Thursday, March 04, 2010

Not all equals are born equal

The relationship between two things is something that we learn early in life. It's quite an important thing to know about, especially for understanding the world we live in.

However the question of 'are these the same' or 'are these equal' is surprisingly tricky. It might immediately seem rather obvious, but in actual fact there is a surprising amount of complex mathematics to do with equality and how you define it. As it turns out there are quite a few ways of saying that things are the same, leading to a description of all the different ways of comparing things being called an Equivalence relationship.

Equivalence relationships have three key properties:

  • Reflexive - That everything is equal to itself.
  • Symmetric - That if object A is equal to thing B then thing B must be equal to object A.
  • Transitive - That if A equals B and B equals C then A must equal C.
All of which might seem rather obvious, but mathematicians like to flip-flop between stating the massively obvious and the painfully complex.

For example consider the following:

Maisy
=
Maisy
Quite clearly, for most sensible definitions of equality these will be the same; mostly due to them being identical.

Where it starts to get tricky is if there is some variation between the two things being compared. Initially the answer to this might appear to be "no, they aren't equal", but consider the following:

Maisy
=
Maisy
If you took a poll of people to ask if these two were equal you'd probably come up with the following options in some proportion:
  • Yes, it's the same image.
  • Yes, apart from one is a bit distorted.
  • No, they look different.
All of which are acceptable answers, it just depends on how you define equality. The first answer assumes the comparison is about the contents of the image rather than how it's presented. The second answer is the middle ground, acknowledging that the images are the same but that there is a different in how they a presented. The final answer is the other extreme, the pictures aren't the same so they're not equal.

This all ties in very deeply with one of the foundations of computer science; the distinction between comparing the concept represented by some data and the data itself. A clearer example of the distinction here would be to use two pictures where the idea of equality is even more fuzzy.

Maisy
=
Maisy

As before the answer to if these two are equal is yes and also no. If the question being asked is "are these the same cat?" then the answer is yes; if the question is instead "are these photos the same?" then the answer is no. Usually it's obvious which of these questions is being asked from the context and generally the question is phrased in the more explicit way.

Computers are no different. Ever since Ada Lovelace realised that numbers could be used to represent ideas, concepts and physical objects there has been the need to distinguish between the two types of equality. It's necessary to explicitly tell the computer if you're asking if the numbers are the same (asking if the photos are the same for the above example) or asking if the thing the numbers represent are the same (asking if it's the same cat).

This distinction between the values and what the values are meant to represent occurs frequently in computer languages. For example in the Java language the operator '==' is used to compare the number representing the object, where as the 'equals()' method is used to compare the concept/item that the object represents.

Some languages complicate this further, such as JavaScript where there is the '==' equality operator and the '===' strict equality operator. The need for this arises because when JavaScript first came into existence the '==' operator was defined in such a way that the number 5 was equal to a sequence of text which is the character 5. This is a little confusing, but for various reasons computers choose to represent the character/digit of 5 as the number 53.

This distinction between a character/digit of 5 and the number 5 (which you get from adding 2 and 3) might seem a little strange, but consider the phrase "I ate 5 strawberries". It's necessary for there to be some way to represent each character/letter as a number and the method of representation that has become standard has the digit 5 represented by the number 53.

Coming back to the difference between '==' and '===', the former will say that "5" is equal to 5, where as the latter will say that they are not equal (due to the internal representation being different). This can be a little confusing when first coming to JavaScript and is the main reason for writing this post. I discovered the difference and thought I should share.

So the next time you're trying to work out if two things are the same, just remember to think what you are wanting to compare.

Creative Commons License
The words and photos on this webpage which are created by Toby Gray are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 England & Wales License.