Sunday, 29 April 2012

PHP - Some strings are more equal than others

You may have recently read about the PHP programming language, when it was found that if you compare the two strings "9223372036854775807" and "9223372036854775808" with the == operator, PHP will report these as identical. Most of the time PHP does the right thing, but you need to be careful about these exceptions to the rule.

This was reported as a bug to the people who maintain PHP, but they responded that regarding these two strings as equal was really the correct thing to do. Programmers who feel these two strings should be treated as different should instead use the === operator. This operator checks if two strings are equal, but this time, means it!

But this isn't the end of the story...

While === is fine for strings containing only digits, there's a little known feature of Unicode where you can express an accented letter either by a single character such as 'é' (U+00E9), or by using a regular ascii 'e' (U+0065) and then adding a special character (U+0301) which means "put an accent on that last character". If you want to compare two strings that are the same except they each use different ways of expressing an 'é', you need to add another equal sign and use ==== to differentiate them, as === will see them as equal.

There's a similar rule about the Unicode smiley face character '' (U+263A) and the more familiar colon-bracket smiley ':)'. These will compare equal unless you use the ==== operator. As well as that, all of these comparison operators see both the white smiley face '' and black smiley face '' (U+263B) as identical, unless php.ini has the 'Racist' setting switched on.


Even the ==== operator isn't the end of the matter. This can't tell the difference between serif and sans-serif text. Most programmers are happy to treat these as equivalent, but if the text is highly secure, you need the ===== operator which knows that 'A' and 'A' are different.

But the ultimate equality operator is the six equal sign ====== operator. As I write this, no-one has found two values where x======y returns true, even when x and y are copies. Some mathematicians suspect there are no such pairs of values, but a mathematical proof remains elusive.

Picture credits:
'Equal in stature' by Kevin Dooley (CC-BY)
'Equal Opportunity Employment' by flickr user 'pasukaru76' (CC-BY)

2 comments:

  1. If you posted this on April 1st I wouldn't believe you :)

    ReplyDelete
  2. For two moments i thought, that's real, but then i the the label 'Satire';-) For my background im a regulare PHP-Programmer, but it's true the comparision in PHP is a real pain in the ass...

    ReplyDelete