Monday, April 21, 2008


Now, two different but similar explanations can be given.

The first is that, after the ASCII-to-hex conversion of the string, Notepad rearranges the hex codes not according to ASCII standards, but to Unicode, and that messes it up. Here's the example:

Take "bush hid the facts". The hex codes (they can be seen with any hex editor you want to download) for the string are:

62 75 73 68 20 68 69 64 20 74 68 65 20 66 61 63 74 73

Arrange the codes to make up Unicode characters and you get:

7562 6873 6820 6964 7420 6568 6620 6163 7473

You'll notice that every code is hyperlinked. If you click on each one of them, you'll see that each one represents a Chinese (I think) "letter".

So this whole thing's cause is the coincidence that the 18 ASCII characters happen to represent 9 Unicode characters. And, of course, Windows' inability to determine the right encoding of the file.

The second explanation is slightly different, but the basics are the same: the difference between ASCII and Unicode. It's just a matter of Notepad defaults. You see, when you save the file, in the "Encoding" field, the default drop-down is set to ANSI. So, by default, Notepad saves as ANSI. But if you do a File -> Open, the default Encoding is set to Unicode. That's exactly what happens when you double click a saved file. Notepad knows the path, but not the Encoding. So it uses the default Unicode encoding, which spits the Chinese characters as explained above.

And that's about it. No easter eggs, no conspiracies, no Bush interventions. Just plain old Microsoft.

It is a bug in notepad.exe and it is called 4335 series bug.Means a line with 4 3 3 5 letter words will always pose the same problem..

