I was talking to kaze (that’s his irc handle) about various things over breakfast during work week and we ended up talking about localization and character encoding.
I found out through him (and hopefully this will change sometime in the near future) that the gecko engine uses latin character encoding rather than utf-8 and so if the dev forgot to use the code to translate the encodings when coding, you can have mojibake (unreadable characters). [Note: this also makes me wonder if some performance could be cut down if we change this… I am guessing though, it might be a bit messy to make that switch if no one has done it by now? Maybe I’m wrong]
Not 100 % sure if it’s because of that … but a few hours later when I was testing import facebook contacts (because I have a friend that has his name in cyrillic characters), I found a mojibake in his name in the contacts app. I wouldn’t have thought to check for it if it wasn’t for discussion I had over breakfast with kaze.
Another point we discussed was http://en.wikipedia.org/wiki/Diaeresis_%28diacritic%29 and the difference between a single character and a compound character. Specialized characters also cause various issues. In my past experience I’ve seen a space and hyphen cause an issue… even the long dash versus short dash.
Knowing the finer details in how things work allows you to create more test cases and coverage. Knowing how things are implemented helps you to assess risk areas for faster bug finding.