18 September 2010

Who wrote Unicode, and just what were they thinking of at the time?

Is it for content or style? ASCII was defined for data value essentially, and the appearance used to be handled by the character set of the host computer.

This carried through to the basic Latin character-handling in Unicode, so that there is no difference in, for example, the Anglophone 7 and the continental 7, with a bar through it; likewise the two forms of Z. That's fine.

But then comes the inconsistency... we hit things like the pre-1950s Irish script and the rules change. Let's say I chose to write in the old Irish script, and I redefined my keyboard to let me do so. Two of the main differences will be the characters & and g.  & is replaced with the Tironian et - ⁊ - and  g is replaced with ᵹ -- "insular g".

Sadly, most software won't realise that the two mean the same thing, so my spell-checker breaks, my sort function stops putting things in alphabetical order, other people can't search or edit my documents properly.  I'm also know unable to code in C anymore, because the logical AND operator is &&, not ⁊⁊.

The end result is that while Unicode appears to give more freedom, it's entirely superficial, because the user must choose between style and function, and no-one wants to limit themselves in terms of function -- which is an issue that I will go into in the near future when considering the translation of software and of web services.

No comments: