2000.01.21 07:52 "Re: Unicode build of libtiff?", by Bjorn Brox
On Wed, 19 Jan 2000, Leonard Rosenthol wrote:
Nope! UTF-8 encoding is the same as ASCII for all values <=255, so all Roman/Latin based language information would look and act the same. For other languages (CJK, being the big example) a non-UTF-8 savvy reader would simply display strange looking values but would be able to handle it OK.
The values are the same, but the encoding is different for values >127. Still only an issue for accents and such, to the best of my knowledge.
Values > 127 is not ASCII, but ASCII is definitely not enough to cover all Roman/Latin based languages.
Some Roman/Latin based languages can be covered by ISO 8869-1, also called Latin-1, and in UTF-8 the values 128..255 is encoded using two bytes.
Anyhow: The TIFF standard should be changed to state that all text arguments should be unicode stored and managed using UTF-8.
Using UTF-8 is definitely the best way of making a product to support Unicode since you does not have to invent new incomptible data types and you can guarantee that there will be no \0's in the strings which means that these strings still can be managed as char, and all str*() functions will not have t be replaced by special functions, and you don't have to depend on unicode support in your operating system.
Bjorn Brox, CORENA Norge AS, http://www.corena.no/, ICQ 17872043
Kirkegaardsvn. 45, P.O.Box 1024, N-3601 Kongsberg, NORWAY
Phone: +47 32287435, Fax: +47 32736877, Mobile: +47 92638590