ISO Character Sets (XML in a Nutshell, 2nd Edition)

Book Home

5.6. ISO Character Sets

Unicode has only recently become popular. Previously, the space and processing costs associated with Unicode files prompted vendors to prefer smaller, single-byte character sets that could only handle English and a few other languages of interest, but not the full panoply of human language. The International Standards Organization (ISO) has standardized 14 of these character sets as ISO standard 8859. For all of these single-byte character sets, characters 0 through 127 are identical to the ASCII character set; characters 128 through 159 are the C1 controls; and characters 160 through 255 are the additional characters needed for scripts such as Greek, Cyrillic, and Turkish.

ISO-8859-1 (Latin-1)

Various national standards bodies have produced other character sets to cover scripts and languages of interest within their geographic and political boundaries. For example, the Korea Industrial Standards Association developed the KS C 5601-1992 standard for encoding Korean. These national standard character sets can be used in XML documents as well, provided that you include the proper encoding declaration in the document and your parser knows how to translate these character sets into Unicode.

Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.