TrueType Typography
TrueType and Unicode
The Unicode Consortium's web site


The Unicode character encoding standard is a fixed-width, uniform text and character encoding scheme. It includes characters from the world's scripts, as well as technical symbols in common use. The Unicode standard is modeled on the ASCII character set. Since ASCII's 7-bit character size is inadequate to handle multilingual text, the Unicode Consortium adopted a 16-bit architecture which extends the benefits of ASCII to multilingual text. Unicode characters are consistently 16 bits wide, regardless of language, so no escape sequence or control code is required to specify any character in any language. Unicode character encoding treats symbols, alphabetic characters, and ideographic characters identically, so that they can be used simultaneously and with equal facility. Computer programs that use Unicode character encoding to represent characters but do not display or print text can (for the most part) remain unaltered when new scripts or characters are introduced.
    From the introduction to "The Unicode Standard Version 1.0", Addison-Wesley 1991

Unicode characters are indicated by U+nnnn, where nnnn is a hexadecimal number from 0000 to FFFE. Printable ASCII codes are thus represented by U+0020 (space) to U+007E (tilde).

Unicode in TrueType

TrueType fonts for use on Microsoft platforms are expected to contain a Unicode-based character mapping table (part of the 'cmap' table in the file). With the format 4 subtable used for this, 16-bit characters are mapped to 16-bit glyph numbers using an efficient encoding scheme.

Fonts of characters not in Unicode should use characters from F000 In this terminology, characters are semantic entities that mean exactly the same regardless of the font style, whereas glyphs are stylistic things, the actual images that are printed. In many situations, the distinction between characters and glyphs is blurry, but in character mappings it is useful to separate them.

However, full Unicode support for the computer user depends on much more than the internal layout of his or her TrueType fonts. It relies on the operating system providing function libraries that accept Unicode 2-byte strings, and applications that utilize these new functions. So far, Windows NT and Java have good Unicode support internally. Windows 95 does not, although it was intended to until near its launch. (Indeed there are books on internationalizing your Windows programs that assume the release version of Windows 95 would have Unicode functionality: the beta versions did!) No major applications yet support Unicode.

Unicode Fonts?

There's no definition of what constitutes a Unicode font. However the following fonts contain enough of "Unicode space" to be considered useable in many environments where a single font must work for many languages and symbol systems.
  • Lucida Sans Unicode by Bigelow & Holmes, was supplied with some versions of Windows NT. It contains 1775 glyphs.

  • Times New Roman, Arial, Courier (4 styles each) and Impact are available free from Microsoft's web site. These fonts contain the WGL4 character set, which handles most major Latin-alphabet, Greek and Cyrillic languages. They each contain 653 glyphs, apart from Times New Roman which contains a little surprise in the 654th glyph.

  • Bitstream Cyberbit has been released as a TrueType font for Windows 95 and NT. Containing over 8500 characters, the Roman style is free! As well as supporting all major European languages (including Greek, Russian and Turkish), Cyberbit also supports Hebrew, Japanese, Korean, Simplified Chinese and Traditional Chinese. Bitstream also offer customized versions of CyberBit. The design of the font is Univers (not Dutch 801 as it says on the website), extended with the help of DynaLab of Taiwan.

  • TrueType GX fonts typically contain lots of characters, but they do not necessarily contain a Unicode mapping table. Contact your GX font supplier, who should be able to tell you such things.

See also

TYPE*chimérique | TrueType Typography | TYPE*links