A computer text-processing system inputs keystrokes and outputs
glyphs, small pictures that are assembled on paper or on a
computer screen. Keystrokes and glyphs do not, in general, coincide:
for example, if the system does generate ligatures, then to the
sequence of two keystrokes <f
><i
> will typically
correspond a single glyph. Similarly, if the system shapes Arabic
glyphs in a vaguely reasonable manner, then multiple different glyphs
may correspond to a single keystroke.
The complex transformation rules from keystrokes to glyphs are usually factored into two simpler transformations, from keystrokes to characters and from characters to glyphs. You may want to think of characters as the basic unit of text that is stored e.g. in the buffer of your text editor. While the definition of a character is intrinsically application-specific, a number of standardised collections of characters have been defined.
A coded character set is a set of characters together with a mapping from integer codes --- known as codepoints --- to characters. Examples of coded character sets include US-ASCII, ISO 8859-1, KOI8-R, and JIS X 0208(1990).
A coded character set need not use 8 bit integers to index characters. Many early systems used 6 bit character sets, while 16 bit (or more) character sets are necessary for ideographic writing systems.
Traditionally, typographers speak about typefaces and founts. A typeface is a particular style or design, such as Times Italic, while a fount is a molten-lead incarnation of a given typeface at a given size.
Digital fonts come in font files. A font file contains the information necessary for generating glyphs of a given typeface, and applications using font files may access glyph information in an arbitrary order.
Digital fonts may consist of bitmap data, in which case they are said to be bitmap fonts. They may also consist of a mathematical description of glyph shapes, in which case they are said to be scalable fonts. Common formats for scalable font files are Type 1 (sometimes incorrectly called ATM fonts or PostScript fonts), TrueType and OpenType.
The glyph data in a digital font needs to be indexed somehow. How this is done depends on the font file format. In the case of Type 1 fonts, glyphs are identified by glyph names. In the case of TrueType fonts, glyphs are indexed by integers corresponding to one of a number of indexing schemes (usually Unicode --- see below).
The X11 core fonts system uses the data in a font file to generate font instances, which are collections of glyphs at a given size indexed according to a given encoding.
X11 core font instances are usually specified using a notation known
as the X Logical Font Description (XLFD). An XLFD starts with a
dash “-
”, and consists of fourteen fields separated by dashes,
for example:
-adobe-courier-medium-r-normal--12-120-75-75-m-70-iso8859-1
Or particular interest are the last two fields “iso8859-1
”, which
specify the font instance's encoding.
A scalable font is specified by an XLFD which contains zeroes instead of some fields:
-adobe-courier-medium-r-normal--0-0-0-0-m-0-iso8859-1
X11 font instances may also be specified by short name. Unlike an
XLFD, a short name has no structure and is simply a conventional name
for a font instance. Two short names are of particular interest, as
the server will not start if font instances with these names cannot be
opened. These are “fixed
”, which specifies the fallback font to
use when the requested font cannot be opened, and “cursor
”, which
specifies the set of glyphs to be used by the mouse pointer.
Short names are usually implemented as aliases to XLFDs; the
standard “fixed
” and “cursor
” aliases are defined in
/usr/share/font/X11/misc/fonts.alias
Unicode (http://www.unicode.org) is a coded character set with the goal of uniquely identifying all characters for all scripts, current and historical. While Unicode was explicitly not designed as a glyph encoding scheme, it is often possible to use it as such.
Unicode is an open character set, meaning that codepoint assignments may be added to Unicode at any time (once specified, though, an assignment can never be changed). For this reason, a Unicode font will be sparse, meaning that it only defines glyphs for a subset of the character registry of Unicode.
The Unicode standard is defined in parallel with the international standard ISO 10646. Assignments in the two standards are always equivalent, and we often use the terms Unicode and ISO 10646 interchangeably.
When used in the X11 core fonts system, Unicode-encoded fonts should
have the last two fields of their XLFD set to “iso10646-1
”.