Skip to content

Standard Character Set Encodings

The default GL and GR sets in Compound Text correspond to the left and right halves of ISO 8859-1 (Latin 1). As such, any legal instance of a STRING type (as defined in the ICCCM) is also a legal instance of type COMPOUND_TEXT.

[The implied initial state in ISO 2022 is defined with the sequence: 01/11 02/00 04/03 GO and G1 in an 8-bit environment only. Designation also invokes. 01/11 02/00 04/07 In an 8-bit environment, C1 represented as 8-bits. 01/11 02/00 04/09 Graphic character sets can be 94 or 96. 01/11 02/00 04/11 8-bit code is used. 01/11 02/08 04/02 Designate ASCII into G0. 01/11 02/13 04/01 Designate right-hand part of ISO Latin-1 into G1. ]

To define one of the approved standard character set encodings to be the GL set, one of the following control sequences is used:

01/1102/08{I} F94 character set
01/1102/0402/08{I} F94N character set

To define one of the approved standard character set encodings to be the GR set, one of the following control sequences is used:

01/1102/09{I} F94 character set
01/1102/13{I} F96 character set
01/1102/0402/09 {I} F94N character set

The "F"in the control sequences above stands for "Final character", which is always in the range 04/00 to 07/14. The "{I}" stands for zero or more "intermediate characters", which are always in the range 02/00 to 02/15, with the first intermediate character always in the range 02/01 to 02/03. The registration authority has defined an "{I} F" sequence for each registered character set encoding.

[Final characters for private encodings (in the range 03/00 to 03/15) are not permitted here in Compound Text.]

For GL, octet 02/00 is always defined as SPACE, and octet 07/15 (normally DELETE) is never used. For a 94-character set defined as GR, octets 10/00 and 15/15 are never used.

[This is consistent with ISO 2022.]

A 94N character set uses N octets (N > 1) for each character. The value of N is derived from the column value for F:

column 04 or 052 octets
column 063 octets
column 074 or more octets

In a 94N encoding, the octet values 02/00 and 07/15 (in GL) and 10/00 and 15/15 (in GR) are never used.

[The column definitions come from ISO 2022.]

Once a GL or GR set has been defined, all further octets in that range (except within control sequences and extended segments) are interpreted with respect to that character set encoding, until the GL or GR set is redefined. GL and GR sets can be defined independently, they do not have to be defined in pairs.

Note that when actually using a character set encoding as the GR set, you must force the most significant bit (08/00) of each octet to be a one, so that it falls in the range 10/00 to 15/15.

[Control sequences to specify character set encoding revisions (as in section 6.3.13 of ISO 2022) are not used in Compound Text. Revision indicators do not appear to provide useful information in the context of Compound Text. The most recent revision can always be assumed, since revisions are upward compatible.]