Documentation/admin-guide/unicode.rst

Source file repositories/reference/linux-study-clean/Documentation/admin-guide/unicode.rst

File Facts

System
Linux kernel
Corpus path
Documentation/admin-guide/unicode.rst
Extension
.rst
Size
7254 bytes
Lines
189
Domain
Support Tooling And Documentation
Bucket
Documentation
Inferred role
Support Tooling And Documentation: documentation
Status
atlas-only

Why This File Exists

Repository support layer: documentation, build tooling, samples, user-space helper tools, generated initramfs support, licenses, and validation utilities.

Dependency Surface

Detected Declarations

Annotated Snippet

Unicode support
===============

		 Last update: 2005-01-17, version 1.4

Note: The original version of this document, which was maintained at
lanana.org as part of the Linux Assigned Names And Numbers Authority
(LANANA) project, is no longer existent.  So, this version in the
mainline Linux kernel is now the maintained main document.

Introduction
------------

The Linux kernel code has been rewritten to use Unicode to map
characters to fonts.  By downloading a single Unicode-to-font table,
both the eight-bit character sets and UTF-8 mode are changed to use
the font as indicated.

This changes the semantics of the eight-bit character tables subtly.
The four character tables are now:

=============== =============================== ================
Map symbol	Map name			Escape code (G0)
=============== =============================== ================
LAT1_MAP	Latin-1 (ISO 8859-1)		ESC ( B
GRAF_MAP	DEC VT100 pseudographics	ESC ( 0
IBMPC_MAP	IBM code page 437		ESC ( U
USER_MAP	User defined			ESC ( K
=============== =============================== ================

In particular, ESC ( U is no longer "straight to font", since the font
might be completely different than the IBM character set.  This
permits for example the use of block graphics even with a Latin-1 font
loaded.

Note that although these codes are similar to ISO 2022, neither the
codes nor their uses match ISO 2022; Linux has two 8-bit codes (G0 and
G1), whereas ISO 2022 has four 7-bit codes (G0-G3).

In accordance with the Unicode standard/ISO 10646 the range U+F000 to
U+F8FF has been reserved for OS-wide allocation (the Unicode Standard
refers to this as a "Corporate Zone", since this is inaccurate for
Linux we call it the "Linux Zone").  U+F000 was picked as the starting
point since it lets the direct-mapping area start on a large power of
two (in case 1024- or 2048-character fonts ever become necessary).
This leaves U+E000 to U+EFFF as End User Zone.

[v1.2]: The Unicodes range from U+F000 and up to U+F7FF have been
hard-coded to map directly to the loaded font, bypassing the
translation table.  The user-defined map now defaults to U+F000 to
U+F0FF, emulating the previous behaviour.  In practice, this range
might be shorter; for example, vgacon can only handle 256-character
(U+F000..U+F0FF) or 512-character (U+F000..U+F1FF) fonts.


Actual characters assigned in the Linux Zone
--------------------------------------------

In addition, the following characters not present in Unicode 1.1.4
have been defined; these are used by the DEC VT graphics map.  [v1.2]
THIS USE IS OBSOLETE AND SHOULD NO LONGER BE USED; PLEASE SEE BELOW.

====== ======================================
U+F800 DEC VT GRAPHICS HORIZONTAL LINE SCAN 1
U+F801 DEC VT GRAPHICS HORIZONTAL LINE SCAN 3
U+F803 DEC VT GRAPHICS HORIZONTAL LINE SCAN 7
U+F804 DEC VT GRAPHICS HORIZONTAL LINE SCAN 9
====== ======================================

The DEC VT220 uses a 6x10 character matrix, and these characters form

Annotation

Implementation Notes