GBK


GBK

GBK is an extension of the GB2312 character set for simplified Chinese characters, used in the People's Republic of China.

"GB" stands for "National Standard", while "K" stands for "Extension". GBK not only extended the old standard GB2312 with Traditional Chinese characters, but also with Chinese characters that were simplified after the establishment of GB2312 in 1981. With the arrival of GBK, certain names with characters formerly unrepresentable, like the "rong" (镕) character in former Chinese Premier Zhu Rongji's name, are now representable.

History

In 1993, the Unicode 1.1 standard was released, including 20,902 characters used in mainland China, Taiwan, Japan and Korea. Following this, China released GB13000.1-93, a national standard ("guóbiāo") equivalent of Unicode 1.1.

The GBK character set was defined in 1993 as an extension of GB2312-80, while also including the characters of GB13000.1-93 through the unused codepoints available in GB2312. Hence GBK is upward compatible with GB2312.

Microsoft implemented GBK in Windows 95 as Code Page 936. While GBK was never an official standard, widespread usage of Windows 95 led to GBK becoming the "de facto" standard. While GBK included all the Chinese characters defined in Unicode 1.1 and GB13000.1-93, these standards used different code tables. The primary reason for its existence was simply to bridge the gap between GB2312-80 and GB13000.1-93.

In 1995, China National Information Technology Standardization Technical Committee set down the Chinese Internal Code Specification (zh-sp|s=汉字内码扩展规范(GBK)|p=Hànzì Nèimǎ Kuòzhǎn Guīfàn (GBK)), Version 1.0, known as GBK 1.0, which is a slight extension of Codepage 936. The newly added 95 characters were not found in GB 13000.1-1993, and were provisionally assigned Unicode PUA code points.

Microsoft later added the euro sign to Codepage 936 and assigned the code 0x80 to it. This is not a valid code point in GBK 1.0.

In 2000, the GB18030-2000 standard was released, superseding yet maintaining compatibility with GBK 1.0. It increased the number of definitions of Chinese characters and extended the number of possible characters through the implementation of four-byte character spaces. The subset of GB 18030 consisting of one-byte and two-byte characters is sometimes also referred to as GBK. Mapping to Unicode has been slightly changed, though, as some characters are now defined in Unicode. In the most up-to-date form of the standard, GB 18030-2005, only 14 characters are still mapped to Unicode PUA.

Encoding

A character is encoded as 1 or 2 bytes. A byte in the range 007F is a single byte that means the same thing as it does in ASCII. Strictly speaking, there are 96 characters and 32 control codes in this range.

A byte with the high bit set indicates that it is the first of 2 bytes. Loosely speaking, the first byte is in the range 81FE (that is, never 80 or FF, and the second byte is 40FE for some areas and 80FE for others.

More specifically, the following ranges of bytes are defined:

In graphical form, the following figure shows the space of all 64K possible 2-byte codes. Green and yellow areas are assigned GBK codepoints, red are for user-defined characters. The uncolored areas are invalid byte combinations.

Relationship to other encodings

The areas indicated in the previous section as GBK/1 and GBK/2, taken by themselves, is simply GB2312-80 in its usual encoding. GB2312, or more properly the EUC-CN encoding thereof, takes a pair of bytes from the range A1FE, like any 94² ISO-2022 character set loaded into GR. This corresponds to the lower-right quarter of the illustration above. However, GB2312 does not assign any code points to the rows located at ABB0 and F8FE, even though it had staked out the territory.

GBK added extensions to this. You can see that the two gaps were filled in with user-defined areas.

More significantly, it extended the range of the bytes. Having two-byte characters in the ISO-2022 GR range gives a limit of 94²=8,836 possibilities. Abandoning the ISO-2022 model of strict regions for graphics and control characters, but retaining the feature of low bytes being 1-byte characters and pairs of high bytes denoting a character, you could potentially have 128²=16,384 positions. GBK takes part of that, extending the range from A1FE (94 choices for each byte) to 81FE (126 choices) for the first byte and 40FE (191 choices) for the second byte, for a total of 24,066 positions.

Microsoft's Code Page 936 is generally thought of as being GBK. It has bytes in the same range, with assignments that seem to match if you compare them. However, the total number of two-byte code points defined is 21,791 so there must be some differences—at the very least, 95 are missing.

GBK's successor, GB18030-2000, uses the remaining range available to the second byte to further expand the number of possibilities while retaining GBK as a subset.

External links

* [http://www.microsoft.com/globaldev/reference/dbcs/936.mspx Microsoft Reference page for GBK]
* [http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT Mapping of GBK to Unicode ] N.B.: this is Microsoft code page 936, which contains entries for 21015 code points and 32 control characters. This is not exactly the same as GBK which has 21886 characters.
* [http://www.khngai.com/chinese/charmap/tblgbk.php?page=0 GBK Code Table] N.B. This shows the available coding space totally populated except for 2 places, for a total of 32256 glyphs (32352 with the implied single-byte ASCII codes not illustrated), which is more than 23940 or 21886.
* [http://developers.sun.com/dev/gadc/technicalpublications/articles/gb18030.html Evolution of GBK and GB2312 into GB18030]
* [http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V51_HTML/MAN/MAN5/0020____.HTM GBK(5)] man page from HP has a good treatment of character ranges.


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • GBK — Nombre completo Gamlakarleby bollklubb Fundación 1924 Estadio Campo de fútbol central , Kokkola Capacidad 800 asientos, capacidad 3 000 …   Wikipedia Español

  • Gbk — Cette page d’homonymie répertorie les différents sujets et articles partageant un même nom. {{{image}}}   Sigles d une seule lettre   Sigles de deux lettres > Sigles de trois lettres …   Wikipédia en Français

  • GBK — Windows Codepages 874  Thai 932  Japanisch 936  Vereinfachtes Chinesisch 949  Koreanisch 950  Traditionelles Chinesisch 1250  Mitteleuropäisch 1251  Kyrillisch 1252 …   Deutsch Wikipedia

  • GBK — Cette page d’homonymie répertorie les différents sujets et articles partageant un même nom.   Sigles d’une seule lettre   Sigles de deux lettres > Sigles de trois lettres   Sigles de quatre lettres …   Wikipédia en Français

  • GBK — Холодная прокатка с отжигом (GBK) по DIN 2391. После последней обработки в холодном состоянии трубы отжигаются в среде защитного газа/в вакууме …   Металлургический словарь

  • GBK — Gbangbatok, Sierra Leone (Regional » Airport Codes) **** Guojia Biaozhun Kuozhan (Miscellaneous) ** George B Kaiser (Community » Famous) …   Abbreviations dictionary

  • gbk — ISO 639 3 Code of Language ISO 639 2/B Code : ISO 639 2/T Code : ISO 639 1 Code : Scope : Individual Language Type : Living Language Name : Gaddi …   Names of Languages ISO 639-3

  • GBK — abbr. Gladsaxe Bowling Klub …   Dictionary of abbreviations

  • GBK Köln — Sozialversicherung Gesetzliche Krankenversicherung Krankenkasse Betriebskrankenkasse Rechtsform Körperschaft des öffentlichen Rechts Zuständigkeit Deutschland …   Deutsch Wikipedia

  • GBK 001–003 — / ČSD Baureihe 514.1 Nummerierung: GBK 001–003 ČSD 514.101–103 Anzahl: 3 Hersteller: BMMF, Prag. Baujahr(e) …   Deutsch Wikipedia


Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.