Half-width kana

Half-width kana

Half-width kana (半角カナ) is half of fullwidth form. It refers to the katakana character portion of the character set specified by JIS X 0201.

Although an official name is JIS X 0201 katakana, half-width kana is the commonly known name and this term will be used in this article.

History

ASCII is defined as a 7-bit character set and has room for 128 characters. However, since this standard was designed for the United States, it does not contain characters and symbols (for example, the ¥ yen currency symbol) needed for representation of Japanese.

JIS X 0201 was developed in 1969, and since computers at that time simply did not have the computational power and memory necessary to process the thousands of Kanji (Chinese-based) characters that exist in written Japanese, thereforeo as a simplification, Kanji characters were always represented by katakana.

Half-width kana were developed as "...the first Japanese characters encoded on computers because they are used for Japanese telegrams. As single-byte characters..." ref|Lunde1999_1

To make katakana fit into the area allowed, some compromises were made: the diacritical marks Dakuten and Handakuten are treated as separate characters instead of being part of the preceding character. This led to the so-called "half-width kana" and these compromises still cause problems today for computer programs, apart from frequently being considered to be visually unattractive.

Half-width table

\Trailing 4 bits→
↓Leading 4 bits
0 1 2 3 4 5 6 7 8 9 a b c d e f
0
1
2
3
4
5
6
7
8
9
a
b ソ
c
d
e
f

Half-width kana on the Internet

E-mail

Since the SMTP and NNTP protocols (used to deliver e-mail and Usenet, respectively) were formerly only able to transmit 7-bits, it was then the convention to use ISO-2022-JP for sending e-mail in Japanese.

Since half-width kana is not contained in ISO-2022-JP, half-width kana cannot be included in a message, but when half-width kana was accidentally included in a message, it can become garbled during transmission.

This is no longer such a problem since most e-mail servers today use ESMTP, and hence 8-bit characters are acceptable. Alternatively, an encoding system such as Base64 can be used and specified in the message using MIME.

Web pages

The problems that exists in e-mail do not exist with Web pages since HTTP accepts 8-bit characters.

A problem that does exist is that computer programs have difficulties whether to treat a character as Shift JIS,EUC-JP, or UTF-7 - hence character code information should be specified with a HTTP response header or a Meta tag.

Misunderstanding of JIS X 0201

In fact, JIS X 0201 katakana is not half-width katakana. The standard doesn't define character's width. It defines only the code representation of katakana characters. The term "half-width" is just the remains of the old devices that displayed single-byte characters in half-width (as compared with double-byte ones). In JIS X 0201 standard, katakana characters in its code chart are printed in normal width, not half-width.

However, the misunderstanding that the standard defines "half-width" characters is widespread. People who know the standard will often say "so-called half-width kana."

ee also

* Fullwidth form
* Halfwidth and Fullwidth Forms

References

#Note|Lunde1999_1 Lunde, Ken. CJKV Information Processing. 1st ed. O'Reilly, 1999. p. 144-145


Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

  • Extended Unix Code — (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese.The structure of EUC is based on the ISO 2022 standard, which specifies a way to represent character sets containing a maximum of 94… …   Wikipedia

  • Fullwidth form — In CJK computing, graphic characters are traditionally classed into fullwidth (in Taiwan and Hong Kong: 全形; elsewhere: 全角) and halfwidth (in Taiwan and Hong Kong: 半形; elsewhere: 半角) characters. With fixed width fonts (now called bi width by… …   Wikipedia

  • The Matrix — For the series, see The Matrix (franchise). For other uses, see Matrix. The Matrix Theatrical release poster Directed by Andy Wachowski Larry Wachowski …   Wikipedia

  • Matrix digital rain — A screensaver named XMatrix in XScreenSaver representing the digital rain Matrix digital rain, Matrix code or sometimes green rain, is the computer code featured in the Matrix series. The falling green code is a way of representing the activity… …   Wikipedia

  • List of typographic features — State of the art digital typographic systems have solved virtually all the demands of traditional typography and have expanded the possibilities with many new features. The two lists below provide information about many features Contents 1… …   Wikipedia

  • Katakana — Schrifttyp Silbenschrift Sprachen Japanisch Ainu Verwendungszeit seit ca. 800 n. Chr. Offiziell in Japan Abstammung …   Deutsch Wikipedia

  • Dakuten — ゙ ゚ Dakuten Diacritics accent acute( …   Wikipedia

  • Language input keys — are keys designed to translate letters entered by users, usually found on Japanese and Korean keyboards, for use with an input method editor.Keys for Japanese KeyboardsKanji (漢字)Used to switch between entering Japanese and English text. It is not …   Wikipedia

  • Meiryo — Category Sans serif Designer(s) C G Inc., Eiichi Kōno, Takeharu Suzuki (Katakana, Hiragana, and Chinese Character), Matthew Carter, Tom Rickner (Latin, Greek, and Cyrillic) …   Wikipedia

  • Japanese language and computers — In relation to the Japanese language and computers many adaptation issues arise, some unique to Japanese and others common to languages which have a very large number of characters. The number of characters needed in order to write English is… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”