Duplicate characters in Unicode

Duplicate characters in Unicode

Unicode has a certain amount of duplication of characters. These are pairs of single Unicode code points that are canonically equivalent. The reason for this are compatibility issues with legacy systems.

Unless two characters are canonically equivalent, they are not "duplicate" in the narrow sense. There is, however, room for disagreement on whether two Unicode characters really encode the same grapheme in cases such as the "micro sign" µ vs. the Greek μ.

This should be clearly distinguished from Unicode characters that are rendered as identical glyphs or near-identical glyphs (homoglyphs), either because they are historically cognate (such as Greek Η vs. Latin H) or because of coincidental similarity (such as Greek Ρ vs. Latin P, or Greek Η vs. Cyrillic Н, or the astronomical symbol for "Sun" ☉ vs. "circled dot operator" ⊙ vs. the Gothic letter Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать курсовую

Look at other dictionaries:

  • Cyrillic characters in Unicode — Cyrillic script Slavic letters А Б В Г Ґ Д …   Wikipedia

  • Unicode — For the 1889 Universal Telegraphic Phrase book, see Commercial code (communications). The Unicode official logo since October 2009 …   Wikipedia

  • Unicode character property — Unicode assigns character properties to each code point.[1] These properties can be used to handle characters (code points) in processes, like in line breaking, script direction right to left or applying controls. Slightly inconsequently, some… …   Wikipedia

  • Unicode equivalence — is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character… …   Wikipedia

  • Unicode font — A Unicode font (also known as UCS font and Unicode typeface) is a computer font that contains a wide range of characters, letters, digits, glyphs, symbols, ideograms, logograms, etc., which are collectively mapped into the standard Universal… …   Wikipedia

  • Unicode symbols — v · Character Types Scripts Unihan ideographs, etc. Phonetic characters Punctuation and separators Diacritics and other marks Symbols Numerals Compatibility characters …   Wikipedia

  • Mapping of Unicode characters — Unicode’s Universal Character Set has a potential capacity to support over 1 million characters. Each UCS character is mapped to a code point which is an integer between 0 and 1,114,111 used to represent each character within the internal logic… …   Wikipedia

  • Phonetic symbols in Unicode — Unicode supports several phonetic scripts and notations through the existing writing systems and the addition of extra blocks with phonetic characters. These phonetic extras are derived of an existing script, usually Latin, Greek or Cyrillic. In… …   Wikipedia

  • Unicode numerals — Numerals (often called numbers in Unicode) are characters that denote a number. The same Arabic Indic numerals are used widely in various writing systems throughout the world and all share the same semantics for denoting numbers, However, the… …   Wikipedia

  • Mapping of Unicode graphic characters — Main article: Mapping of Unicode characters By far the most common Unicode characters are graphical characters. Graphical characters all have some visual representation or glyphs associated with them. While Unicode does not specify the concrete… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”