Prefix code

Prefix code

A prefix code is a code, typically a variable-length code, with the "prefix property": no code word is a prefix of any other code word in the set. A code with code words {0, 10, 11} has the prefix property; a code consisting of {0, 1, 10, 11} does not, because "1" is a prefix of both "10" and "11".

Prefix codes are also known as prefix-free codes, prefix condition codes, comma-free codes (although this is incorrect), and instantaneous codes. Although Huffman coding is just one of many algorithms for deriving prefix codes, prefix codes are also widely referred to as "Huffman codes", even when the code was not produced by a Huffman algorithm.

Using prefix codes, a message can be transmitted as a sequence of concatenated code words, without any out-of-band markers to frame the words in the message. The recipient can decode the message unambiguously, by repeatedly finding and removing prefixes that form valid code words. This is not possible with codes that lack the prefix property, such as our example of {0, 1, 10, 11}: a receiver reading a "1" at the start of a code word would not know whether that was the complete code word "1", or merely the prefix of the code word "10" or "11".

The variable-length Huffman codes, country calling codes, the country and publisher parts of ISBNs, and the Secondary Synchronization Codes used in the UMTS W-CDMA 3G Wireless Standard are prefix codes. Prefix codes are also a form of entropy encoding used in lossless data compression.

Prefix codes are not error-correcting codes. In actual practice, a message might first be compressed with a prefix code, and then encoded again (with an error-correcting code) before transmission.

"This article is partly derived from Federal Standard 1037C, which uses the term comma-free code."

Techniques

Techniques for constructing a prefix code can be simple, or quite complicated.

If every word in the code has the same length, the code is called a fixed-length code. For example, ISO 8859-15 letters are always 8 bits long. UTF-32/UCS-4 letters are always 32 bits long. ATM packets are always 424 bits long. Prefixes cannot exist in a fixed-length code. Unfortunately, fixed length encodings are inefficient in situations where some words are much more likely to be transmitted than others.

Some codes mark the end of a code word with a special "comma" symbol, different from normal data. [ [http://www.imperial.ac.uk/research/hep/group/theses/JJones.pdf "Development of Trigger and Control Systems for CMS"] by J. A. Jones: "Synchronisation" p. 70] This is somewhat analogous to the period at the end of a sentence; it marks where one sentence ends and another begins. If every code word ends in a comma, and the comma does not appear elsewhere in a code word, the code is prefix-free. However, modern communication systems send everything as sequences of "1" and "0" – adding a third symbol would be expensive, and using it only at the ends of words would be inefficient. Morse code is an everyday example of a variable-length code with a comma. The long pauses between letters, and the even longer pauses between words, help people recognize where one letter (or word) ends, and the next begins. Similarly, Fibonacci coding uses a "11" to mark the end of every code word.

Huffman coding is a more sophisticated technique for constructing variable-length prefix codes. The Huffman coding algorithm takes as input the frequencies that the code words should have, and constructs a prefix code that minimizes the weighted average of the code word lengths.

Kraft's inequality characterizes the sets of code word lengths that are possible in a prefix code.

Prefix codes in use today

Examples of prefix codes include:
* country calling codes
* the country and publisher parts of ISBNs
* the Secondary Synchronization Codes used in the UMTS W-CDMA 3G Wireless Standard
* VCR Plus+ codes
* the UTF-8 system for encoding Unicode characters

Techniques

Commonly used techniques for constructing prefix codes include Huffman codes and the earlier Shannon-Fano codes, and universal codes such as:
* Elias delta coding
* Elias gamma coding
* Elias omega coding
* Fibonacci coding
* Levenshtein coding
* Unary coding
* Golomb Rice code

References

* P. Elias, Universal codeword sets and representations of integers, IEEE Trans. Inform. Theory 21 (2) (1975) 194-203.
*D.A. Huffman, " [http://compression.ru/download/articles/huff/huffman_1952_minimum-redundancy-codes.pdf A method for the construction of minimum-redundancy codes] " (PDF), Proceedings of the I.R.E., Sept. 1952, pp. 1098-1102 (Huffman's original article)
* [http://www.huffmancoding.com/david/scientific.html Profile: David A. Huffman] , Scientific American, Sept. 1991, pp. 54-58 (Background story)
* Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. "Introduction to Algorithms", Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 16.3, pp.385–392.

External links

* [http://plus.maths.org/issue10/features/infotheory/index.html Codes, trees and the prefix property] by Kona Macphee


Wikimedia Foundation. 2010.

См. также в других словарях:

  • Code — redirects here. CODE may also refer to Cultural Olympiad Digital Edition. Decoded redirects here. For the television show, see Brad Meltzer s Decoded. For code (computer programming), see source code. For other uses, see Code (disambiguation).… …   Wikipedia

  • Prefix (disambiguation) — A prefix is a part of a word attached to a stem which modifies the meaning of that stem.Prefix may also refer to: *Prefix (computer science), a prefix of a string *The Prefix a mixtape by the American rapper, Lil Wayne *Numerical prefix, a prefix …   Wikipedia

  • Prefix header — In computer programming, a prefix header is a feature found in some C or C++ compilers used to simplify code and/or to reduce compilation time. Overview In the C and C++ programming languages, a header file is a file whose text is automatically… …   Wikipedia

  • Prefix — Liste des personnages d Astérix le Gaulois Voici la liste des personnages de la bande dessinée Astérix le Gaulois par René Goscinny et Albert Uderzo, classés par ordre alphabétique. Sommaire 1 Personnages du village 1.1 Abraracourcix 1.2… …   Wikipédia en Français

  • prefix — [[t]pri͟ːfɪks[/t]] prefixes 1) N COUNT A prefix is a letter or group of letters, for example un or multi , which is added to the beginning of a word in order to form a different word. For example, the prefix un is added to happy to form unhappy …   English dictionary

  • Code de procédure civile (France) — Pour les autres articles nationaux, voir Code de procédure civile. Le code de procédure civile français, souvent abrégé en CPC, est un code qui rassemble toutes les règles de procédure civile française. Article détaillé : Procédure civile en …   Wikipédia en Français

  • prefix — pre·fix || ‚prɪ fɪks n. word or word stem used at the beginning of a word; title (as Mr. or The Honorable before a person s name); dialing code, area code (for telephones) v. place before or in front of; add a prefix, add a word or word stem …   English contemporary dictionary

  • Universal code (data compression) — In data compression, a universal code for integers is a prefix code that maps the positive integers onto binary codewords, with the additional property that whatever the true probability distribution on integers, as long as the distribution is… …   Wikipedia

  • Self-synchronizing code — Not to be confused with self clocking signal. In telecommunications, a self synchronizing code[1] is a line code in which the symbol stream formed by a portion of one code word, or by the overlapped portion of any two adjacent code words, is not… …   Wikipedia

  • ZIP code — Mr. ZIP promoted the use of ZIP codes for the USPS during the 1960s and 1970s. ZIP codes are a system of postal codes used by the United States Postal Service (USPS) since 1963. The term ZIP, an acronym for Zone Improvement Plan,[1] is properly… …   Wikipedia


Поделиться ссылкой на выделенное

Прямая ссылка:
Нажмите правой клавишей мыши и выберите «Копировать ссылку»