 Entropy encoding

In information theory an entropy encoding is a lossless data compression scheme that is independent of the specific characteristics of the medium.
One of the main types of entropy coding creates and assigns a unique prefixfree code to each unique symbol that occurs in the input. These entropy encoders then compress data by replacing each fixedlength input symbol by the corresponding variablelength prefixfree output codeword. The length of each codeword is approximately proportional to the negative logarithm of the probability. Therefore, the most common symbols use the shortest codes.
According to Shannon's source coding theorem, the optimal code length for a symbol is −log_{b}P, where b is the number of symbols used to make output codes and P is the probability of the input symbol.
Two of the most common entropy encoding techniques are Huffman coding and arithmetic coding. If the approximate entropy characteristics of a data stream are known in advance (especially for signal compression), a simpler static code may be useful. These static codes include universal codes (such as Elias gamma coding or Fibonacci coding) and Golomb codes (such as unary coding or Rice coding).
Entropy as a measure of similarity
Besides using entropy encoding as a way to compress digital data, an entropy encoder can also be used to measure the amount of similarity between streams of data. This is done by generating an entropy coder/compressor for each class of data; unknown data is then classified by feeding the uncompressed data to each compressor and seeing which compressor yields the highest compression. The coder with the best compression is probably the coder trained on the data that was most similar to the unknown data.
External links
 Online textbook: Information Theory, Inference, and Learning Algorithms, by David MacKay, gives an accessible introduction to Shannon theory and data compression, including the Huffman coding and arithmetic coding.
Data compression methods Information theory Lossless Entropy encodingShannon–Fano · Shannon–Fano–Elias · Huffman · Adaptive Huffman · Arithmetic · Range · Golomb · Universal (Gamma · ExpGolomb · Fibonacci · Levenshtein)RLE · Byte pair encoding · DEFLATE · Lempel–Ziv (LZ77/78 · LZSS · LZW · LZWL · LZO · LZMA · LZX · LZRW · LZJB · LZS · LZT · ROLZ) · Statistical Lempel ZivOthersAudio Audio codec partsOthersImage TermsMethodsOthersVideo TermsVideo characteristics · Frame · Frame rate · Interlace · Frame types · Video quality · Video resolutionOthersSee Compression formats for formats and Compression software implementations for codecs Categories: Lossless compression algorithms
 Entropy and information
Wikimedia Foundation. 2010.
Look at other dictionaries:
Entropy (information theory) — In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the information… … Wikipedia
Entropy (disambiguation) — Additional relevant articles may be found in the following categories: Thermodynamic entropy Entropy and information Quantum mechanical entropy Entropy, in thermodynamics, is a measure of the energy in a thermodynamic system not available to do… … Wikipedia
Entropy (general concept) — In many branches of science, entropy refers to a certain measure of the disorder of a system. Entropy is particularly notable as it has a broad, common definition that is shared across physics, mathematics and information science. Although the… … Wikipedia
entropy — Synonyms and related words: EDP, abeyance, aloofness, amorphia, amorphism, amorphousness, anarchy, apathy, bit, blurriness, catalepsy, catatonia, channel, chaos, communication explosion, communication theory, confusion, data retrieval, data… … Moby Thesaurus
Delta encoding — Not to be confused with Elias delta coding. Delta encoding is a way of storing or transmitting data in the form of differences between sequential data rather than complete files; more generally this is known as data differencing. Delta encoding… … Wikipedia
Byte pair encoding — or digram coding[1] is a simple form of data compression in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur within that data. A table of the replacements is required to rebuild the original data … Wikipedia
Truncated binary encoding — is an entropy encoding typically used for uniform probability distributions with a finite alphabet. It is parameterized by an alphabet with total size of number n . It is a slightly more general form of binary encoding when n is not a power of… … Wikipedia
Incremental encoding — Incremental encoding, also known as front compression, back compression, or front coding, is a type of delta encoding compression algorithm whereby common prefixes or suffixes and their lengths are recorded so that they need not be duplicated.… … Wikipedia
Principle of maximum entropy — This article is about the probability theoretic principle. For the classifier in machine learning, see maximum entropy classifier. For other uses, see maximum entropy (disambiguation). Bayesian statistics Theory Bayesian probability Probability… … Wikipedia
JPEG — For other uses, see JPEG (disambiguation). Joint Photographic Experts Group A photo of a cat compressed with successively more lossy compression ratios from right to left Filename extension .jpg … Wikipedia