Entropy encoding

Entropy encoding

In information theory an entropy encoding is a lossless data compression scheme that is independent of the specific characteristics of the medium.

One of the main types of entropy coding creates and assigns a unique prefix-free code to each unique symbol that occurs in the input. These entropy encoders then compress data by replacing each fixed-length input symbol by the corresponding variable-length prefix-free output codeword. The length of each codeword is approximately proportional to the negative logarithm of the probability. Therefore, the most common symbols use the shortest codes.

According to Shannon's source coding theorem, the optimal code length for a symbol is −logbP, where b is the number of symbols used to make output codes and P is the probability of the input symbol.

Two of the most common entropy encoding techniques are Huffman coding and arithmetic coding. If the approximate entropy characteristics of a data stream are known in advance (especially for signal compression), a simpler static code may be useful. These static codes include universal codes (such as Elias gamma coding or Fibonacci coding) and Golomb codes (such as unary coding or Rice coding).

Entropy as a measure of similarity

Besides using entropy encoding as a way to compress digital data, an entropy encoder can also be used to measure the amount of similarity between streams of data. This is done by generating an entropy coder/compressor for each class of data; unknown data is then classified by feeding the uncompressed data to each compressor and seeing which compressor yields the highest compression. The coder with the best compression is probably the coder trained on the data that was most similar to the unknown data.

External links



Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Entropy (information theory) — In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the information… …   Wikipedia

  • Entropy (disambiguation) — Additional relevant articles may be found in the following categories: Thermodynamic entropy Entropy and information Quantum mechanical entropy Entropy, in thermodynamics, is a measure of the energy in a thermodynamic system not available to do… …   Wikipedia

  • Entropy (general concept) — In many branches of science, entropy refers to a certain measure of the disorder of a system. Entropy is particularly notable as it has a broad, common definition that is shared across physics, mathematics and information science. Although the… …   Wikipedia

  • entropy — Synonyms and related words: EDP, abeyance, aloofness, amorphia, amorphism, amorphousness, anarchy, apathy, bit, blurriness, catalepsy, catatonia, channel, chaos, communication explosion, communication theory, confusion, data retrieval, data… …   Moby Thesaurus

  • Delta encoding — Not to be confused with Elias delta coding. Delta encoding is a way of storing or transmitting data in the form of differences between sequential data rather than complete files; more generally this is known as data differencing. Delta encoding… …   Wikipedia

  • Byte pair encoding — or digram coding[1] is a simple form of data compression in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur within that data. A table of the replacements is required to rebuild the original data …   Wikipedia

  • Truncated binary encoding — is an entropy encoding typically used for uniform probability distributions with a finite alphabet. It is parameterized by an alphabet with total size of number n . It is a slightly more general form of binary encoding when n is not a power of… …   Wikipedia

  • Incremental encoding — Incremental encoding, also known as front compression, back compression, or front coding, is a type of delta encoding compression algorithm whereby common prefixes or suffixes and their lengths are recorded so that they need not be duplicated.… …   Wikipedia

  • Principle of maximum entropy — This article is about the probability theoretic principle. For the classifier in machine learning, see maximum entropy classifier. For other uses, see maximum entropy (disambiguation). Bayesian statistics Theory Bayesian probability Probability… …   Wikipedia

  • JPEG — For other uses, see JPEG (disambiguation). Joint Photographic Experts Group A photo of a cat compressed with successively more lossy compression ratios from right to left Filename extension .jpg …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”