- Shannon's source coding theorem
In

information theory ,**Shannon's source coding theorem**(or**noiseless coding theorem**) establishes the limits to possibledata compression , and the operational meaning of theShannon entropy .The

**source coding theorem**shows that (in the limit, as the length of a stream of i.i.d. data tends to infinity) it is impossible to compress the data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without it being virtually certain that information will be lost. However it is possible to get the code rate arbitrarily close to the Shannon entropy, with negligible probability of loss.The

**source coding theorem for symbol codes**places an upper and a lower bound on the minimal possible expected length of codewords as a function of the entropy of the input word (which is viewed as arandom variable ) and of the size of the target alphabet.**Statements**"Source coding" is a mapping from (a sequence of) symbols from an information source to a sequence of alphabet symbols (usually bits) such that the source symbols can be exactly recovered from the binary bits (lossless source coding) or recovered within some distortion (lossy source coding). This is the concept behind

data compression .**Source coding theorem**In information theory, the

**source coding theorem**(Shannon 1948) informally states that::"N" i.i.d. random variables each with entropy "H(X)" can be compressed into more than "NH(X)"

bits with negligible risk of information loss, as "N" tends to infinity; but conversely, if they are compressed into fewer than "NH(X)" bits it is virtually certain that information will be lost." (MacKay 2003).**Source coding theorem for symbol codes**Let $Sigma\_1$, $Sigma\_2$ denote two finite alphabets and let $Sigma\_1^*$ and $Sigma\_2^*$ denote the set of all finite words from those alphabets (respectively).

Suppose that "X" is a random variable taking values in $Sigma\_1$ and let "f" be a decipherable code from $Sigma\_1^*$ to $Sigma\_2^*$ where $|Sigma\_2|=a$. Let "S" denote the random variable given by the wordlength "f(X)".

If "f" is optimal in the sense that it has the minimal expected wordlength for "X", then

:$frac\{H(X)\}\{log\_2\; a\}\; leq\; mathbb\{E\}S\; <\; frac\{H(X)\}\{log\_2\; a\}\; +1$(Shannon 1948)

**Proof: Source coding theorem**Given $X$ is an i.i.d. source, its

time series "X"_{1}, ..., "X"_{"n"}is i.i.d. withentropy "H"("X") in the discrete-valued case anddifferential entropy in the continuous-valued case. The Source coding theorem states that for any $epsilon\; >\; 0$ for any rate larger than theentropy of the source, there is large enough $n$ and an encoder that takes $n$ i.i.d. repetition of the source, $X^\{1:n\}$, and maps it to $n.(H(X)+epsilon)$ binary bits such that the source symbols $X^\{1:n\}$ are recoverable from the binary bits with probability at least $1\; -\; epsilon$.**Proof for achievability**Fix some for any $epsilon\; >\; 0$. The typical set,$A\_n^epsilon$, is defined as follows:

$A\_n^epsilon\; =;\; left\{x\_1^n\; :\; left|-frac\{1\}\{n\}\; log\; p(X\_1,\; X\_2,\; ...,\; X\_n)\; -\; H\_n(X)\; ight|\}\; math>$

AEP shows that for large enough "n", the probability that a sequence generated by the source lies in the typical set,$A\_n^epsilon$, defined as approaches one. In particular there for large enough "n", $P(A\_n^epsilon)>1-epsilon$ (See

AEP for a proof):The definition of typical sets implies that those sequences that lie in the typical set satisfy:

:$2^\{-n(H(X)+epsilon)\}\; leq\; p(x\_1,\; x\_2,\; ...,\; x\_n)\; leq\; 2^\{-n(H(X)-epsilon)\}$

Note that:

*The probability of a sequence from $X$ being drawn from $\{A\_epsilon\}^\{(n)\}$ is greater than $1-epsilon$

*$left|\; \{A\_epsilon\}^\{(n)\}\; ight|\; leq\; 2^\{n(H(X)+epsilon)\}$ since the probability of the whole set $\{A\_epsilon\}^\{(n)\}$ is at most one.

*$left|\; \{A\_epsilon\}^\{(n)\}\; ight|\; geq\; (1-epsilon)2^\{n(H(X)-epsilon)\}$. For the proof, use the upper bound on the probability of each term in typical set and the lower bound on the probability of the whole set $\{A\_epsilon\}^\{(n)\}$.

Since $left|\; \{A\_epsilon\}^\{(n)\}\; ight|\; leq\; 2^\{n(H(X)+epsilon)\},\; n.(H(X)+epsilon);$ bits are enough to point to any string in this set.

The encoding algorithm: The encoder checks if the input sequence lies within the typical set; if yes, it outputs the index of the input sequence within the typical set; if not, the encoder outputs an arbitrary $n.(H(X)+epsilon)$ digit number. As long as the input sequence lies within the typical set (with probability at least $1-epsilon$), the encoder doesn't make any error. So, the probability of error of the encoder is bounded above by $epsilon$

**Proof for converse**The converse is proved by showing that any set of size smaller than $A\_n^epsilon$ (in the sense of exponent) would cover a set of probability bounded away from 1.**Proof: Source coding theorem for symbol codes**Let $s\_i$ denote the wordlength of each possible $x\_i$ ($i=1,ldots,n$). Define $q\_i\; =\; a^\{-s\_i\}/C$, where "C" is chosen so that $sum\; q\_i\; =\; 1$.

Then

:$egin\{matrix\}H(X)\; =\; -\; sum\_\{i=1\}^n\; p\_i\; log\_2\; p\_i\; \backslash \; \backslash leq\; -\; sum\_\{i=1\}^n\; p\_i\; log\_2\; q\_i\; \backslash \; \backslash =\; -\; sum\_\{i=1\}^n\; p\_i\; log\_2\; a^\{-s\_i\}\; +\; sum\_\{i=1\}^n\; p\_i\; log\_2\; C\; \backslash \; \backslash =\; -\; sum\_\{i=1\}^n\; p\_i\; log\_2\; a^\{-s\_i\}\; +\; log\_2\; C\; \backslash \; \backslash leq\; -\; sum\_\{i=1\}^n\; -\; s\_i\; p\_i\; log\_2\; a\; \backslash \; \backslash leq\; mathbb\{E\}S\; log\_2\; a\; \backslash end\{matrix\}$

where the second line follows from

Gibbs' inequality and the fifth line follows fromKraft's inequality : $C\; =\; sum\_\{i=1\}^n\; a^\{-s\_i\}\; leq\; 1$ so $log\; C\; leq\; 0$.For the second inequality we may set

:$s\_i\; =\; lceil\; -\; log\_a\; p\_i\; ceil$

so that

:$-\; log\_a\; p\_i\; leq\; s\_i\; <\; -log\_a\; p\_i\; +\; 1$

and so

:$a^\{-s\_i\}\; leq\; p\_i$

and

:$sum\; a^\{-s\_i\}\; leq\; sum\; p\_i\; =\; 1$

and so by Kraft's inequality there exists a prefix-free code having those wordlengths. Thus the minimal "S" satisfies

:$egin\{matrix\}mathbb\{E\}S\; =\; sum\; p\_i\; s\_i\; \backslash \; \backslash \; sum\; p\_i\; left(\; -log\_a\; p\_i\; +1\; ight)\; \backslash \; \backslash \; =\; sum\; -\; p\_i\; frac\{log\_2\; p\_i\}\{log\_2\; a\}\; +1\; \backslash \; \backslash \; =\; frac\{H(X)\}\{log\_2\; a\}\; +1\; \backslash end\{matrix\}$

**Extension to non-stationary independent sources****Fixed Rate lossless source coding for discrete time non-stationary independent sources**Define typical set $A\_n^epsilon$ as:

: $A\_n^epsilon\; =;\; \{x\_1^n\; :\; left|-frac\{1\}\{n\}\; log\; p(X\_1,\; X\_2,\; ...,\; X\_n)\; -\; ar\{H\_n\}(X)\; ight|\}\; math>$

Then, for given $delta>0$, for n large enough, $Pr(A\_n^epsilon)>\; 1-delta$. Now we just encode the sequences in the typical set, and usual methods in source coding show that the cardinality of this set is smaller than $2^\{n(ar\{H\_n\}(X)+epsilon)\}$. Thus, on an average, $ar\{H\_n\}(X)+epsilon$ bits suffice for encoding with probability greater than $1-delta$, where $epsilon;\; mbox\{\; and\; \};\; delta$ can be made arbitrarily small, by making n larger.

**ee also***

Channel coding

* Noisy Channel Coding Theorem

*Error exponent

* Asymptotic Equipartition Property (AEP)

*Data compression **References*** cite book

last = Cover

first = Thomas M.

title = Elements of Information Theory

chapter = Chapter 5: Data Compression

year = 2006

publisher = John Wiley & Sons

isbn = 0-471-24195-4

* C.E. Shannon, " [*http://plan9.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf A Mathematical Theory of Communication*] ", "Bell System Technical Journal ", vol. 27, pp. 379-423, 623-656, July, October, 1948

*Wikimedia Foundation.
2010.*

### Look at other dictionaries:

**Noisy-channel coding theorem**— In information theory, the noisy channel coding theorem (sometimes Shannon s theorem), establishes that for any given degree of noise contamination of a communication channel, it is possible to communicate discrete data (digital information)… … Wikipedia**Distributed source coding**— (DSC) is an important problem in information theory and communication. DSC problems regard the compression of multiple correlated information sources that do not communicate with each other.[1] By modeling the correlation between multiple sources … Wikipedia**Shannon–Hartley theorem**— In information theory, the Shannon–Hartley theorem tells the maximum rate at which information can be transmitted over a communications channel of a specified bandwidth in the presence of noise. It is an application of the noisy channel coding… … Wikipedia**Claude Shannon**— Claude Elwood Shannon (1916 2001) Born April … Wikipedia**Huffman coding**— Huffman tree generated from the exact frequencies of the text this is an example of a huffman tree . The frequencies and codes of each character are below. Encoding the sentence with this code requires 135 bits, as opposed of 288 bits if 36… … Wikipedia**Golomb coding**— is a data compression scheme invented by Solomon W. Golomb in the 1960s. The scheme is based on entropy encoding and is optimal (in the sense of Shannon s source coding theorem) for alphabets following a geometric distribution, making it highly… … Wikipedia**Adaptive Huffman coding**— (also called Dynamic Huffman coding) is an adaptive coding technique based on Huffman coding. It permits building the code as the symbols are being transmitted, having no initial knowledge of source distribution, that allows one pass encoding and … Wikipedia**Information theory**— Not to be confused with Information science. Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental… … Wikipedia**Entropy (information theory)**— In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the information… … Wikipedia**Entropy encoding**— In information theory an entropy encoding is a lossless data compression scheme that is independent of the specific characteristics of the medium. One of the main types of entropy coding creates and assigns a unique prefix free code to each… … Wikipedia