Typical set

Typical set

In information theory, the typical set is a set of sequences whose probability is close to two raised to the negative power of the entropy of their source distribution. That this set has total probability close to one is a consequence of the asymptotic equipartition property (AEP) which is a kind of law of large numbers.

This has great use in compression theory as it provides a theoretical means for compressing data, allowing us to represent any sequence X^n using nH(X) bits on average, and, hence, justifying the use of entropy as a measure of information from a source.

The AEP can also be proven for a large class of stationary ergodic processes, allowing typical set to be defined in more general cases.

(Weakly) typical sequences

If a sequence "x"1, ..., "x""n" is drawn from an i.i.d. distribution X defined over a finite alphabet mathcal{X}, then the typical set, {A_epsilon}^{(n)} is defined as those sequences which satisfy:

: 2^{-n(H(X)+epsilon)} leq p(x_1, x_2, ..., x_n) leq 2^{-n(H(X)-epsilon)}

Where H(X) = - sum_{y isin mathcal{Xp(y)log_2 p(y) is the information entropy of X. The probability above need only be within a factor of 2^{nepsilon}.

It has the following properties if "n" is sufficiently large, ε can be chosen arbitrarily small so that:
#The probability of a sequence from X being drawn from {A_epsilon}^{(n)} is greater than 1-epsilon
#left| {A_epsilon}^{(n)} ight| leq 2^{n(H(X)+epsilon)}
#left| {A_epsilon}^{(n)} ight| geq (1-epsilon)2^{n(H(X)-epsilon)}

For a general stochastic process {X(t)} with AEP, the (weakly) typical set can be defined similarly with p(x_1, x_2, ..., x_n) replaced by p(x_0^ au) (i.e. the probability of the sample limited to the time interval [0, au] ), n being the degree of freedom of the process in the time interval and H(X) being the entropy rate. If the process is continuous-valued, differential entropy is used instead.

Strongly typical sequences (strong typicality)

If a sequence "x"1, ..., "x""n" is drawn from some specified joint distribution, then the strongly typical set, "A"ε,strong("n") is defined as the set of sequences which satisfy

: left|frac{N(x^n)}{n}-p(x^n) ight| < frac{varepsilon}{|mathcal{X}.

It can be shown that strongly typical sequences are also weakly typical (with a different constant &epsilon;, and hence the name. The two forms, however, are not equivalent. Strong typicality is often easier to work with in proving theorems for memoryless channels. However, as is apparent from the definition, this form of typicality is only defined for random variables having finite support.

Jointly typical sequences

Two sequences x_1^n and y_1^n are jointly ε-typical if the pair (x_1^n,y_1^n) is ε-typical with respect to the joint distribution p(x_1^n,y_1^n) and both x_1^n and y_1^n are ε-typical with respect to their marginal distributions p(x_1^n) and p(y_1^n). The set of all such pairs of sequences (x_1^n,y_1^n) is denoted by A_{epsilon}^n(X,Y). Jointly ε-typical "n"-tuple sequences are defined similarly.

Applications of typicality

Typical set encoding

In communication, typical set encoding encodes only the typical set of a stochastic source with fixed length block codes. Asymptotically, it is, by the AEP, lossless and achieves the minimum rate equal to the entropy rate of the source.

Typical set decoding

In communication, typical set decoding is used in conjunction with random coding to estimate the transmitted message as the one with a codeword that is jointly ε-typical with the observation. i.e.:hat{w}=w iff (exists!w)( (x_1^n(w),y_1^n)in A_{epsilon}^n(X,Y)) where hat{w},x_1^n(w),y_1^n are the message estimate, codeword of message w and the observation respectively. A_{epsilon}^n(X,Y) is defined with respect to the joint distribution p(x_1^n)p(y_1^n|x_1^n) where p(y_1^n|x_1^n) is the transition probability that characterizes the channel statistics, and p(x_1^n) is some input distribution used to generate the codewords in the random codebook.

Universal null-hypothesis testing

Universal channel code

See also: algorithmic complexity theory

See also

* Source coding theorem


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Set-aside — as a political measure was introduced by the European Union (EU) in 1988 to (i) help reduce the large and costly surpluses produced in Europe under the guaranteed price system of the Common Agricultural Policy; and (ii) to deliver some… …   Wikipedia

  • Set notation — Sets are fundamental objects in mathematics. Intuitively, a set is merely a collection of elements or members . There are various conventions for textually denoting sets. In any particular situation, an author typically chooses from among these… …   Wikipedia

  • Typical meteorological year — A typical meteorological year (TMY) is a collation of selected weather data for a specific location, generated from a data bank much longer than a year in duration. It is specially selected so that it showcases the range of weather phenomena for… …   Wikipedia

  • SET News — Infobox TV channel name = SET News logofile = logosize = logoalt = logo2 = launch = March 1998 closed date = picture format = share = share as of = share source = network = Sanlih E Television owner = slogan = First leading brand of Taiwan s… …   Wikipedia

  • Set-n-Forget cooker — The Set n Forget cooker is a commercial kitchen deep fryer developed in 1972. It is believed that this was the first ever automated attempt at the deep fryer. Background The Set n Forget deep fryer was designed by the influential Australian… …   Wikipedia

  • typical — I (New American Roget s College Thesaurus) adj. emblematic, symbolic; characteristic, representative, model, ideal, conforming. See conformity, indication. II (Roget s IV) modif. Syn. characteristic, habitual, usual, standard, representative,… …   English dictionary for students

  • Voice modem command set — Main article: Modem Voice modem is a term commonly used to describe an analog telephone data modem with a built in capability of transmitting and receiving voice recordings over the phone line. Voice modems are used for telephony and answering… …   Wikipedia

  • Mandelbrot set — Initial image of a Mandelbrot set zoom sequence with a continuously coloured environment …   Wikipedia

  • Instruction set — An instruction set, or instruction set architecture (ISA), is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception… …   Wikipedia

  • Complex instruction set computing — A complex instruction set computer (CISC) (  /ˈsɪs …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”