# Kraft's inequality

﻿
Kraft's inequality

In coding theory, Kraft's inequality, named after Leon Kraft, gives a necessary and sufficient condition for the existence of a uniquely decodable code for a given set of codeword lengths. Its applications to prefix codes and trees often find use in computer science and information theory.

More specifically, Kraft's inequality limits the lengths of codewords in a prefix code: if one takes an exponential function of each length, the resulting values must look like a probability mass function. Kraft's inequality can be thought of in terms of a constrained budget to be spent on codewords, with shorter codewords being more expensive.

• If Kraft's inequality holds with strict inequality, the code has some redundancy.
• If Kraft's inequality holds with strict equality, the code in question is a complete code.
• If Kraft's inequality does not hold, the code is not uniquely decodable.

Kraft's inequality was published by Kraft (1949). However, Kraft's paper discusses only prefix codes, and attributes the analysis leading to the inequality to Raymond Redheffer. The inequality is sometimes also called the Kraft–McMillan theorem after the independent discovery of the result by McMillan (1956); McMillan proves the result for the general case of uniquely decodable codes, and attributes the version for prefix codes to a spoken observation in 1955 by Joseph Leo Doob.

## Examples

### Binary trees

9, 14, 19, 67 and 76 are leaf nodes at depths of 3, 3, 3, 3 and 2, respectively.

Any binary tree can be viewed as defining a prefix code for the leaves of the tree. Kraft's inequality states that

$\sum_{\ell \in \mathrm{leaves}} 2^{-\mathrm{depth}(\ell)} \leq 1.$

Here the sum is taken over the leaves of the tree, i.e. the nodes without any children. The depth is the distance to the root node. In the tree to the right, this sum is

$\frac{1}{4} + 4 \left( \frac{1}{8} \right) = \frac{3}{4} \leq 1.$

### Chaitin's constant

In algorithmic information theory, Chaitin's constant is defined as

$\Omega = \sum_{p \in P} 2^{-|p|}.$

This is an infinite sum which has one summand for every syntactically correct program which halts. |p| stands for the length of the bit string of p. The programs are required to be prefix-free in the sense that no summand has a prefix representing a syntactically valid program that halts. Hence the bit strings are prefix codes, and Kraft's inequality gives that $\Omega \leq 1$.

## Formal statement

Let each source symbol from the alphabet

$S=\{\,s_1,s_2,\ldots,s_n\,\}\,$

be encoded into a uniquely decodable code over an alphabet of size r with codeword lengths

$\ell_1,\ell_2,\ldots,\ell_n.\,$

Then

$\sum_{i=1}^{n} \left( \frac{1}{r} \right)^{\ell_i} \leq 1.$

Conversely, for a given set of natural numbers $\ell_1,\ell_2,\ldots,\ell_n\,$ satisfying the above inequality, there exists a uniquely decodable code over an alphabet of size r with those codeword lengths.

A commonly occurring special case of a uniquely decodable code is a prefix code. Kraft's inequality therefore also holds for any prefix code.

## Proof for prefix codes

Example for binary tree. Red nodes represent a prefix tree leaf nodes. The method for calculation the number of descendant leaf nodes in the full tree is shown.

Suppose that $\ell_1 \leq \ell_2 \leq ... \leq \ell_n$. Let A be the full r-ary tree of depth $\ell_n$. Every word of length $\ell \leq \ell_n$ over an r-ary alphabet corresponds to a node in this tree at depth $\ell$. The ith word in the prefix code corresponds to a node vi; let Ai be the set of all leaf nodes in the subtree of A rooted at vi. Clearly

$|A_i| = r^{\ell_n-\ell_i}.$

Since the code is a prefix code,

$A_i \cap A_j = \varnothing,\quad i\neq j$.

Thus, given that the total number of nodes at depth $\ell_n$ is $r^{\ell_n}$,

$|\bigcup_{i=1}^n A_i| = \sum_{i=1}^n r^{\ell_n-\ell_i} \leq r^{\ell_n}$

from which the result follows.

Conversely, given any ordered sequence of n natural numbers,

$\ell_1 \leq \ell_2 \leq \dots \leq \ell_n$

satisfying the Kraft's inequality, one can construct a prefix code with codeword lengths equal to $\ell_i$ by pruning subtrees from a full r-ary tree of depth $\ell_n$. First choose any node from the full tree at depth $\ell_1$ and remove all of its descendents. This removes $r^{-\ell_1}$ fraction of the nodes from the full tree from being considered for the rest of the remaining codewords. Next iteration removes $r^{-\ell_2}$ fraction of the full tree for total of $r^{-\ell_1}+r^{-\ell_2}$. After m iterations,

$\sum_{i=1}^m r^{-\ell_i}$

fraction of the full tree nodes are removed from consideration for any remaining codewords. But, by the assumption, this sum is less than 1 for all m < n, thus prefix code with lengths $\ell_i$ can be constructed for all n source symbols.

## Proof of the general case

Consider the generating function in inverse of x for the code S

$F(x) = \sum_{i=1}^n x^{-|s_i|} = \sum_{\ell=\min}^\max p_\ell \, x^{-\ell}$

in which $p_\ell$—the coefficient in front of $x^{-\ell}$—is the number of distinct codewords of length $\ell$. Here min is the length of the shortest codeword in S, and max is the length of the longest codeword in S.

For any positive integer m consider the m-fold product Sm, which consists of all the words of the form $s_{i_1}s_{i_2}\dots s_{i_m}$, where $i_1, i_2, \dots, i_m$ are indices between 1 and m. Note that, since S was assumed to uniquely decodable, if $s_{i_1}s_{i_2}\dots s_{i_m}=s_{j_1}s_{j_2}\dots s_{j_m}$, then $i_1=j_1, i_2=j_2, \dots, i_m=j_m$. In other words, every word in Sm comes from a unique sequence of codewords in S. Because of this property, one can compute the generating function G(x) for Sm from the generating function F(x) as

$G(x) = \left( F(x) \right)^m = \left( \sum_{i=1}^n x^{-|s_i|} \right)^m = \sum_{i_1=1}^n \sum_{i_2=1}^n \cdots \sum_{i_m=1}^n x^{-\left(|s_{i_1}| + |s_{i_2}| + \cdots + |s_{i_m}|\right)} = \sum_{i_1=1}^n \sum_{i_2=1}^n \cdots \sum_{i_m=1}^n x^{-|s_{i_1} s_{i_2}\cdots s_{i_m}|} = \sum_{\ell=m \cdot \min}^{m \cdot \max} q_\ell \, x^{-\ell} \; .$

Here, similarly as before, $q_\ell$—the coefficient in front of $x^{-\ell}$ in G(x)—is the number of words of length $\ell$ in Sm. Clearly, $q_\ell$ can not exceed $r^\ell$. Hence for any positive x

$\left( F(x) \right)^m \le \sum_{\ell=m \cdot \min}^{m \cdot \max} r^\ell \, x^{-\ell} \; .$

Substituting the value x = r we have

$\left( F(r) \right)^m \le m \cdot (\max-\min)+1$

for any positive integer m. Left side of the inequality grows exponentially in m and right side only linearly. The only possibility for the inequality to be valid for all m is that $F(r) \le 1$. Looking back on the definition of F(x) we finally get the inequality.

$\sum_{i=1}^n r^{-\ell_i} = \sum_{i=1}^n r^{-|s_i|} = F(r) \le 1 \; .$

## References

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Kraft — has more than one meaning:* Kraft Foods, the world s third largest food and beverage company * Kraft process, a paper pulp production method * Kraft paper, paper produced by the Kraft process * Kraft (Mega Man Zero), a video game character *… …   Wikipedia

• List of inequalities — This page lists Wikipedia articles about named mathematical inequalities. Inequalities in pure mathematics =Analysis= * Askey–Gasper inequality * Bernoulli s inequality * Bernstein s inequality (mathematical analysis) * Bessel s inequality *… …   Wikipedia

• Shannon's source coding theorem — In information theory, Shannon s source coding theorem (or noiseless coding theorem) establishes the limits to possible data compression, and the operational meaning of the Shannon entropy.The source coding theorem shows that (in the limit, as… …   Wikipedia

• List of mathematics articles (K) — NOTOC K K approximation of k hitting set K ary tree K core K edge connected graph K equivalence K factor error K finite K function K homology K means algorithm K medoids K minimum spanning tree K Poincaré algebra K Poincaré group K set (geometry) …   Wikipedia

• Code — redirects here. CODE may also refer to Cultural Olympiad Digital Edition. Decoded redirects here. For the television show, see Brad Meltzer s Decoded. For code (computer programming), see source code. For other uses, see Code (disambiguation).… …   Wikipedia

• List of terms relating to algorithms and data structures — The [http://www.nist.gov/dads/ NIST Dictionary of Algorithms and Data Structures] is a reference work maintained by the U.S. National Institute of Standards and Technology. It defines a large number of terms relating to algorithms and data… …   Wikipedia

• Список терминов, относящихся к алгоритмам и структурам данных —   Это служебный список статей, созданный для координации работ по развитию темы.   Данное предупреждение не устанавливается на информационные списки и глоссарии …   Википедия

• Список терминов — Список терминов, относящихся к алгоритмам и структурам данных   Это сл …   Википедия

• Oméga de Chaitin — Un nombre Oméga de Chaitin est une suite de bits représentant, sous forme concentrée, la solution du problème de l arrêt pour tous les programme d une machine de Turing universelle donnée. En théorie algorithmique de l information, une constante… …   Wikipédia en Français

• Неравенство Крафта — Макмиллана — В теории кодирования, неравенство Крафта  Макмиллана даёт необходимое и достаточное условие существования разделимых и префиксных кодов, обладающих заданным набором длин кодовых слов. Содержание 1 Предварительные определения 2 …   Википедия