Chow-Liu tree

A first-order dependency tree representing the product on the left.

A Chow-Liu tree is an efficient method for constructing a second-order product approximation of a joint distribution, first described in a paper by Chow & Liu (1968). The goals of such a decomposition, as with such Bayesian networks in general, may be either data compression or inference.

1 The Chow-Liu representation
2 The Chow-Liu algorithm
3 Variations on Chow-Liu trees
4 See also
5 Notes
6 References

The Chow-Liu representation

The Chow-Liu method describes a joint probability distribution $P(X_{1},X_{2},\ldots,X_{n})$ as a product of second-order conditional and marginal distributions. For example, the six-dimensional distribution $P (X 1, X 2, X 3, X 4, X 5, X 6)$ might be approximated as

$P^{\prime }(X_{1},X_{2},X_{3},X_{4},X_{5},X_{6})=P(X_{6}|X_{5})P(X_{5}|X_{2})P(X_{4}|X_{2})P(X_{3}|X_{2})P(X_{2}|X_{1})P(X_{1})$

where each new term in the product introduces just one new variable, and the product can be represented as a first-order dependency tree, as shown in the figure. The Chow-Liu algorithm (below) determines which conditional probabilities are to be used in the product approximation. In general, unless there are no third or higher-order interactions, the Chow-Liu approximation is indeed an approximation, and cannot capture the complete structure of the original distribution. Pearl (1988) provides a modern analysis of the Chow-Liu tree as a Bayesian network.

The Chow-Liu algorithm

Chow and Liu show how to select second-order terms for the product approximation so that among all such second-order approximations (first-order dependency trees), the constructed approximation $P^{\prime}$ has the minimum Kullback-Leibler distance to the actual distribution $P$ , and is thus the closest approximation in the classical information-theoretic sense. The Kullback-Leibler distance between a second-order product approximation and the actual distribution is shown to be

$D(P\parallel P^{\prime })=-\sum I(X_{i};X_{i-1})+\sum H(X_{i})-H(X_{1},X_{2},\ldots ,X_{n})$

where $I (X i; X i - 1)$ is the mutual information between variable $X i$ and $X i - 1$ and $H(X_{1},X_{2},\ldots ,X_{n})$ is the joint entropy of variable set $\{X_{1},X_{2},\ldots ,X_{n}\}$ . Since the terms $\sum H(X_{i})$ and $H(X_{1},X_{2},\ldots ,X_{n})$ are independent of the dependency ordering in the tree, only the sum of the pairwise mutual informations, $\sum I(X_{i};X_{i-1})$ , determines the quality of the approximation. Thus, if every branch (edge) on the tree is given a weight corresponding to the mutual information between the variables at its vertices, then the tree which provides the optimal second-order approximation to the target distribution is just the maximum-weight tree. The equation above also highlights the role of the dependencies in the approximation: When no dependencies exist, and the first term in the equation is absent, we have only an approximation based on first-order marginals, and the distance between the approximation and the true distribution is due to the redundancies that are not accounted for when the variables are treated as independent. As we specify second-order dependencies, we begin to capture some of that structure and reduce the distance between the two distributions.

Chow and Liu provide a simple algorithm for constructing the optimal tree; at each stage of the procedure the algorithm simply adds the maximum mutual information pair to the tree. See the original paper, Chow & Liu (1968), for full details. A more efficient tree construction algorithm for the common case of sparse data was outlined in Meilă (1999).

Chow and Wagner proved in a later paper Chow & Wagner (1973) that the learning of the Chow-Liu tree is consistent given samples (or observations) drawn i.i.d. from a tree-structured distribution. In other words, the probability of learning an incorrect tree decays to zero as the number of samples tends to infinity. The main idea in the proof is the continuity of the mutual information in the pairwise marginal distribution. Recently, the exponential rate of convergence of the error probability was provided.^[1]

Variations on Chow-Liu trees

The obvious problem which occurs when the actual distribution is not in fact a second-order dependency tree can still in some cases be addressed by fusing or aggregating together densely connected subsets of variables to obtain a "large-node" Chow-Liu tree (Huang & King 2002), or by extending the idea of greedy maximum branch weight selection to non-tree (multiple parent) structures (Williamson 2000). (Similar techniques of variable substitution and construction are common in the Bayes network literature, e.g., for dealing with loops. See Pearl (1988).)

Generalizations of the Chow-Liu tree are the so called t-cherry junction trees. It is proved that the t-cherry junction trees provide a better or at least as good approximation for a discrete multivariate probability distribution as the Chow-Liu tree gives. For the third order t-cherry junction tree see (Kovács & Szántai 2010), for the k-th order t-cherry junction tree see (Szántai & Kovács 2010). The second order t-cherry junction tree is in fact the Chow-Liu tree.

Notes

^ A Large-Deviation Analysis for the Maximum-Likelihood Learning of Tree Structures. V. Y. F. Tan, A. Anandkumar, L. Tong and A. Willsky. In the International symposium on information theory (ISIT), July 2009.

References

Chow, C. K.; Liu, C. N. (1968), "Approximating discrete probability distributions with dependence trees", IEEE Transactions on Information Theory IT-14 (3): 462–467 .
Huang, Kaizhu; King, Irwin; Lyu, Michael R. (2002), "Constructing a large node Chow-Liu tree based on frequent itemsets", in Wang, Lipo & Rajapakse, Jagath C. & Fukushima, Kunihiko & Lee, Soo-Young & Yao, Xin, Proceedings of the 9th International Conference on Neural Information Processing ({ICONIP}'02) .
Pearl, Judea (1988), Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, San Mateo, CA: Morgan Kaufmann
Williamson, Jon (2000), "Approximating discrete probability distributions with Bayesian networks", Proceedings of the International Conference on Artificial Intelligence in Science and Technology, Tasmania, pp. 16–20 .
Meilă, Marina (1999), "An Accelerated Chow and Liu Algorithm: Fitting Tree Distributions to High-Dimensional Sparse Data", Proceedings of the Sixteenth International Conference on Machine Learning, Morgan Kaufmann, pp. 249–257 .
Chow, C. K.; Wagner (1973), "Consistency of an estimate of tree-dependent probability distribution", IEEE Transactions on Information Theory IT-19 (3): 369–371 .
Kovács, E.; Szántai (2010), "On the approximation of a discrete multivariate probability distribution using the new concept of t-cherry junction tree", Lecture Notes in Economics and Mathematical Systems 633: 39–56 .
Szántai, T.; Kovács (2010), "Hypergraphs as a mean of discovering the dependence structure of a discrete multivariate probability distribution", Annals of Operations Research .

Categories:

Knowledge representation

Wikimedia Foundation. 2010.

Игры ⚽ Поможем написать реферат

Look at other dictionaries:

Bayesian network — A Bayesian network, Bayes network, belief network or directed acyclic graphical model is a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). For example … Wikipedia
Joint probability distribution — In the study of probability, given two random variables X and Y that are defined on the same probability space, the joint distribution for X and Y defines the probability of events defined in terms of both X and Y. In the case of only two random… … Wikipedia
Markov random field — A Markov random field, Markov network or undirected graphical model is a set of variables having a Markov property described by an undirected graph. A Markov random field is similar to a Bayesian network in its representation of dependencies. It… … Wikipedia
Joan Chen — Chinese name 陳冲 (Traditional) Chinese name 陈冲 (Simplified) … Wikipedia
Branches of Wing Chun — The branches of Wing Chun are a student teacher family tree within the Chinese martial art Wing Chun. The different branches of Wing Chun can be thought of as describing both the differing traditions and interpretations of Wing Chun, and the… … Wikipedia
Chinese films of the 2010s — This list is incomplete; you can help by expanding it. Cinema of China List of Chinese films 1905–1919 1920s 1930 … Wikipedia
Way of the Dragon — The Way of the Dragon Way of the Dragon movie poster Traditional 猛龍過江 … Wikipedia
List of Dragon Ball episodes — First volume of the Dragon Ball DVD series, released by Toei Entertainment on April 4, 2007 Dragon Ball is the first in a trilogy of anime adaptations of the Dragon Ball manga series by Akira Toriyama. Produced by Toei Animation, the anime series … Wikipedia
Kung Fu Panda — Theatrical release poster Directed by John Wayne Stevenson Mark Osborne Produced by … Wikipedia
Curse of the Golden Flower — Theatrical release poster Traditional 滿城盡帶黃金甲 … Wikipedia

Academic Dictionaries and Encyclopedias

Chow-Liu tree

Contents

The Chow-Liu representation

The Chow-Liu algorithm

Variations on Chow-Liu trees

See also

Notes

References

Look at other dictionaries:

Share the article and excerpts

Academic Dictionaries and Encyclopedias

Wikipedia

Chow-Liu tree

Contents

The Chow-Liu representation

The Chow-Liu algorithm

Variations on Chow-Liu trees

See also

Notes

References

Look at other dictionaries:

Share the article and excerpts

Direct link