Total correlation


Total correlation

In probability theory and in particular in information theory, total correlation (Watanabe 1960) is one of several generalizations of the mutual information. It is also known as the multivariate constraint (Garner 1962) or multiinformation (Studený & Vejnarová 1999). It quantifies the redundancy or dependency among a set of n random variables.

Contents

Definition

For a given set of n random variables \{X_1,X_2,\ldots,X_n\}, the total correlation C(X_1,X_2,\ldots,X_n) is defined as the Kullback–Leibler divergence from the independent distribution of p(X_1)p(X_2)\cdots p(X_n) to the joint distribution p(X_1, \ldots, X_n),

C(X_1, X_2, \ldots, X_n) \equiv \operatorname{D_{KL}}\left[ p(X_1, \ldots, X_n) \| p(X_1)p(X_2)\cdots p(X_n)\right] \; .


This divergence reduces to the simpler difference of entropies,

C(X_1,X_2,\ldots,X_n) = \sum_{i=1}^n H(X_i) - H(X_1, X_2, \ldots, X_n)

where H(Xi) is the information entropy of variable X_i \,, and H(X_1,X_2,\ldots,X_n) is the joint entropy of the variable set \{X_1,X_2,\ldots,X_n\}. In terms of the discrete probability distributions on variables \{X_1, X_2, \ldots, X_n\} , the total correlation is given by

C(X_1,X_2,\ldots,X_n)= \sum_{x_1\in\mathcal{X}_1} \sum_{x_2\in\mathcal{X}_2} \ldots \sum_{x_n\in\mathcal{X}_n} p(x_1,x_2,\ldots,x_n)\log\frac{p(x_1,x_2,\ldots,x_n)} {p(x_1)p(x_2)\cdots p(x_n)}.

The total correlation is the amount of information shared among the variables in the set. The sum \begin{matrix}\sum_{i=1}^n H(X_i)\end{matrix} represents the amount of information in bits (assuming base-2 logs) that the variables would possess if they were totally independent of one another (non-redundant), or, equivalently, the average code length to transmit the values of all variables if each variable was (optimally) coded independently. The term H(X_{1},X_{2},\ldots ,X_{n}) is the actual amount of information that the variable set contains, or equivalently, the average code length to transmit the values of all variables if the set of variables was (optimally) coded together. The difference between these terms therefore represents the absolute redundancy (in bits) present in the given set of variables, and thus provides a general quantitative measure of the structure or organization embodied in the set of variables (Rothstein 1952). The total correlation is also the Kullback–Leibler divergence between the actual distribution p(X_1,X_2,\ldots,X_n) and its maximum entropy product approximation p(X_1)p(X_2)\cdots p(X_n).

Total correlation tells us in the most general sense how cohesive or related are a group of variables. A near-zero total correlation indicates that the variables in the group are essentially statistically independent; they are completely unrelated, in the sense that knowing the value of one variable does not provide any clue as to the values of the other variables. On the other hand, the maximum total correlation, given by

C_\max = \sum_{i=1}^n H(X_i)-\max\limits_{X_i}H(X_i)

occurs when one of the variables is completely redundant with all of the other variables. The variables are then maximally related in the sense that knowing the value of one variable provides complete information about the values of all the other variable, and the variables can be figuratively regarded as cogs, in which the position of one cog determines the positions of all the others (Rothstein 1952).

It is important to note that the total correlation counts up all the redundancies among a set of variables, but that these redundancies may be distributed throughout the variable set in a variety of complicated ways (Garner 1962). For example, some variables in the set may be totally inter-redundant while others in the set are completely independent. Perhaps more significantly, redundancy may be carried in interactions of various degrees: A group of variables may not possess any pairwise redundancies, but may possess higher-order interaction redundancies of the kind exemplified by the parity function. The decomposition of total correlation into its constituent redundancies is explored in a number sources (Mcgill 1954, Watanabe 1960, Garner 1962, Studeny & Vejnarova 1999, Jakulin & Bratko 2003a, Jakulin & Bratko 2003b, Nemenman 2004, Han 1978, Han 1980).

Conditional total correlation

Conditional total correlation is defined analogously to the total correlation, but adding a condition to each term. Conditional total correlation is similarly defined as a Kullback-Leibler divergence between two conditional probability distributions,

C(X_1, X_2, \ldots, X_n|Y=y) \equiv \operatorname{D_{KL}}\left[ p(X_1, \ldots, X_n|Y=y) \| p(X_1|Y=y)p(X_2|Y=y)\cdots p(X_n|Y=y)\right] \; .


Analogous to the above, conditional total correlation reduces to a difference of conditional entropies,

C(X_1,X_2,\ldots,X_n|Y=y) = \sum_{i=1}^n H(X_i|Y=y) - H(X_1, X_2, \ldots, X_n|Y=y)

Uses of total correlation

Clustering and feature selection algorithms based on total correlation have been explored by Watanabe. Alfonso et al. (2010) applied the concept of total correlation on the optimisation of water monitoring networks.

See also


References

  • Alfonso, L., Lobbrecht, A., and Price, R. (2010). Optimization of Water Level Monitoring Network in Polder Systems Using Information Theory, Water Resources Research, 46, W12553, 13 PP., 2010, doi:10.1029/2009WR008953.
  • Garner W R (1962). Uncertainty and Structure as Psychological Concepts, JohnWiley & Sons, New York.
  • Han T S (1978). Nonnegative entropy measures of multivariate symmetric correlations, Information and Control 36, 133–156.
  • Han T S (1980). Multiple mutual information and multiple interactions in frequency data, Information and Control 46, 26–45.
  • Jakulin A & Bratko I (2003a). Analyzing Attribute Dependencies, in N Lavra\quad{c}, D Gamberger, L Todorovski & H Blockeel, eds, Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Springer, Cavtat-Dubrovnik, Croatia, pp. 229–240.
  • Jakulin A & Bratko I (2003b). Quantifying and visualizing attribute interactions [1].
  • McGill W J (1954). Multivariate information transmission, Psychometrika 19, 97–116.
  • Nemenman I (2004). Information theory, multivariate dependence, and genetic network inference [2].
  • Rothstein J (1952). Organization and entropy, Journal of Applied Physics 23, 1281–1282.
  • Studený M & Vejnarová J (1999). The multiinformation function as a tool for measuring stochastic dependence, in M I Jordan, ed., Learning in Graphical Models, MIT Press, Cambridge, MA, pp. 261–296.
  • Watanabe S (1960). Information theoretical analysis of multivariate correlation, IBM Journal of Research and Development 4, 66–82.

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Dual total correlation — In information theory, dual total correlation (Han 1978) or excess entropy (Olbrich 2008) is one of the two known non negative generalizations of mutual information. While total correlation is bounded by the sum entropies of the n elements, the… …   Wikipedia

  • Correlation and dependence — This article is about correlation and dependence in statistical data. For other uses, see correlation (disambiguation). In statistics, dependence refers to any statistical relationship between two random variables or two sets of data. Correlation …   Wikipedia

  • Correlation — In probability theory and statistics, correlation, (often measured as a correlation coefficient), indicates the strength and direction of a linear relationship between two random variables. In general statistical usage, correlation or co relation …   Wikipedia

  • Correlation electronique — Corrélation électronique L expression de corrélation électronique est utilisée pour décrire l interaction entre électrons dans un système quantique, dont on considère la structure électronique. Le terme de corrélation provient des mathématiques… …   Wikipédia en Français

  • Corrélation Électronique — L expression de corrélation électronique est utilisée pour décrire l interaction entre électrons dans un système quantique, dont on considère la structure électronique. Le terme de corrélation provient des mathématiques statistiques et indique… …   Wikipédia en Français

  • Total suspended solids — is a water quality measurement usually abbreviated TSS. It is listed as a conventional pollutant in the U.S. Clean Water Act. This parameter was at one time called non filterable residue (NFR), a term that refers to the identical measurement: the …   Wikipedia

  • Total organic carbon — (TOC) is the amount of carbon bound in an organic compound and is often used as a non specific indicator of water quality or cleanliness of pharmaceutical manufacturing equipment. A typical analysis for TOC measures both the total carbon present… …   Wikipedia

  • Total least squares — The bivariate (Deming regression) case of Total Least Squares. The red lines show the error in both x and y. This is different from the traditional least squares method which measures error parallel to the y axis. The case shown, with deviations… …   Wikipedia

  • Correlation attack — In cryptography, correlation attacks are a class of known plaintext attacks for breaking stream ciphers whose keystream is generated by combining the output of several linear feedback shift registers (called LFSRs for the rest of this article)… …   Wikipedia

  • Corrélation électronique — L expression de corrélation électronique est utilisée pour décrire l interaction entre électrons dans un système quantique, dont on considère la structure électronique. Le terme de corrélation provient des mathématiques statistiques et indique… …   Wikipédia en Français