Entropy estimation

Entropy estimation

Estimating the differential entropy of a system or process, given some observations, is useful in various science/engineering applications, such as Independent Component Analysis [Dinh-Tuan Pham (2004) Fast algorithms for mutual information based independent component analysis. In "Signal Processing". Volume 52, Issue 10, 2690 - 2700, doi|10.1109/TSP.2004.834398] , image analysis [Chang, C.-I.; Du, Y.; Wang, J.; Guo, S.-M.; Thouin, P.D. (2006) Survey and comparative analysis of entropy and relative entropy thresholding techniques. In "Vision, Image and Signal Processing", Volume 153, Issue 6, 837 - 850, doi|10.1049/ip-vis:20050032] , genetic analysis [Martins, D. C. "et al" (2008) Intrinsically Multivariate Predictive Genes. In "Selected Topics in Signal Processing". Volume 2, Issue 3, 424 - 439, doi|10.1109/JSTSP.2008.923841] , speech recognition [Gue Jun Jung; Yung-Hwan Oh (2008) Information Distance-Based Subvector Clustering for ASR Parameter Quantization. In "Signal Processing Letters", Volume 15, 209 - 212, doi|10.1109/LSP.2007.913132 ] , manifold learning [Costa, J.A.; Hero, A.O. (2004), Geodesic entropic graphs for dimension and entropy estimation in manifold learning. In "Signal Processing", Volume 52, Issue 8, 2210 - 2221, doi|10.1109/TSP.2004.831130] , and time delay estimation [Benesty, J.; Yiteng Huang; Jingdong Chen (2007) Time Delay Estimation via Minimum Entropy. In "Signal Processing Letters", Volume 14, Issue 3, March 2007 157 - 160 doi|10.1109/LSP.2006.884038 ] . The simplest and most common approach uses histogram-based estimation, but other approaches have been developed and used, each with their own benefits and drawbacks.J. Beirlant, E. J. Dudewicz, L. Gyorfi, and E. C. van der Meulen (1997) [http://www.menem.com/ilya/digital_library/entropy/beirlant_etal_97.pdf Nonparametric entropy estimation: An overview] ] . In "International Journal of Mathematical and Statistical Sciences", Volume 6, pp. 17– 39.] The main factor in choosing a method is often a trade-off between the bias and the variance of the estimateT. Schürmann, Bias analysis in entropy estimation. In "J. Phys. A: Math. Gen", 37 (2004), pp. L295–L301. doi|10.1088/0305-4470/37/27/L02] although the nature of the (suspected) distribution of the data may also be a factor.

Histogram estimator

The histogram approach uses the idea that the differential entropy,

:H(X) = -int_mathbb{X} f(x)log f(x),dx

can be approximated by producing a histogram of the observations, and then finding the discrete entropy

:egin{matrix}H(X) = - displaystyle{sum_{i=1}^nf(x_i)log f(x_i)} qquadend{matrix}

of that histogram (which is itself a maximum-likelihood estimate of the discretized frequency distribution). Histograms can be quick to calculate, and simple, so this approach has some attractions. However, the estimate produced is biased, and although corrections can be made to the estimate, they may not always be satisfactory. G. Miller (1955) Note on the bias of information estimates. In "Information Theory in Psychology: Problems and Methods", pp. 95–100.]

Estimates based on sample-spacings

If the data is one-dimensional, we can imagine taking all the observations and putting them in order of their value. The spacing between one value and the next then gives us a rough idea of (the reciprocal of) the probability density in that region: the closer together the values are, the higher the probability density. This is a very rough estimate with high variance, but can be improved, for example by thinking about the space between a given value and the one "m" away from it, where "m" is some fixed number.

The probability density estimated in this way can then be used to calculate the entropy estimate, in a similar way to that given above for the histogram, but with some slight tweaks.

One of the main drawbacks with this approach is going beyond one dimension: the idea of lining the data points up in order falls apart in more than one dimension. However, using analogous methods, some multidimensional entropy estimators have been developed.E. G. Learned-Miller (2003) A new class of entropy estimators for multi-dimensional densities, in "Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’03)", vol. 3, April 2003, pp. 297–300.]

Estimates based on nearest-neighbours

For each point in our dataset, we can find the distance to its nearest neighbour. We can in fact estimate the entropy from the distribution of the nearest-neighbour-distance of our datapoints. (In a uniform distribution these distances all tend to be fairly similar, whereas in a strongly nonuniform distribution they may vary a lot more.)

References


Wikimedia Foundation. 2010.

Игры ⚽ Поможем решить контрольную работу

Look at other dictionaries:

  • Entropy (information theory) — In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the information… …   Wikipedia

  • Estimation theory — is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data. The parameters describe an underlying physical setting in such a way that the value of the parameters affects… …   Wikipedia

  • Maximum spacing estimation — The maximum spacing method tries to find a distribution function such that the spacings, D(i), are all approximately of the same length. This is done by maximizing their geometric mean. In statistics, maximum spacing estimation (MSE or MSP), or… …   Wikipedia

  • Differential entropy — (also referred to as continuous entropy) is a concept in information theory that extends the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continuous probability distributions. Contents 1 Definition 2… …   Wikipedia

  • Cross-entropy method — The cross entropy (CE) method attributed to Reuven Rubinstein is a general Monte Carlo approach to combinatorial and continuous multi extremal optimization and importance sampling. The method originated from the field of rare event simulation,… …   Wikipedia

  • Cross entropy — In information theory, the cross entropy between two probability distributions measures the average number of bits needed to identify an event from a set of possibilities, if a coding scheme is used based on a given probability distribution q,… …   Wikipedia

  • Maximum entropy spectral estimation — The maximum entropy method applied to spectral density estimation. The overall idea is that the maximum entropy rate stochastic process that satisfies the given constant autocorrelation and variance constraints, is a linear Gauss Markov process… …   Wikipedia

  • Principle of maximum entropy — This article is about the probability theoretic principle. For the classifier in machine learning, see maximum entropy classifier. For other uses, see maximum entropy (disambiguation). Bayesian statistics Theory Bayesian probability Probability… …   Wikipedia

  • Maximum entropy — may refer to: The principle of maximum entropy The maximum entropy probability distribution Maximum entropy spectral estimation Maximum entropy spectral analysis Maximum entropy thermodynamics The law of maximum entropy production Entropy… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”