# Jensen–Shannon divergence

﻿
Jensen–Shannon divergence

In probability theory and statistics, the Jensen-Shannon divergence is a popular method of measuring the similarity between two probability distributions. It is also known as information radius (IRad) [cite book |author=Hinrich Schütze; Christopher D. Manning|title=Foundations of Statistical Natural Language Processing |publisher=MIT Press |location=Cambridge, Mass |year=1999 |pages=p. 304 |isbn=0-262-13360-1 |url=http://nlp.stanford.edu/fsnlp/ |doi=] or total divergence to the average [cite journal|title=Similarity-Based Methods For Word Sense Disambiguation|journal=Proceedings of the Thirty-Fifth Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics|date=1997|first=Ido|last=Dagan|coauthors=Lillian Lee, Fernando Pereira|volume=|issue=|pages=pp. 56–63|id= |url=http://citeseer.ist.psu.edu/dagan97similaritybased.html|format=|accessdate=2008-03-09 ] . It is based on the Kullback-Leibler divergence, with the notable (and useful) difference that it is always a finite value.

Definition

Consider the set $M_+^1\left(A\right)$ of probability distributions where A is a set provided with some σ-algebra.

Jensen-Shannon divergence (JSD) $M_+^1\left(A\right) imes M_+^1\left(A\right) ightarrow \left[0,1\right]$ is a symmetrized and smoothed version of the Kullback-Leibler divergence$D\left(P parallel Q\right)$.It is defined by

$JSD\left(P parallel Q\right)= frac\left\{1\right\}\left\{2\right\}D\left(P parallel M\right)+frac\left\{1\right\}\left\{2\right\}D\left(Q parallel M\right)$

where $M=frac\left\{1\right\}\left\{2\right\}\left(P+Q\right)$

ee also

Kullback-Leibler divergence for details about calculating the Jensen-Shannon divergence.

References

*Jensen-Shannon Divergence and Hilbert space embedding, Bent Fuglede and Flemming Topsøe University of Copenhagen, Department of Mathematics [http://www.math.ku.dk/~topsoe/ISIT2004JSD.pdf]
* J. Lin. [http://citeseer.ist.psu.edu/context/395386/0 Divergence measures based on the shannon entropy.] IEEE Trans. on Information Theory, 37(1):145--151, January 1991.
* Y. Ofran & B. Rost. [http://citeseer.ist.psu.edu/ofran03analysing.html Analysing Six Types of Protein-Protein Interfaces.] 2003.

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Divergence (disambiguation) — Divergence can refer to: In mathematics: Divergence, a function that associates a scalar with every point of a vector field Divergence (computer science), a computation which does not terminate (or terminates in an exceptional state) Divergence… …   Wikipedia

• Kullback–Leibler divergence — In probability theory and information theory, the Kullback–Leibler divergence (also information divergence, information gain, relative entropy, or KLIC) is a non symmetric measure of the difference between two probability distributions P …   Wikipedia

• List of mathematics articles (J) — NOTOC J J homomorphism J integral J invariant J. H. Wilkinson Prize for Numerical Software Jaccard index Jack function Jacket matrix Jackson integral Jackson network Jackson s dimensional theorem Jackson s inequality Jackson s theorem Jackson s… …   Wikipedia

• List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

• String metric — String metrics (also known as similarity metrics) are a class of textual based metrics resulting in a similarity or dissimilarity (distance) score between two pairs of text strings for approximate matching or comparison and in fuzzy string… …   Wikipedia

• Akaike information criterion — Akaike s information criterion, developed by Hirotsugu Akaike under the name of an information criterion (AIC) in 1971 and proposed in Akaike (1974), is a measure of the goodness of fit of an estimated statistical model. It is grounded in the… …   Wikipedia

• Bayesian information criterion — In statistics, in order to describe a particular dataset, one can use non parametric methods or parametric methods. In parametric methods, there might be various candidate models with different number of parameters to represent a dataset. The… …   Wikipedia

• Deviance information criterion — The deviance information criterion (DIC) is a hierarchical modeling generalization of the AIC (Akaike information criterion) and BIC (Bayesian information criterion, also known as the Schwarz criterion). It is particularly useful in Bayesian… …   Wikipedia

• JSD — is a three letter abbreviation with multiple meanings, as described below:* Jensen Shannon divergence * Jackson System Development * Doctor of Juridical Science * Joint science department * James Simpson Daniel, English rugby union player …   Wikipedia

• Rényi entropy — In information theory, the Rényi entropy, a generalisation of Shannon entropy, is one of a family of functionals for quantifying the diversity, uncertainty or randomness of a system. It is named after Alfréd Rényi. The Rényi entropy of order α,… …   Wikipedia