Explained variation


Explained variation

In statistics, explained variation or explained randomness measures the proportion to which a mathematical model accounts for the variation (= apparent randomness) of a given data set. Often, variation is quantified as variance; then, the more specific term explained variance can be used.

The complementary part of the total variation/randomness/variance is called unexplained or residual.

Definition

Explained variation is a relatively recent concept. The most authoritative source seems to be Kent (1983) who founded his definition on information theory.

Information gain by better modelling

Following Kent (1983), we use the Fraser information (Fraser 1965):F( heta) = int extrm{d}r,g(r),ln f(r; heta)where extrm{d}r,g(r) is the probability density of a random variable R,, and f(r; heta), with hetainTheta_i (i=0,1,) are two families of parametric models.Model family 0 is the simpler one, with a restricted parameter space Theta_0subsetTheta_1.

Parameters are determined by maximum likelihood estimation,: heta_i = mbox{arg max}_{ hetainTheta_i} F( heta).

The information gain of model 1 over model 0 is written as:Gamma( heta_1: heta_0) = 2 [ F( heta_1)-F( heta_0) ] ,where a factor of 2 is included for convenience.Γ is always nonnegative; it measures the extent to which the best model of family 1 is better than the best model of family 0 in explaining "g(r)".

Information gain by a conditional model

Assume a two-dimensional random variable R=(X,Y) where "X" shall be considered as an explanatory variable, and "Y" as a dependent variable. Models of family 1 "explain" "Y" in terms of "X", :f(y|x; heta), whereas in family 0, "X" and "Y" are assumed to be independent.We define the randomness of "Y" by D(Y)=exp [-2F( heta_0)] ,and the randomness of "Y", given "X", by D(Y|X)=exp [-2F( heta_1)] .Then,: ho_C^2 = 1-D(Y|X)/D(Y)can be interpreted as proportion of the randomness which is explained by "X".

Special cases and generalized usage

For special models, the above definition yields particularly appealing results.Regrettably, these simplified definitions of explained variance are used even in situations where the underlying assumptions do not hold.

Linear regression

The fraction of variance unexplained is an established concept in the context of linear regression. The usual definition of the coefficient of determination seems to be compatible with the fundamental definition of explained variance.

Correlation coefficient as measure of explained variance

Let "X" be a random vector, and "Y" a random variable that is modeled by a normal distribution with centre mu+Psi^ extrm{T}X. In this case, the above-derived proportion of randomness ho_C^2 equals the squared correlation coefficient R^2.

Note the strong model assumptions: the centre of the "Y" distribution must be a linear function of "X",and for any given "x", the "Y" distribution must be normal. In other situations, it is generally not justified to interpret R^2 as proportion of explained variance.

Explained variance in principal component analysis

"Explained variance" is routinely used in principal component analysis. The relation to the Fraser-Kent information gain remains to be clarified.

Criticism

As "explained variance" essentially equals the correlation coefficient R^2, it shares all the disadvantages of the latter: it reflects not only the quality of the regression, but also the distribution of the independent (conditioning) variables.

In the words of one critic: "Thus R^2 gives the 'percentage of variance explained' by the regression, an expression that, for most social scientists, is of doubtful meaning but great rhetorical value. If this number is large, the regression gives a good fit, and there is little point in searching for additional variables. Other regression equations on different data sets are said to be lesssatisfactory or less powerful if their R^2 is lower. Nothing about R^2 supports these claims" [Achen 1982, p. 58] . And, after constructing an example where R^2 is enhanced just by jointly considering data from two different populations: "'Explained variance' explains nothing" [Achen 1990, p. 183] .

Further information

Literature

* D A S Fraser (1965): "On Information in Statistics", Ann. Math. Statist. 36 (3) 890-896.
* C H Achen (1982): "Interpreting and Using Regression", Beverly Hills: Sage.
* J T Kent (1983): "Information gain and a general measure of correlation", Biometrika 70(1)163-173.
* C H Achen (1990): "What Does "Explained Variance" Explain?: Reply", [http://pan.oxfordjournals.org/cgi/content/abstract/2/1/173 Political Analysis 2(1)173-184] .

External links

* [http://www.documentingexcellence.com/stat_tool/variance.htm Variance, explained and unexplained]
* [http://spirxpert.com/statistical7.htm Explained variance]
* [http://darwin.cwru.edu/~witte/statistics/explained_variance.htm Explained and Unexplained Variance on a graph]


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Radial Force Variation — Tires provide for steering, traction, braking, and load support by transmitting forces between the vehicle and the road. Radial Force Variation (RFV) is a property of a tire that characterizes its dynamic behavior of these forces. High values of… …   Wikipedia

  • Unwarranted variation — Unwarranted Variation, first termed by Dr. John Wennberg in his decades of research [ [http://www.businessweek.com/magazine/content/06 22/b3986001.htm Medical Guesswork ] ] , can be defined as differences in healthcare service delivery that… …   Wikipedia

  • Solar variation — Solar variations are changes in the amount of solar radiation emitted by the Sun. There are periodic components to these variations, the principal one being the 11 year solar cycle (or sunspot cycle), as well as aperiodic fluctuations. Solar… …   Wikipedia

  • Lateral Force Variation — Tires provide for steering, traction, braking, and load support by transmitting forces between the vehicle and the road. Lateral Force Variation (LFV) is a property of a tire that characterizes its dynamic behavior of these forces. High values of …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • Fraction of variance unexplained — In statistics, the fraction of variance unexplained (or FVU) in the context of a regression task is the amount of variance of the regressand Y which cannot be explained, i.e., which is not correctly predicted, by the explanatory variable X .For a …   Wikipedia

  • Law of total variance — In probability theory, the law of total variance or variance decomposition formula states that if X and Y are random variables on the same probability space, and the variance of X is finite, then:operatorname{var}(X)=operatorname{E}(operatorname{v… …   Wikipedia

  • List of mathematics articles (E) — NOTOC E E₇ E (mathematical constant) E function E₈ lattice E₈ manifold E∞ operad E7½ E8 investigation tool Earley parser Early stopping Earnshaw s theorem Earth mover s distance East Journal on Approximations Eastern Arabic numerals Easton s… …   Wikipedia

  • Unexplained — means not explained and may refer to:* Unexplained , a 1992 rock music album by EMFTelevision * The Unexplained , a 1990s documentary television series * Unexplained Mysteries , a 2003 documentary television series * Unexplained Canada , a 2006… …   Wikipedia

  • evolution — evolutional, adj. evolutionally, adv. /ev euh looh sheuhn/ or, esp. Brit., /ee veuh /, n. 1. any process of formation or growth; development: the evolution of a language; the evolution of the airplane. 2. a product of such development; something… …   Universalium