Squared deviations


Squared deviations

In probability theory and statistics, the definition of variance is either the expected value (when considering a theoretical distribution), or average (for actual experimental data) of squared deviations from the mean. Computations for analysis of variance involve the partitioning of a sum of squared deviations. An understanding of the complex computations involved is greatly enhanced by a detailed study of the statistical value:

: operatorname{E}( X ^ 2 ).

It is well-known that for a random variable X with mean mu and variance sigma^2:

: sigma^2 = operatorname{E}( X ^ 2 ) - mu^2 [Mood & Graybill: "An introduction to the Theory of Statistics" (McGraw Hill)]

Therefore

: operatorname{E}( X ^ 2 ) = sigma^2 + mu^2.

From the above, the following are readly derived:

: operatorname{E}left( sumleft( X ^ 2 ight) ight) = nsigma^2 + nmu^2

: operatorname{E}left( left(sum X ight)^ 2 ight) = nsigma^2 + n^2mu^2

Sample variance

The sum of squared deviations needed to calculate variance (before deciding whether to divide by "n" or "n" − 1) is most easily calculated as

: S = sum x ^ 2 - left(sum x ight)^2/n

From the two derived expectations above the expected value of this sum is

: operatorname{E}(S) = nsigma^2 + nmu^2 - (nsigma^2 + n^2mu^2)/n

which implies

: operatorname{E}(S) = (n - 1)sigma^2.

This effectively proves the use of the divisor (n - 1) in the calculation of an unbiased sample estimate of sigma^2

Partition — analysis of variance

In the situation where data is available for "k" different treatment groups having size "ni" where "i" varies from 1 to "k", then it is assumed that the expected mean of each group is

: operatorname{E}(mu_i) = mu + T_i

and the variance of each treatment group is unchanged from the population variance sigma^2.

Under the Null Hyporthesis that the treatments have no effect, then each of the T_i will be zero.

It is now possible to calculate three sums of squares:

;Individual

:I = sum x^2

:operatorname{E}(I) = nsigma^2 + nmu^2

;Treatments

:T = sum_{i=1}^k left(left(sum x ight)^2/n_i ight)

:operatorname{E}(T) = ksigma^2 + sum_{i=1}^k n_i(mu + T_i)^2

:operatorname{E}(T) = ksigma^2 + nmu^2 + 2mu sum_{i=1}^k (n_iT_i) + sum_{i=1}^k n_i(T_i)^2

Under the null hypothesis that the treatments cause no differences and all the T_i are zero, the expectation simplifies to

:operatorname{E}(T) = ksigma^2 + nmu^2.

;Combination

:C = left(sum x ight)^2/n

:operatorname{E}(C) = sigma^2 + nmu^2

ums of squared deviations

Under the null hypothesis, the difference of any pair of "I", "T", and "C" does not contain any dependency on mu, only sigma^2.

:operatorname{E}(I - C) = (n - 1)sigma^2 total squared deviations

:operatorname{E}(T - C) = (k - 1)sigma^2 treatment squared deviations

:operatorname{E}(I - T) = (n - k)sigma^2 residual squared deviations

The constants ("n" − 1), ("k" − 1), and ("n" − "k") are normally referred to as the number of degrees of freedom.

Example

In a very simple example, 5 observations arise from two treatments. The first treatment gives three values 1, 2, and 3, and the second treatment gives two values 4, and 6.

:I = frac{1^2}{1} + frac{2^2}{1} + frac{3^2}{1} + frac{4^2}{1} + frac{6^2}{1} = 66

:T = frac{(1 + 2 + 3)^2}{3} + frac{(4 + 6)^2}{2} = 12 + 50 = 62

:C = frac{(1 + 2 + 3 + 4 + 6)^2}{5} = 256/5 = 51.2

Giving

: Total squared deviations = 66 − 51.2 = 14.8 with 4 degrees of freedom.: Treatment squared deviations = 62 − 51.2 = 10.8 with 1 degree of freedom.: Residual squared deviations = 66 − 62 = 4 with 3 degrees of freedom.

Two-way analysis of variance

The following hypothetical example gives the yields of 15 plants subject to two different environmental variations, and three different fertilisers.

ee also

* Variance decomposition
* Errors and residuals in statistics

References


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Mean squared error — In statistics, the mean squared error (MSE) of an estimator is one of many ways to quantify the difference between values implied by a kernel density estimator and the true values of the quantity being estimated. MSE is a risk function,… …   Wikipedia

  • Mean squared prediction error — In statistics the mean squared prediction error of a smoothing procedure is the expected sum of squared deviations of the fitted values from the (unobservable) function g. If the smoothing procedure has operator matrix L, then The MSPE can be… …   Wikipedia

  • Pearson's chi-squared test — (χ2) is the best known of several chi squared tests – statistical procedures whose results are evaluated by reference to the chi squared distribution. Its properties were first investigated by Karl Pearson in 1900.[1] In contexts where it is… …   Wikipedia

  • D'Agostino's K-squared test — In statistics, D’Agostino’s K2 test is a goodness of fit measure of departure from normality, that is the test aims to establish whether or not the given sample comes from a normally distributed population. The test is based on transformations of …   Wikipedia

  • Standard deviation — In probability and statistics, the standard deviation is a measure of the dispersion of a collection of values. It can apply to a probability distribution, a random variable, a population or a data set. The standard deviation is usually denoted… …   Wikipedia

  • statistics — /steuh tis tiks/, n. 1. (used with a sing. v.) the science that deals with the collection, classification, analysis, and interpretation of numerical facts or data, and that, by use of mathematical theories of probability, imposes order and… …   Universalium

  • Variance — In probability theory and statistics, the variance of a random variable, probability distribution, or sample is one measure of statistical dispersion, averaging the squared distance of its possible values from the expected value (mean). Whereas… …   Wikipedia

  • Descriptive statistics — quantitatively describe the main features of a collection of data.[1] Descriptive statistics are distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aim to summarize a data set, rather than use the… …   Wikipedia

  • Deviation (statistics) — In mathematics and statistics, deviation is a measure of difference for interval and ratio variables between the observed value and the mean. The sign of deviation (positive or negative), reports the direction of that difference (it is larger… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia