Sum of squares

Sum of squares

Sum of squares is a concept that permeates much of inferential statistics and descriptive statistics. More properly, it is "the sum of the squared deviations". Mathematically, it is an unscaled, or unadjusted measure of dispersion (also called variability). When scaled for the number of degrees of freedom, it estimates the variance, or spread of the observations about their mean value.

The distance from any point in a collection of data, to the mean of the data, is the deviation. This can be written as X_i - overline{X}, where X_i is the ith data point, and overline{X} is the estimate of the mean. If all such deviations are squared, then summed, as in sum_{i=1}^nleft(X_i-overline{X}, ight)^2, we have the "sum of squares" for these data.

When more data are added to the collection, the sum of squares will increase, except in unlikely cases such as the new data being equal to the mean. So usually, the sum of squares will grow with the size of the data collection. That is a manifestation of the fact that it is unscaled.

In many cases, the number of degrees of freedom is simply the number of data in the collection, minus one. We write this as "n" − 1, where "n" is the number of data.

Scaling (also known as normalizing) means adjusting the sum of squares so that it does not grow as the size of the data collection grows. This is important when we want to compare samples of different sizes, such as a sample of 100 people compared to a sample of 20 people. If the sum of squares was not normalized, its value would always be larger for the sample of 100 people than for the sample of 20 people. To scale the sum of squares, we divide it by the degrees of freedom, i.e., calculate the sum of squares per degree of freedom, or variance. Standard deviation, in turn, is the square root of the variance.

The above information is how sum of squares is used in descriptive statistics; see the article on total sum of squares for an application of this broad principle to inferential statistics.

ee also

*Explained sum of squares
*Polynomial SOS
*Residual sum of squares
*Mean squared error

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Sum Of Squares — A statistical technique used in regression analysis. The sum of squares is a mathematical approach to determining the dispersion of data points. In a regression analysis, the goal is to determine how well a data series can be fitted to a function …   Investment dictionary

  • Lack-of-fit sum of squares — In statistics, a sum of squares due to lack of fit, or more tersely a lack of fit sum of squares, is one of the components of a partition of the sum of squares in an analysis of variance, used in the numerator in an F test of the null hypothesis… …   Wikipedia

  • Total sum of squares — The value of the total sum of squares (TSS) depends on the data being analyzed and the test that is being done.In statistical linear models, (particularly in standard regression models), the TSS is the sum of the squares of the difference of the… …   Wikipedia

  • Explained sum of squares — In statistics, an explained sum of squares (ESS) is the sum of squared predicted values in a standard regression model (for example y {i}=a+bx {i}+epsilon {i}), where y {i} is the response variable, x {i} is the explanatory variable, a and b are… …   Wikipedia

  • Residual sum of squares — In statistics, the residual sum of squares (RSS) is the sum of squares of residuals. It is the discrepancy between the data and our estimation model. The smaller this discrepancy is, the better the estimation will be.:RSS = sum {i=1}^n (y i f(x… …   Wikipedia

  • Residual Sum Of Squares - RSS — A statistical technique used to measure the amount of variance in a data set that is not explained by the regression model. The residual sum of squares is a measure of the amount of error remaining between the regression function and the data set …   Investment dictionary

  • Sum of two squares — In mathematics, sums of two squares occur in a number of contexts:* The Pythagorean theorem says that the square on the hypotenuse of a right triangle is equal in area to the sum of the squares on the legs * Brahmagupta–Fibonacci identity says… …   Wikipedia

  • Ordinary least squares — This article is about the statistical properties of unweighted linear regression analysis. For more general regression analysis, see regression analysis. For linear regression on a single variable, see simple linear regression. For the… …   Wikipedia

  • Least squares — The method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. Least squares means that the overall solution minimizes the sum of… …   Wikipedia

  • Linear least squares/Proposed — Linear least squares is an important computational problem, that arises primarily in applications when it is desired to fit a linear mathematical model to observations obtained from experiments. Mathematically, it can be stated as the problem of… …   Wikipedia