Statistical model

Statistical model

A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but stochastically related. In mathematical terms, a statistical model is frequently thought of as a pair (Y,P) where Y is the set of possible observations and P the set of possible probability distributions on Y. It is assumed that there is a distinct element of P which generates the observed data. Statistical inference enables us to make statements about which element(s) of this set are likely to be the true one.

Most statistical tests can be described in the form of a statistical model. For example, the Student's t-test for comparing the means of two groups can be formulated as seeing if an estimated parameter in the model is different from 0. Another similarity between tests and models is that there are assumptions involved. Error is assumed to be normally distributed in most models.[1]


Formal definition

A Statistical model, \mathcal{P}, is a collection of probability distribution functions or probability density functions (collectively referred to as distributions for brevity). A parametric model is a collection of distributions, each of which is indexed by a unique finite-dimensional parameter: \mathcal{P}=\{\mathbb{P}_{\theta} : \theta \in \Theta\}, where θ is a parameter and \Theta \subseteq \mathbb{R}^d is the feasible region of parameters, which is a subset of d-dimensional Euclidean space. A statistical model may be used to describe the set of distributions from which one assumes that a particular data set is sampled. For example, if one assumes that data arise from a univariate Gaussian distribution, then one has assumed a Gaussian model: \mathcal{P}=\{\mathbb{P}(x; \mu, \sigma) = \frac{1}{\sqrt{2 \pi} \sigma} \exp\left\{ -\frac{1}{2\sigma^2}(x-\mu)^2\right\} : \mu \in \mathbb{R}, \sigma > 0\}.

A non-parametric model is a set of probability distributions with infinite dimensional parameters, and might be written as \mathcal{P}=\{\text{all distributions}\}. A semi-parametric model also has infinite dimensional parameters, but is not dense in the space of distributions. For example, a mixture of Gaussians with one Gaussian at each data point is dense is the space of distributions. Formally, if d is the dimension of the parameter, and n is the number of samples, if d \rightarrow \infty as n \rightarrow \infty and d/n \rightarrow 0 as n \rightarrow \infty, then the model is semi-parametric.

Model comparison

Models can be compared to each other. This can either be done when you have done an exploratory data analysis or a confirmatory data analysis. In an exploratory analysis, you formulate all models you can think of, and see which describes your data best. In a confirmatory analysis you test which of your models you have described before the data was collected fits the data best, or test if your only model fits the data. In linear regression analysis you can compare the amount of variance explained by the independent variables, R2, across the different models. In general, you can compare models that are nested by using a Likelihood-ratio test. Nested models are models that can be obtained by restricting a parameter in a more complex model to be zero.

An example

Length and age are probabilistically distributed over humans. They are stochastically related, when you know that a person is of age 7, this influences the chance of this person being 6 feet tall. You could formalize this relationship in a linear regression model of the following form: lengthi = b0 + b1agei + εi, where b0 is the intercept, b1 is a parameter that age is multiplied by to get a prediction of length, ε is the error term, and i is the subject. This means that length starts at some value, there is a minimum length when someone is born, and it is predicted by age to some amount. This prediction is not perfect as error is included in the model. This error contains variance that stems from sex and other variables. When sex is included in the model, the error term will become smaller, as you will have a better idea of the chance that a particular 16-year-old is 6 feet tall when you know this 16-year-old is a girl. The model would become lengthi = b0 + b1agei + b2sexi + εi, where the variable sex is dichotomous. This model would presumably have a higher R2. The first model is nested in the second model: the first model is obtained from the second when b2 is restricted to zero.


According to the number of the endogenous variables and the number of equations, models can be classified as complete models (the number of equations equals to the number of endogenous variables) and incomplete models. Some other statistical models are the general linear model (restricted to continuous dependent variables), the generalized linear model (for example, logistic regression), the multilevel model, and the structural equation model.[2]

See also


  1. ^ Field, A. (2005). Discovering statistics using SPSS. Sage, London.
  2. ^ Adèr, H.J. (2008). Chapter 12: Modelling. In H.J. Adèr & G.J. Mellenbergh (Eds.) (with contributions by D.J. Hand), Advising on Research Methods: A consultant's companion (pp. 271-304). Huizen, The Netherlands: Johannes van Kessel Publishing.

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • statistical model of nucleus — statistinis branduolio modelis statusas T sritis fizika atitikmenys: angl. statistical model of nucleus; statistical nuclear model vok. statistisches Kernmodell, n; statistisches Modell des Kerns, n rus. статистическая модель ядра, f pranc.… …   Fizikos terminų žodynas

  • Statistical model validation — Model validation is possibly the most important step in the model building sequence. It is also one of the most overlooked. Often the validation of a model seems to consist of nothing more than quoting the R 2 statistic from the fit (which… …   Wikipedia

  • Model selection — is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is …   Wikipedia

  • Model output statistics — (MOS) is an omnipresent statistical technique that forms the backbone of modern weather forecasting. The technique pioneered in the 1960s and early 1970s is used to post process output from numerical weather forecast models. Generally speaking,… …   Wikipedia

  • Model-based testing — is the application of Model based design for designing and optionally executing the necessary artifacts to perform software testing. Models can be used to represent the desired behavior of the System Under Test (SUT), or to represent the desired… …   Wikipedia

  • model — mod‧el [ˈmɒdl ǁ ˈmɑːdl] noun 1. [countable] a particular type or design of a vehicle or machine: • the cheapest model in the Volkswagen range • Our photocopier is the latest model. see also brand1, make2 …   Financial and business terms

  • Statistical inference — In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation.[1] More substantially, the terms statistical inference,… …   Wikipedia

  • Statistical graphics — thumb|240px|John Snow s Cholera map in dot style, 1854.Statistical graphics, also known as graphical techniques, are information graphics in the field of statistics used to visualize quantitative data. Overview Statistics and data analysis… …   Wikipedia

  • Statistical unit — In different statistical disciplines, the statistical unit is the source of a random variable. In epidemiology, the usual term is sampling unit, whereas experimental unit is often used in experimental design. There are different ways to study a… …   Wikipedia

  • statistical nuclear model — statistinis branduolio modelis statusas T sritis fizika atitikmenys: angl. statistical model of nucleus; statistical nuclear model vok. statistisches Kernmodell, n; statistisches Modell des Kerns, n rus. статистическая модель ядра, f pranc.… …   Fizikos terminų žodynas

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.