 Statistical model

A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but stochastically related. In mathematical terms, a statistical model is frequently thought of as a pair (Y,P) where Y is the set of possible observations and P the set of possible probability distributions on Y. It is assumed that there is a distinct element of P which generates the observed data. Statistical inference enables us to make statements about which element(s) of this set are likely to be the true one.
Most statistical tests can be described in the form of a statistical model. For example, the Student's ttest for comparing the means of two groups can be formulated as seeing if an estimated parameter in the model is different from 0. Another similarity between tests and models is that there are assumptions involved. Error is assumed to be normally distributed in most models.^{[1]}
Contents
Formal definition
A Statistical model, , is a collection of probability distribution functions or probability density functions (collectively referred to as distributions for brevity). A parametric model is a collection of distributions, each of which is indexed by a unique finitedimensional parameter: , where θ is a parameter and is the feasible region of parameters, which is a subset of ddimensional Euclidean space. A statistical model may be used to describe the set of distributions from which one assumes that a particular data set is sampled. For example, if one assumes that data arise from a univariate Gaussian distribution, then one has assumed a Gaussian model: .
A nonparametric model is a set of probability distributions with infinite dimensional parameters, and might be written as . A semiparametric model also has infinite dimensional parameters, but is not dense in the space of distributions. For example, a mixture of Gaussians with one Gaussian at each data point is dense is the space of distributions. Formally, if d is the dimension of the parameter, and n is the number of samples, if as and as , then the model is semiparametric.
Model comparison
Models can be compared to each other. This can either be done when you have done an exploratory data analysis or a confirmatory data analysis. In an exploratory analysis, you formulate all models you can think of, and see which describes your data best. In a confirmatory analysis you test which of your models you have described before the data was collected fits the data best, or test if your only model fits the data. In linear regression analysis you can compare the amount of variance explained by the independent variables, R^{2}, across the different models. In general, you can compare models that are nested by using a Likelihoodratio test. Nested models are models that can be obtained by restricting a parameter in a more complex model to be zero.
An example
Length and age are probabilistically distributed over humans. They are stochastically related, when you know that a person is of age 7, this influences the chance of this person being 6 feet tall. You could formalize this relationship in a linear regression model of the following form: length_{i} = b_{0} + b_{1}age_{i} + ε_{i}, where b_{0} is the intercept, b_{1} is a parameter that age is multiplied by to get a prediction of length, ε is the error term, and i is the subject. This means that length starts at some value, there is a minimum length when someone is born, and it is predicted by age to some amount. This prediction is not perfect as error is included in the model. This error contains variance that stems from sex and other variables. When sex is included in the model, the error term will become smaller, as you will have a better idea of the chance that a particular 16yearold is 6 feet tall when you know this 16yearold is a girl. The model would become length_{i} = b_{0} + b_{1}age_{i} + b_{2}sex_{i} + ε_{i}, where the variable sex is dichotomous. This model would presumably have a higher R^{2}. The first model is nested in the second model: the first model is obtained from the second when b_{2} is restricted to zero.
Classification
According to the number of the endogenous variables and the number of equations, models can be classified as complete models (the number of equations equals to the number of endogenous variables) and incomplete models. Some other statistical models are the general linear model (restricted to continuous dependent variables), the generalized linear model (for example, logistic regression), the multilevel model, and the structural equation model.^{[2]}
See also
References
 ^ Field, A. (2005). Discovering statistics using SPSS. Sage, London.
 ^ Adèr, H.J. (2008). Chapter 12: Modelling. In H.J. Adèr & G.J. Mellenbergh (Eds.) (with contributions by D.J. Hand), Advising on Research Methods: A consultant's companion (pp. 271304). Huizen, The Netherlands: Johannes van Kessel Publishing.
Categories: Statistical models
 Statistical theory
 Scientific modeling
Wikimedia Foundation. 2010.
Look at other dictionaries:
statistical model of nucleus — statistinis branduolio modelis statusas T sritis fizika atitikmenys: angl. statistical model of nucleus; statistical nuclear model vok. statistisches Kernmodell, n; statistisches Modell des Kerns, n rus. статистическая модель ядра, f pranc.… … Fizikos terminų žodynas
Statistical model validation — Model validation is possibly the most important step in the model building sequence. It is also one of the most overlooked. Often the validation of a model seems to consist of nothing more than quoting the R 2 statistic from the fit (which… … Wikipedia
Model selection — is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is … Wikipedia
Model output statistics — (MOS) is an omnipresent statistical technique that forms the backbone of modern weather forecasting. The technique pioneered in the 1960s and early 1970s is used to post process output from numerical weather forecast models. Generally speaking,… … Wikipedia
Modelbased testing — is the application of Model based design for designing and optionally executing the necessary artifacts to perform software testing. Models can be used to represent the desired behavior of the System Under Test (SUT), or to represent the desired… … Wikipedia
model — mod‧el [ˈmɒdl ǁ ˈmɑːdl] noun 1. [countable] a particular type or design of a vehicle or machine: • the cheapest model in the Volkswagen range • Our photocopier is the latest model. see also brand1, make2 … Financial and business terms
Statistical inference — In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation.[1] More substantially, the terms statistical inference,… … Wikipedia
Statistical graphics — thumb240pxJohn Snow s Cholera map in dot style, 1854.Statistical graphics, also known as graphical techniques, are information graphics in the field of statistics used to visualize quantitative data. Overview Statistics and data analysis… … Wikipedia
Statistical unit — In different statistical disciplines, the statistical unit is the source of a random variable. In epidemiology, the usual term is sampling unit, whereas experimental unit is often used in experimental design. There are different ways to study a… … Wikipedia
statistical nuclear model — statistinis branduolio modelis statusas T sritis fizika atitikmenys: angl. statistical model of nucleus; statistical nuclear model vok. statistisches Kernmodell, n; statistisches Modell des Kerns, n rus. статистическая модель ядра, f pranc.… … Fizikos terminų žodynas