# Bayesian model comparison

﻿
Bayesian model comparison

A common problem in statistical inference is to use data to decide between two or more competing models. Frequentist statistics uses hypothesis tests for this purpose. There are several Bayesian approaches. One approach is through Bayes factors.

The posterior probability of a model given data, $Pr\left(H|D\right)$, is given by Bayes' theorem:

:$Pr\left(H|D\right) = frac\left\{Pr\left(D|H\right)Pr\left(H\right)\right\}\left\{Pr\left(D\right)\right\}$

The key data-dependent term $Pr\left(H|D\right)$ is a likelihood, and is sometimes called the evidence for model "H"; evaluating it correctly is the key to Bayesian model comparison. The evidence is usually the normalizing constant or partition function of another inference, namely the inference of the parameters of model "H" given the data "D".

The plausibility of two different models "H"1 and "H"2, parametrised by model parameter vectors $heta_1$ and $heta_2$ is assessed by the Bayes factor given by

:$frac\left\{Pr\left(D|H_2\right)\right\}\left\{Pr\left(D|H_1\right)\right\} = frac\left\{int Pr\left( heta_2|H_2\right)Pr\left(D| heta_2,H_2\right),d heta_2\right\}\left\{int Pr\left( heta_1|H_1\right)Pr\left(D| heta_1,H_1\right),d heta_1\right\}$

Thus the Bayesian model comparison does not depend on the parameters used by each model. Instead, it considers the probability of the model considering all possible parameter values. Alternatively, the Maximum likelihood estimate could be used for each of the parameters.

An advantage of the use of Bayes factors is that it automatically, and quite naturally, includes a penalty for including too much model structure. It thus guards against overfitting.

Other approaches are:
* to treat model comparison as a decision problem, computing the expected value or cost of each model choice;
* to use Minimum Message Length (MML).

*Nested sampling algorithm
*Akaike information criterion
*Schwarz's Bayesian information criterion
*Conditional predictive ordinate
*Deviance information criterion
*Wallace's Minimum Message Length (MML)
*Model selection

References

* Gelman, A., Carlin, J.,Stern, H. and Rubin, D. Bayesian Data Analysis. Chapman and Hall/CRC.(1995)
* Bernardo, J., and Smith, A.F.M., Bayesian Theory. John Wiley. (1994)
* Lee, P.M. Bayesian Statistics. Arnold.(1989).
* Denison, D.G.T., Holmes, C.C., Mallick, B.K., Smith, A.F.M., Bayesian Methods for Nonlinear Classification and Regression. John Wiley. (2002).
* Richard O. Duda, Peter E. Hart, David G. Stork (2000) "Pattern classification" (2nd edition), Section 9.6.5, p. 487-489, Wiley, ISBN 0-471-05669-3
* Chapter 24 in [http://omega.math.albany.edu:8008/JaynesBook.html Probability Theory - The logic of science] by E. T. Jaynes, 1994.
* David J.C. MacKay (2003) Information theory, inference and learning algorithms, CUP, ISBN 0-521-64298-1, (also [http://www.inference.phy.cam.ac.uk/mackay/itila/book.html available online] )

* [http://www.inference.phy.cam.ac.uk/mackay/itila/ The on-line textbook: Information Theory, Inference, and Learning Algorithms] , by David J.C. MacKay, discusses Bayesian model comparison in Chapter 28, p343.

Wikimedia Foundation. 2010.

### Look at other dictionaries:

• Bayesian inference — is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true. The name Bayesian comes from the frequent use of Bayes theorem in the inference process. Bayes theorem… …   Wikipedia

• Bayesian — refers to methods in probability and statistics named after the Reverend Thomas Bayes (ca. 1702 ndash;1761), in particular methods related to: * the degree of belief interpretation of probability, as opposed to frequency or proportion or… …   Wikipedia

• Bayesian information criterion — In statistics, in order to describe a particular dataset, one can use non parametric methods or parametric methods. In parametric methods, there might be various candidate models with different number of parameters to represent a dataset. The… …   Wikipedia

• Bayesian experimental design — provides a general probability theoretical framework from which other theories on experimental design can be derived. It is based on Bayesian inference to interpret the observations/data acquired during the experiment. This allows accounting for… …   Wikipedia

• Comparison of statistics journals — This is a comparison of peer reviewed scientific journals published in the field of statistics. Contents 1 General information 2 Impact, indexing, abstracting and reviewing 3 Notes 4 …   Wikipedia

• Comparison of general and generalized linear models — General linear model Generalized linear model Typical estimation method Least squares, best linear unbiased prediction Maximum likelihood or Bayesian Special cases ANOVA, ANCOVA, MANOVA, MANCOVA, ordinary linear regression, mixed model, t test, F …   Wikipedia

• General linear model — Not to be confused with generalized linear model. The general linear model (GLM) is a statistical linear model. It may be written as[1] where Y is a matrix with series of multivariate measurements, X is a matrix that might be a design matrix, B… …   Wikipedia

• Bag of words model in computer vision — This is an article introducing the Bag of words model (BoW) in computer vision, especially for object categorization. From now, the BoW model refers to the BoW model in computer vision unless explicitly declared.Before introducing the BoW model,… …   Wikipedia

• List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

• Minimum message length — (MML) is a formal information theory restatement of Occam s Razor: even when models are not equal in goodness of fit accuracy to the observed data, the one generating the shortest overall message is more likely to be correct (where the message… …   Wikipedia