- Bayesian linear regression
In

statistics ,**Bayesian linear regression**is a Bayesian alternative to the more well-known ordinaryleast-squares linear regression .Consider standard

linear regression problem, where we specify the conditional density of "$y,$" given "$x,$" predictor variables::$y\_\{i\}\; =\; eta\; x\_\{i\}\; +\; epsilon\_\{i\},,$

where the noise $epsilon$ is

i.i.d. andnormally distributed :$epsilon\_\{i\}\; sim\; N(0,\; sigma^2).,$

A common,

linear least squares solution, is to estimate theslope $hat\{eta\}$ using theMoore-Penrose pseudoinverse ::$hat\{eta\}\; =\; (X^\{T\}X)^\{-1\}X^\{T\}y$

where $X,$ is the vector of $x\_\{i\},$ (of length $n,$).

This is a

frequentist 's view, and assumes we have enough measurements of $x\_i,$ to say something meaningful about $y,$. In theempirical Bayes approach, we will assume we have only a small sample of $x\_\{i\},$ for our individual measurement, and we seek to correct our estimate by "borrowing" information from a larger set of similar observations.Let us write our conditional

likelihood as:$ho(y|X,eta,sigma^\{2\})\; propto\; (sigma^\{2\})^\{-n/2\}\; expleft(-frac\{1\}\{2\{sigma\}^\{2(y-eta\; X)^\{T\}(y-eta\; X)\; ight),,$

We seek a natural

conjugate prior (a joint density $ho(eta,sigma^\{2\}),$ which is of the same functional form as the likelihood). Since the likelihood is quadratic in $eta,$, we re-write the likelihood so it is normal in $(eta-hat\{eta\}),$. Write:$(y-eta\; X)^\{T\}(y-eta\; X)\; =\; (y-hat\{eta\}\; X)^\{T\}(y-hat\{eta\}\; X)\; +\; (eta\; -\; hat\{eta\})^\{T\}(X^\{T\}X)(eta\; -\; hat\{eta\})$

Now re-write the likelihood as

:$ho(y|X,eta,sigma^\{2\})\; propto\; (sigma^2)^\{-v/2\}\; expleft(-frac\{vs^\{2\{2\{sigma\}^\{2\; ight)(sigma^2)^\{-(n-v)/2\}\; expleft(-frac\{1\}\{2\{sigma\}^\{2(eta\; -\; hat\{eta\})^\{T\}(X^\{T\}X)(eta\; -\; hat\{eta\})\; ight),,$

where

:$vs^\{2\}\; =(y-hat\{eta\}\; X)^\{T\}(y-hat\{eta\}\; X)\; ,\; v\; =\; n-k$

with $k$ as the number of parameters to estimate.

This suggests a form for the priors:

:$ho(eta,sigma^\{2\})\; =\; ho(sigma^\{2\})\; ho(eta|sigma^\{2\}),,$

where $ho(sigma^\{2\})$ is an

inverse-gamma distribution :$ho(sigma^\{2\})\; propto\; (sigma^2)^\{-(v\_\{0\}/2+1)\}\; expleft(-frac\{v\_\{0\}s\_\{0\}^\{2\{2\{sigma\}^\{2\; ight),,$

and $ho(eta|sigma^\{2\})$ is a

normal distribution :$ho(eta|sigma^\{2\})\; propto\; (sigma^2)^\{-k\}\; expleft(-frac\{1\}\{2\{sigma\}^\{2(eta\; -\; ar\{eta\})^\{T\}(A)(eta\; -\; ar\{eta\})\; ight),,$

with $v\_\{0\}$ and $s\_\{0\}^\{2\}$ as the prior values of $v$ and $s^\{2\}$, respectively.

With the prior now specified, we can express the posterior distribution as

:$ho(eta,sigma^\{2\}|y,X)\; propto\; ho(y|X,eta,sigma^\{2\})\; ho(eta|sigma^\{2\})\; ho(sigma^\{2\})$

::$propto\; (sigma^\{2\})^\{-n/2\}\; expleft(-frac\{1\}\{2\{sigma\}^\{2(y-eta\; X)^\{T\}(y-eta\; X)\; ight)$

:::$imes\; (sigma^\{2\})^\{-k\}\; expleft(-frac\{1\}\{2\{sigma\}^\{2(eta\; -\; ar\{eta\})^\{T\}(A)(eta\; -\; ar\{eta\})\; ight).$

:::$imes\; (sigma^2)^\{-(v\_\{0\}/2+1)\}\; expleft(-frac\{v\_\{0\}s\_\{0\}^\{2\{2\{sigma\}^\{2\; ight)$

With some re-arrangement, we can re-write the posterior so that the posterior mean $ilde\{eta\}$ is weighted average of the least squares estimator and the prior mean:

:$ilde\{eta\}\; =\; (X^\{T\}X+A)^\{-1\}(X^\{T\}Xhat\{eta\}+Aar\{eta\})$

where $U$ comes from the

Cholesky decomposition of $A$ (which is apositive-definite matrix by design):$A\; =\; U^\{T\}U.\; ,$

This is the key result of the Empirical Bayes approach; it allows us to estimate the slope $eta$ for our original linear regression problem by combining estimates using the least squares estimate $hat\{eta\}$ for a single set of measurements with the empirical prior estimate $ar\{eta\}$ from a large collection of similar measurements. (Notice that the weighted average also depends on the empirical estimate of the prior covariance matrix $A$.)

To justify this, collect the quadratic terms in the exponential and now express this as a quadratic form in $eta-\; ilde\{eta\}$:

:$(y-eta\; X)^\{T\}(y-eta\; X))\; +\; (eta\; -\; ar\{eta\})^\{T\}(A)(eta\; -\; ar\{eta\})\; =\; (v-Weta)^\{T\}(v-Weta)$

::$=\; ns^\{2\}\; +\; (eta\; -\; ar\{eta\})^\{T\}W^\{T\}W(eta\; -\; ar\{eta\})$

where

::$ns^\{2\}\; =\; (v\; -\; W\; ilde\{eta\})^\{T\}(v\; -\; W\; ilde\{eta\}),\; v\; =\; [y,\; Uar\{B\}]\; ,\; W\; =\; [X,\; U]$

The posterior can now be expressed as a

Normal distribution $N(\; ilde\{eta\},sigma^\{2\}(X^\{T\}X+A)^\{-1\})$ times aninverse-gamma distribution ::$ho(eta,sigma^\{2\}|y,X)\; propto\; (sigma^\{2\})^\{-k/2\}\; expleft(-frac\{1\}\{2\{sigma\}^\{2(eta\; -\; ilde\{eta\})^\{T\}(X^\{T\}X+A)(eta\; -\; ilde\{eta\})\; ight)\; imes\; (sigma^2)^\{-(n+v\_\{0\})/2+1\}\; expleft(-frac\{(v\_\{0\}s\_\{0\}^\{2\}+ns^\{2\})\}\{2\{sigma\}^\{2\; ight)$

A similar analysis can be performed for general case of multi-variate regression for a Bayesian

Estimation of covariance matrices .**Example:**Suppose the weights of a large population of 35-year-old men are normally distributed with expected value μ and standard deviation σ. A crude measuring instrument measures a man's weight with a measurement error that is normally distributed with expected value 0 and standard deviation τ. The man's true weight is not observable; his weight measured with error is observed. The conditional probability distribution of a randomly chosen man's true weight, given his weight-measured-with-error, can be found by using

Bayes' theorem , and then the conditional expected value can be used as an estimate of his true weight,**provided**that the values of μ, σ, and τ are "known". But they are not. One may use the data to estimate the standard deviation of the measurement errors by measuring each man multiple times. One may similarly estimate the population average weight and the population standard deviation of weights by weighing multiple men. These estimates of parameters based on the data are the occasion for the use of the word "empirical". Finally, one may then estimate the aforementioned conditional expected true weight by using Bayes' theorem.**ee also***

Bayesian multivariate linear regression **References*** Bradley P. Carlin and Thomas A. Louis, "Bayes and Empirical Bayes Methods for Data Analysis", Chapman & Hall/CRC, Second edition 2000,

* Peter E. Rossi, Greg M. Allenby, and Robert McCulloch, "Bayesian Statistics and Marketing", John Wiley & Sons, Ltd, 2006

* Thomas P. Minka, [

*http://research.microsoft.com/~minka/papers/linear.html "Bayesian Linear Regression*] , 2001**External links**

*Wikimedia Foundation.
2010.*

### Look at other dictionaries:

**Bayesian additive regression kernels**— (BARK) is a non parametric statistics model for regression and classificationcite web| title= Bayesian Additive Regression Kernels |url= http://stat.duke.edu/people/theses/OuyangZ.html |Author = Zhi Ouyang |Publisher = Duke University] . The… … Wikipedia**Linear regression**— Example of simple linear regression, which has one independent variable In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one… … Wikipedia**Bayesian multivariate linear regression**— Consider a collection of m linear regression problems for n observations, related through a set of common predictor variables {x {c}}, and a jointly normal errors {epsilon {c}} ::y {1} = eta {1} x {1} + epsilon {1},,:y {c} = eta {c} x {c} +… … Wikipedia**Bayesian probability**— Bayesian statistics Theory Bayesian probability Probability interpretations Bayes theorem Bayes rule · Bayes factor Bayesian inference Bayesian network Prior · Posterior · Likelihood … Wikipedia**Bayesian**— refers to methods in probability and statistics named after the Reverend Thomas Bayes (ca. 1702 ndash;1761), in particular methods related to: * the degree of belief interpretation of probability, as opposed to frequency or proportion or… … Wikipedia**Regression analysis**— In statistics, regression analysis is a collective name for techniques for the modeling and analysis of numerical data consisting of values of a dependent variable (response variable) and of one or more independent variables (explanatory… … Wikipedia**Bayesian experimental design**— provides a general probability theoretical framework from which other theories on experimental design can be derived. It is based on Bayesian inference to interpret the observations/data acquired during the experiment. This allows accounting for… … Wikipedia**Regression toward the mean**— In statistics, regression toward the mean (also known as regression to the mean) is the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on a second measurement, and a fact that may… … Wikipedia**Regression discontinuity design**— In statistics, econometrics, epidemiology and related disciplines, a regression discontinuity design (RDD) is a design that elicits the causal effects of interventions by exploiting a given exogenous threshold determining assignment to treatment … Wikipedia**Linear least squares (mathematics)**— This article is about the mathematics that underlie curve fitting using linear least squares. For statistical regression analysis using least squares, see linear regression. For linear regression on a single variable, see simple linear regression … Wikipedia