Bayesian multivariate linear regression

Bayesian multivariate linear regression

Consider a collection of m linear regression problems for n observations, related through a set of common predictor variables {x_{c}}, and a jointly normal errors {epsilon_{c}} :

:y_{1} = eta_{1} x_{1} + epsilon_{1},,

:y_{c} = eta_{c} x_{c} + epsilon_{c},,

:y_{m} = eta_{m} x_{m} + epsilon_{m},,

where the subscript c denotes a column vector of k observations for each measurement (n = k + m).

The noise terms are jointly normal over each collection of k observations. That is, each row vector {r} represents an m length vector of correlated observations on each of the dependent variables:

:y_{r} = B^{T}x_{r} + epsilon_{r},,

where the noise epsilon_{r} is i.i.d. and normally distributed for all rows {r}.

:epsilon_{r} sim N(0, Sigma_{epsilon}^2).,

where B is an k imes m matrix

:B = [eta_{1},cdots,eta_{c},cdots, eta_{m}] ,

We can write the entire regression problem in matrix form as:

:Y =B^{T}X + E,,

where Y and E are n imes m matrices.

The classical, frequentists linear least squares solution is to simply estimate the matrix of regression coefficients hat{B} using the Moore-Penrose pseudoinverse:

: hat{B} = (X^{T}X)^{-1}X^{T}Y.

To obtain the Bayesian solution, we need to specify the conditional likelihood and then find the appropriate conjugate prior. As with the univariate case of linear Bayesian regression, we will find that we can specify a natural conditional conjugate prior (which is scale dependent).

Let us write our conditional likelihood as

: ho(E|Sigma_{epsilon}) propto (Sigma_{epsilon}^{2})^{-n/2} exp(-frac{1}{2} tr(E^{T} Sigma_{epsilon}^{-1}E) ) ,,

writing the error E in terms Y,X, and B yields

: ho(Y|X,BSigma_{epsilon}) propto (Sigma_{epsilon}^{2})^{-n/2} exp(-frac{1}{2} tr((Y-BX)^{T} Sigma_{epsilon}^{-1}(Y-BX)) ) ,,

We seek a natural conjugate prior—a joint density ho(B,sigma_{epsilon}) which is of the same functional form as the likelihood. Since the likelihood is quadratic in B, we re-write the likelihood so it is normal in (B-hat{B}) (the deviation from classical sample estimate)

Using the same technique as with linear Bayesian regression, we decompose the exponential term using a matrix-form of the sum-of-squares technique. Here, however, we will also need to use the Matrix Differential Calculus (Kronecker product and vectorization transformations).

First, let us apply sum-of-squares to obtain new expression for the likelihood:

: ho(Y|X,B,Sigma_{epsilon}) propto Sigma_{epsilon}^{-(n-k)/2} exp(-tr(-frac{1}{2}SSigma_{epsilon}^{-1})) (Sigma_{epsilon}^{2})^{-k/2} exp(-frac{1}{2} tr((B-hat{B})^{T} Sigma_{epsilon}^{-1}(B-hat{B})) ),,

:S = Y - hat{B}X

We would like to develop a conditional form for the priors:

: ho(B,Sigma_{epsilon}) = ho(Sigma_{epsilon}) ho(B|Sigma_{epsilon}),,

where ho(Sigma_{epsilon}) is an inverse-Wishart distributionand ho(B|Sigma_{epsilon}) is some form of normal distribution in the matrix B. This is accomplished using the vectorization transformation, which converts the likelihood from a function of the matrices B, hat{B} to a function of the vectors Beta = vec(B), hat{Beta} = vec(hat{B}).

Write

:tr((B - hat{B})^{T}X^{T} Sigma_{epsilon}^{-1} X(B - hat{B})) = vec(B - hat{B})vec(X^{T} Sigma_{epsilon}^{-1} X(B - hat{B}))

Let

: vec(X^{T} Sigma_{epsilon}^{-1} X(B - hat{B})) = (Sigma_{epsilon}^{-1} otimes X^{T}X )vec(B - hat{B})

Then

:tr((B - hat{B})^{T}X^{T} (Sigma_{epsilon}^{-1} otimes X^{T}X )vec(B - hat{B})

::: = (eta-hat{eta})(Sigma_{epsilon}^{-1} otimes X^{T}X )(eta-hat{eta})

which will lead to a likelihood which is normal in (eta - ar{eta}).

With the likelihood in a more tractable form, we can now find a natural (conditional) conjugate prior.

(to complete)

Example:

References

* Bradley P. Carlin and Thomas A. Louis, "Bayes and Empirical Bayes Methods for Data Analysis", Chapman & Hall/CRC, Second edition 2000,

* Peter E. Rossi, Greg M. Allenby, and Robert McCulloch, "Bayesian Statistics and Marketing", John Wiley & Sons, Ltd, 2006

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Bayesian linear regression — In statistics, Bayesian linear regression is a Bayesian alternative to the more well known ordinary least squares linear regression.Consider standard linear regression problem, where we specify the conditional density of y, given x, predictor… …   Wikipedia

  • Multivariate adaptive regression splines — (MARS) is a form of regression analysis introduced by Jerome Friedman in 1991.[1] It is a non parametric regression technique and can be seen as an extension of linear models that automatically models non linearities and interactions. The term… …   Wikipedia

  • Linear regression — Example of simple linear regression, which has one independent variable In statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one… …   Wikipedia

  • Multivariate statistics — is a form of statistics encompassing the simultaneous observation and analysis of more than one statistical variable. The application of multivariate statistics is multivariate analysis. Methods of bivariate statistics, for example simple linear… …   Wikipedia

  • General linear model — Not to be confused with generalized linear model. The general linear model (GLM) is a statistical linear model. It may be written as[1] where Y is a matrix with series of multivariate measurements, X is a matrix that might be a design matrix, B… …   Wikipedia

  • Bayesian experimental design — provides a general probability theoretical framework from which other theories on experimental design can be derived. It is based on Bayesian inference to interpret the observations/data acquired during the experiment. This allows accounting for… …   Wikipedia

  • Multivariate analysis of variance — (MANOVA) is a generalized form of univariate analysis of variance (ANOVA). It is used when there are two or more dependent variables. It helps to answer : 1. do changes in the independent variable(s) have significant effects on the dependent …   Wikipedia

  • Regression toward the mean — In statistics, regression toward the mean (also known as regression to the mean) is the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on a second measurement, and a fact that may… …   Wikipedia

  • Regression discontinuity design — In statistics, econometrics, epidemiology and related disciplines, a regression discontinuity design (RDD) is a design that elicits the causal effects of interventions by exploiting a given exogenous threshold determining assignment to treatment …   Wikipedia

  • Regression analysis — In statistics, regression analysis is a collective name for techniques for the modeling and analysis of numerical data consisting of values of a dependent variable (response variable) and of one or more independent variables (explanatory… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”