Bayes estimator


Bayes estimator

In decision theory and estimation theory, a Bayes estimator is an estimator or decision rule that maximizes the posterior expected value of a utility function or minimizes the posterior expected value of a loss function (also called posterior expected loss).

Definition

Suppose an unknown parameter θ is known to have a prior distribution pi. Let delta be an estimator of θ (based on some measurements), and let R( heta,delta) be a risk function, such as the mean squared error. The Bayes risk of delta is defined as E_pi { R( heta, delta) }, where the expectation is taken over the probability distribution of heta. An estimator delta is said to be a "Bayes estimator" if it minimizes the Bayes risk among all estimators. The estimator which minimizes the posterior expected loss "for each x" also minimizes the Bayes risk and therefore is a Bayes estimator.

If the prior is improper then an estimator which minimizes the posterior expected loss "for each x" is called a generalized Bayes estimator.

Examples

Minimum mean square error estimation

The most common risk function used for Bayesian estimation is the mean square error (MSE), also called squared error risk. The MSE is defined by:mathrm{MSE} = Eleft [ (widehat{ heta}(x) - heta)^2 ight] ,where the expectation is taken over the joint distribution of heta and x.

Using the MSE as risk, the Bayes estimate of the unknown parameter is simply the mean of the posterior distribution,:widehat{ heta}(x) = E [ heta |X] =int heta f( heta |x),d heta.This is known as the "minimum mean square error" (MMSE) estimator. The Bayes risk, in this case, is the posterior variance.

Bayes estimators for conjugate priors

If there is no inherent reason to prefer one prior probability distribution over another, a conjugate prior is sometimes chosen for simplicity. A conjugate prior is defined as a prior distribution belonging to some parametric family, for which the resulting posterior distribution also belongs to the same family. This is an important property, since the Bayes estimator, as well as its statistical properties (variance, confidence interval, etc.), can all be derived from the posterior distribution.

Conjugate priors are especially useful for sequential estimation, where the posterior of the current measurement is used as the prior in the next measurement. In sequential estimation, unless a conjugate prior is used, the posterior distribution typically becomes more complex with each added measurement, and the Bayes estimator cannot usually be calculated without resorting to numerical methods.

Following are some examples of conjugate priors.

  • If x|θ is normal, x|θ ~ N(θ,σ2), and the prior is normal, θ ~ N(μ,τ2), then the posterior is also normal and the Bayes estimator under MSE is given by:widehat{ heta}(x)=frac{sigma^{2{sigma^{2}+ au^{2mu+frac{ au^{2{sigma^{2}+ au^{2x.
  • If x1,...,xn are iid Poisson random variables xi|θ ~ P(θ), and if the prior is Gamma distributed θ ~ G(a,b), then the posterior is also Gamma distributed, and the Bayes estimator under MSE is given by:widehat{ heta}(X)=frac{noverline{X}+a}{n+frac{1}{b.
  • If x1,...,xn are iid uniformly distributed xi|θ~U(0,θ), and if the prior is Pareto distributed θ~Pa(θ0,a), then the posterior is also Pareto distributed, and the Bayes estimator under MSE is given by:widehat{ heta}(X)=frac{(a+n)max{( heta_0,x_1,...,x_n){a+n-1}.

Alternative risk functions

Risk functions are chosen depending on how one measures the distance between the estimate and the unknown parameter. The MSE is the most common risk function in use, primarily due to its simplicity. However, alternative risk functions are also occasionally used. The following are several examples of such alternatives. We denote the posterior generalized distribution function by F.

  • A "linear" loss function, with a>0 , which yields the posterior median as the Bayes' estimate: : L( heta,widehat{ heta}) = a| heta-widehat{ heta}| : F(widehat{ heta }(x)|X) = frac{1}{2}
  • Another "linear" loss function, which assigns different "weights" a,b>0 to over or sub estimation. It yields a quantile from the posterior distribution, and is a generalization of the previous loss function: : L( heta,widehat{ heta}) = left{egin{matrix} a| heta-widehat{ heta}| & mbox{for } heta-widehat{ heta} ge 0 \ b| heta-widehat{ heta}| & mbox{for } heta-widehat{ heta} < 0 end{matrix} ight.: F(widehat{ heta }(x)|X) = frac{a}{a+b}
  • The following loss function is trickier: it yields either the posterior mode, or a point close to it depending on the curvature and properties of the posterior distribution. Small values of the parameter K>0 are recommended, in order to use the mode as an approximation ( L>0 ): : L( heta,widehat{ heta}) = left{egin{matrix} 0 & mbox{for }| heta-widehat{ heta}| < K \ L & mbox{for }| heta-widehat{ heta}| ge K end{matrix} ight.

Other loss functions can be conceived, although the mean squared error is the most widely used and validated.

Generalized Bayes estimators

The prior distribution pi has thus far been assumed to be a true probability distribution, in that :int pi( heta) d heta = 1.However, occasionally this can be a restrictive requirement. For example, there is no distribution for which every real number is equally likely. Yet, in some sense, such a "distribution" seems like a natural choice for a non-informative prior, i.e., a prior distribution which does not imply a preference for any particular value of the unknown parameter. One can still define a function pi( heta) = 1, but this would not be a proper probability distribution since it has infinite mass,:int{pi( heta)d heta}=infty.Such measures pi( heta), which are not probability distributions, are referred to as improper priors.

The use of an improper prior typically results in infinite Bayes risk. As a consequence, it is no longer meaningful to speak of an estimator which minimizes the Bayes risk. Nevertheless, in many cases, one can define the posterior distribution:pi( heta|x) = frac{p(x| heta) pi( heta)}{int p(x| heta) pi( heta) d heta}.This is a definition, and not an application of Bayes' theorem, since Bayes' theorem can only be applied when all distributions are proper. However, it is not uncommon for the resulting "posterior" to be a valid probability distribution. In this case, the posterior expected loss: int{L( heta,a)pi( heta|x)d heta}is typically well-defined and finite. Recall that, for a proper prior, the Bayes estimator minimizes the posterior expected loss. When the prior is improper, an estimator which minimizes the posterior expected loss is referred to as a generalized Bayes estimator.

Example

A typical example concerns the estimation of a location parameter with a loss function of the type L(a- heta). Here heta is a location parameter, i.e., p(x| heta) = f(x- heta).

It is common to use the improper prior pi( heta)=1 in this case, especially when no other more subjective information is available. This yields:pi( heta|x) = frac{p(x| heta) pi( heta)}{p(x)} = frac{f(x- heta)}{p(x)}so the posterior expected loss equals:E [L(a- heta)] = int{L(a- heta) pi( heta|x) d heta} = frac{1}{p(x)} int L(a- heta) f(x- heta).The generalized Bayes estimator is the value a(x) which minimizes this expression for all x. This is equivalent to minimizing :int L(a- heta) f(x- heta) for all x. (1)

It can be shown that, in this case, the generalized Bayes estimator has the form x+a_0, for some constant a_0. To see this, let a_0 be the value minimizing (1) when x=0. Then, given a different value x_1, we must minimize:int L(a- heta) f(x_1- heta) d heta = int L(a-x_1- heta') f(- heta') d heta'. (2)This is identical to (1), except that a has been replaced by a-x_1. Thus, the expression minimizing is given by a-x_1 = a_0, so that the optimal estimator has the form:a(x) = a_0 + x.,!

Empirical Bayes estimators

A Bayes estimator derived through the empirical Bayes method is called an "empirical Bayes estimator". Empirical Bayes methods enable the use of auxiliary empirical data, from observations of related parameters, in the development of a Bayes estimator. This is done under the assumption that the estimated parameters are obtained from a common prior. For example, if independent observations of different parameters are performed, then the estimation performance of a particular parameter can sometimes be improved by using data from other observations.

There are parametric and non-parametric approaches to empirical Bayes estimation. Parametric empirical Bayes is usually preferable since it is more applicable and more accurate on small amounts of data. [Berger (1980), section 4.5.]

Example

The following is a simple example of parametric empirical Bayes estimation. Given past observations x_1,ldots,x_n having conditional distribution f(x_i| heta_i), one is interested in estimating heta_{n+1} based on x_{n+1}. Assume that the heta_i's have a common prior pi which depends on unknown parameters. For example, suppose that pi is normal with unknown mean mu_pi,! and variance sigma_pi,!. We can then use the past observations to determine the mean and variance of pi in the following way.

First, we estimate the mean mu_m,! and variance sigma_m,! of the marginal distribution of x_1, ldots, x_n using the maximum likelihood approach::widehat{mu}_m=frac{1}{n}sum{x_i},:widehat{sigma}_m^{2}=frac{1}{n}sum{(x_i-widehat{mu}_m)^{2.Next, we use the relation: mu_m=E_pi [mu_f( heta)] ,!,: sigma_m^{2}=E_pi [sigma_f^{2}( heta)] +E_pi [mu_f( heta)-mu_m] ,where mu_f( heta) and sigma_f( heta) are the moments of the conditional distribution f(x_i| heta_i), which are assumed to be known. In particular, suppose that mu_f( heta) = heta and that sigma_f( heta) = K; we then have: mu_pi=mu_m ,!,: sigma_pi^{2}=sigma_m^{2}-sigma_f^{2}=sigma_m^{2}-K .Finally, we obtain the estimated moments of the prior,: widehat{mu}_pi=widehat{mu}_m, : widehat{sigma}_pi^{2}=widehat{sigma}_m^{2}-K. For example, if x_i| heta_i sim N( heta_i,1), and if we assume a normal prior (which is a conjugate prior in this case), we conclude that heta_{n+1}sim N(widehat{mu}_pi,widehat{sigma}_pi^{2}) , from which the Bayes estimator of heta_{n+1} based on x_{n+1} can be calculated.

Properties

Admissibility

Bayes rules having finite Bayes risk are typically admissible. The following are some specific examples of admissibility theorems.
* If a Bayes rule is unique then it is admissible. [Lehmann and Casella (1998), Theorem 5.2.4.] For example, as stated above, under mean squared error (MSE) the Bayes rule is unique and therefore admissible.
* If θ belongs to a discrete set, then all Bayes rules are admissible.
* If θ belongs to a continuous (non-discrete set), and if the risk function R(θ,δ) is continuous in θ for every δ, then all Bayes rules are admissible.

By contrast, generalized Bayes rules usually have infinite Bayes risk. These are often inadmissible and the verification of their admissibility can be difficult. For example, the generalized Bayes estimator of a location parameter θ based on Gaussian samples (described in the "Generalized Bayes estimator" section above) is inadmissible for p>2; this is known as Stein's phenomenon.

Asymptotic efficiency

Let θ be an unknown random variable, and suppose that x_1,x_2,ldots are iid samples with density f(x_i| heta). Let delta_n = delta_n(x_1,ldots,x_n) be a sequence of Bayes estimators of θ based on an increasing number of measurements. We are interested in analyzing the asymptotic performance of this sequence of estimators, i.e., the performance of delta_n for large "n".

To this end, it is customary to regard θ as a deterministic parameter whose true value is heta_0. Under specific conditions, [Lehmann and Casella (1998), section 6.8] for large samples (large values of "n"), the posterior density of θ is approximately normal. In other words, for large "n", the effect of the prior probability on the posterior is negligible. Moreover, if δ is the Bayes estimator under MSE risk, then it is asymptotically unbiased and it converges in distribution to the normal distribution:

: sqrt{n}(delta_n - heta_0) o Nleft(0 , frac{1}{I( heta_0)} ight),

where "I"(θ0) is the fisher information of θ0.It follows that the Bayes estimator δ"n" under MSE is asymptotically efficient.

Another estimator which is asymptotically normal and efficient is the maximum likelihood estimator (MLE). The relations between the maximum likelihood and Bayes estimators can be shown in the following simple example.

Consider the estimator of θ based on binomial sample "x"~b(θ,"n") where θ denotes the probability for success. Assuming θ is distributed according to the conjugate prior, which in this case is the Beta distribution B("a","b"), the posterior distribution is known to be B(a+x,b+n-x). Thus, the Bayes estimator under MSE is: delta_n(x)=E [ heta|x] =frac{a+x}{a+b+n}The MLE in this case is x/n and so we get,: delta_n(x)=frac{a+b}{a+b+n}E [ heta] +frac{n}{a+b+n}delta_{MLE}The last equation implies that, for "n" → ∞, the Bayes estimator (in the described problem) is close to the MLE. On the other hand, when "n" is small, the prior information is still relevant to the decision problem and affects the estimate.

See also

*Recursive Bayesian estimation
*Empirical Bayes method
*Conjugate prior

Notes

References

* cite book
last = Lehmann
first = E. L.
coauthors = Casella, G.
title = Theory of Point Estimation
date = 1998
publisher = Springer
isbn = 0-387-98502-6
pages = 2nd ed

* cite book
last = Berger
first = J.O.
title= Statistical Decision Theory and Bayesian Analysis
date = 1985
pages = Second Edition.
publisher = Springer Verlag, New York
isbn = ISBN 0-387-96098-8 and also ISBN 3-540-96098-8

External links

* [http://cnx.org/content/m11660/latest/ Bayesian estimation on cnx.org]


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Bayes — may refer to:*Thomas Bayes (1702 1761), British mathematician, statistician and religious leader:*Bayesian probability:*Bayes theorem, a result in probability theory:*A Bayes estimator is a statistical estimator that minimizes the average risk.… …   Wikipedia

  • Bayes linear — is a subjectivist statistical methodology and framework. Traditional subjective Bayesian analysis is based upon fully specified probability distributions, which are very difficult to specify at the necessary level of detail. Bayes linear attempts …   Wikipedia

  • Minimax estimator — In statistical decision theory, where we are faced with the problem of estimating a deterministic parameter (vector) from observations an estimator (estimation rule) is called minimax if its maximal risk is minimal among all estimators of . In a… …   Wikipedia

  • Minimum-variance unbiased estimator — In statistics a uniformly minimum variance unbiased estimator or minimum variance unbiased estimator (UMVUE or MVUE) is an unbiased estimator that has lower variance than any other unbiased estimator for all possible values of the parameter. The… …   Wikipedia

  • Empirical Bayes method — In statistics, empirical Bayes methods are a class of methods which use empirical data to evaluate / approximate the conditional probability distributions that arise from Bayes theorem. These methods allow one to estimate quantities… …   Wikipedia

  • Kaplan–Meier estimator — The Kaplan–Meier estimator,[1][2] also known as the product limit estimator, is an estimator for estimating the survival function from life time data. In medical research, it is often used to measure the fraction of patients living for a certain… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • Maximum a posteriori estimation — In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is a mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to… …   Wikipedia

  • Outline of statistics — The following outline is provided as an overview and guide to the variety of topics included within the subject of statistics: Statistics pertains to the collection, analysis, interpretation, and presentation of data. It is applicable to a wide… …   Wikipedia

  • Estimation theory — is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data. The parameters describe an underlying physical setting in such a way that the value of the parameters affects… …   Wikipedia


Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”

We are using cookies for the best presentation of our site. Continuing to use this site, you agree with this.