Minimax estimator

Minimax estimator

In statistical decision theory, where we are faced with the problem of estimating a deterministic parameter (vector) \theta \in \Theta from observations x \in \mathcal{X}, an estimator (estimation rule) \delta^M \,\! is called minimax if its maximal risk is minimal among all estimators of \theta \,\!. In a sense this means that \delta^M \,\! is an estimator which performs best in the worst possible case allowed in the problem.


Problem setup

Consider the problem of estimating a deterministic (not Bayesian) parameter \theta \in \Theta from noisy or corrupt data x \in \mathcal{X} related through the conditional probability distribution P(x|\theta)\,\!. Our goal is to find a "good" estimator \delta(x) \,\! for estimating the parameter \theta \,\!, which minimizes some given risk function R(\theta,\delta) \,\!. Here the risk function is the expectation of some loss function L(\theta,\delta) \,\! with respect to P(x|\theta)\,\!. A popular example for a loss function is the squared error loss L(\theta,\delta)= \|\theta-\delta\|^2 \,\!, and the risk function for this loss is the mean squared error (MSE).

Unfortunately in general the risk cannot be minimized, since it depends on the unknown parameter \theta \,\! itself (If we knew what was the actual value of \theta \,\!, we wouldn't need to estimate it). Therefore additional criteria for finding an optimal estimator in some sense are required. One such criterion is the minimax criteria.


Definition : An estimator \delta^M:\mathcal{X} \rightarrow \Theta \,\! is called minimax with respect to a risk function R(\theta,\delta) \,\! if it achieves the smallest maximum risk among all estimators, meaning it satisfies

\sup_{\theta \in \Theta} R(\theta,\delta^M) = \inf_\delta \sup_{\theta \in \Theta} R(\theta,\delta). \,

Least favorable distribution

Logically, an estimator is minimax when it is the best in the worst case. Continuing this logic, a minimax estimator should be a Bayes estimator with respect to a prior least favorable distribution of \theta \,\!. To demonstrate this notion denote the average risk of the Bayes estimator \delta_{\pi} \,\! with respect to a prior distribution \pi \,\! as

r_\pi = \int R(\theta,\delta_{\pi}) \, d\pi(\theta) \,

Definition: A prior distribution \pi \,\! is called least favorable if for any other distribution \pi ' \,\! the average risk satisfies r_\pi \geq r_{\pi '}  \, .

Theorem 1: If r_\pi = \sup_\theta R(\theta,\delta_\pi), \, then:

  1. \delta_{\pi}\,\! is minimax.
  2. If \delta_{\pi}\,\! is a unique Bayes estimator, it is also the unique minimax estimator.
  3. \pi\,\! is least favorable.

Corollary: If a Bayes estimator has constant risk, it is minimax. Note that this is not a necessary condition.

Example 1, Unfair coin: Consider the problem of estimating the "success" rate of a Binomial variable, x \sim B(n,\theta)\,\!. This may be viewed as estimating the rate at which an unfair coin falls on "heads" or "tails". In this case the Bayes estimator with respect to a Beta-distributed prior, \theta \sim \text{Beta}(\sqrt{n}/2,\sqrt{n}/2) \, is

\delta^M=\frac{x+0.5\sqrt{n}}{n+\sqrt{n}}, \,

with constant Bayes risk

r=\frac{1}{4(1+\sqrt{n})^2}  \,

and, according to the Corollary, is minimax.

Definition: A sequence of prior distributions  {\pi}_n\,\! is called least favorable if for any other distribution \pi '\,\!,

\lim_{n \rightarrow \infty} r_{\pi_n} \geq r_{\pi '}. \,

Theorem 2: If there are a sequence of priors  \pi_n\,\! and an estimator \delta\,\! such that \sup_{\theta} R(\theta,\delta)=\lim_{n \rightarrow \infty} r_{\pi_n} \,\!, then :

  1. \delta\,\! is minimax.
  2. The sequence {\pi}_n\,\! is least favorable.

Notice that no uniqueness is guaranteed here. For example, the ML estimator from the previous example may be attained as the limit of Bayes estimators with respect to a uniform prior, \pi_n \sim U[-n,n]\,\! with increasing support and also with respect to a zero mean normal prior \pi_n \sim N(0,n \sigma^2) \,\! with increasing variance. So neither the resulting ML estimator is unique minimax not the least favorable prior is unique.

Example 2: Consider the problem of estimating the mean of p\,\! dimensional Gaussian random vector, x \sim N(\theta,I_p \sigma^2)\,\!. The Maximum likelihood (ML) estimator for \theta\,\! in this case is simply \delta_{ML}=x\,\!, and it risk is

R(\theta,\delta_{ML})=E{\|\delta_{ML}-\theta\|^2}=\sum \limits_1^p E{(x_i-\theta_i)^2}=p \sigma^2. \,
MSE of maximum likelihood estimator versus James–Stein estimator

The risk is constant, but the ML estimator is actually not a Bayes estimator, so the Corollary of Theorem 1 does not apply. However, the ML estimator is the limit of the Bayes estimators with respect to the prior sequence \pi_n \sim N(0,n \sigma^2) \,\!, and, hence, indeed minimax according to Theorem 2 . Nonetheless, minimaxity does not always imply admissibility. In fact in this example, the ML estimator is known to be inadmissible (not admissible) whenever p >2\,\!. The famous James–Stein estimator dominates the ML whenever p >2\,\!. Though both estimators have the same risk p \sigma^2\,\! when \|\theta\| \rightarrow \infty\,\!, and they are both minimax, the James–Stein estimator has smaller risk for any finite \|\theta\|\,\!. This fact is illustrated in the following figure.

Some examples

In general it is difficult, often even impossible to determine the minimax estimator. Nonetheless, in many cases a minimax estimator has been determined.

Example 3, Bounded Normal Mean: When estimating the Mean of a Normal Vector x \sim N(\theta,I_n \sigma^2)\,\!, where it is known that \|\theta\|^2 \leq M\,\!. The Bayes estimator with respect to a prior which is uniformly distributed on the edge of the bounding sphere is known to be minimax whenever M \leq n\,\!. The analytical expression for this estimator is

\delta^M=\frac{nJ_{n+1}(n\|x\|)}{\|x\|J_{n}(n\|x\|)}, \,

where J_{n}(t)\,\!, is the modified Bessel function of the first kind of order n.

Relationship to Robust Optimization

Robust optimization is an approach to solve optimization problems under uncertainty in the knowledge of underlying parameters,.[1][2] For instance, the MMSE Bayesian estimation of a parameter requires the knowledge of parameter correlation function. If the knowledge of this correlation function is not perfectly available, a popular minimax robust optimization approach[3] is to define a set characterizing the uncertainty about the correlation function, and then pursuing a minimax optimization over the uncertainty set and the estimator respectively. Similar minimax optimizations can be pursued to make estimators robust to certain imprecisely known parameters. For instance, a recent study dealing with such techniques in the area of signal processing can be found in [4].

In R. Fandom Noubiap and W. Seidel (2001) an algorithm for calculating a Gamma-minimax decision rule has been developed, when Gamma is given by a finite number of generalized moment conditions. Such a decision rule minimizes the maximum of the integrals of the risk function with respect to all distributions in Gamma. Gamma-minimax decision rules are of interest in robustness studies in Bayesian statistics.


  • E. L. Lehmann and G. Casella (1998), Theory of Point Estimation, 2nd ed. New York: Springer-Verlag.
  • F. Perron and E. Marchand (2002), "On the minimax estimator of a bounded normal mean," Statistics and Probability Letters 58: 327–333.
  • J. O. Berger (1985), Statistical Decision Theory and Bayesian Analysis, 2nd ed. New York: Springer-Verlag. ISBN 0-387-96098-8.
  • R. Fandom Noubiap and W. Seidel (2001), „An Algorithm for Calculating Gamma-Minimax Decision Rules under Generalized Moment Conditions“ ; Annals of Statistics, Aug., 2001, vol. 29, no. 4, p. 1094–1116
  • Stein, C. (1981). "Estimation of the mean of a multivariate normal distribution". Annals of Statistics 9 (6): 1135–1151. doi:10.1214/aos/1176345632. MR630098. Zbl 0476.62035. 
  1. ^ S. A. Kassam and H. V. Poor (1985), "Robust Techniques for Signal Processing: A Survey," Proceedings of the IEEE, vol. 73, pp. 433–481, March 1985.
  2. ^ A. Ben-Tal, L. El Ghoaoui, and A. Nemirovski (2009), "Robust Optimization", Princeton University Press, 2009.
  3. ^ S. Verdu and H. V. Poor (1984), "On Minimax Robustness: A general approach and applications," IEEE Transactions on Information Theory, vol. 30, pp. 328–340, March 1984.
  4. ^ M. Danish Nisar. "Minimax Robustness in Signal Processing for Communications", Shaker Verlag, ISBN 978-3-8440-0332-1, August 2011.

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Minimax — This article is about the decision theory concept. For other uses, see Minimax (disambiguation). Minimax (sometimes minmax) is a decision rule used in decision theory, game theory, statistics and philosophy for minimizing the possible loss for a… …   Wikipedia

  • James-Stein estimator — The James Stein estimator is a nonlinear estimator which can be shown to dominate, or outperform, the ordinary (least squares) technique. As such, it is the best known example of Stein s phenomenon.An earlier version of the estimator was… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • List of mathematics articles (M) — NOTOC M M estimator M group M matrix M separation M set M. C. Escher s legacy M. Riesz extension theorem M/M/1 model Maass wave form Mac Lane s planarity criterion Macaulay brackets Macbeath surface MacCormack method Macdonald polynomial Machin… …   Wikipedia

  • Regret (decision theory) — Regret (often also called opportunity loss) is defined as the difference between the actual payoff and the payoff that would have been obtained if a different course of action had been chosen. This is also called difference regret. Furthermore,… …   Wikipedia

  • Robust statistics — provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions. Contents 1 Introduction 2 Examples of robust and non robust… …   Wikipedia

  • Linear least squares — is an important computational problem, that arises primarily in applications when it is desired to fit a linear mathematical model to measurements obtained from experiments. The goals of linear least squares are to extract predictions from the… …   Wikipedia

  • Linear least squares/Proposed — Linear least squares is an important computational problem, that arises primarily in applications when it is desired to fit a linear mathematical model to observations obtained from experiments. Mathematically, it can be stated as the problem of… …   Wikipedia

  • Loss function — In statistics and decision theory a loss function is a function that maps an event onto a real number intuitively representing some cost associated with the event. Typically it is used for parameter estimation, and the event in question is some… …   Wikipedia

  • Chebyshev center — In geometry, the Chebyshev center of a bounded set Q having non empty interior is the center of the minimal radius ball enclosing the entire set Q. In the field of parameter estimation, the Chebyshev center approach tries to find an estimator for …   Wikipedia