Hierarchical Bayes model

Hierarchical Bayes model

The hierarchical Bayes method is one of the most important topics in modern Bayesian analysis. It is a powerful tool for expressing rich statistical models that more fully reflect a given problem than a simpler model could.

Given data x,! and parameters vartheta, a simple Bayesian analysis starts with a prior probability ("prior") p(vartheta) and likelihood p(x|vartheta) to compute a posterior probability p(vartheta|x) propto p(x|vartheta)p(vartheta).

Often the prior on vartheta depends in turn on other parameters varphi that are not mentioned in the likelihood. So, the prior p(vartheta) must be replaced by a prior p(vartheta|varphi), and a prior p(varphi) on the newly introduced parameters varphi is required, resulting in a posterior probability

:p(vartheta,varphi|x) propto p(x|vartheta)p(vartheta|varphi)p(varphi).

This is the simplest example of a "hierarchical Bayes model".

The process may be repeated; For example, the parameters varphi may depend in turn on additional parameters psi,!, which will require their own prior. Eventually the process must terminate, with priors that do not depend on any other unmentioned parameters.


Suppose we have measured n,! quantities x_i, i=1,dots,n,!, where the observed data x_i,! have been measured with normally distributed errors of known standard deviation sigma,!, e.g.,

:x_i sim N(vartheta_i, sigma^2)

Suppose we are interested in estimating the vartheta_i. An approach would be to estimate the vartheta_i using a maximum likelihood approach; since the observations are independent, the likelihood factorizes and the maximum likelihood estimate is simply

:vartheta_i = x_i

However, if the quantities are related, so that for example we may think that the individual vartheta_i have themselves been drawn from an underlying distribution, then this relationship destroys the independence and suggests a more complex model, e.g.,

:x_i sim N(vartheta_i,sigma^2),:vartheta_isim N(varphi, au^2)

with improper priors varphisimflat, ausimflat in (0,infty). When nge 3, this is an identified model, and the posterior distributions of the individual vartheta_i will tend to move, or "shrink" away from the maximum likelihood estimates towards their common mean. This "shrinkage" is a typical behavior in hierarchical Bayes models.

"More examples needed."

Restrictions on priors

Some care is needed when choosing priors in a hierarchical model, particularly on scale variables at higher levels of the hierarchy such as the variable au,! in the example. The usual priors such as the Jeffreys prior often do not work, because the posterior distribution will be improper (not normalizable), and estimates made by minimizing the expected loss will be inadmissible.

"This section needs significant expansion."

Representation by directed acyclic graphs (DAGs)

A useful graphical tool for representing hierarchical Bayes models is the directed acyclic graph, or DAG. In this diagram, the likelihood function is represented as the root of the graph; each prior is represented as a separate node pointing to the node that depends on it. In a simple Bayesian model, the data x are at the root of the diagram, representing the likelihood p(x|vartheta), and the variable vartheta is placed in a node that points to the root, as in the following diagram:

: vartheta { ightarrow} x

:"Better would be a figure, but this will do for the time being"

In the simplest hierarchical Bayes model, where vartheta in turn depends on a new variable varphi, a new node labelled varphi is indicated, with an arrow pointed towards the node vartheta. See also Bayesian networks.

: varphi { ightarrow} vartheta { ightarrow} x

:"Better would be a figure, but this will do for the time being"

"Significant expansion required."


*Gelman, A., "et al." (2004), "Bayesian Data Analysis", Second Edition. Boca Raton: Chapman & Hall/CRC. Chapter 5.

External links

* [http://www.biomedcentral.com/1471-2105/7/514/abstract A hierarchical Bayes Model for handling sample heterogeneity in classification problems] , provides a classification model taking into consideration the uncertainty associated with measuring replicate samples.

* [http://www.labmedinfo.org/download/lmi339.pdf Hierarchical Naive Bayes Model for handling sample uncertainty] , shows how to perform classification and learning with continuous and discrete variables with replicated measurements.

Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Mixture model — See also: Mixture distribution In statistics, a mixture model is a probabilistic model for representing the presence of sub populations within an overall population, without requiring that an observed data set should identify the sub population… …   Wikipedia

  • Bag of words model in computer vision — This is an article introducing the Bag of words model (BoW) in computer vision, especially for object categorization. From now, the BoW model refers to the BoW model in computer vision unless explicitly declared.Before introducing the BoW model,… …   Wikipedia

  • Naive Bayes classifier — A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes theorem with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be independent feature model . In… …   Wikipedia

  • Empirical Bayes method — In statistics, empirical Bayes methods are a class of methods which use empirical data to evaluate / approximate the conditional probability distributions that arise from Bayes theorem. These methods allow one to estimate quantities… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • Hierarchy — A hierarchy (Greek: hierarchia (ἱεραρχία), from hierarches, leader of sacred rites ) is an arrangement of items (objects, names, values, categories, etc.) in which the items are represented as being above, below, or at the same level as one… …   Wikipedia

  • Bayesian inference — is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true. The name Bayesian comes from the frequent use of Bayes theorem in the inference process. Bayes theorem… …   Wikipedia

  • Greg Allenby — Greg M. Allenby (* 1. August 1956) ist ein US amerikanischer Wirtschaftswissenschaftler. Sein Forschungsgebiet sind die Anwendung des Bayestheorems für die statistische Auswertung im Marketing.[1] Inhaltsverzeichnis 1 Leben 2 Werke 2.1 …   Deutsch Wikipedia

  • Greg M. Allenby — (* 1. August 1956) ist ein US amerikanischer Wirtschaftswissenschaftler. Sein Forschungsgebiet sind die Anwendung des Bayestheorems für die statistische Auswertung im Marketing.[1] Inhaltsverzeichnis 1 Leben 2 Werke 2.1 Redaktionelle Mitarbeit …   Deutsch Wikipedia

  • MaxDiff — Maximum difference scaling (MaxDiff) is a discrete choice model first described by Jordan Louviere in 1987 while on the faculty at the University of Alberta. The first working papers and publications occurred in the early 1990s. With MaxDiff,… …   Wikipedia