Algebraic statistics

﻿
Algebraic statistics

Algebraic statistics is a fairly recent field of statistics which utilizes the tools of algebraic geometry and commutative algebra in order to study problems related to discrete random variables with finite state spaces. Such problems include parameter estimation, hypothesis testing, and experimental design. The key connection between statistics and algebra is the observation that many commonly used classes of discrete random variables can be viewed as algebraic varieties.

Introductory example

Consider a random variable "X" which can take on the values 0, 1, 2. Such a variable is completely characterized by the three probabilities :$p_i=mathrm\left\{Pr\right\}\left(X=i\right),quad i=0,1,2$and these numbers clearly satisfy:$sum_\left\{i=0\right\}^2 p_i = 1 quad mbox\left\{and\right\}quad 0leq p_i leq 1.$Conversely, any three such numbers unambiguously specify a random variable, so we can identify the random variable "X" with the tuple ("p"0,"p"1,"p"2)&isin;R3.

Now suppose "X" is a Binomial random variable with parameter "p = q" and "n = 2", i.e. "X" represents the number of successes when repeating a certain experiment two times, where each experiment has an individual success probability of "q". Then :$p_i=mathrm\left\{Pr\right\}\left(X=i\right)=\left\{2 choose i\right\}q^i \left(1-q\right)^\left\{2-i\right\}$and it is not hard to show that the tuples ("p"0,"p"1,"p"2) which arise in this way are precisely the ones satisfying:$4 p_0 p_2-p_1^2=0.$The latter is a polynomial equation defining an algebraic variety (or surface) in R3, and this variety, when intersected with the simplex given by:$sum_\left\{i=0\right\}^2 p_i = 1 quad mbox\left\{and\right\}quad 0leq p_i leq 1,$ yields a piece of an algebraic curve which may be identified with the set of all 3-state Bernoulli variables. Determining the parameter "q" amounts to locating one point on this curve; testing the hypothesis that a given variable "X" is Bernoulli amounts to testing whether a certain point lies on that curve or not.

References

* [http://www.math.harvard.edu/~seths/assc.html Algebraic Statistics Short Course] , lecture notes by Seth Sullivant
* L. Pachter and B. Sturmfels. "Algebraic Statistics and Computational Biology." Cambridge University Press 2005.
* G. Pistone, E. Riccomango, H. P. Wynn. "Algebraic Statistics." CRC Press, 2001.

Wikimedia Foundation. 2010.

Look at other dictionaries:

• Algebraic geometry — This Togliatti surface is an algebraic surface of degree five. Algebraic geometry is a branch of mathematics which combines techniques of abstract algebra, especially commutative algebra, with the language and the problems of geometry. It… …   Wikipedia

• List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

• Mathematical statistics — is the study of statistics from a mathematical standpoint, using probability theory as well as other branches of mathematics such as linear algebra and analysis. The term mathematical statistics is closely related to the term statistical theory… …   Wikipedia

• Structured data analysis (statistics) — Structured data analysis is the statistical data analysis of structured data. Either in the form of a priori structure such as multiple choice questionnaires or in situations with the need to search for structure that fits the given data, either… …   Wikipedia

• List of mathematics articles (A) — NOTOC A A Beautiful Mind A Beautiful Mind (book) A Beautiful Mind (film) A Brief History of Time (film) A Course of Pure Mathematics A curious identity involving binomial coefficients A derivation of the discrete Fourier transform A equivalence A …   Wikipedia

• Segre embedding — In mathematics, the Segre embedding is used in projective geometry to consider the cartesian product of two or more projective spaces as a projective variety. It is named after Corrado Segre. Contents 1 Definition 2 Discussion 3 Properties …   Wikipedia

• Randomized block design — In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups (blocks) that are similar to one another. Typically, a blocking factor is a source of variability that is not of primary interest to …   Wikipedia

• Bernd Sturmfels — (* 28. März 1962 in Kassel) ist ein deutscher Mathematiker. Sturmfels Inhaltsverzeichnis 1 Biographie …   Deutsch Wikipedia

• Bioinformatics — For the journal, see Bioinformatics (journal). Map of the human X chromosome (from the NCBI website). Assembly of the human genome is one of the greatest achievements of bioinformatics. Bioinformatics …   Wikipedia

• Combinatorial design — theory is the part of combinatorial mathematics that deals with the existence and construction of systems of finite sets whose intersections have specified numerical properties. For instance, a balanced incomplete block design (usually called for …   Wikipedia