VC dimension


VC dimension

In computational learning theory, the VC dimension (for Vapnik-Chervonenkis dimension) is a measure of the capacity of a statistical classification algorithm, defined as the cardinality of the largest set of points that the algorithm can shatter. It is a core concept in Vapnik-Chervonenkis theory, and was originally defined by Vladimir Vapnik and Alexey Chervonenkis.

Informally, the capacity of a classification model is related to how complicated it can be. For example, consider the thresholding of a high-degree polynomial: if the polynomial evaluates above zero, that point is classified as positive, otherwise as negative. A high-degree polynomial can be wiggly, so it can fit a given set of training points well. But one can expect that the classifier will make errors on other points, because it is too wiggly. Such a polynomial has a high capacity. A much simpler alternative is to threshold a linear function. This polynomial may not fit the training set well, because it has a low capacity. We make this notion of capacity more rigorous below.

Shattering

A classification model f with some parameter vector heta is said to "shatter" a set of data points (x_1,x_2,ldots,x_n) if, for all assignments of labels to those points, there exists a heta such that the model f makes no errors when evaluating that set of data points.

VC dimension of a model f is h' where h' is the maximum h such that some data point set of cardinality h can be shattered by f.

For example, consider a straight line as the classification model: the model used by a perceptron. The line should separate positive data points from negative data points. When there are 3 points that are not collinear, the line can shatter them. However, the line cannot shatter four points. Thus, the VC dimension of this particular classifier is 3. It is important to remember that one can choose the arrangement of points, but then cannot change it as the labels on the points are permuted. Note, only 3 of the 8 possible permutations are shown for the 3 points.

Uses

The VC dimension has utility in statistical learning theory, because it can predict a probabilistic upper bound on the test error of a classification model.

The bound on the test error of a classification model (on data that is drawn i.i.d. from the same distribution as the training set) is given by

:Training error + sqrt{h(log(2N/h)+1)-log(eta/4)over N}

with probability 1-eta, where h is the VC dimension of the classification model, and N is the size of the training set (restriction: this formula is valid when the VC dimension is small h).

References

* Andrew Moore's [http://www-2.cs.cmu.edu/~awm/tutorials/vcdim.html VC dimension tutorial]
* V. Vapnik and A. Chervonenkis. "On the uniform convergence of relative frequencies of events to their probabilities." "Theory of Probability and its Applications", 16(2):264--280, 1971.
* A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. "Learnability and the Vapnik-Chervonenkis dimension." "Journal of the ACM", 36(4):929--865, 1989.
* Christopher Burges Tutorial on SVMs for Pattern Recognition (containing information also for VC dimension) [http://citeseer.ist.psu.edu/burges98tutorial.html]


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • dimension — [ dimɑ̃sjɔ̃ ] n. f. • 1425; lat. dimensio, de metiri « mesurer » I ♦ 1 ♦ Grandeur réelle, mesurable, qui détermine la portion d espace occupée par un corps. ⇒ étendue, grandeur, grosseur. Dimension relative. ⇒ proportion. « Notre âme est jetée… …   Encyclopédie Universelle

  • Dimension de Hamel — Dimension Voir « dimension » sur le Wiktionnaire …   Wikipédia en Français

  • Dimension spatiale — Dimension Voir « dimension » sur le Wiktionnaire …   Wikipédia en Français

  • Dimension X (TMNT) — Dimension X is a alternate dimension in the Teenage Mutant Ninja Turtles (TMNT) 1987 cartoon series and the TMNT Adventures comics. Dimension X is the home of the evil alien Krang. Krang uses a dimensional portal inside the Technodrome for… …   Wikipedia

  • Dimension D'un Espace Vectoriel — En mathématiques, la dimension d un espace vectoriel E est le cardinal (c est à dire le nombre de vecteurs) de toute base de E. Elle est parfois appelée la dimension de Hamel ou la dimension algébrique à distinguer d autres types de dimension.… …   Wikipédia en Français

  • Dimension box-counting — Dimension de Minkowski–Bouligand En géométrie fractale, la dimension de Minkowski–Bouligand, également appelée dimension de Minkowski ou dimension box counting, est une manière de déterminer la dimension fractale d un ensemble S dans un Espace… …   Wikipédia en Français

  • Dimension de Minkowski — Dimension de Minkowski–Bouligand En géométrie fractale, la dimension de Minkowski–Bouligand, également appelée dimension de Minkowski ou dimension box counting, est une manière de déterminer la dimension fractale d un ensemble S dans un Espace… …   Wikipédia en Français

  • Dimensión fractal — Saltar a navegación, búsqueda En geometría de fractales, la dimensión fractal, D es una cantidad estadística que da una idea de cuán completamente parece llenar un fractal el espacio conforme se amplía el primero hacia escalas más y más finas.… …   Wikipedia Español

  • Dimension Jump (convention) — Dimension Jump Status Active Genre Red Dwarf Location Peterborough Country United Kingdom First held 1992 Last held …   Wikipedia

  • Dimension De Hausdorff — En topologie, la dimension de Hausdorff d un espace métrique (X,d) est un nombre réel positif ou nul, éventuellement l infini. Si deux métriques sont Lipschitz équivalentes, alors elles ont la même dimension de Hausdorff. Introduite en 1918 par… …   Wikipédia en Français

  • Dimension de hausdorff — En topologie, la dimension de Hausdorff d un espace métrique (X,d) est un nombre réel positif ou nul, éventuellement l infini. Si deux métriques sont Lipschitz équivalentes, alors elles ont la même dimension de Hausdorff. Introduite en 1918 par… …   Wikipédia en Français