Margin classifier

Margin classifier

In machine learning, a margin classifer is a classifier which is able to give an associated distance from the decision boundary for each example. For instance, if a linear classifier (e.g. perceptron or linear discriminant analysis) is used, the distance (typically euclidean distance, though others may be used) of an example from the separating hyperplane is the margin of that example.

The notion of margin is important in several machine learning classification algorithms, as it can be used to bound the generalization error of the classifier. These bounds are frequently shown using the VC dimension. Of particular prominence is the generalization error bound on boosting algorithms and support vector machines.

Contents

Support vector machine definition of margin

See support vector machines and maximum-margin hyperplane for details.

Boosting definition of margin

The margin for an iterative boosting algorithm given a set of examples with two classes can be defined as follows. The classifier is given an example pair (x,y) where x \in X is a domain space and y \in Y = \{-1, +1\} is the label of the example. The iterative boosting algorithm then selects a classifier h_j \in C at each iteration j where C is a space of possible classifiers that predict real values. This hypothesis is then weighted by \alpha_j \in R as selected by the boosting algorithm. At iteration t, The margin of an example x can thus be defined as

\frac{y \sum_j^t \alpha_j h_j (x)}{\sum |\alpha_j|}

By this definition, the margin is positive if the example is labeled correctly and negative is the example is labeled incorrectly.

This definition may be modified and is not the only way to define margin for boosting algorithms. However, there are reasons why this definition may be appealing (see the Schapire et al. paper for details [1]).

Examples of margin-based algorithms

Many classifiers can give an associated margin for each example. However, only some classifiers utilize information of the margin while learning from a data set.

Many boosting algorithms rely on the notion of a margin to give weights to examples. If a convex loss is utilized (as in AdaBoost, LogitBoost, and all members of the AnyBoost family of algorithms) then an example with higher margin will receive less (or equal) weight than an example with lower margin. This leads the boosting algorithm to focus weight on low margin examples. In nonconvex algorithms (e.g. BrownBoost), the margin still dictates the weighting of an example, though the weighting is non-monotone with respect to margin. There exists boosting algorithms that provably maximize the minimum margin (e.g. see [2]).

Support vector machines provably maximize the margin of the separating hyperplane. Support vector machines that are trained using noisy data (there exists no perfect separation of the data in the given space) maximize the soft margin. More discussion of this can be found in the support vector machine article.

The voted-perceptron algorithm is a margin maximizing algorithm based on an iterative application of the classic perceptron algorithm.

Generalization error bounds

One theoretical motivation behind margin classifiers is that their generalization error may be bound by parameters of the algorithm and a margin term. An example of such a bound is for the AdaBoost algorithm[1]. Let S be a set of m examples sampled independently at random from a distribution D. Assume the VC-dimension of the underlying base classifier is d and m \geq d \geq 1. Then with probability 1 − δ we have the bound

P_D\left( \frac{y \sum_j^t \alpha_j h_j (x)}{\sum |\alpha_j|} \leq 0\right)          \leq         P_S\left(\frac{y \sum_j^t \alpha_j h_j (x)}{\sum |\alpha_j|} \leq \theta\right)  + O\left(\frac{1}{\sqrt{m}} \sqrt{d\log^2(m/d)/ \theta^2  + \log(1/\delta)}\right)

for all θ > 0.

References

  1. ^ a b Robert E. Schapire, Yoav Freund, Peter Bartlett and Wee Sun Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651–1686, 1998.
  2. ^ Manfred Warmuth and Karen Glocer and Gunnar Rätsch. Boosting Algorithms for Maximizing the Soft Margin. In the Proceedings of Advances in Neural Information Processing Systems 20, 2007, pp 1585–1592.

Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Margin (machine learning) — In machine learning the margin of a single data point is defined to be the distance from the data point to a decision boundary. Note that there are many distances and decision boundaries that may be appropriate for certain datasets and goals. A… …   Wikipedia

  • Linear classifier — In the field of machine learning, the goal of classification is to group items that have similar feature values, into groups. A linear classifier achieves this by making a classification decision based on the value of the linear combination of… …   Wikipedia

  • Support vector machine — Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. Viewing input data as two sets of vectors in an n dimensional space, an SVM will construct a separating hyperplane in that… …   Wikipedia

  • Support-Vector-Machine — Eine Support Vector Machine [səˈpɔːt ˈvektə məˈʃiːn] (SVM, die Übersetzung aus dem Englischen, „Stützvektormaschine“ oder Stützvektormethode, ist nicht gebräuchlich) ist ein Klassifikator (vgl. Klassifizierung). Eine Support Vector Machine… …   Deutsch Wikipedia

  • Support-Vector-Maschine — Eine Support Vector Machine [səˈpɔːt ˈvektə məˈʃiːn] (SVM, die Übersetzung aus dem Englischen, „Stützvektormaschine“ oder Stützvektormethode, ist nicht gebräuchlich) ist ein Klassifikator (vgl. Klassifizierung). Eine Support Vector Machine… …   Deutsch Wikipedia

  • Support-Vektor-Maschine — Eine Support Vector Machine [səˈpɔːt ˈvektə məˈʃiːn] (SVM, die Übersetzung aus dem Englischen, „Stützvektormaschine“ oder Stützvektormethode, ist nicht gebräuchlich) ist ein Klassifikator (vgl. Klassifizierung). Eine Support Vector Machine… …   Deutsch Wikipedia

  • Supportvektormaschine — Eine Support Vector Machine [səˈpɔːt ˈvektə məˈʃiːn] (SVM, die Übersetzung aus dem Englischen, „Stützvektormaschine“ oder Stützvektormethode, ist nicht gebräuchlich) ist ein Klassifikator (vgl. Klassifizierung). Eine Support Vector Machine… …   Deutsch Wikipedia

  • Support Vector Machine — Eine Support Vector Machine [səˈpɔːt ˈvektə məˈʃiːn] (SVM, die Übersetzung aus dem Englischen, „Stützvektormaschine“ oder Stützvektormethode, ist nicht gebräuchlich) ist ein Klassifikator (vgl. Klassifizierung). Eine Support Vector Machine… …   Deutsch Wikipedia

  • Ballungsanalyse — Unter Clusteranalyse (der Begriff Ballungsanalyse wird selten verwendet) versteht man strukturentdeckende, multivariate Analyseverfahren zur Ermittlung von Gruppen (Clustern) von Objekten, deren Eigenschaften oder Eigenschaftsausprägungen… …   Deutsch Wikipedia

  • Cluster-Analyse — Unter Clusteranalyse (der Begriff Ballungsanalyse wird selten verwendet) versteht man strukturentdeckende, multivariate Analyseverfahren zur Ermittlung von Gruppen (Clustern) von Objekten, deren Eigenschaften oder Eigenschaftsausprägungen… …   Deutsch Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”