AdaBoost, short for Adaptive Boosting, is a machine learning algorithm, formulated by Yoav Freund and Robert Schapire. It is a meta-algorithm, and can be used in conjunction with many other learning algorithms to improve their performance. AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers. Otherwise, it is less susceptible to the overfitting problem than most learning algorithms.

AdaBoost calls a weak classifier repeatedly in a series of rounds t = 1,ldots,T. For each call a distribution of weights D_{t} is updated that indicates the importance of examples in the data set for the classification. On each round, the weights of each incorrectly classified example are increased (or alternatively, the weights of each correctly classified example are decreased), so that the new classifier focuses more on those examples.

The algorithm for the binary classification task

Given: (x_{1},y_{1}),ldots,(x_{m},y_{m}) where x_{i} in X,, y_{i} in Y = {-1, +1}

Initialise D_{1}(i) = frac{1}{m}, i=1,ldots,m.

For t = 1,ldots,T:

* Find the classifier h_{t} : X o {-1,+1} that minimizes the error with respect to the distribution D_{t}:
h_{t} = arg min_{h_{j} in mathcal{H epsilon_{j}, where epsilon_{j} = sum_{i=1}^{m} D_{t}(i) [y_i e h_{j}(x_{i})]
* Prerequisite: epsilon_{t} < 0.5, otherwise stop.
* Choose alpha_{t} in mathbf{R}, typically alpha_{t}=frac{1}{2} extrm{ln}frac{1-epsilon_{t{epsilon_{t where epsilon_{t} is the weighted error rate of classifier h_{t}.
* Update:
D_{t+1}(i) = frac{ D_{t}(i) , e^{- alpha_{t} y_{i} h_{t}(x_{i})} }{ Z_{t} }
where Z_{t} is a normalization factor (chosen so that D_{t+1} will be a probability distribution, i.e. sum one over all x).

Output the final classifier:

H(x) = extrm{sign}left( sum_{t=1}^{T} alpha_{t}h_{t}(x) ight)

The equation to update the distribution D_{t} is constructed so that:

e^{- alpha_{t} y_{i} h_{t}(x_{i})} egin{cases} <1, & y(i)=h_{t}(x_{i}) \ >1, & y(i) e h_{t}(x_{i}) end{cases}

Thus, after selecting an optimal classifier h_{t} , for the distribution D_{t} ,, the examples x_{i} , that the classifier h_{t} , identified correctly are weighted less and those that it identified incorrectly are weighted more. Therefore, when the algorithm is testing the classifiers on the distribution D_{t+1} ,, it will select a classifier that better identifies those examples that the previous classifer missed.

tatistical Understanding of Boosting

Boosting can be seen as minimization of a convex loss function over a convex set of functions. [T. Zhang, "Convex Risk Minimization", Annals of Statistics, 2004.] Specifically, the loss being minimized is the exponential loss

:sum_i e^{-y_i f(x_i)}

and we are seeking a function

:f = sum_t alpha_t h_t

ee also

* Bootstrap aggregating
* LPBoost
* GentleBoost


External links

* [ AdaBoost] Presentation summarizing Adaboost (see page 4 for an illustrated example of performance)
* [ A Short Introduction to Boosting] Introduction to Adaboost by Freund and Schapire from 1999
* [ A decision-theoretic generalization of on-line learning and an application to boosting] "Journal of Computer and System Sciences", no. 55. 1997 (Original paper of Yoav Freund and Robert E.Schapire where Adaboost is first introduced.)
* [ An applet demonstrating AdaBoost]
* [ Ensemble Based Systems in Decision Making] , R. Polikar, IEEE Circuits and Systems Magazine, vol.6, no.3, pp. 21-45, 2006. A tutorial article on ensemble systems including pseudocode, block diagrams and implementation issues for AdaBoost and other ensemble learning algorithms.
* [] adaboost
* [ A Matlab Implementation of AdaBoost]
* [] by Jerome Friedman, Trevor Hastie, Robert Tibshirani. Paper introducing probabilistic theory for AdaBoost, and introducing GentleBoost

Wikimedia Foundation. 2010.