- Bootstrap aggregating
**Bootstrap aggregating**(**bagging**) is ameta-algorithm to improvemachine learning ofclassification andregression models in terms of stability andclassification accuracy. It also reducesvariance and helps to avoidoverfitting . Although it is usually applied todecision tree models, it can be used with any type of model. Bagging is a special case of themodel averaging approach.Given a standard

training set "D" of size "n", bagging generates "m" new training sets $D\_i$ of size "n"' ≤ "n", by sampling examples from "D" uniformly and with replacement. By sampling with replacement it is likely that some examples will be repeated in each $D\_i$. If "n"'="n", then for large "n" the set $D\_i$ expected to have 63.2% of the unique examples of "D", the rest being duplicates. This kind of sample is known as a bootstrap sample. The "m" models are fitted using the above "m" bootstrap samples and combined by averaging the output (for regression) or voting (for classification).Since the method averages several predictors, it is not useful for improving linear models.

**Example: Ozone data**This example is rather artificial, but illustrates the basic principles of bagging.

Rousseeuw and Leroy (1986) describe a data set concerning ozone levels. The data are available via the

classic data sets page. All computations were performed in R.A scatter plot reveals an apparently non-linear relationship between temperature and ozone. One way to model the relationship is to use a loess smoother. Such a smoother requires that a span parameter be chosen. In this example, a span of 0.5 was used.

One hundred bootstrap samples of the data were taken, and the

LOESS smoother was fit to each sample. Predictions from these 100 smoothers were then made across the range of the data. The first 10 predicted smooth fits appear as grey lines in the figure below. The lines are clearly very "wiggly" and they overfit the data - a result of the span being too low.The red line on the plot below represents the mean of the 100 smoothers. Clearly, the mean is more stable and there is less overfit. This is the bagged predictor.

**History**Bagging (

**B**ootstrap**agg**regat**ing**) was proposed byLeo Breiman in 1994 to improve the classification by combining classifications of randomly generated training sets. See Breiman, 1994. Technical Report No. 421.**References*** Cite journal

author =Leo Breiman

title = Bagging predictors

journal = Machine Learning

volume = 24

issue = 2

pages = 123140

year = 1996

url = http://citeseer.ist.psu.edu/breiman96bagging.html

doi = 10.1007/BF00058655

* Cite journal

author = S. Kotsiantis, P. Pintelas

title = Combining Bagging and Boosting

journal =International Journal of Computational Intelligence

volume = 1

issue = 4

pages = 324–333

year = 2004

url = http://www.math.upatras.gr/~esdlab/en/members/kotsiantis/ijci%20paper%20kotsiantis.pdf**See also***

Boosting

*Cross validation

*Wikimedia Foundation.
2010.*

### Look at other dictionaries:

**Bootstrapping (statistics)**— In statistics, bootstrapping is a modern, computer intensive, general purpose approach to statistical inference, falling within a broader class of resampling methods.Bootstrapping is the practice of estimating properties of an estimator (such as… … Wikipedia**Glossaire du data mining**— Exploration de données Articles principaux Exploration de données Fouille de données spatiales Fouille du web Fouille de flots de données Fouille de textes … Wikipédia en Français**Cross-validation (statistics)**— Cross validation, sometimes called rotation estimation,[1][2][3] is a technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and… … Wikipedia**List of statistics topics**— Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… … Wikipedia**Random naive Bayes**— extends the Naive Bayes classifier by adopting the random forest principles: random input selection (bagging, i.e. bootstrap aggregating) and random feature selection ( [Breiman, 2001] ). Naive Bayes classifier Naive Bayes is a probabilistic… … Wikipedia**Bagging**— can mean: *In statistics, data mining and machine learning, bootstrap aggregating *In mountaineering, peak bagging *In medicine, ventilating a patient with a bag valve mask *In agriculture, a form of reap hook bagging hook from verb to badge (or… … Wikipedia**AdaBoost**— AdaBoost, short for Adaptive Boosting, is a machine learning algorithm, formulated by Yoav Freund and Robert Schapire. It is a meta algorithm, and can be used in conjunction with many other learning algorithms to improve their performance.… … Wikipedia**List of mathematics articles (B)**— NOTOC B B spline B* algebra B* search algorithm B,C,K,W system BA model Ba space Babuška Lax Milgram theorem Baby Monster group Baby step giant step Babylonian mathematics Babylonian numerals Bach tensor Bach s algorithm Bachmann–Howard ordinal… … Wikipedia**Classification bayésienne naïve aléatoire**— La Classification bayésienne naïve aléatoire étend la Classification naïve bayesienne en adoptant les principes des forêts d arbres décisionnels : sélection aléatoire des entrées, bagging (i.e. « bootstrap aggregating ») et… … Wikipédia en Français