DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
|
The http://en.wikipedia.org/wiki/BoostingBoosting meta-algorithm is an efficient, simple, and easy to program learning strategy. The popular variant called http://en.wikipedia.org/wiki/AdaBoostAdaBoost (an abbreviation for Adaptive Boosting) has been described as the ``best off-the-shelf classifier in the world'' (attributed to Leo Breiman by [p. 302]hastie.tibshirani.etal:2001:stats_learn). http://en.wikipedia.org/wiki/BoostingBoosting algorithms build multiple models from a dataset, using some other learning algorithm that need not be a particularly good learner. Boosting associates weights with entities in the dataset, and increases (boosts) the weights for those entities that are hard to accurately model. A sequence of models is constructed and after each model is constructed the weights are modified to give more weight to those entities that are harder to classify. In fact, the weights of such entities generally oscillate up and down from one model to the next. The final model is then an additive model constructed from the sequence of models, each model's output weighted by some score. There is little tuning required and little is assumed about the learner used, except that it should be a weak learner! We note that boosting can fail to perform if there is insufficient data or if the weak models are overly complex. Boosting is also susceptible to noise.