|   | DATA MINING Desktop Survival Guide by Graham Williams |   | |||
[#!freund.schapire:1995:decis_theor!#] introduced AdaBoost, popularising the idea of ensemble learning where a committee of models cooperate to deliver a better outcome. For a demonstration of AdaBoost visit http://www1.cs.columbia.edu/~freund/adaboost/.
The original formulation of the algorithm, as
in [#!freund.schapire:1995:decis_theor!#], adjusts all weights each
iteration. Weights are increased if the corresponding record is
misclassified by  or decreased if it is correctly
classified by
 or decreased if it is correctly
classified by  . The weights are then further normalised
each iteration to ensure they continue to represent a distribution (so
that
. The weights are then further normalised
each iteration to ensure they continue to represent a distribution (so
that 
 ). This can be simplified, as
by hastie.tibshirani.etal:2001:stats_learn, to only increase
the weights of the misclassified entities.  We use this simpler
formulation in the above description of the algorithm.  Consequently,
the calculation of
). This can be simplified, as
by hastie.tibshirani.etal:2001:stats_learn, to only increase
the weights of the misclassified entities.  We use this simpler
formulation in the above description of the algorithm.  Consequently,
the calculation of  (line 9) includes the sum of the
weights as a denominator (which is 1 in the original formulation). 
Only the weights associated with the misclassified entities are
modified in line 11. The original algorithm modified all weights by
 (line 9) includes the sum of the
weights as a denominator (which is 1 in the original formulation). 
Only the weights associated with the misclassified entities are
modified in line 11. The original algorithm modified all weights by
 which equates to
 which equates to  for misclassified entities (since either
for misclassified entities (since either  or
 or 
 is -1, but not both) and to
is -1, but not both) and to  for correctly classified
entities (since both
 for correctly classified
entities (since both  or
 or 
 are either 1 or -1). 
For each iteration the new weights in the original algorithm are
normalised by dividing each weight by a calculated factor
 are either 1 or -1). 
For each iteration the new weights in the original algorithm are
normalised by dividing each weight by a calculated factor  .
. 
Cost functions other than the exponential loss criterion  have
been proposed. These include the logistic log-likelihood criterion
 have
been proposed. These include the logistic log-likelihood criterion
 used in LogitBoost),
 used in LogitBoost),  (Doom II) and
 (Doom II) and
 (Support Vector Machines).
 (Support Vector Machines).
BrownBoost addresses the issue of the sensitivity of AdaBoost to noise.
We note that if each weak classifier is always better than chance, then AdaBoost can be proven to converge to a perfectly accurate model (no training error). Also note that even after achieving an ensemble model with no error, as we add in new models to the ensemble the generalisation error continues to improve (the margin continues to grow). Although it was thought, at first, that AdaBoost does not overfit the data, it has since been shown that it can. However, it generally does not, even for large numbers of iterations.
Extensions to AdaBoost include multi-class classification, application to regression (by transforming the problem into a binary classification task), and localised boosting which is similar to mixtures of experts.
Some early practical work on boosting was undertaken with the Australian Taxation Office using boosted stumps. Multiple, simple models of tax compliance were produced. The models were easily and independently interpretable. Effectively, the models identified a collection of factors that in combination were useful in predicting compliance.
Copyright © 2004-2006 Graham.Williams@togaware.com Support further development through the purchase of the PDF version of the book.