DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Generalised boosted models, as proposed by [#!friedman:2001:greedy_func_approx!#] and extended by [#!friedman:2002:stoch_gradient_boost!#], has been implemented for R as the gbm package by Greg Ridgeway. This is a much more extensive package for boosting than the boost package.
We illustrate AdaBoost using the distribution option of
the gbm function.
> library(gbm) > load("wine.RData") > ds <- wine > ds$Type <- as.numeric(ds$Type) > ds$Type[ds$Type>1] <- 0 > ds$Type [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > ds.gbm <- gbm(Type ~ Alcohol + Malic + Ash + Alcalinity + Magnesium + Phenols + Flavanoids + Nonflavanoids + Proanthocyanins + Color + Hue + Dilution + Proline, data=ds, distribution="adaboost", n.trees=100) Iter TrainDeviance ValidDeviance StepSize Improve 1 0.9408 nan 0.0010 0.0006 2 0.9402 nan 0.0010 0.0006 3 0.9394 nan 0.0010 0.0007 4 0.9387 nan 0.0010 0.0007 5 0.9381 nan 0.0010 0.0005 6 0.9374 nan 0.0010 0.0006 7 0.9368 nan 0.0010 0.0006 8 0.9361 nan 0.0010 0.0007 9 0.9354 nan 0.0010 0.0006 10 0.9349 nan 0.0010 0.0004 100 0.8750 nan 0.0010 0.0007 > summary(ds.gbm) var rel.inf 1 Proline 91.82978 2 Flavanoids 8.17022 3 Alcohol 0.00000 4 Malic 0.00000 5 Ash 0.00000 6 Alcalinity 0.00000 7 Magnesium 0.00000 8 Phenols 0.00000 9 Nonflavanoids 0.00000 10 Proanthocyanins 0.00000 11 Color 0.00000 12 Hue 0.00000 13 Dilution 0.00000 > pretty.gbm.tree(ds.gbm) SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight 0 12 8.675000e+02 1 2 3 65.36408 89 1 -1 -8.139656e-04 -1 -1 -1 0.00000 62 2 -1 9.236987e-04 -1 -1 -1 0.00000 27 3 -1 -2.868090e-04 -1 -1 -1 0.00000 89 Prediction 0 -0.0002868090 1 -0.0008139656 2 0.0009236987 3 -0.0002868090 > gbm.show.rules(ds.gbm) Number of models: 100 Tree 1: Weight XXXX Proline < 867.50 : 0 (XXXX/XXXX) Proline >= 867.50 : 1 (XXXX/XXXX) Proline missing : 0 (XXXX/XXXX) [...] Tree 100: Weight XXXX Proline < 755.00 : 0 (XXXX/XXXX) Proline >= 755.00 : 1 (XXXX/XXXX) Proline missing : 0 (XXXX/XXXX) |