Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Basics

Use printcp to view the performance of the model.

> printcp(wine.rpart)

Classification tree:
rpart(formula = Type ~ ., data = wine)

Variables actually used in tree construction:
[1] Dilution   Flavanoids Hue        Proline

Root node error: 107/178 = 0.60112

n= 178

        CP nsplit rel error  xerror     xstd
1 0.495327      0   1.00000 1.00000 0.061056
2 0.317757      1   0.50467 0.47664 0.056376
3 0.056075      2   0.18692 0.28037 0.046676
4 0.028037      3   0.13084 0.23364 0.043323
5 0.010000      4   0.10280 0.21495 0.041825

The predict function will apply the model to data. The data must contain the same variable on which the model was built. If not an error is generated. This is a common problem when wanting to apply the model to a new dataset that does not contain all the same variables, but does contain the variables you are interested in.

> cols <- c("Type", "Dilution", "Flavanoids", "Hue", "Proline")
> predict(wine.rpart, wine[,cols])
Error in eval(expr, envir, enclos) : Object "Alcohol" not found

Fix this up with

> wine.rpart <- rpart(Type ~ Dilution + Flavanoids + Hue + Proline,
data=wine)
> predict(wine.rpart, wine[,cols])
             1          2          3
1   0.96610169 0.03389831 0.00000000
2   0.96610169 0.03389831 0.00000000
[...]
70  0.03076923 0.93846154 0.03076923
71  0.00000000 0.25000000 0.75000000
[...]
177 0.00000000 0.25000000 0.75000000
178 0.00000000 0.02564103 0.97435897

Display a confusion matrix.

> table(predict(wine.rpart, wine, type="class"), wine$Type)

     1  2  3
  1 57  2  0
  2  2 66  4
  3  0  3 44



Copyright © 2004-2006 Graham.Williams@togaware.com
Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.