DATA MINING
Desktop Survival Guide
by Graham Williams

Basics

Use printcp to view the performance of the model.

> printcp(wine.rpart) Classification tree: rpart(formula = Type ~ ., data = wine) Variables actually used in tree construction: [1] Dilution Flavanoids Hue Proline Root node error: 107/178 = 0.60112 n= 178 CP nsplit rel error xerror xstd 1 0.495327 0 1.00000 1.00000 0.061056 2 0.317757 1 0.50467 0.47664 0.056376 3 0.056075 2 0.18692 0.28037 0.046676 4 0.028037 3 0.13084 0.23364 0.043323 5 0.010000 4 0.10280 0.21495 0.041825

The predict function will apply the model to data. The data must contain the same variable on which the model was built. If not an error is generated. This is a common problem when wanting to apply the model to a new dataset that does not contain all the same variables, but does contain the variables you are interested in.

> cols <- c("Type", "Dilution", "Flavanoids", "Hue", "Proline") > predict(wine.rpart, wine[,cols]) Error in eval(expr, envir, enclos) : Object "Alcohol" not found

Fix this up with

> wine.rpart <- rpart(Type ~ Dilution + Flavanoids + Hue + Proline, data=wine) > predict(wine.rpart, wine[,cols]) 1 2 3 1 0.96610169 0.03389831 0.00000000 2 0.96610169 0.03389831 0.00000000 [...] 70 0.03076923 0.93846154 0.03076923 71 0.00000000 0.25000000 0.75000000 [...] 177 0.00000000 0.25000000 0.75000000 178 0.00000000 0.02564103 0.97435897

Display a confusion matrix.

> table(predict(wine.rpart, wine, type="class"), wine$Type) 1 2 3 1 57 2 0 2 2 66 4 3 0 3 44

Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.