DATA MINING
Desktop Survival Guide
by Graham Williams

Basics

Use printcp to view the performance of the model.

> printcp(wine.rpart) Classification tree: rpart(formula = Type ~ ., data = wine) Variables actually used in tree construction: [1] Dilution Flavanoids Hue Proline Root node error: 107/178 = 0.60112 n= 178 CP nsplit rel error xerror xstd 1 0.495327 0 1.00000 1.00000 0.061056 2 0.317757 1 0.50467 0.47664 0.056376 3 0.056075 2 0.18692 0.28037 0.046676 4 0.028037 3 0.13084 0.23364 0.043323 5 0.010000 4 0.10280 0.21495 0.041825

We can note that:

$\begin{displaymath}rel error = rel error(before) - (nsplit - nsplit(before)) * CP(before)\end{displaymath}$

The predict function will apply the model to data. The data must contain the same variable on which the model was built. If not an error is generated. This is a common problem when wanting to apply the model to a new dataset that does not contain all the same variables, but does contain the variables you are interested in.

> cols <- c("Type", "Dilution", "Flavanoids", "Hue", "Proline") > predict(wine.rpart, wine[,cols]) Error in eval(expr, envir, enclos) : Object "Alcohol" not found

Fix this up with

> wine.rpart <- rpart(Type ~ Dilution + Flavanoids + Hue + Proline, data=wine) > predict(wine.rpart, wine[,cols]) 1 2 3 1 0.96610169 0.03389831 0.00000000 2 0.96610169 0.03389831 0.00000000 [...] 70 0.03076923 0.93846154 0.03076923 71 0.00000000 0.25000000 0.75000000 [...] 177 0.00000000 0.25000000 0.75000000 178 0.00000000 0.02564103 0.97435897

Display a confusion matrix.

> table(predict(wine.rpart, wine, type="class"), wine$Type) 1 2 3 1 57 2 0 2 2 66 4 3 0 3 44

Subsections

Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.