|
DATA MINING
Desktop Survival Guide by Graham Williams |
|
|||
Use printcp to view the performance of the model.
> printcp(wine.rpart)
Classification tree:
rpart(formula = Type ~ ., data = wine)
Variables actually used in tree construction:
[1] Dilution Flavanoids Hue Proline
Root node error: 107/178 = 0.60112
n= 178
CP nsplit rel error xerror xstd
1 0.495327 0 1.00000 1.00000 0.061056
2 0.317757 1 0.50467 0.47664 0.056376
3 0.056075 2 0.18692 0.28037 0.046676
4 0.028037 3 0.13084 0.23364 0.043323
5 0.010000 4 0.10280 0.21495 0.041825
|
We can note that:
The predict function will apply the model to data. The
data must contain the same variable on which the model was built. If
not an error is generated. This is a common problem when wanting to
apply the model to a new dataset that does not contain all the same
variables, but does contain the variables you are interested in.
> cols <- c("Type", "Dilution", "Flavanoids", "Hue", "Proline")
> predict(wine.rpart, wine[,cols])
Error in eval(expr, envir, enclos) : Object "Alcohol" not found
|
Fix this up with
> wine.rpart <- rpart(Type ~ Dilution + Flavanoids + Hue + Proline,
data=wine)
> predict(wine.rpart, wine[,cols])
1 2 3
1 0.96610169 0.03389831 0.00000000
2 0.96610169 0.03389831 0.00000000
[...]
70 0.03076923 0.93846154 0.03076923
71 0.00000000 0.25000000 0.75000000
[...]
177 0.00000000 0.25000000 0.75000000
178 0.00000000 0.02564103 0.97435897
|
Display a confusion matrix.
> table(predict(wine.rpart, wine, type="class"), wine$Type)
1 2 3
1 57 2 0
2 2 66 4
3 0 3 44
|