Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


Decision Tree Induction

[width=0.55]rattle-audit-model-rpart

[width=0.55]dm-dtree-example

One of the classic machine learning techniques, widely deployed in data mining, is decision tree induction. Using a simple algorithm and a simple knowledge structure, the approach has proven to be very effective. These simple tree structures represent a classification (and regression) model. Starting at the root node, a simple question is asked (usually a test on a variable value, like Age $<$ 35). The branches emanating from the node correspond to alternative answers. For example, with a test of Age $<$ 35 the alternatives would be Yes and No. Once a leaf node is reached (one from which no branches emanate) we take the decision or classification associated with that node. Some form of probability may also be associated with the nodes, indicating a degree of certainty for the decision. Decision tree algorithms handle mixed types of variables, handle missing values, are robust to outliers and monotonic transformations of the input, and robust to irrelevant inputs. Predictive power tends to be poorer than other techniques.



Subsections
Copyright © 2004-2006 Graham.Williams@togaware.com
Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.