![]() |
DATA MINING
Desktop Survival Guide by Graham Williams |
![]() |
|||
One of the classic machine learning techniques, widely deployed in
data mining, is decision tree induction. Using a simple algorithm and
a simple knowledge structure, the approach has proven to be very
effective. These simple tree structures represent a classification
(and regression) model. Starting at the root node, a simple question
is asked (usually a test on a variable value, like Age
35). The branches emanating from the node correspond to alternative
answers. For example, with a test of Age
35 the
alternatives would be Yes and No. Once a leaf node
is reached (one from which no branches emanate) we take the decision
or classification associated with that node. Some form of probability
may also be associated with the nodes, indicating a degree of
certainty for the decision. Decision tree algorithms handle mixed types of
variables, handle missing values, are robust to outliers and monotonic
transformations of the input, and robust to irrelevant
inputs. Predictive power tends to be poorer than other techniques.