DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
One of the classic machine learning techniques, widely deployed in data mining, is decision tree induction. Using a simple algorithm and a simple knowledge structure, the approach has proven to be very effective. These simple tree structures represent a classification (and regression) model. Starting at the root node, a simple question is asked (usually a test on a variable value, like Age 35). The branches emanating from the node correspond to alternative answers. For example, with a test of Age 35 the alternatives would be Yes and No. Once a leaf node is reached (one from which no branches emanate) we take the decision or classification associated with that node. Some form of probability may also be associated with the nodes, indicating a degree of certainty for the decision. Decision tree algorithms handle mixed types of variables, handle missing values, are robust to outliers and monotonic transformations of the input, and robust to irrelevant inputs. Predictive power tends to be poorer than other techniques.