DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Suppose we have a database of patients for whom we have, in the past, identified suitable types of contact lenses. Four categorical variables might describe each patient: Age (having possible values young, pre-presbyopic, and presbyopic), type of Prescription (myope and hypermetrope), whether the patient is Astigmatic (boolean), and TearRate (reduced or normal). The patients are already classified into one of three classes indicating the type of contact lens to fit: hard, soft, none.
Sample data is presented in Table .
We can make use of this historic data to talk about the probabilities
of requiring soft, hard, or no lenses, according to
a patient's Age, Prescription, Astigmatic
condition and TearRate. Having a probabilistic model we could
then make predictions using the model about the suitability of
particular lens types for new clients. Thus, if we know historically
that the probability of a patient requiring a soft contact lens, given
that they were young and had a normal rate of tear
production, was 0.9 (i.e., 90%) then we would supply soft lens to
most such patients in the future. Symbolically we write this
probability as:
Copyright © 2004-2006 Graham.Williams@togaware.com Support further development through the purchase of the PDF version of the book.