Data Mining Survivor: Measurement_Issues

DATA MINING
Desktop Survival Guide
by Graham Williams

Imbalanced Decisions

Model accuracy is not such an appropriate measure of performance when the data has a very imbalanced distribution of outcomes. For example, if positive cases account for just 1% of all cases, as might be the case in an insurance dataset recording cases of fraud, then the most accurate, but most useless, of models is one that predicts no fraud in all cases. It will be 99% accurate!

Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.