DATA MINING
Desktop Survival Guide
by Graham Williams

Measuring Data Distributions

We now start to explore how the data in each of the variables is distributed. This might be as simple as looking at the spread of the numeric values, or the number of entities having a specific value for a variable. Another aspect involves measuring the central tendency of data, or determining the http://en.wikipedia.org/wiki/meanmean and http://en.wikipedia.org/wiki/medianmedian. Yet another is a measure of the spread or http://en.wikipedia.org/wiki/variancevariance of the data from this central tendency. We again begin with textual presentations of the distributions, and then graphical presentations.

Subsections

Textual Summaries
Boxplot
- Multiple Boxplots
- Boxplot by Class
Box and Whisker Plot
Box and Whisker Plot: With Means
Clustered Box Plot

Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.