DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
The wine dataset contains the results of a chemical analysis of wines grown in a specific area of Italy. Three types of wine are represented in the 178 samples, with the results of 13 chemical analyses recorded for each sample. Note that we transform the Type into a categorical variable, but this information is only recovered in the binary R dataset, and not the CSV dataset.
UCI <- "ftp://ftp.ics.uci.edu/pub" REPOS <- "machine-learning-databases" wine.url <- sprintf("%s/%s/wine/wine.data", UCI, REPOS) wine <- read.csv(wine.url, header=F) colnames(wine) <- c('Type', 'Alcohol', 'Malic', 'Ash', 'Alcalinity', 'Magnesium', 'Phenols', 'Flavanoids', 'Nonflavanoids', 'Proanthocyanins', 'Color', 'Hue', 'Dilution', 'Proline') wine$Type <- as.factor(wine$Type) write.table(wine, "wine.csv", sep=",", row.names=FALSE) save(wine, file="wine.Rdata", compress=TRUE) } |
At a later time you can simply read in the CSV dataset or else load in the R dataset:
> wine <- read.csv("wine.csv") OR > load("wine.RData") > dim(wine) [1] 178 14 > str(wine) `data.frame': 178 obs. of 14 variables: $ Type : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ... $ Alcohol : num 14.2 13.2 13.2 14.4 13.2 ... $ Malic : num 1.71 1.78 2.36 1.95 2.59 1.76 1.87 2.15 1.64 1.35 ... $ Ash : num 2.43 2.14 2.67 2.5 2.87 2.45 2.45 2.61 2.17 2.27 ... $ Alcalinity : num 15.6 11.2 18.6 16.8 21 15.2 14.6 17.6 14 16 ... \$ Magnesium : int 127 100 101 113 118 112 96 121 97 98 ... $ Phenols : num 2.8 2.65 2.8 3.85 2.8 3.27 2.5 2.6 2.8 2.98 ... $ Flavanoids : num 3.06 2.76 3.24 3.49 2.69 3.39 2.52 2.51 2.98 3.15 ... $ Nonflavanoids : num 0.28 0.26 0.3 0.24 0.39 0.34 0.3 0.31 0.29 0.22 ... $ Proanthocyanins: num 2.29 1.28 2.81 2.18 1.82 1.97 1.98 1.25 1.98 1.85 ... $ Color : num 5.64 4.38 5.68 7.8 4.32 6.75 5.25 5.05 5.2 7.22 ... $ Hue : num 1.04 1.05 1.03 0.86 1.04 1.05 1.02 1.06 1.08 1.01 ... $ Dilution : num 3.92 3.4 3.17 3.45 2.93 2.85 3.58 3.58 2.85 3.55 ... $ Proline : int 1065 1050 1185 1480 735 1450 1290 1295 1045 1045 ... |
Note that R provides a useful interactive file chooser through the
function file.choose. This will prompt for a file name,
and provides tab completion.
> ds <- read.csv(file.choose()) |