DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
All R objects can be saved using the save function and then restored at a later time using the load function. The data will be saved into a .RData file. To illustrate this we make use of a standard dataset called iris.
We create a random sample of 20 entities from the dataset. This is
done by randomly sampling 20 numbers between 1 and the number of rows
(nrow) in the iris dataset, using the
sample function. The list of numbers generated by
sample is then used to index the iris
dataset, to select the sample of rows, by supplying this list of rows
as the first argument in the square brackets. The second argument in
the square brackets is left blank, indicating that all columns are
required in our new dataset. We then save the dataset to file using
the save function which compresses the data for storage:
> rows <- sample(1:nrow(iris), 20) > myiris <- iris[rows,] > dim(myiris) [1] 20 5 > save(myiris, file="myiris.RData", compress=TRUE) |
> load("myiris.RData") > dim(myiris) [1] 20 5 |
You can save any objects in an R binary file. For example, suppose
you have built a model and want to save it for later exploration:
> library(rpart) > iris.rp <- rpart(Species ~ ., data=iris) > save(iris.rp, file="irisrp.RData", compress=TRUE) |
At a later stage, perhaps on a fresh start of R, you can load the
model:
> load("irisrp.RData") > iris.rp n= 150 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 150 100 setosa (0.33333333 0.33333333 0.33333333) 2) Petal.Length< 2.45 50 0 setosa (1.00000000 0.00000000 0.00000000) * 3) Petal.Length>=2.45 100 50 versicolor (0.00000000 0.50000000 0.50000000) 6) Petal.Width< 1.75 54 5 versicolor (0.00000000 0.90740741 0.09259259) * 7) Petal.Width>=1.75 46 1 virginica (0.00000000 0.02173913 0.97826087) * |
To identify what is saved into an RData file you can
attach the file and then get a listing of its contents:
attach("irisrp.RData") ls(2) ... detach(2) |