DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
When doing timings of commands it is important to know that garbage collection plays a role. R adjusts its garbage collection triggers accoring to your usage. When you first start using large objects the trigger levels will grow and generally things will speed up.
You can use gcinfo to start seeing the adjustments in
action:
> gcinfo(TRUE) [1] FALSE # The setting was previously FALSE |
For the system.time function use the gcFirst.
The gc function will cause a garbage collection to take
place, and lists useful information about memory usage (the primary
purpose for calling the gc function). Ncells is the
number of so called cons cells used (each cell is 28 or 56 bytes on 32
or 64 bit systems, and is used for storing fixed sized objects), and
this is converted in the function's to Mb for us. Vcells is the number
of vector cells used (each cell is 8 bytes, and is used for storing
variable sized objects). The final two columns show the maximum amount
of memory that has been used since the last call to
gc(reset=TRUE).
> gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 177949 4.8 407500 10.9 350000 9.4 Vcells 72431 0.6 786432 6.0 332253 2.6 > survey <- read.csv("survey.csv") > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 212685 5.7 741108 19.8 514436 13.8 Vcells 366127 2.8 1398372 10.7 1387692 10.6 > rm(survey) > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 179940 4.9 741108 19.8 514436 13.8 Vcells 72773 0.6 1118697 8.6 1387692 10.6 |
Copyright © 2004-2006 Graham.Williams@togaware.com Support further development through the purchase of the PDF version of the book.