DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
SQLite (from www.sqlite.org) is an open source database package that is well supported in . It has the advantage that it requires no setup or administration (other than installing the package) and is an embedded system so that there is less of a connection overhead. You are able to manage very large datasets in SQLite without needing to load all the data into memory, unlike R itself, so that you are able to manipulate the data using SQL then load in just the data you need.
For small dataset SQLite is a good choice, but for very large datasets, MySQL still performs very well.
There is also a project under way as part of the http://code.google.com/soc/gnu/appinfo.html?csaid=7211BC2F22A5A01EGoogle Summer of Code project, that aims to create a package that will store data frames and matrices into sqlite databases, initially called sqlite data frames (sdf). These sdf's will behave like ordinary data frames so that existing R functions will work. This will enable R users to work with very large datasets much more readily, with no user effort.
For now, SQLite allows the easy import and export of data to text files.
library(RSQLite) con <- dbConnect(SQLite(), "foo3.db") dbGetQuery(con, "pragma cache_size") cache_size 1 2000 dbGetQuery(con, "pragma cache_size=2500") NULL dbGetQuery(con, "pragma cache_size") cache_size 1 2500 |
As an example, first create an empty SQLite database (outside of R)
and import a CSV (comma separated value) file, tell sqlite to use
commas, not '|':
$ sqlite3 -separator , audit.db sqlite> create table audit(ID INTEGER, Age INTEGER, Employment TEXT, Education TEXT, Marital TEXT, Occupation TEXT, Income REAL, Sex TEXT, Deductions REAL, Hours INTEGER, Accounts TEXT, Adjustment REAL, Adjusted INTEGER); sqlite> .tables audit sqlite> .import audit.csv audit sqlite> select count(*) from audit; 2001 sqlite> delete from audit where ID='"ID"'; sqlite> select count(*) from audit; 2000 sqlite> .quit |
Now in R:
library((DBI) library(RSQLite) library(Hmisc) survey <- read.csv("survey.csv") driver<-dbDriver("SQLite") connect<-dbConnect(driver, dbname = "survey.db") dbWriteTable(connect, "surveytbl", survey, overwrite = T, row.names = F) dbListTables(connect) query01 <- dbSendQuery(connect, statement = "select * from surveytbl"); data01 <- fetch(query01, n = 10) contents(data01) sqliteCloseResult(query01) sqliteCloseConnection(connect) sqliteCloseDriver(driver) |
Copyright © 2004-2006 Graham.Williams@togaware.com Support further development through the purchase of the PDF version of the book.