|
DATA MINING
Desktop Survival Guide by Graham Williams |
|
|||
SQLite (from www.sqlite.org) is an open source database package that is well supported in . It has the advantage that it requires no setup or administration (other than installing the package) and is an embedded system so that there is less of a connection overhead. You are able to manage very large datasets in SQLite without needing to load all the data into memory, unlike R itself, so that you are able to manipulate the data using SQL then load in just the data you need.
For small dataset SQLite is a good choice, but for very large datasets, MySQL still performs very well.
There is also a project under way as part of the http://code.google.com/soc/gnu/appinfo.html?csaid=7211BC2F22A5A01EGoogle Summer of Code project, that aims to create a package that will store data frames and matrices into sqlite databases, initially called sqlite data frames (sdf). These sdf's will behave like ordinary data frames so that existing R functions will work. This will enable R users to work with very large datasets much more readily, with no user effort.
For now, SQLite allows the easy import and export of data to text files.
library(RSQLite)
con <- dbConnect(SQLite(), "foo3.db")
dbGetQuery(con, "pragma cache_size")
cache_size
1 2000
dbGetQuery(con, "pragma cache_size=2500")
NULL
dbGetQuery(con, "pragma cache_size")
cache_size
1 2500
|
As an example, first create an empty SQLite database (outside of R)
and import a CSV (comma separated value) file, tell sqlite to use
commas, not '|':
$ sqlite3 -separator , audit.db
sqlite> create table audit(ID INTEGER, Age INTEGER, Employment TEXT,
Education TEXT, Marital TEXT, Occupation TEXT,
Income REAL, Sex TEXT, Deductions REAL,
Hours INTEGER, Accounts TEXT,
Adjustment REAL, Adjusted INTEGER);
sqlite> .tables
audit
sqlite> .import audit.csv audit
sqlite> select count(*) from audit;
2001
sqlite> delete from audit where ID='"ID"';
sqlite> select count(*) from audit;
2000
sqlite> .quit
|
Now in R:
library((DBI)
library(RSQLite)
library(Hmisc)
survey <- read.csv("survey.csv")
driver<-dbDriver("SQLite")
connect<-dbConnect(driver, dbname = "survey.db")
dbWriteTable(connect, "surveytbl", survey, overwrite = T, row.names = F)
dbListTables(connect)
query01 <- dbSendQuery(connect, statement = "select * from surveytbl");
data01 <- fetch(query01, n = 10)
contents(data01)
sqliteCloseResult(query01)
sqliteCloseConnection(connect)
sqliteCloseDriver(driver)
|
Copyright © 2004-2006 Graham.Williams@togaware.com Support further development through the purchase of the PDF version of the book.