Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


Matricies

A dataset is usually more copmlex than a simple vector. Indeed, often we have several vectors making up the dataset, and refer to this as a matrix. A matrix is a data structure containing items all of the same data type. We construct a matrix with the matrix and c functions. Rows and columns of a matrix can have names, and the functions colnames and rownames will list the current names. However, you can also assign a new list of names to these functions!

> ds <- matrix(c(52, 37, 59, 42, 36, 46, 38, 21, 18, 32, 10, 67), 
               nrow=3, byrow=T)
> colnames(ds) <- c("Low", "Medium", "High","VHigh")
> rownames(ds) <- c("Married","Prev.Married","Single")
> ds
             Low Medium High VHigh
Married       52     37   59    42
Prev.Married  36     46   38    21
Single        18     32   10    67

Of course, manually creating datasets in this way is only useful for small data collections. A slightly easier approach is to manually modify and add to the dataset using a simple spreadsheet-like interface through the edit function or through the fix function which will also assign the results of the edit back to the variable being edited. Note that normally the edit function returns , and thus prints to the screen if it is not assigned, the datasets. To avoid the dataset being printed to the screen, when you do not assign edit to a variable because all you wanted to do was browse the dataset, use the invisible function.

> ds <- edit(ds)
> fix(ds)
> invisible(edit(ds))

The cbind function combines each of its arguments, column-wise (the c in the name is for column), into a single data structure:

> age <- c(35, 23, 56, 18)
> gender <- c("m", "m", "f", "f")
> people <- cbind(age, gender)
> people
     age  gender
[1,] "35" "m"
[2,] "23" "m"
[3,] "56" "f"
[4,] "18" "f"

Because the resulting matrix must have elements all of the same data type, we see that the variable age has been transformed into the character data type (since gender could not be so convincingly converted to numeric).

The rbind function similarly combines its argument, but in a row-wise manner. The result will be the same as if we transpose the matrix with the t function:



> t(people)
       [,1] [,2] [,3] [,4]
age    "35" "23" "56" "18"
gender "m"  "m"  "f"  "f"
> people <- rbind(age, gender)
> people
       [,1] [,2] [,3] [,4]
age    "35" "23" "56" "18"
gender "m"  "m"  "f"  "f"

Copyright © 2004-2006 Graham.Williams@togaware.com
Support further development through the purchase of the PDF version of the book.
Brought to you by Togaware.