DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
> letters # a b c [...] z > letters[10] # "j" > letters[10:15] # "j" "k" "l" "m" "n" "o" > letters[c(1, 2, 4, 8, 16)] # "a" "b" "d" "h" "p" > letters[-(10:26)] # "a" "b" "c" "d" "e" "f" "g" "h" "i" |
An operator (or function) can be applied to a vector to return a
vector. This is particularly useful for boolean operators, returning a
vector of boolean values which can then be used to select specific
elements of a vector:
> letters > "j" # FALSE FALSE FALSE [...] TRUE > letters[letters > "j"] # "k" "l" "m" "n" [...] "y" "z" > letters[letters > "w" | letters < "e"] # "a" "b" "c" "d" "x" "y" "z" |
Here's a useful trick to ensure we don't divide by zero, which would
otherwise give an infinite answer (Inf):
> x <- c(0.28, 0.55, 0, 2) > y <- c(0.53, 1.34, 1.2, 2.07) > sum(((x-y)^2/x)) [1] Inf > sum(((x-y)^2/x)[x!=0]) # Exclude the zeros [1] 1.360392 |
We could also generate random subsets of our data.
> subdataset <- dataset[sample(seq(1, nrow(dataset)), 1000),] |
We can select elements meeting set inclusion conditions. Here we
first select a subset of rows from a data frame having particular
colours.
> ds[ds$colour %in% c("green", "blue"),] > ds[ds$colour %in% names(which(table(ds$colour) > 11)),] |