DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Missing data can affect modelling, particularly if the data is not randomly missing, but missing because of some underlying systematic reason (e.g., censoring). If data is missing at random then it is more likely that the missing values will have little affect on the modelling.
An excellent reference on dealing with missing data is schafer97:incomplete_data.
Missing values are specially recorded in R as NA. Various functions can be used to check for a missing value (is.na), to remove any entities with missing values (na.omit and to identify those entities that are complete (complete.cases. The apply function also comes in handy here.
> ds <- ds[!apply(is.na(ds),1,all),] # Remove all rows of all NA's. > ds <- na.omit(ds) # Remove all rows that have any NA's. > ds <- ds[complete.cases(ds),] # Remove all rows that have any NA's. |