DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Imputation is the process of filling in the gaps (or missing values) in data. Often data will contain missing values, and this can cause a problem for some modelling algorithms. For example, the random forest option silently removes any entity with any missing value! For datasets with a very large number of variables, and a reasonable number of missing values, this may well result in a small, unrepresentative dataset, or even no dataset at all!
XXXX
When Rattle performs an imputation it will store the results in a variable of the dataset which has the same name as the variable that is imputed, but prefixed with IMP_. Such variables, whether they are imputed by Rattle or already existed in the dataset loaded into Rattle (e.g., a dataset from SAS), will be treated as input variables, and the original variable marked to be ignored.