What Exactly Does Complete in Mice Do

Mice in R - how can I understand what this command does?

The mice function approximates missing values. In you case you are using the "rf" statement, which means the random forest imputations algorithm is used. Since I can't reproduce your dataset, I'm using airquality which is a built in dataset by R with NA values. Those can be approximated. You are creating kinda a prediction model with mice. Actually it is a mids object, which is used by mice for imputed datasets (documentation). If you want to use those imputations, you can call complete for creating the filled dataframe.

library(mice)
df<-airquality
mice_mod <- mice(df, method='rf')
mice_output <- complete(mice_mod)

When you compare df and mice_output, you'll see the NA values in Ozone and Solar got replaced.

In your example your lecturer is using all names which are not in the called list of names. So he is filtering the dataframe beforehand.


If you want more information about the algorithm: regarding to the documentation it is described in

Doove, L.L., van Buuren, S., Dusseldorp, E. (2014), Recursive
partitioning for missing data imputation in the presence of
interaction Effects. Computational Statistics \& Data Analysis, 72,
92-104.

R, mitools::MIcombine, what is the reason for no p-values?

After looking through the documentation, it doesn't seem like there is a particular reason that the mitools library doesn't provide p-values. Although, the package's focus is on imputation, not model results.

However, you don't need either of these packages to see your results–along with the per model p-values. I started writing this as a comment but decided to include the code. If you weren't aware...you can use base R's summary. I realize that the output of mice is comparative, as is mitools. I thought it was important enough to mention this, as well.

If the output of your call is model, then this will work.

library(tidyverse)
map(1:length(model), ~summary(model[.x]))

R Imputation with Ordered Categorical

Your "categorical" variable appears to be in character format. You may want to coerce them into factors before imputing. Otherwise mice() will ignore the variable. Do:

DATA[sapply(DATA, is.character)] <- lapply(DATA[sapply(DATA, is.character)], as.factor)

str(DATA)
# 'data.frame': 1000 obs. of 4 variables:
# $ x1: Factor w/ 5 levels "a","b","c","d",..: 2 2 NA NA 3 3 4 NA NA 4 ...
# $ x2: num 0.932 0.87 0.886 0.925 0.984 ...
# $ x3: num 0.292 0.734 0.764 0.943 0.806 ...
# $ x4: Factor w/ 4 levels "t","u","v","w": 1 3 1 3 4 3 1 4 3 2 ...
head(DATA)
# x1 x2 x3 x4
# 1 b 0.9315629 0.2916144 t
# 2 b 0.8695138 0.7338165 v
# 3 <NA> 0.8863894 0.7642693 t
# 4 <NA> 0.9248280 0.9427943 v
# 5 c 0.9844646 0.8062173 w
# 6 c 0.6200558 0.7354498 v

Also, it might be a better idea to use a proportional odds model ("polr") for ordered categorical data instead of partial mean matching ("pmm").

library(mice)
IMP <- mice(DATA, m=5, maxit=50, meth=c("polr", "", "", ""), seed=500)
DATAIMPUTE <- complete(IMP)
head(DATAIMPUTE)
# x1 x2 x3 x4
# 1 b 0.9315629 0.2916144 t
# 2 b 0.8695138 0.7338165 v
# 3 a 0.8863894 0.7642693 t
# 4 a 0.9248280 0.9427943 v
# 5 c 0.9844646 0.8062173 w
# 6 c 0.6200558 0.7354498 v

Important note: You seem to missunderstand the method if you think the complete() function gives you a valid imputed dataset (it just action=1 as default and returns just the first completed data set—no multiple imputation at all!). You probably should consult a statistician and read the documentation more thoroughly. There's a nice answer around, that briefly summarizes the most important point.


Data:

set.seed(74)
DATA=data.frame(x1=c(sample(c(letters[1:5], NA), 1000, r=T)),
x2=runif(1000),
x3=runif(1000),
x4=sample(letters[20:23], 1000, r=T))


Related Topics



Leave a reply



Submit