Mice in R - how can I understand what this command does?
The mice function approximates missing values. In you case you are using the "rf" statement, which means the random forest imputations algorithm is used. Since I can't reproduce your dataset, I'm using airquality
which is a built in dataset by R with NA
values. Those can be approximated. You are creating kinda a prediction model with mice
. Actually it is a mids
object, which is used by mice for imputed datasets (documentation). If you want to use those imputations, you can call complete
for creating the filled dataframe.
library(mice)
df<-airquality
mice_mod <- mice(df, method='rf')
mice_output <- complete(mice_mod)
When you compare df
and mice_output
, you'll see the NA
values in Ozone
and Solar
got replaced.
In your example your lecturer is using all names which are not in the called list of names. So he is filtering the dataframe beforehand.
If you want more information about the algorithm: regarding to the documentation it is described in
Doove, L.L., van Buuren, S., Dusseldorp, E. (2014), Recursive
partitioning for missing data imputation in the presence of
interaction Effects. Computational Statistics \& Data Analysis, 72,
92-104.
R, mitools::MIcombine, what is the reason for no p-values?
After looking through the documentation, it doesn't seem like there is a particular reason that the mitools
library doesn't provide p-values. Although, the package's focus is on imputation, not model results.
However, you don't need either of these packages to see your results–along with the per model p-values. I started writing this as a comment but decided to include the code. If you weren't aware...you can use base R's summary
. I realize that the output of mice
is comparative, as is mitools
. I thought it was important enough to mention this, as well.
If the output of your call is model
, then this will work.
library(tidyverse)
map(1:length(model), ~summary(model[.x]))
R Imputation with Ordered Categorical
Your "categorical" variable appears to be in character format. You may want to coerce them into factors before imputing. Otherwise mice()
will ignore the variable. Do:
DATA[sapply(DATA, is.character)] <- lapply(DATA[sapply(DATA, is.character)], as.factor)
str(DATA)
# 'data.frame': 1000 obs. of 4 variables:
# $ x1: Factor w/ 5 levels "a","b","c","d",..: 2 2 NA NA 3 3 4 NA NA 4 ...
# $ x2: num 0.932 0.87 0.886 0.925 0.984 ...
# $ x3: num 0.292 0.734 0.764 0.943 0.806 ...
# $ x4: Factor w/ 4 levels "t","u","v","w": 1 3 1 3 4 3 1 4 3 2 ...
head(DATA)
# x1 x2 x3 x4
# 1 b 0.9315629 0.2916144 t
# 2 b 0.8695138 0.7338165 v
# 3 <NA> 0.8863894 0.7642693 t
# 4 <NA> 0.9248280 0.9427943 v
# 5 c 0.9844646 0.8062173 w
# 6 c 0.6200558 0.7354498 v
Also, it might be a better idea to use a proportional odds model ("polr"
) for ordered categorical data instead of partial mean matching ("pmm"
).
library(mice)
IMP <- mice(DATA, m=5, maxit=50, meth=c("polr", "", "", ""), seed=500)
DATAIMPUTE <- complete(IMP)
head(DATAIMPUTE)
# x1 x2 x3 x4
# 1 b 0.9315629 0.2916144 t
# 2 b 0.8695138 0.7338165 v
# 3 a 0.8863894 0.7642693 t
# 4 a 0.9248280 0.9427943 v
# 5 c 0.9844646 0.8062173 w
# 6 c 0.6200558 0.7354498 v
Important note: You seem to missunderstand the method if you think the complete()
function gives you a valid imputed dataset (it just action=1
as default and returns just the first completed data set—no multiple imputation at all!). You probably should consult a statistician and read the documentation more thoroughly. There's a nice answer around, that briefly summarizes the most important point.
Data:
set.seed(74)
DATA=data.frame(x1=c(sample(c(letters[1:5], NA), 1000, r=T)),
x2=runif(1000),
x3=runif(1000),
x4=sample(letters[20:23], 1000, r=T))
Related Topics
Missing Data When Supplying a Dual-Axis--Multiple-Traces to Subplot
How to Change the Default Directory in Rstudio (Or R)
Error in Na.Fail.Default: Missing Values in Object - But No Missing Values
Is There a Limit for the Possible Number of Nested Ifelse Statements
Getsymbols and Using Lapply, Cl, and Merge to Extract Close Prices
Rselenium, Chrome, How to Set Download Directory, File Download Error
Stacked Bar Chart, Reorder by Total (Sum Up of Values) Instead of Value Ggplot2 + Dplyr
Error in If/While (Condition):Argument Is Not Interpretable as Logical
Let Ggplot2 Histogram Show Classwise Percentages on Y Axis
Store Arrangegrob to Object, Does Not Create Printable Object
Count Common Words in Two Strings
Convert Table into Matrix by Column Names
Colors Lost in Legend When Using Scale_Shape_Manual
Extract Hyperlink from Excel File in R
Group Vector on Conditional Sum
Scatterplot3D: Regression Plane with Residuals