How to Get Coefficients and Their Confidence Intervals in Mixed Effects Models

How to get coefficients and their confidence intervals in mixed effects models?

I'm going to add a bit here. If m is a fitted (g)lmer model (most of these work for lme too):

  • fixef(m) is the canonical way to extract coefficients from mixed models (this convention began with nlme and has carried over to lme4)
  • you can get the full coefficient table with coef(summary(m)); if you have loaded lmerTest before fitting the model, or convert the model after fitting (and then loading lmerTest) via coef(summary(as(m,"merModLmerTest"))), then the coefficient table will include p-values. (The coefficient table is a matrix; you can extract the columns via e.g. ctab[,"Estimate"], ctab[,"Pr(>|t|)"], or convert the matrix to a data frame and use $-indexing.)
  • As stated above you can get likelihood profile confidence intervals via confint(m); these may be computationally intensive. If you use confint(m, method="Wald") you'll get the standard +/- 1.96SE confidence intervals. (lme uses intervals(m) instead of confint().)

If you prefer to use broom.mixed:

  • tidy(m,effects="fixed") gives you a table with estimates, standard errors, etc.
  • tidy(as(m,"merModLmerTest"), effects="fixed") (or fitting with lmerTest in the first place) includes p-values
  • adding conf.int=TRUE gives (Wald) CIs
  • adding conf.method="profile" (along with conf.int=TRUE) gives likelihood profile CIs

You can also get confidence intervals by parametric bootstrap (method="boot"), which is considerably slower but more accurate in some circumstances.

Linear mixed model confidence intervals question

I'm pretty sure this has to do with the dreaded "denominator degrees of freedom" question, i.e. what kind (if any) of finite-sample correction is being employed. tl;dr emmeans is using a Kenward-Roger correction, which is more or less the most accurate available option — the only reason not to use K-R is if you have a large data set for which it becomes unbearably slow.

load packages, simulate data, fit model

library(lmerTest)
library(emmeans)
library(effects)
dd <- expand.grid(f=factor(letters[1:3]),g=factor(1:20),rep=1:10)
set.seed(101)
dd$y <- simulate(~f+(1|g), newdata=dd, newparams=list(beta=rep(1,3),theta=1,sigma=1))[[1]]
m <- lmer(y~f+(1|g), data=dd)

compare default emmeans with effects

emmeans(m, ~f)
## f emmean SE df lower.CL upper.CL
## a 0.848 0.212 21.9 0.409 1.29
## b 1.853 0.212 21.9 1.414 2.29
## c 1.863 0.212 21.9 1.424 2.30

## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95

as.data.frame(effect("f",m))
## f fit se lower upper
## 1 a 0.8480161 0.2117093 0.4322306 1.263802
## 2 b 1.8531805 0.2117093 1.4373950 2.268966
## 3 c 1.8632228 0.2117093 1.4474373 2.279008

effects doesn't explicitly tell us what/whether it's using a finite-sample correction: we could dig around in the documentation or the code to try to find out. Alternatively, we can tell emmeans not to use finite-sample correction:

emmeans(m, ~f, lmer.df="asymptotic")
## f emmean SE df asymp.LCL asymp.UCL
## a 0.848 0.212 Inf 0.433 1.26
## b 1.853 0.212 Inf 1.438 2.27
## c 1.863 0.212 Inf 1.448 2.28

## Degrees-of-freedom method: asymptotic
## Confidence level used: 0.95

Testing shows that these are equivalent to about a tolerance of 0.001 (probably close enough). In principle we should be able to specify KR=TRUE to get effects to use Kenward-Roger correction, but I haven't been able to get that to work yet.

However, I will also say that there's something a little bit funky about your example. If we compute the distance between the mean and the lower CI in units of standard error, for emmeans we get (76.4-23.9)/8.96 = 5.86, which implies a very small effect degrees of freedom (e.g. about 1.55). That seems questionable to me unless your data set is extremely small ...


From your updated post, it appears that Kenward-Roger is indeed estimating only 1.5 denominator df.

In general it is dicey/not recommended to try fitting random effects where the grouping variable has a small number of levels (although see here for a counterargument). I would try treating LETO (which has only two levels) as a fixed effect, i.e.

CA ~ SISTEM + LETO + (1 | LETO:ponovitev) + (1 | LETO:SISTEM)

and see if that helps. (I would expect you would then get on the order of 7 df, which would make your CIs ± 2.4 SE instead of ± 6 SE ...)

Obtaining glmer coefficient confidence intervals via bootstrapping

If you want to compute parametric bootstrap confidence intervals, the built-in functionality

confint(fitted_model, method = "boot")

should work (see ?confint.merMod)

Also see this answer (which illustrates both parametric and nonparametric bootstrapping for user-defined quantities).

If you have multiple cores, you can speed this up by adding parallel = "multicore", ncpus = parallel::detectCores()-1 (or some other appropriate number of cores to use): see ?lme4::bootMer for details.

How to get lm() coefficients and confidence intervals by group?

I saw in another question that these types of transformations to craete new variables can be done using do({}) from dplyr. The following seems to do the trick:

df <- data.frame(
country = rep(c("W", "Q"), 500),
v1 = rnorm(1000, 10, 2),
v2 = rnorm(1000, 8, 1))

df$a1 <- rnorm(1000, 2, 0.1) + df$v1*rnorm(1000, 2, 0.1)-df$v2
df$a2 <- df$v1*rnorm(1000, 1, 0.2)+df$v2*rnorm(1000, 2, 0.2)


betas <- function(dw, atts, socs){
for (i in 1:length(atts)) {
for (j in 1:length(socs)) {
dw1 <- dw %>% group_by(country) %>% do(
{
name.b <- paste0(atts[[i]], ".", socs[[j]], ".b")
name.l <- paste0(atts[[i]], ".", socs[[j]], ".l")
name.u <- paste0(atts[[i]], ".", socs[[j]], ".u")
mod <- lm(paste0(atts[[i]], "~", socs[[j]]), .)
assign(name.b, summary(mod)$coefficients[socs[[j]], 1])
cis <- confint(mod, socs[[j]], level=0.95)
assign(name.l, cis[socs[j], 1])
assign(name.u, cis[socs[j], 2])
dat <- data.frame(., get(name.b), get(name.l), get(name.u))
colnames(dat)[(length(names(dat))-2):length(names(dat))] <- c(name.b, name.l, name.u)
dat
})
}
}
dw1
}


df1 <- betas(df, c("a1", "a2"), c("v1", "v2"))

hist(df1$a2.v2.b)

Confidence intervals for hurdle or zero inflated mixed models

tl;dr as far as I can tell this is a bug in confint.glmmTMB (and probably in the internal function glmmTMB:::getParms). In the meantime, broom.mixed::tidy(m1, effects="fixed") should do what you want. (There's now a fix in progress in the development version on GitHub, should make it to CRAN sometime? soon ...)

Reproducible example:

set up data

set.seed(101)
n <- 1e3
bd <- data.frame(
year=factor(sample(2002:2018, size=n, replace=TRUE)),
class=factor(sample(1:20, size=n, replace=TRUE)),
x1 = rnorm(n),
x2 = rnorm(n),
x3 = factor(sample(c("low","reg","high"), size=n, replace=TRUE),
levels=c("low","reg","high")),
count = rnbinom(n, mu = 3, size=1))

fit

library(glmmTMB)
m1 <- glmmTMB(count~x1+x2+x3+(1|year/class),
data = bd, zi = ~x2+x3+(1|year/class), family = truncated_nbinom2,
)

confidence intervals

confint(m1, "beta_")  ## wrong/ incomplete
broom.mixed::tidy(m1, effects="fixed", conf.int=TRUE) ## correct

You may want to think about which kind of confidence intervals you want:

  • Wald CIs (default) are much faster to compute and are generally OK as long as (1) your data set is large and (2) you aren't estimating any parameters on the log/logit scale that are near the boundaries
  • likelihood profile CIs are more accurate but much slower


Related Topics



Leave a reply



Submit