How to get coefficients and their confidence intervals in mixed effects models?
I'm going to add a bit here. If m
is a fitted (g)lmer
model (most of these work for lme
too):
fixef(m)
is the canonical way to extract coefficients from mixed models (this convention began withnlme
and has carried over tolme4
)- you can get the full coefficient table with
coef(summary(m))
; if you have loadedlmerTest
before fitting the model, or convert the model after fitting (and then loadinglmerTest
) viacoef(summary(as(m,"merModLmerTest")))
, then the coefficient table will include p-values. (The coefficient table is a matrix; you can extract the columns via e.g.ctab[,"Estimate"]
,ctab[,"Pr(>|t|)"]
, or convert the matrix to a data frame and use$
-indexing.) - As stated above you can get likelihood profile confidence intervals via
confint(m)
; these may be computationally intensive. If you useconfint(m, method="Wald")
you'll get the standard +/- 1.96SE confidence intervals. (lme
usesintervals(m)
instead ofconfint()
.)
If you prefer to use broom.mixed
:
tidy(m,effects="fixed")
gives you a table with estimates, standard errors, etc.tidy(as(m,"merModLmerTest"), effects="fixed")
(or fitting withlmerTest
in the first place) includes p-values- adding
conf.int=TRUE
gives (Wald) CIs - adding
conf.method="profile"
(along withconf.int=TRUE
) gives likelihood profile CIs
You can also get confidence intervals by parametric bootstrap (method="boot"
), which is considerably slower but more accurate in some circumstances.
Linear mixed model confidence intervals question
I'm pretty sure this has to do with the dreaded "denominator degrees of freedom" question, i.e. what kind (if any) of finite-sample correction is being employed. tl;dr emmeans
is using a Kenward-Roger correction, which is more or less the most accurate available option — the only reason not to use K-R is if you have a large data set for which it becomes unbearably slow.
load packages, simulate data, fit model
library(lmerTest)
library(emmeans)
library(effects)
dd <- expand.grid(f=factor(letters[1:3]),g=factor(1:20),rep=1:10)
set.seed(101)
dd$y <- simulate(~f+(1|g), newdata=dd, newparams=list(beta=rep(1,3),theta=1,sigma=1))[[1]]
m <- lmer(y~f+(1|g), data=dd)
compare default emmeans with effects
emmeans(m, ~f)
## f emmean SE df lower.CL upper.CL
## a 0.848 0.212 21.9 0.409 1.29
## b 1.853 0.212 21.9 1.414 2.29
## c 1.863 0.212 21.9 1.424 2.30
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
as.data.frame(effect("f",m))
## f fit se lower upper
## 1 a 0.8480161 0.2117093 0.4322306 1.263802
## 2 b 1.8531805 0.2117093 1.4373950 2.268966
## 3 c 1.8632228 0.2117093 1.4474373 2.279008
effects
doesn't explicitly tell us what/whether it's using a finite-sample correction: we could dig around in the documentation or the code to try to find out. Alternatively, we can tell emmeans
not to use finite-sample correction:
emmeans(m, ~f, lmer.df="asymptotic")
## f emmean SE df asymp.LCL asymp.UCL
## a 0.848 0.212 Inf 0.433 1.26
## b 1.853 0.212 Inf 1.438 2.27
## c 1.863 0.212 Inf 1.448 2.28
## Degrees-of-freedom method: asymptotic
## Confidence level used: 0.95
Testing shows that these are equivalent to about a tolerance of 0.001 (probably close enough). In principle we should be able to specify KR=TRUE
to get effects
to use Kenward-Roger correction, but I haven't been able to get that to work yet.
However, I will also say that there's something a little bit funky about your example. If we compute the distance between the mean and the lower CI in units of standard error, for emmeans
we get (76.4-23.9)/8.96 = 5.86
, which implies a very small effect degrees of freedom (e.g. about 1.55). That seems questionable to me unless your data set is extremely small ...
From your updated post, it appears that Kenward-Roger is indeed estimating only 1.5 denominator df.
In general it is dicey/not recommended to try fitting random effects where the grouping variable has a small number of levels (although see here for a counterargument). I would try treating LETO
(which has only two levels) as a fixed effect, i.e.
CA ~ SISTEM + LETO + (1 | LETO:ponovitev) + (1 | LETO:SISTEM)
and see if that helps. (I would expect you would then get on the order of 7 df, which would make your CIs ± 2.4 SE instead of ± 6 SE ...)
Obtaining glmer coefficient confidence intervals via bootstrapping
If you want to compute parametric bootstrap confidence intervals, the built-in functionality
confint(fitted_model, method = "boot")
should work (see ?confint.merMod
)
Also see this answer (which illustrates both parametric and nonparametric bootstrapping for user-defined quantities).
If you have multiple cores, you can speed this up by adding parallel = "multicore", ncpus = parallel::detectCores()-1
(or some other appropriate number of cores to use): see ?lme4::bootMer
for details.
How to get lm() coefficients and confidence intervals by group?
I saw in another question that these types of transformations to craete new variables can be done using do({})
from dplyr
. The following seems to do the trick:
df <- data.frame(
country = rep(c("W", "Q"), 500),
v1 = rnorm(1000, 10, 2),
v2 = rnorm(1000, 8, 1))
df$a1 <- rnorm(1000, 2, 0.1) + df$v1*rnorm(1000, 2, 0.1)-df$v2
df$a2 <- df$v1*rnorm(1000, 1, 0.2)+df$v2*rnorm(1000, 2, 0.2)
betas <- function(dw, atts, socs){
for (i in 1:length(atts)) {
for (j in 1:length(socs)) {
dw1 <- dw %>% group_by(country) %>% do(
{
name.b <- paste0(atts[[i]], ".", socs[[j]], ".b")
name.l <- paste0(atts[[i]], ".", socs[[j]], ".l")
name.u <- paste0(atts[[i]], ".", socs[[j]], ".u")
mod <- lm(paste0(atts[[i]], "~", socs[[j]]), .)
assign(name.b, summary(mod)$coefficients[socs[[j]], 1])
cis <- confint(mod, socs[[j]], level=0.95)
assign(name.l, cis[socs[j], 1])
assign(name.u, cis[socs[j], 2])
dat <- data.frame(., get(name.b), get(name.l), get(name.u))
colnames(dat)[(length(names(dat))-2):length(names(dat))] <- c(name.b, name.l, name.u)
dat
})
}
}
dw1
}
df1 <- betas(df, c("a1", "a2"), c("v1", "v2"))
hist(df1$a2.v2.b)
Confidence intervals for hurdle or zero inflated mixed models
tl;dr as far as I can tell this is a bug in confint.glmmTMB
(and probably in the internal function glmmTMB:::getParms
). In the meantime, broom.mixed::tidy(m1, effects="fixed")
should do what you want. (There's now a fix in progress in the development version on GitHub, should make it to CRAN sometime? soon ...)
Reproducible example:
set up data
set.seed(101)
n <- 1e3
bd <- data.frame(
year=factor(sample(2002:2018, size=n, replace=TRUE)),
class=factor(sample(1:20, size=n, replace=TRUE)),
x1 = rnorm(n),
x2 = rnorm(n),
x3 = factor(sample(c("low","reg","high"), size=n, replace=TRUE),
levels=c("low","reg","high")),
count = rnbinom(n, mu = 3, size=1))
fit
library(glmmTMB)
m1 <- glmmTMB(count~x1+x2+x3+(1|year/class),
data = bd, zi = ~x2+x3+(1|year/class), family = truncated_nbinom2,
)
confidence intervals
confint(m1, "beta_") ## wrong/ incomplete
broom.mixed::tidy(m1, effects="fixed", conf.int=TRUE) ## correct
You may want to think about which kind of confidence intervals you want:
- Wald CIs (default) are much faster to compute and are generally OK as long as (1) your data set is large and (2) you aren't estimating any parameters on the log/logit scale that are near the boundaries
- likelihood profile CIs are more accurate but much slower
Related Topics
How to Count True Values in a Logical Vector
Add Objects to Package Namespace
Joining Aggregated Values Back to the Original Data Frame
Shiny Slider on Logarithmic Scale
Change Background and Text of Strips Associated to Multiple Panels in R/Lattice
Convert a Dataframe to Presence Absence Matrix
Text Clustering with Levenshtein Distances
How to Edit and Debug R Library Sources
Rgdal Installation Failed on Ubuntu 16.04
Different Colours of Geom_Line Above and Below a Specific Value
How to Arrange an Arbitrary Number of Ggplots Using Grid.Arrange
How to Extract the Row with Min or Max Values
Filter Function in Dplyr Errors: Object 'Name' Not Found
Removing the Border of Legend Symbol
How to Fix Corrupted Dates in R