Extracting coefficients and their standard error for each unit in an lme model fit
The first comment is that this is actually a non-trivial theoretical question: there is a rather long thread on r-sig-mixed-models that goes into some of the technical details; you should definitely have a look, even though it gets a bit scary. The basic issue is that the estimated coefficient values for each group are the sum of the fixed-effect parameter and the BLUP/conditional mode for that group, which are different classes of objects (one is a parameter, one is a conditional mean of a random variable), which creates some technical difficulties.
The second point is that (unfortunately) I don't know of an easy way to do this in lme
, so my answer uses lmer
(from the lme4
package).
If you are comfortable doing the easiest thing and ignoring the (possibly ill-defined) covariance between the fixed-effect parameters and the BLUPs, you can use the code below.
Two alternatives would be (1) to fit your model with a Bayesian hierarchical approach (e.g. the MCMCglmm
package) and compute the standard deviations of the posterior predictions for each level (2) use parametric bootstrapping to compute the BLUPs/conditional modes, then take the standard deviations of the bootstrap distributions.
Please remember that as usual this advice comes with no warranty.
library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
cc <- coef(fm1)$Subject
## variances of fixed effects
fixed.vars <- diag(vcov(fm1))
## extract variances of conditional modes
r1 <- ranef(fm1,condVar=TRUE)
cmode.vars <- t(apply(cv <- attr(r1[[1]],"postVar"),3,diag))
seVals <- sqrt(sweep(cmode.vars,2,fixed.vars,"+"))
res <- cbind(cc,seVals)
res2 <- setNames(res[,c(1,3,2,4)],
c("int","int_se","slope","slope_se"))
## int int_se slope slope_se
## 308 253.6637 13.86649 19.666258 2.7752
## 309 211.0065 13.86649 1.847583 2.7752
## 310 212.4449 13.86649 5.018406 2.7752
## 330 275.0956 13.86649 5.652955 2.7752
## 331 273.6653 13.86649 7.397391 2.7752
## 332 260.4446 13.86649 10.195115 2.7752
Covariance (or correlation) matrix of coefficients in lme
If you want the variance-covariance matrix of the fixed effects, use
vcov(fitted.model)
If you want the correlation matrix, use
cov2cor(vcov(fitted.model))
You can also use
summary(fitted.model)$corFixed
(use str(summary(fitted.model))
to find the bits you need), but the accessors above are better because they don't make use of the (not necessarily stable) internal structure of the results.
Why is it inadvisable to get statistical summary information for regression coefficients from glmnet model?
"It is a very natural question to ask for standard errors of regression
coefficients or other estimated quantities. In principle such standard
errors can easily be calculated, e.g. using the bootstrap.Still, this package deliberately does not provide them. The reason for
this is that standard errors are not very meaningful for strongly
biased estimates such as arise from penalized estimation methods.
Penalized estimation is a procedure that reduces the variance of
estimators by introducing substantial bias. The bias of each estimator
is therefore a major component of its mean squared error, whereas its
variance may contribute only a small part.Unfortunately, in most applications of penalized regression it is
impossible to obtain a sufficiently precise estimate of the bias. Any
bootstrap-based calculations can only give an assessment of the
variance of the estimates. Reliable estimates of the bias are only
available if reliable unbiased estimates are available, which is
typically not the case in situations in which penalized estimates are
used.Reporting a standard error of a penalized estimate therefore tells
only part of the story. It can give a mistaken impression of great
precision, completely ignoring the inaccuracy caused by the bias. It
is certainly a mistake to make confidence statements that are only
based on an assessment of the variance of the estimates, such as
bootstrap-based confidence intervals do."
Jelle Goeman, Ph.D. Leiden University, Author of the Penalized package in R.
How to fit a model I built to another data set and get residuals?
The reason you are getting new coefficients in your second attempt with data=B
is that the function lme
returns a model fitted to your data set using the formula you provide, and stores that model in the variable model
as you have selected.
To get more information about a model you can type summary(model_name)
. the nlme
library includes a method called predict.lme
which allows you to make predictions based on a fitted model. You can type predict(my_model)
to get the predictions using the original data set, or type predict(my_model, some_other_data)
as mentioned above to generate predictions using that model but with a different data set.
In your case to get the residuals you just need to subtract the predicted values from observed values. So use predict(my_model,some_other_data) - some_other_data$dependent_var
, or in your case predict(model,B) - B$Y
.
Better fits for a linear model
This answer is going to be a bit encyclopedic, because there are several important points about your question. tl;dr adding year*plot
as a random effect is the first step, but the fit is actually a bit problematic, although it doesn't appear so at first: centering the year
variable takes care of the problem, but log-transforming the data might be an even better idea.
UPDATE: OP was doing an analysis on subset(df,Depth==30)
. I accidentally did the analysis on the full data set, which may have made things harder -- the heteroscedasticity documented below probably isn't as bad for a subset of the data -- but I'm going to leave it as is for pedagogical purposes (and out of laziness).
Get the data:
df <- read.csv("rootmeansv2.csv")
library(nlme)
gdf <- groupedData( mass ~ year | plot, data=df)
Adding the year-by-plot interaction to the model as a random effect:
fit0 <- lme(mass ~ year , random = ~ year | plot, data = gdf)
However, if we look at the results we see that this doesn't help very much at all -- the year
term (among-plot variance in the slope) is tiny.
## Variance StdDev Corr
## (Intercept) 9.522880e-12 3.085916e-06 (Intr)
## year 7.110616e-08 2.666574e-04 0.32
## Residual 3.885373e-01 6.233276e-01
This does sometimes happen (a singular fit), but otherwise the lme
summary doesn't indicate in any other way that there might be something wrong. However, if we try to get confidence intervals we see there might be trouble:
intervals(fit0)
## Error in intervals.lme(fit0) :
## cannot get confidence intervals on var-cov components: Non-positive definite approximate variance-covariance
The other way to double-check is to fit the same model in lme4
.
library(lme4)
VarCorr(fit1 <- lmer(mass ~ year +(year | plot), data = gdf))
## Groups Name Std.Dev. Corr
## plot (Intercept) 0.66159674
## year 0.00059471 -1.000
## Residual 0.62324403
## Warning messages:
## 1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.132739 (tol = 0.002, component 1)
## 2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
We get apparently quite different answers, and warnings about convergence (this is in the development version, 1.1-7, which does not suffer from the false-positive warnings reported widely for 1.1-6). It looks like lme4
's fit is slightly better:
c(logLik(fit1)-logLik(fit0))
## 0.1775161
One of the potential data problems for complex fits such as mixed models is poor centering and scaling of predictor variables. In this case, because year
is a continuous predictor that starts at 2008, the intercept and slopes are extremely highly correlated (the intercept is the predicted value in year 0!). We can fix the problem in this case by centering -- it would also be reasonable to subtract the minimum (i.e. start year at 0 rather than 2008)
c. <- function(x) scale(x,center=TRUE,scale=FALSE)
VarCorr(fit2 <- update(fit1,.~ c.(year) +(c.(year) | plot)))
## Groups Name Std.Dev. Corr
## plot (Intercept) 0.53798
## c.(year) 0.10515 1.000
## Residual 0.59634
We get more reasonable answers (and no warnings), although we do still have perfectly (now positively) correlated intercepts and slopes -- this just says the model is slightly overfitting the data, but there's not much we can do about this (we could force the correlation to zero by fitting the model ~c.(year)+(c.(year)||plot))
, but this has its own problems).
Now try it in lme
:
VarCorr(fit3 <- update(fit0,
fixed.=~c.(year),
random=~c.(year)|plot,
control=lmeControl(opt="optim")))
## plot = pdLogChol(c.(year))
## Variance StdDev Corr
## (Intercept) 0.28899909 0.5375864 (Intr)
## c.(year) 0.01122497 0.1059479 0.991
## Residual 0.35559015 0.5963138
Results are similar (although correlation is only 0.991 instead of 1.0): the difference in log-likelihoods is actually slightly larger, but still small. (The fit is still somewhat problematic -- I had to change optimizers as shown in the lmeControl
argument to get here.)
Things now look better:
plot(augPred(fit3))
However, the residual vs fitted plot shows you a problem you should worry about:
plot(fit3)
The funnel shape shows you have a heteroscedasticity problem. The scale-location plot shows it even more clearly:
plot(fit3,sqrt(abs(resid(.)))~fitted(.),type=c("p","smooth"))
The most obvious fix is to log-transform the data:
df$logmass <- log10(df$mass) ## log10() for interpretability
gdfL <- groupedData( logmass ~ year | plot, data=df)
VarCorr(fitL1 <- lme(logmass ~ c.(year) , random = ~ c.(year) | plot,
data = gdfL))
## plot = pdLogChol(c.(year))
## Variance StdDev Corr
## (Intercept) 0.1842905872 0.42929080 (Intr)
## c.(year) 0.0009702893 0.03114947 -0.607
## Residual 0.1052946948 0.32449144
The among-year variation is small again, but this time it's probably because it's not needed. There are no warnings from lmer
when we fit the equivalent model, and we get identical results; fit is non-singular (no zero variances or perfect correlations); and intervals(fitL1)
works.
The predictions look reasonable:
plot(augPred(fitL1))
So does the diagnostic plot:
plot(fitL1)
Although there does appear to be something a bit funny about 2008 (the axes are flipped because lme
prefers to plot boxplots horizontally -- factor(year)
tells lme
to use a boxplot instead of a scatterplot).
plot(fitL1,factor(year)~resid(.))
Related Topics
R: Save Multiple Plots from a File List into a Single File (Png or PDF or Other Format)
Offline Installation of R Packages
How to Extract Data from a Rasterbrick
Model Matrix with All Pairwise Interactions Between Columns
How to Call the 'Function' Function
R How to Change One of the Level to Na
Using Strsplit and Subset in Dplyr and Mutate
Index Element from List in Rcpp
A Way to Access Google Streetview from R
How to Read Knitr/Rmd Cache in Interactive Session
R - How to Add Row Index to a Data Frame, Based on Combination of Factors
Remove White Space Between Plots and Table in Grid.Arrange
Handling Latex Backslashes in Xtable
Ggplot2 Error "No Layers in Plot"