Unscale and Uncenter Glmer Parameters

unscale and uncenter glmer parameters

Read in data:

source("SO_unscale.txt")

Separate unscaled and scaled variables (Transmitter.depth doesn't appear to have a scaled variant)

unsc.vars <- subset(dd,select=c(Transmitter.depth,
receiver.depth,water.temperature,
wind.speed,distance))
sc.vars <- subset(dd,select=c(Transmitter.depth,
Receiver.depth,Water.temperature,
Wind.speed,Distance))

I noticed that the means and standard deviations of the scaled variables were not exactly 0/1, perhaps because what's here is a subset of the data. In any case, we will need the means and standard deviations of the original data in order to unscale.

colMeans(sc.vars)
apply(sc.vars,2,sd)
cm <- colMeans(unsc.vars)
csd <- apply(unsc.vars,2,sd)

It is possible to 'unscale' even if the new variables are not exactly centered/scaled (one would just need to enter the actual amount of the shift/scaling done), but it's marginally more complicated, so I'm just going to go ahead and fit with precisely centered/scaled variables.

## changed data name to dd
library(lme4)
cs. <- function(x) scale(x,center=TRUE,scale=TRUE)
m1 <- glmer(Valid.detections ~ Transmitter.depth +
receiver.depth + water.temperature +
wind.speed + distance + (distance | SUR.ID),
data=dd, family = poisson,
control=glmerControl(optimizer=c("bobyqa","Nelder_Mead")))
## FAILS with bobyqa alone
m1.sc <- glmer(Valid.detections ~ cs.(Transmitter.depth) +
cs.(receiver.depth) + cs.(water.temperature) +
cs.(wind.speed) + cs.(distance) + (cs.(distance) | SUR.ID),
data=dd, family = poisson,
control=glmerControl(optimizer=c("bobyqa","Nelder_Mead")))

An important point is that in this case the very different scaling doesn't seem to do any harm; the scaled and unscaled model get essentially the same goodness of fit (if it were important, we would expect the scaled fit to do better)

logLik(m1)-logLik(m1.sc)  ## 1e-7

Here is the rescaling function given in a previous answer:

rescale.coefs <- function(beta,mu,sigma) {
beta2 <- beta ## inherit names etc.
beta2[-1] <- sigma[1]*beta[-1]/sigma[-1]
beta2[1] <- sigma[1]*beta[1]+mu[1]-sum(beta2[-1]*mu[-1])
beta2
}

The parameters do indeed match very closely. (The shifting/scaling vectors include possible scaling/shifting of the response variable, so we start with 0/1 since the response is not scaled [it would rarely make sense to scale a response variable for a GLMM, but this function can be useful for LMMs too].)

(cc <- rescale.coefs(fixef(m1.sc),mu=c(0,cm),sigma=c(1,csd)))
## (Intercept) cs.(Transmitter.depth) cs.(receiver.depth)
## 3.865879406 0.011158402 -0.554392645
## cs.(water.temperature) cs.(wind.speed) cs.(distance)
## -0.050833325 -0.042188495 -0.007231021

fixef(m1)
## (Intercept) Transmitter.depth receiver.depth water.temperature
## 3.865816422 0.011180213 -0.554498582 -0.050830611
## wind.speed distance
## -0.042179333 -0.007231004

Since they're the same (since the unscaled model does fit OK), we could use either set for this calculation.

ddist <- 1:1000
vals <- cbind(`(Intercept)`=1,Transmitter.depth=0.6067926,
Receiver.depth=-0.1610828,Water.temperature=-0.1128282,
Wind.speed=-0.2959290,distance=ddist)
pred.obs <- exp(cc %*% t(vals))
max(ddist[pred.obs>1])

Now suppose you want to do similar scaling/unscaling for a model with interactions or other complexities (i.e. the predictor variables, the columns of the fixed-effect model matrix, are not the same as the input variables, which are the variables that appear in the formula)

m2 <- update(m1,. ~ . + wind.speed:distance)
m2.sc <- update(m1.sc,. ~ . + I(cs.(wind.speed*distance)))
logLik(m2)-logLik(m2.sc)

Calculate mean/sd of model matrix, dropping the first (intercept) value:

X <- getME(m2,"X")                                        
cm2 <- colMeans(X)[-1]
csd2 <- apply(X,2,sd)[-1]
(cc2 <- rescale.coefs(fixef(m2.sc),mu=c(0,cm2),sigma=c(1,csd2)))
all.equal(unname(cc2),unname(fixef(m2)),tol=1e-3) ## TRUE

You don't actually have to fit the full unscaled model just to get the scaling parameters: you could use model.matrix([formula],data) to derive the model matrix. That is, if you haven't already fitted m2 and you want to get X to get the column means and standard deviations, i.e.

X <- model.matrix(Valid.detections ~ Transmitter.depth + receiver.depth +
water.temperature +
wind.speed + distance +
wind.speed:distance,
data=dd)

If you have a LMM/have scaled the response variable, you should also multiply all of the standard deviations (including the residual error, sigma(fitted_model)) by the original SD of the response variable.

How to unscale the coefficients from an lmer()-model fitted with a scaled response

Updated: generalized to allow for scaling of the response as well as the predictors.

Here's a fairly crude implementation.

If our original (unscaled) regression is

Y = b0 + b1*x1 + b2*x2 ... 

Then our scaled regression is

(Y0-mu0)/s0 = b0' + (b1'*(1/s1*(x1-mu1))) + b2'*(1/s2*(x2-mu2))+ ...

This is equivalent to

Y0 = mu0 + s0((b0'-b1'/s1*mu1-b2'/s2*mu2 + ...) + b1'/s1*x1 + b2'/s2*x2 + ...)

So bi = s0*bi'/si for i>0 and

b0 = s0*b0'+mu0-sum(bi*mui)

Implement this:

 rescale.coefs <- function(beta,mu,sigma) {
beta2 <- beta ## inherit names etc.
beta2[-1] <- sigma[1]*beta[-1]/sigma[-1]
beta2[1] <- sigma[1]*beta[1]+mu[1]-sum(beta2[-1]*mu[-1])
beta2
}

Try it out for a linear model:

m1 <- lm(Illiteracy~.,as.data.frame(state.x77))
b1 <- coef(m1)

Make a scaled version of the data:

ss <- scale(state.x77)

Scaled coefficients:

m1S <- update(m1,data=as.data.frame(ss))
b1S <- coef(m1S)

Now try out rescaling:

icol <- which(colnames(state.x77)=="Illiteracy")
p.order <- c(icol,(1:ncol(state.x77))[-icol])
m <- colMeans(state.x77)[p.order]
s <- apply(state.x77,2,sd)[p.order]
all.equal(b1,rescale.coefs(b1S,m,s)) ## TRUE

This assumes that both the response and the predictors are scaled.

  • If you scale only the response and not the predictors, then you should submit (c(mean(response),rep(0,...)) for m and c(sd(response),rep(1,...)) for s (i.e., m and s are the values by which the variables were shifted and scaled).
  • If you scale only the predictors and not the response, then submit c(0,mean(predictors)) for m and c(1,sd(predictors)) for s.

Back-transform coefficients from glmer with scaled independent variables for prediction

The problem with your approach is that it only "unscales" based on the wt variable, whereas you scaled all of the variables in your regression model. One approach that works is to adjust all of the variables in your new (prediction) data frame using the centering/scaling values that were used on the original data frame:

## scale variable x using center/scale attributes
## of variable y
scfun <- function(x,y) {
scale(x,
center=attr(y,"scaled:center"),
scale=attr(y,"scaled:scale"))
}
## scale prediction frame
xweight_sc <- transform(xweight,
wt = scfun(wt, database$wt),
am = scfun(am, database$am))
## predict
p_unsc <- predict(model.1,
newdata=xweight_sc,
type="response", re.form=NA)

Comparing this p_unsc to your predictions from the unscaled model (b in your code), i.e. all.equal(b,p_unsc), gives TRUE.

Another reasonable approach would be to

  • unscale/uncenter all of your parameters using the "unscaling" approaches presented in one of the linked question (such as this one), generating a coefficient vector beta_unsc
  • construct the appropriate model matrix from your prediction frame:
X <- model.matrix(formula(model,fixed.only=TRUE), 
newdata=pred_frame)
  • compute the linear predictor and back-transform:
pred <- plogis(X %*% beta_unsc)

Correct way to scale for multilevel regression using lmer [R]

tl;dr The models have nearly equivalent goodness of fit; the scaled model is slightly better and more reliable; the fixed effects are nearly equivalent; both models estimate singular random-effects variance-covariance matrices, which makes comparison much harder and means you shouldn't be relying on these models for conclusions about variances in any case ...

The model should have the same meaning after centering and rescaling. As I'll show below, the fixed effects are essentially equivalent. I'm finding it harder to convince myself that the variance-covariance estimates are equivalent, but the model is singular anyway (i.e., there isn't enough information to fit a positive-definite variance-covariance matrix).

Using your example and re-running with scaled parameters:

MLdata <- transform(MLdata,
size_cs=scale(size),
GDP_cs=scale(GDP))
m2 <- lmer(PQ ~ size_cs + GDP_cs + (size_cs|country), data = MLdata,
REML = FALSE)

Comparing the log-likelihoods:

logLik(dummymodel)  ## -828.4349
logLik(m2) ## -828.4067

This suggests that the models are pretty close, but the scaled model has done slightly better (although an improvement of 0.03 log-likelihood units is very small).

The fixed effects look different, but I'm going to show that they're equivalent:

fixef(dummymodel)
## (Intercept) size GDP
## 1.345754e+01 3.706393e-05 -5.464550e-06
## fixef(m2)
## (Intercept) size_cs GDP_cs
## 13.86155940 0.27300688 -0.05914308

(If you look at coef(summary(.)) for both models, you'll see that the t-statistics for size and GDP are nearly identical.)

From this question

rescale.coefs <- function(beta,mu,sigma) {
beta2 <- beta ## inherit names etc.
## parameters other than intercept:
beta2[-1] <- sigma[1]*beta[-1]/sigma[-1]
## intercept:
beta2[1] <- sigma[1]*beta[1]+mu[1]-sum(beta2[-1]*mu[-1])
return(beta2)
}
cm <- sapply(MLdata[c("size","GDP")],mean)
csd <- sapply(MLdata[c("size","GDP")],sd)

rescaled <- rescale.coefs(fixef(m2),mu=c(0,cm),sigma=c(1,csd))
all.equal(rescaled,fixef(m2))
cbind(fixef(dummymodel),rescaled)
## rescaled
## (Intercept) 1.345754e+01 1.345833e+01
## size 3.706393e-05 3.713062e-05
## GDP -5.464550e-06 -5.435151e-06

Very similar.

With regard to the variance-covariance matrices:

VarCorr(dummymodel)
## Groups Name Std.Dev. Corr
## country (Intercept) 2.3420e-01
## size 1.5874e-05 -1.000
## Residual 3.8267e+00

VarCorr(m2)
## Groups Name Std.Dev. Corr
## country (Intercept) 0.0000e+00
## size_cs 4.7463e-08 NaN
## Residual 3.8283e+00

Neither variance-covariance matrix is positive definite; the first has the variation in intercept across countries perfectly correlated with variation in slope across countries, while the second assigns zero variance to the intercept across countries. In addition, the among-country variation is very small relative the residual variance in either case. If both matrices were positive definite we could work on finding the transformation that would scale from one case to the other (I think it would just be to multiply each element by (s_i s_j), where s_. is the scaling factor applied to a given predictor). When they're not PD, it becomes tricky.



Related Topics



Leave a reply



Submit