Anova Test Fails on Lme Fits Created with Pasted Formula

anova test fails on lme fits created with pasted formula

Ben's answer works, but do.call provides the more general solution he wished for.

getm <- function(xs) {
f <- as.formula(paste("y ~", paste(xs, collapse="+")))
do.call("lme", args = list(f, random=~1|id, data=d, method="ML"))
}

It works because (by default) the arguments in args = are evaluated before being passed to lme.

anova test fails on lme fits created with pasted formula

Ben's answer works, but do.call provides the more general solution he wished for.

getm <- function(xs) {
f <- as.formula(paste("y ~", paste(xs, collapse="+")))
do.call("lme", args = list(f, random=~1|id, data=d, method="ML"))
}

It works because (by default) the arguments in args = are evaluated before being passed to lme.

Any pitfalls to using programmatically constructed formulas?

I'm always hesitant to claim there are no situations in which something involving R environments and scoping might bite, but ... after some more exploration, my first usage above does look safe.

It turns out that the printed call is a bit of red herring.

The formula that actually gets used by other functions (and the one extracted by formula() and as.formula()) is the one stored in the terms element of the fit object, and it gets the actual formula right. (The terms element contains an object of class "terms", which is just a "formula" with a bunch of attached attributes.)

To see that all of the proposals in my question and the associated comments store the same "formula" object (up to the associated environment), run the following.

## First the three approaches in my post
formula(fun(XX=c("cyl", "disp")))
# mpg ~ cyl + disp
# <environment: 0x026d2b7c>

formula(lm(mpg ~ cyl + disp, data=mtcars))
# mpg ~ cyl + disp

formula(fun2(XX=c("cyl", "disp"))$call)
# mpg ~ cyl + disp
# <environment: 0x02c4ce2c>

## Then Gabor Grothendieck's idea
XX = c("cyl", "disp")
ff <- reformulate(response="mpg", termlabels=XX)
formula(do.call("lm", list(ff, quote(mtcars))))
## mpg ~ cyl + disp

To confirm that formula() really is deriving its output from the terms element of the fit object, have a look at stats:::formula.lm and stats:::formula.terms.

Create lme object within a function

I think what you're fighting is just a buggy implementation of contrast() for lme objects. I would contact the author to fix it (it may be a result of something recent changing with nlme). But in the mean time you can avoid your side effects by implementing your workaround in the contrast.lme() function rather than in your model construction function:

contrast.lme <- function(fit, ...) {
mC <- fit$call
mC$fixed <- formula(fit)
mC$data <- fit$data
fit$call <- mC

library(nlme)
contrast:::contrastCalc(fit, ...)
}
assignInNamespace("contrast.lme", contrast.lme, "contrast")

mm2 <- makeMixedModel2("y1")

contrast(mm2, list(x = 1))

Yields:

lme model parameter contrast

Contrast S.E. Lower Upper t df Pr(>|t|)
0.1613734 0.2169281 -0.2692255 0.5919722 0.74 96 0.4588

And:

print(mm2)

Yields:

Linear mixed-effects model fit by REML
Data: mdat
Log-restricted-likelihood: -136.2472
Fixed: mF
(Intercept) x
-0.1936347 0.3550081

Random effects:
Formula: ~1 | grp
(Intercept) Residual
StdDev: 0.131666 0.9365614

Number of Observations: 100
Number of Groups: 2

lme warning message because of random effects

The convergence warnings disappeared when I removed all data points <2. I stumbled over this by coincidence..

Probably this is somehow connected to the issue that for each subset within the range from 0 to about 50 all data points are almost exactly the same (and have values of about ~1).

Pasting object names into the glm function in R

The . in the formula tells R to use all variables in the data.frame data.set (except y) as predictors. This should do it:

glm( binary.outcome ~ ., family=binomial, 
data=data.set )

Call: glm(formula = binary.outcome ~ ., family = binomial, data = data.set)

Coefficients:
(Intercept) varA varB varC
-0.4820 0.1878 -0.3974 -0.4566

Degrees of Freedom: 49 Total (i.e. Null); 46 Residual
Null Deviance: 66.41
Residual Deviance: 62.06 AIC: 70.06

and from ?formula

There are two special interpretations of . in a formula. The usual one
is in the context of a data argument of model fitting functions and
means ‘all columns not otherwise in the formula’: see terms.formula.
In the context of update.formula, only, it means ‘what was previously
in this part of the formula’.



Related Topics



Leave a reply



Submit