anova test fails on lme fits created with pasted formula
Ben's answer works, but do.call
provides the more general solution he wished for.
getm <- function(xs) {
f <- as.formula(paste("y ~", paste(xs, collapse="+")))
do.call("lme", args = list(f, random=~1|id, data=d, method="ML"))
}
It works because (by default) the arguments in args =
are evaluated before being passed to lme.
anova test fails on lme fits created with pasted formula
Ben's answer works, but do.call
provides the more general solution he wished for.
getm <- function(xs) {
f <- as.formula(paste("y ~", paste(xs, collapse="+")))
do.call("lme", args = list(f, random=~1|id, data=d, method="ML"))
}
It works because (by default) the arguments in args =
are evaluated before being passed to lme.
Any pitfalls to using programmatically constructed formulas?
I'm always hesitant to claim there are no situations in which something involving R environments and scoping might bite, but ... after some more exploration, my first usage above does look safe.
It turns out that the printed call is a bit of red herring.
The formula that actually gets used by other functions (and the one extracted by formula()
and as.formula()
) is the one stored in the terms
element of the fit object, and it gets the actual formula right. (The terms
element contains an object of class "terms"
, which is just a "formula"
with a bunch of attached attributes.)
To see that all of the proposals in my question and the associated comments store the same "formula"
object (up to the associated environment), run the following.
## First the three approaches in my post
formula(fun(XX=c("cyl", "disp")))
# mpg ~ cyl + disp
# <environment: 0x026d2b7c>
formula(lm(mpg ~ cyl + disp, data=mtcars))
# mpg ~ cyl + disp
formula(fun2(XX=c("cyl", "disp"))$call)
# mpg ~ cyl + disp
# <environment: 0x02c4ce2c>
## Then Gabor Grothendieck's idea
XX = c("cyl", "disp")
ff <- reformulate(response="mpg", termlabels=XX)
formula(do.call("lm", list(ff, quote(mtcars))))
## mpg ~ cyl + disp
To confirm that formula()
really is deriving its output from the terms
element of the fit object, have a look at stats:::formula.lm
and stats:::formula.terms
.
Create lme object within a function
I think what you're fighting is just a buggy implementation of contrast()
for lme
objects. I would contact the author to fix it (it may be a result of something recent changing with nlme
). But in the mean time you can avoid your side effects by implementing your workaround in the contrast.lme()
function rather than in your model construction function:
contrast.lme <- function(fit, ...) {
mC <- fit$call
mC$fixed <- formula(fit)
mC$data <- fit$data
fit$call <- mC
library(nlme)
contrast:::contrastCalc(fit, ...)
}
assignInNamespace("contrast.lme", contrast.lme, "contrast")
mm2 <- makeMixedModel2("y1")
contrast(mm2, list(x = 1))
Yields:
lme model parameter contrast
Contrast S.E. Lower Upper t df Pr(>|t|)
0.1613734 0.2169281 -0.2692255 0.5919722 0.74 96 0.4588
And:
print(mm2)
Yields:
Linear mixed-effects model fit by REML
Data: mdat
Log-restricted-likelihood: -136.2472
Fixed: mF
(Intercept) x
-0.1936347 0.3550081
Random effects:
Formula: ~1 | grp
(Intercept) Residual
StdDev: 0.131666 0.9365614
Number of Observations: 100
Number of Groups: 2
lme warning message because of random effects
The convergence warnings disappeared when I removed all data points <2. I stumbled over this by coincidence..
Probably this is somehow connected to the issue that for each subset within the range from 0 to about 50 all data points are almost exactly the same (and have values of about ~1).
Pasting object names into the glm function in R
The . in the formula tells R to use all variables in the data.frame data.set (except y) as predictors. This should do it:
glm( binary.outcome ~ ., family=binomial,
data=data.set )
Call: glm(formula = binary.outcome ~ ., family = binomial, data = data.set)
Coefficients:
(Intercept) varA varB varC
-0.4820 0.1878 -0.3974 -0.4566
Degrees of Freedom: 49 Total (i.e. Null); 46 Residual
Null Deviance: 66.41
Residual Deviance: 62.06 AIC: 70.06
and from ?formula
There are two special interpretations of . in a formula. The usual one
is in the context of a data argument of model fitting functions and
means ‘all columns not otherwise in the formula’: see terms.formula.
In the context of update.formula, only, it means ‘what was previously
in this part of the formula’.
Related Topics
Make Conditionalpanel Depend on Files Uploaded with Fileinput
Convert Column Classes in Data.Table
Create Categories by Comparing a Numeric Column with a Fixed Value
Evaluating Both Column Name and the Target Value Within 'J' Expression Within 'Data.Table'
Why Is 'Vapply' Safer Than 'Sapply'
Generate Random Numbers with Fixed Mean and Sd
How to Paste a String on Each Element of a Vector of Strings Using Apply in R
Convert Written Number to Number in R
Replace Negative Values by Zero
How to Add a General Label to Facets in Ggplot2
Perform a Semi-Join with Data.Table
R: Data.Table Cross-Join Not Working
How to Convert R Markdown to HTML? I.E., What Does "Knit HTML" Do in Rstudio 0.96
Converting Latitude and Longitude Points to Utm