How to Succinctly Write a Formula With Many Variables from a Data Frame

How to succinctly write a formula with many variables from a data frame?

There is a special identifier that one can use in a formula to mean all the variables, it is the . identifier.

y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)

You can also do things like this, to use all variables but one (in this case x3 is excluded):

mod <- lm(y ~ . - x3, data = d)

Technically, . means all variables not already mentioned in the formula. For example

lm(y ~ x1 * x2 + ., data = d)

where . would only reference x3 as x1 and x2 are already in the formula.

Succinctly write a formula with many variables from a data frame with np package

The error comes from improper use of matchc.call npplregbw.formula, invoked by npplregbw`. The error is thrown in the first couple lines of code

npplregbw.formula <- function (formula, data, subset, na.action, call, ...) 
{
mf <- match.call(expand.dots = FALSE)
m <- match(c("formula", "data", "subset", "na.action"),
names(mf), nomatch = 0)
mf <- mf[c(1, m)]
if (!missing(call) && is.call(call)) {
for (i in 1:length(call)) {
if (tryCatch(class(eval(call[[i]])) == "formula",
error = function(e) FALSE))
break
}
mf[[2]] <- call[[i]]
}
mf.xf <- mf
mf[[1]] <- as.name("model.frame")
mf.xf[[1]] <- as.name("model.frame")
chromoly <- explodePipe(mf[["formula"]])
if (length(chromoly) != 3)
stop("invoked with improper formula, please see npplregbw documentation for proper use")
...
}

Note a small example:

foo <- function(formula){
mf <- match.call(expand.dots = FALSE)
mf[["formula"]]
}
foo(fmla)
fmla # <=== output line

This is definitely something to report as an issue (opened here). The quick-fix is the one given by Roland in the comments

eval(bquote(np::npplregbw(formula = .(fmla), data=df)))

The better fix has to be done on the package end.

short formula call for many variables when building a model

You can use . as described in the help page for formula. The . stands for "all columns not otherwise in the formula".

lm(output ~ ., data = myData).

Alternatively, construct the formula manually with paste. This example is from the as.formula() help page:

xnam <- paste("x", 1:25, sep="")
(fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"))))

You can then insert this object into regression function: lm(fmla, data = myData).

R: How to make a for loop that performs a formula over a few columns (variables) in a new variable

A dplyr option:

library(dplyr)
df %>%
mutate(new = 0.5*X*1.2*Y+0.75*Z)

Output:

  X Y Z  new
1 1 3 5 5.55
2 4 2 2 6.30
3 2 5 1 6.75

Data

df <- data.frame(X = c(1,4,2),
Y = c(3,2,5),
Z = c(5,2,1))

extract predictors from formulas

We may use all.vars on the formula

all.vars(lm1$call[[2]])[3]
[1] "wt"

Or with get_all_vars

names(get_all_vars(lm1$call$formula, mtcars))[3]
[1] "wt"


Related Topics



Leave a reply



Submit