How to succinctly write a formula with many variables from a data frame?
There is a special identifier that one can use in a formula to mean all the variables, it is the .
identifier.
y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)
You can also do things like this, to use all variables but one (in this case x3 is excluded):
mod <- lm(y ~ . - x3, data = d)
Technically, .
means all variables not already mentioned in the formula. For example
lm(y ~ x1 * x2 + ., data = d)
where .
would only reference x3
as x1
and x2
are already in the formula.
Succinctly write a formula with many variables from a data frame with np package
The error comes from improper use of matchc.call
npplregbw.formula, invoked by
npplregbw`. The error is thrown in the first couple lines of code
npplregbw.formula <- function (formula, data, subset, na.action, call, ...)
{
mf <- match.call(expand.dots = FALSE)
m <- match(c("formula", "data", "subset", "na.action"),
names(mf), nomatch = 0)
mf <- mf[c(1, m)]
if (!missing(call) && is.call(call)) {
for (i in 1:length(call)) {
if (tryCatch(class(eval(call[[i]])) == "formula",
error = function(e) FALSE))
break
}
mf[[2]] <- call[[i]]
}
mf.xf <- mf
mf[[1]] <- as.name("model.frame")
mf.xf[[1]] <- as.name("model.frame")
chromoly <- explodePipe(mf[["formula"]])
if (length(chromoly) != 3)
stop("invoked with improper formula, please see npplregbw documentation for proper use")
...
}
Note a small example:
foo <- function(formula){
mf <- match.call(expand.dots = FALSE)
mf[["formula"]]
}
foo(fmla)
fmla # <=== output line
This is definitely something to report as an issue (opened here). The quick-fix is the one given by Roland in the comments
eval(bquote(np::npplregbw(formula = .(fmla), data=df)))
The better fix has to be done on the package end.
short formula call for many variables when building a model
You can use .
as described in the help page for formula
. The .
stands for "all columns not otherwise in the formula".
lm(output ~ ., data = myData)
.
Alternatively, construct the formula manually with paste
. This example is from the as.formula()
help page:
xnam <- paste("x", 1:25, sep="")
(fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"))))
You can then insert this object into regression function: lm(fmla, data = myData)
.
R: How to make a for loop that performs a formula over a few columns (variables) in a new variable
A dplyr
option:
library(dplyr)
df %>%
mutate(new = 0.5*X*1.2*Y+0.75*Z)
Output:
X Y Z new
1 1 3 5 5.55
2 4 2 2 6.30
3 2 5 1 6.75
Data
df <- data.frame(X = c(1,4,2),
Y = c(3,2,5),
Z = c(5,2,1))
extract predictors from formulas
We may use all.vars
on the formula
all.vars(lm1$call[[2]])[3]
[1] "wt"
Or with get_all_vars
names(get_all_vars(lm1$call$formula, mtcars))[3]
[1] "wt"
Related Topics
Split Comma-Separated Strings in a Column into Separate Rows
Collapse/Concatenate/Aggregate a Column to a Single Comma Separated String Within Each Group
Convert Dataframe Column to 1 or 0 for "True"/"False" Values and Assign to Dataframe
Loop Through Data Frame and Variable Names
Numbering Rows Within Groups in a Data Frame
Mean Per Group in a Data.Frame
Counting Unique Values Across Variables (Columns) in R
How to Generate a Histogram for Each Column of My Table
Duplicate Columns in Spark Dataframe
Combine Two Lists in a Dataframe in R
Easier Way to Use Grepl and Ifelse Across Multiple Columns
Column Name Changes in R for Loop for Defined Data Frame
Installing Rgl on Ubuntu and Mac: X11 Not Found
Calculate the Area Under a Curve
Replacing Na Values from Another Dataframe by Id
Get the Difference Between Dates in Terms of Weeks, Months, Quarters, and Years
Too Much White Space Between Caption and Figure Produced by Tikzdevice and Ggplot2 in Latex