Pass a vector of variables into lm() formula
You're almost there. You just have to paste
the entire formula together, something like this:
paste("roll_pct ~ ",b,sep = "")
coerce it to an actual formula using as.formula
and then pass that to lm
. Technically, I think lm
may coerce a character string itself, but coercing it yourself is generally safer. (Some functions that expect formulas won't do the coercion for you, others will.)
Using a function parameter and passing it in to lm formula
Use reformulate()
.
f <- function(d, y) lm(reformulate(names(d)[grep("x", names(d))], response=y), data=d)
f(datasets::anscombe, "y1")
# Call:
# lm(formula = reformulate(names(d)[grep("x", names(d))], response = y),
# data = d)
#
# Coefficients:
# (Intercept) x1 x2 x3 x4
# 4.33291 0.45073 NA NA -0.09873
Passing a character vector of variables into selection() formula
Wrap your paste
calls with as.formula
selection(as.formula(paste("y_prob", "~", paste(x_vars[1:4], collapse = " + "))),
as.formula(paste("y", "~", paste(x_vars[3:5], collapse = " + "))), data)
Call:
selection(selection = as.formula(paste("y_prob", "~", paste(x_vars[1:4], collapse = " + "))), outcome = as.formula(paste("y", "~", paste(x_vars[3:5], collapse = " + "))), data = data)
Coefficients:
S:(Intercept) S:x1 S:x2 S:x3 S:x4 O:(Intercept) O:x3 O:x4 O:x5 sigma
-1.936e-01 -5.851e-05 7.020e-05 5.475e-05 2.811e-05 2.905e+02 2.286e-01 2.437e-01 2.165e-01 4.083e+02
rho
1.000e+00
Dynamically update formula with vector under R 4.0.0 and higher
What you need to do now is to manipulate your string so that you obtain: "~ . + x1 + x2"
.
myvar2 <- c("x1", "x2")
formula_update <- paste(
"~ . +",
paste(myvar2, collapse = " + ")
)
formula_update
[1] "~ . + x1 + x2"
update(y ~ 1, formula_update)
y ~ x1 + x2
How to dynamically name variables in formula in lm() function?
The core issue to understand here is that lm()
takes a type formula
as the first parameter that specifies the regression.
You've created a vector of strings (characters) but R won't dynamically generated the formula for you in the function call - the ability to just type variable names as a formula is a convenience but not practical when you are attempting to be dynamic.
To simplify your example, start with:
y1 <- (rnorm(n = 10, mean = 0, sd = 1))
x1 <- (rnorm(n = 10, mean = 0, sd = 1))
x2 <- (rnorm(n = 10, mean = 0, sd = 1))
x3 <- (rnorm(n = 10, mean = 0, sd = 1))
df <- as.data.frame(cbind(y1,x1,x2,x3))
predictors = c("x1", "x2", "x3")
Now you can dynamically create a formula as as concatenated string (paste0
) and convert it to a formula. Then pass this formula to your lm()
call:
form1 = as.formula(paste0("y1~", predictors[1]))
lm(form1, data = df)
As akrun pointed out, you can then start doing things like create loops to dynamically generate these.
You can also do things like:
my_formula = as.formula(paste0("y1~", paste0(predictors, collapse="+")))
## generates y1 ~ x1 + x2 + x3
lm(my_formula, data = df)
See also: Formula with dynamic number of variables
One of the answers on that page also mentions akrun's alternative way of doing this, using the function reformulate
. From ?reformulate
:
reformulate creates a formula from a character vector. If length(termlabels) > 1, its elements are concatenated with +. Non-syntactic names (e.g. containing spaces or special characters; see make.names) must be protected with backticks (see examples). A non-parseable response still works for now, back compatibly, with a deprecation warning.
How to pass weights in lm object as a variable from outside the function and refer the column name as weight in the model form?
Updated
model$call[[i]] returns the values of lm() parameters letter by letter, so not only model$call[[4]] looks uninformative, but also model$call[[2]] returns the name of formula instead of the formula. Below a trick to improve it a little bit.
x <-c(rnorm(10),NA)
df <- data.frame(y=1+2*x+rnorm(11)/2, x=x, wght1=1:11)
## Fancy weights as numeric vector
df$weight <- (df$wght1)^(3/4)
weight_var <- "weight"
eqmodel <- as.formula("y~x")
xdata <- df
### unprocessed:
if (weight_var[[1]]=='') {
model <- lm(formula = eqmodel, xdata)
} else {
model <- lm(formula = eqmodel, xdata, weights = xdata[,weight_var])
}
summary(model)
#Call:
#lm(formula = eqmodel, data = xdata, weights = xdata[, weight_var])
### a little trick:
if (weight_var[[1]]=='') {
model <- lm(formula = eqmodel, xdata)
} else {
model <- lm(formula = eqmodel, xdata, weights = xdata[,weight_var])
model$call[[4]] <- weight_var[[1]]
}
model$call[[2]] <- eqmodel
summary(model)
#Call:
#lm(formula = y ~ x, data = xdata, weights = "weight")
Is there a way to pass formulas for lm from one vector and get set of R2 in another vector (without using loop)?
You might be looking for something like:
R_square <- sapply(formulas,
function(x) summary(lm(x, data = mtcars))$r.squared)
> R_square
mpg~ cyl mpg~ disp mpg~ hp mpg~ drat mpg~ wt mpg~ qsec mpg~ vs
0.7261800 0.7183433 0.6024373 0.4639952 0.7528328 0.1752963 0.4409477
mpg~ am mpg~ gear mpg~ carb
0.3597989 0.2306734 0.3035184
Related Topics
Incomplete Final Line' Warning When Trying to Read a .Csv File into R
How to See the Source Code of R .Internal or .Primitive Function
Conditionally Change Panel Background With Facet_Grid
Filtering a Data Frame on a Vector
Calculate Cumulative Sum (Cumsum) by Group
Manually Setting Group Colors For Ggplot2
Select Equivalent Rows [A-B & B-A]
How to Convert a Table to a Data Frame
Can You Pass-By-Reference in R
Read All Files in a Folder and Apply a Function to Each Data Frame
Displaying Text Below the Plot Generated by Ggplot2
Pass a Vector of Variables into Lm() Formula
How to Add Percentage or Count Labels Above Percentage Bar Plot
Calculating Statistics on Subsets of Data
Sample Random Rows Within Each Group in a Data.Table