Pass a Vector of Variables into Lm() Formula

Pass a vector of variables into lm() formula

You're almost there. You just have to paste the entire formula together, something like this:

paste("roll_pct ~ ",b,sep = "")

coerce it to an actual formula using as.formula and then pass that to lm. Technically, I think lm may coerce a character string itself, but coercing it yourself is generally safer. (Some functions that expect formulas won't do the coercion for you, others will.)

Using a function parameter and passing it in to lm formula

Use reformulate().

f <- function(d, y) lm(reformulate(names(d)[grep("x", names(d))], response=y), data=d)

f(datasets::anscombe, "y1")
# Call:
# lm(formula = reformulate(names(d)[grep("x", names(d))], response = y),
# data = d)
#
# Coefficients:
# (Intercept) x1 x2 x3 x4
# 4.33291 0.45073 NA NA -0.09873

Passing a character vector of variables into selection() formula

Wrap your paste calls with as.formula

selection(as.formula(paste("y_prob", "~", paste(x_vars[1:4], collapse = " + "))), 
as.formula(paste("y", "~", paste(x_vars[3:5], collapse = " + "))), data)


Call:
selection(selection = as.formula(paste("y_prob", "~", paste(x_vars[1:4], collapse = " + "))), outcome = as.formula(paste("y", "~", paste(x_vars[3:5], collapse = " + "))), data = data)

Coefficients:
S:(Intercept) S:x1 S:x2 S:x3 S:x4 O:(Intercept) O:x3 O:x4 O:x5 sigma
-1.936e-01 -5.851e-05 7.020e-05 5.475e-05 2.811e-05 2.905e+02 2.286e-01 2.437e-01 2.165e-01 4.083e+02
rho
1.000e+00

Dynamically update formula with vector under R 4.0.0 and higher

What you need to do now is to manipulate your string so that you obtain: "~ . + x1 + x2".

myvar2 <- c("x1", "x2")

formula_update <- paste(
"~ . +",
paste(myvar2, collapse = " + ")
)

formula_update
[1] "~ . + x1 + x2"

update(y ~ 1, formula_update)
y ~ x1 + x2

How to dynamically name variables in formula in lm() function?

The core issue to understand here is that lm() takes a type formula as the first parameter that specifies the regression.

You've created a vector of strings (characters) but R won't dynamically generated the formula for you in the function call - the ability to just type variable names as a formula is a convenience but not practical when you are attempting to be dynamic.

To simplify your example, start with:

y1 <- (rnorm(n = 10, mean = 0, sd = 1))
x1 <- (rnorm(n = 10, mean = 0, sd = 1))
x2 <- (rnorm(n = 10, mean = 0, sd = 1))
x3 <- (rnorm(n = 10, mean = 0, sd = 1))

df <- as.data.frame(cbind(y1,x1,x2,x3))

predictors = c("x1", "x2", "x3")

Now you can dynamically create a formula as as concatenated string (paste0) and convert it to a formula. Then pass this formula to your lm() call:

form1 = as.formula(paste0("y1~", predictors[1]))

lm(form1, data = df)

As akrun pointed out, you can then start doing things like create loops to dynamically generate these.

You can also do things like:

my_formula = as.formula(paste0("y1~", paste0(predictors, collapse="+")))

## generates y1 ~ x1 + x2 + x3
lm(my_formula, data = df)

See also: Formula with dynamic number of variables

One of the answers on that page also mentions akrun's alternative way of doing this, using the function reformulate. From ?reformulate:

reformulate creates a formula from a character vector. If length(termlabels) > 1, its elements are concatenated with +. Non-syntactic names (e.g. containing spaces or special characters; see make.names) must be protected with backticks (see examples). A non-parseable response still works for now, back compatibly, with a deprecation warning.

How to pass weights in lm object as a variable from outside the function and refer the column name as weight in the model form?

Updated

model$call[[i]] returns the values of lm() parameters letter by letter, so not only model$call[[4]] looks uninformative, but also model$call[[2]] returns the name of formula instead of the formula. Below a trick to improve it a little bit.

x <-c(rnorm(10),NA)
df <- data.frame(y=1+2*x+rnorm(11)/2, x=x, wght1=1:11)

## Fancy weights as numeric vector

df$weight <- (df$wght1)^(3/4)
weight_var <- "weight"
eqmodel <- as.formula("y~x")
xdata <- df

### unprocessed:
if (weight_var[[1]]=='') {
model <- lm(formula = eqmodel, xdata)
} else {
model <- lm(formula = eqmodel, xdata, weights = xdata[,weight_var])
}
summary(model)
#Call:
#lm(formula = eqmodel, data = xdata, weights = xdata[, weight_var])

### a little trick:
if (weight_var[[1]]=='') {
model <- lm(formula = eqmodel, xdata)
} else {
model <- lm(formula = eqmodel, xdata, weights = xdata[,weight_var])
model$call[[4]] <- weight_var[[1]]
}
model$call[[2]] <- eqmodel
summary(model)

#Call:
#lm(formula = y ~ x, data = xdata, weights = "weight")

Is there a way to pass formulas for lm from one vector and get set of R2 in another vector (without using loop)?

You might be looking for something like:

R_square <- sapply(formulas, 
function(x) summary(lm(x, data = mtcars))$r.squared)

> R_square
mpg~ cyl mpg~ disp mpg~ hp mpg~ drat mpg~ wt mpg~ qsec mpg~ vs
0.7261800 0.7183433 0.6024373 0.4639952 0.7528328 0.1752963 0.4409477
mpg~ am mpg~ gear mpg~ carb
0.3597989 0.2306734 0.3035184


Related Topics



Leave a reply



Submit