Short Formula Call for Many Variables When Building a Model

short formula call for many variables when building a model

You can use . as described in the help page for formula. The . stands for "all columns not otherwise in the formula".

lm(output ~ ., data = myData).

Alternatively, construct the formula manually with paste. This example is from the as.formula() help page:

xnam <- paste("x", 1:25, sep="")
(fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"))))

You can then insert this object into regression function: lm(fmla, data = myData).

How to succinctly write a formula with many variables from a data frame?

There is a special identifier that one can use in a formula to mean all the variables, it is the . identifier.

y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)

You can also do things like this, to use all variables but one (in this case x3 is excluded):

mod <- lm(y ~ . - x3, data = d)

Technically, . means all variables not already mentioned in the formula. For example

lm(y ~ x1 * x2 + ., data = d)

where . would only reference x3 as x1 and x2 are already in the formula.

Is there a short cut to typing multiple explanatory variables in lm() in R?

You could use the dot sign to select all variables, and just use the minus sign to select those that should not be used as predictors.

lm(Sepal.Length ~ .-Species -Petal.Length, iris)

Call:
lm(formula = Sepal.Length ~ . - Species - Petal.Length, data = iris)

Coefficients:
(Intercept) Sepal.Width Petal.Width
3.4573 0.3991 0.9721

Formula with dynamic number of variables

See ?as.formula, e.g.:

factors <- c("factor1", "factor2")
as.formula(paste("y~", paste(factors, collapse="+")))
# y ~ factor1 + factor2

where factors is a character vector containing the names of the factors you want to use in the model. This you can paste into an lm model, e.g.:

set.seed(0)
y <- rnorm(100)
factor1 <- rep(1:2, each=50)
factor2 <- rep(3:4, 50)
lm(as.formula(paste("y~", paste(factors, collapse="+"))))

# Call:
# lm(formula = as.formula(paste("y~", paste(factors, collapse = "+"))))

# Coefficients:
# (Intercept) factor1 factor2
# 0.542471 -0.002525 -0.147433

how to create a loop over a different set of variables and models in R

You may use nested lapply -

lapply(models, function(x) lapply(formulas, function(y) x(y, data = mtcars)))

How do I fit a model without specifying the number of variables?

Read ?formula:

There are two special interpretations
of ‘.’ in a formula. The usual one is
in the context of a ‘data’ argument of
model fitting functions and means ‘all
columns not otherwise in the formula’:
see ‘terms.formula’. In the context
of ‘update.formula’, only, it means
‘what was previously in this part of
the formula’.

Creating a loop through a list of variables for an LM model in R

You don't even have to use loops. Apply should work nicely.

training_data <- as.data.frame(matrix(sample(1:64), nrow = 8))
colnames(training_data) <- c("independent_variable", paste0("x", 1:7))

Vars <- as.list(c("x1+x2+x3",
"x1+x2+x4",
"x1+x2+x5",
"x1+x2+x6",
"x1+x2+x7"))

allModelsList <- lapply(paste("independent_variable ~", Vars), as.formula)
allModelsResults <- lapply(allModelsList, function(x) lm(x, data = training_data))

If you need models summaries you can add :

allModelsSummaries = lapply(allModelsResults, summary) 

For example you can access the coefficient R² of the model lm(independent_variable ~ x1+x2+x3) by doing this:

allModelsSummaries[[1]]$r.squared

I hope it helps.



Related Topics



Leave a reply



Submit