short formula call for many variables when building a model
You can use .
as described in the help page for formula
. The .
stands for "all columns not otherwise in the formula".
lm(output ~ ., data = myData)
.
Alternatively, construct the formula manually with paste
. This example is from the as.formula()
help page:
xnam <- paste("x", 1:25, sep="")
(fmla <- as.formula(paste("y ~ ", paste(xnam, collapse= "+"))))
You can then insert this object into regression function: lm(fmla, data = myData)
.
How to succinctly write a formula with many variables from a data frame?
There is a special identifier that one can use in a formula to mean all the variables, it is the .
identifier.
y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)
You can also do things like this, to use all variables but one (in this case x3 is excluded):
mod <- lm(y ~ . - x3, data = d)
Technically, .
means all variables not already mentioned in the formula. For example
lm(y ~ x1 * x2 + ., data = d)
where .
would only reference x3
as x1
and x2
are already in the formula.
Is there a short cut to typing multiple explanatory variables in lm() in R?
You could use the dot sign to select all variables, and just use the minus sign to select those that should not be used as predictors.
lm(Sepal.Length ~ .-Species -Petal.Length, iris)
Call:
lm(formula = Sepal.Length ~ . - Species - Petal.Length, data = iris)
Coefficients:
(Intercept) Sepal.Width Petal.Width
3.4573 0.3991 0.9721
Formula with dynamic number of variables
See ?as.formula
, e.g.:
factors <- c("factor1", "factor2")
as.formula(paste("y~", paste(factors, collapse="+")))
# y ~ factor1 + factor2
where factors
is a character vector containing the names of the factors you want to use in the model. This you can paste into an lm
model, e.g.:
set.seed(0)
y <- rnorm(100)
factor1 <- rep(1:2, each=50)
factor2 <- rep(3:4, 50)
lm(as.formula(paste("y~", paste(factors, collapse="+"))))
# Call:
# lm(formula = as.formula(paste("y~", paste(factors, collapse = "+"))))
# Coefficients:
# (Intercept) factor1 factor2
# 0.542471 -0.002525 -0.147433
how to create a loop over a different set of variables and models in R
You may use nested lapply
-
lapply(models, function(x) lapply(formulas, function(y) x(y, data = mtcars)))
How do I fit a model without specifying the number of variables?
Read ?formula:
There are two special interpretations
of ‘.’ in a formula. The usual one is
in the context of a ‘data’ argument of
model fitting functions and means ‘all
columns not otherwise in the formula’:
see ‘terms.formula’. In the context
of ‘update.formula’, only, it means
‘what was previously in this part of
the formula’.
Creating a loop through a list of variables for an LM model in R
You don't even have to use loops. Apply should work nicely.
training_data <- as.data.frame(matrix(sample(1:64), nrow = 8))
colnames(training_data) <- c("independent_variable", paste0("x", 1:7))
Vars <- as.list(c("x1+x2+x3",
"x1+x2+x4",
"x1+x2+x5",
"x1+x2+x6",
"x1+x2+x7"))
allModelsList <- lapply(paste("independent_variable ~", Vars), as.formula)
allModelsResults <- lapply(allModelsList, function(x) lm(x, data = training_data))
If you need models summaries you can add :
allModelsSummaries = lapply(allModelsResults, summary)
For example you can access the coefficient R² of the model lm(independent_variable ~ x1+x2+x3)
by doing this:
allModelsSummaries[[1]]$r.squared
I hope it helps.
Related Topics
Apply a Function to Every Row of a Matrix or a Data Frame
How to Specify the Actual X Axis Values to Plot as X Axis Ticks in R
How to Add a Ggplot2 Subtitle with Different Size and Colour
Is There a Better Alternative Than String Manipulation to Programmatically Build Formulas
How to Convert R Markdown to HTML? I.E., What Does "Knit HTML" Do in Rstudio 0.96
How to Generate All Possible Combinations of Vectors Without Caring for Order
How to Count the Frequency of a String for Each Row in R
Longest Common Substring in R Finding Non-Contiguous Matches Between the Two Strings
Issue with Geom_Text When Using Position_Dodge
Meaning of Ddply Error: 'Names' Attribute [9] Must Be the Same Length as the Vector [1]
Ggplot Separate Legend and Plot
Find Common Substrings Between Two Character Variables
Convert a Numeric Month to a Month Abbreviation
Using R to List All Files with a Specified Extension