How to do a regression of a series of variables without typing each variable name
Generate a formula by pasting column names first.
f <- as.formula(paste('garisktot ~', paste(colnames(HHdata)[20:43], collapse='+')))
modelAllHexSubscales <- lm(f, HHdata)
How to succinctly write a formula with many variables from a data frame?
There is a special identifier that one can use in a formula to mean all the variables, it is the .
identifier.
y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)
You can also do things like this, to use all variables but one (in this case x3 is excluded):
mod <- lm(y ~ . - x3, data = d)
Technically, .
means all variables not already mentioned in the formula. For example
lm(y ~ x1 * x2 + ., data = d)
where .
would only reference x3
as x1
and x2
are already in the formula.
R: Use string containing variable names in regression
I don't know a simple method for construction of a formula argument different than the one you are rejecting (although I considered and rejected using update.formula
since it would also have required using as.formula
), but this is an alternate method for achieving the same goal. It uses the "."-expansion feature of R-formulas and relies on the ability of the [
-function to accept character argument for column selection:
r_3 <- lm(log(assaults) ~ attend_v + year+ month + . ,
data = df[ , c('assaults', 'attend_v', 'year', 'month', holiday_vars] )
Dynamic variable names in R regressions
Personally, I like to do this with some computing on the language. For me, a combination of bquote
with eval
is easiest (to remember).
var <- as.symbol(var)
eval(bquote(summary(lm(y ~ .(var) + x2, data = df2))))
#Call:
#lm(formula = y ~ x1 + x2, data = df2)
#
#Residuals:
# Min 1Q Median 3Q Max
#-0.49298 -0.26248 -0.00046 0.24111 0.51988
#
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.50244 0.02480 20.258 <2e-16 ***
#x1 -0.01468 0.03161 -0.464 0.643
#x2 -0.01635 0.03227 -0.507 0.612
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 0.2878 on 997 degrees of freedom
#Multiple R-squared: 0.0004708, Adjusted R-squared: -0.001534
#F-statistic: 0.2348 on 2 and 997 DF, p-value: 0.7908
I find this superior to any approach that doesn't show the same call as summary(lm(y ~ x1+x2, data=df2))
.
Is there a short cut to typing multiple explanatory variables in lm() in R?
You could use the dot sign to select all variables, and just use the minus sign to select those that should not be used as predictors.
lm(Sepal.Length ~ .-Species -Petal.Length, iris)
Call:
lm(formula = Sepal.Length ~ . - Species - Petal.Length, data = iris)
Coefficients:
(Intercept) Sepal.Width Petal.Width
3.4573 0.3991 0.9721
Regression with for-loop with changing variables
Construct the formula using sprintf
/paste0
:
m_fit <- vector("list", length(names_pc))
for (i in seq_along(names_pc)){
m <- lm(sprintf('value ~ year + group + group:%s', names_pc[i]), data = dta)
m_fit[[i]] <- m$fit
}
Using column name of dataframe as predictor variable in linear regression
You can use fit <- glm(as.formula(paste0("re78 ~ ", var1)), data=newData)
as @akrun suggest. Further, you likely do not want to call your object glm.fit
as there is a function with the same.
Caveat: I do not why you have the double loop and the :
. Do you not want a regression with a single covaraite? I have no idea what you are trying to achieve otherwise.
How to write a function that will run multiple regression models of the same type with different dependent variables and then store them as lists?
Consider reformulate
to dynamically change model formulas using character values for lm
calls:
# VECTOR OF COLUMN NAMES (NOT VALUES)
dep.vars <- c("dep.var1", "dep.var2")
# USER-DEFINED METHOD TO PROCESS DIFFERENT DEP VAR
run_model <- function(dep.var) {
fml <- reformulate(c("x1", "x2"), dep.var)
lm(fml, data=data)
}
# NAMED LIST OF MODELS
all_models <- sapply(dep.vars, run_model, simplify = FALSE)
# OUTPUT RESULTS
all_models$dep.var1
all_models$dep.var2
...
From there, you can run further extractions or processes across model objects:
# NAMED LIST OF MODEL SUMMARIES
all_summaries <- lapply(all_models, summary)
all_summaries$dep.var1
all_summaries$dep.var2
...
# NAMED LIST OF MODEL COEFFICIENTS
all_coefficients <- lapply(all_models, `[`, "coefficients")
all_coefficients$dep.var1
all_coefficients$dep.var2
...
Related Topics
How to Get the Zoom Level from the Leaflet Map in R/Shiny
Rcpp Can't Find Rtools: "Error 1 Occurred Building Shared Library"
How to Read Data from Cassandra with R
Removing Rows in R Based on Values in a Single Column
Shiny R Renderplots on the Fly
Install the Package That Has Been Removed from the Cran Repository Easily
About Gforce in Data.Table 1.9.2
Producing a Boxplot in Ggplot2 Using Summary Statistics
Efficiently Locf by Groups in a Single R Data.Table
Multiple Lines Each Based on a Different Dataframe in Ggplot2 - Automatic Coloring and Legend
How to Have Conditional Formatting of Data Frames in R Shiny
Creating a Continuous Heat Map in R
Converting a Factor to Numeric Without Losing Information R (As.Numeric() Doesn't Seem to Work)
Arithmetic Mean on a Multidimensional Array on R and Matlab: Drastic Difference of Performances
Determining the Distance Between Two Zip Codes (Alternatives to Mapdist)