How to Do a Regression of a Series of Variables Without Typing Each Variable Name

How to do a regression of a series of variables without typing each variable name

Generate a formula by pasting column names first.

f <- as.formula(paste('garisktot ~', paste(colnames(HHdata)[20:43], collapse='+')))
modelAllHexSubscales <- lm(f, HHdata)

How to succinctly write a formula with many variables from a data frame?

There is a special identifier that one can use in a formula to mean all the variables, it is the . identifier.

y <- c(1,4,6)
d <- data.frame(y = y, x1 = c(4,-1,3), x2 = c(3,9,8), x3 = c(4,-4,-2))
mod <- lm(y ~ ., data = d)

You can also do things like this, to use all variables but one (in this case x3 is excluded):

mod <- lm(y ~ . - x3, data = d)

Technically, . means all variables not already mentioned in the formula. For example

lm(y ~ x1 * x2 + ., data = d)

where . would only reference x3 as x1 and x2 are already in the formula.

R: Use string containing variable names in regression

I don't know a simple method for construction of a formula argument different than the one you are rejecting (although I considered and rejected using update.formula since it would also have required using as.formula), but this is an alternate method for achieving the same goal. It uses the "."-expansion feature of R-formulas and relies on the ability of the [-function to accept character argument for column selection:

  r_3 <- lm(log(assaults) ~ attend_v + year+ month + . ,
data = df[ , c('assaults', 'attend_v', 'year', 'month', holiday_vars] )

Dynamic variable names in R regressions

Personally, I like to do this with some computing on the language. For me, a combination of bquote with eval is easiest (to remember).

var <- as.symbol(var)
eval(bquote(summary(lm(y ~ .(var) + x2, data = df2))))
#Call:
#lm(formula = y ~ x1 + x2, data = df2)
#
#Residuals:
# Min 1Q Median 3Q Max
#-0.49298 -0.26248 -0.00046 0.24111 0.51988
#
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.50244 0.02480 20.258 <2e-16 ***
#x1 -0.01468 0.03161 -0.464 0.643
#x2 -0.01635 0.03227 -0.507 0.612
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 0.2878 on 997 degrees of freedom
#Multiple R-squared: 0.0004708, Adjusted R-squared: -0.001534
#F-statistic: 0.2348 on 2 and 997 DF, p-value: 0.7908

I find this superior to any approach that doesn't show the same call as summary(lm(y ~ x1+x2, data=df2)).

Is there a short cut to typing multiple explanatory variables in lm() in R?

You could use the dot sign to select all variables, and just use the minus sign to select those that should not be used as predictors.

lm(Sepal.Length ~ .-Species -Petal.Length, iris)

Call:
lm(formula = Sepal.Length ~ . - Species - Petal.Length, data = iris)

Coefficients:
(Intercept) Sepal.Width Petal.Width
3.4573 0.3991 0.9721

Regression with for-loop with changing variables

Construct the formula using sprintf/paste0 :

m_fit <- vector("list", length(names_pc))

for (i in seq_along(names_pc)){
m <- lm(sprintf('value ~ year + group + group:%s', names_pc[i]), data = dta)
m_fit[[i]] <- m$fit
}

Using column name of dataframe as predictor variable in linear regression

You can use fit <- glm(as.formula(paste0("re78 ~ ", var1)), data=newData) as @akrun suggest. Further, you likely do not want to call your object glm.fit as there is a function with the same.

Caveat: I do not why you have the double loop and the :. Do you not want a regression with a single covaraite? I have no idea what you are trying to achieve otherwise.

How to write a function that will run multiple regression models of the same type with different dependent variables and then store them as lists?

Consider reformulate to dynamically change model formulas using character values for lm calls:

# VECTOR OF COLUMN NAMES (NOT VALUES)
dep.vars <- c("dep.var1", "dep.var2")

# USER-DEFINED METHOD TO PROCESS DIFFERENT DEP VAR
run_model <- function(dep.var) {
fml <- reformulate(c("x1", "x2"), dep.var)
lm(fml, data=data)
}

# NAMED LIST OF MODELS
all_models <- sapply(dep.vars, run_model, simplify = FALSE)

# OUTPUT RESULTS
all_models$dep.var1
all_models$dep.var2
...

From there, you can run further extractions or processes across model objects:

# NAMED LIST OF MODEL SUMMARIES
all_summaries <- lapply(all_models, summary)

all_summaries$dep.var1
all_summaries$dep.var2
...

# NAMED LIST OF MODEL COEFFICIENTS
all_coefficients <- lapply(all_models, `[`, "coefficients")

all_coefficients$dep.var1
all_coefficients$dep.var2
...


Related Topics



Leave a reply



Submit