R Loop for Variable Names to Run Linear Regression Model

R Loop for Variable Names to run linear regression model

Ok, I'll post an answer. I will use the dataset mtcarsas an example. I believe it will work with your dataset.

First, I create a store, lm.test, an object of class list. In your code you are assigning the output of lm(.) every time through the loop and in the end you would only have the last one, all others would have been rewriten by the newer ones.

Then, inside the loop, I use function reformulate to put together the regression formula. There are other ways of doing this but this one is simple.

# Use just some columns
data <- mtcars[, c("mpg", "cyl", "disp", "hp", "drat", "wt")]
col10 <- names(data)[-1]

lm.test <- vector("list", length(col10))

for(i in seq_along(col10)){
lm.test[[i]] <- lm(reformulate(col10[i], "mpg"), data = data)
}

lm.test

Now you can use the results list for all sorts of things. I suggest you start using lapply and friends for that.

For instance, to extract the coefficients:

cfs <- lapply(lm.test, coef)

In order to get the summaries:

smry <- lapply(lm.test, summary)

It becomes very simple once you're familiar with *apply functions.

Is there a way to loop through column names (not numbers) in r for linear models?

If you want the statistics in a table (which might come in handy) you can use the purrr and broom packages. Here's an example using the dataset mtcars:

Code

library(tidyr)
library(purrr)
library(broom)

formula <- lapply(colnames(mtcars)[3:ncol(mtcars)], function(x) as.formula(paste0(x, " ~ cyl")))

names(formula) <- format(formula)

table <- formula %>% map(~aov(.x, mtcars)) %>% map_dfr(tidy, .id="model")

Output

> head(table)
# A tibble: 6 x 7
model term df sumsq meansq statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 disp ~ cyl cyl 1 387454. 387454. 131. 1.80e-12
2 disp ~ cyl Residuals 30 88731. 2958. NA NA
3 hp ~ cyl cyl 1 100984. 100984. 67.7 3.48e- 9
4 hp ~ cyl Residuals 30 44743. 1491. NA NA
5 drat ~ cyl cyl 1 4.34 4.34 28.8 8.24e- 6
6 drat ~ cyl Residuals 30 4.52 0.151 NA NA

Try

formula <- lapply(colnames(df)[10:ncol(df)], function(x) as.formula(paste0(x, " ~ block + tillage * residue + Error(subblock)")))

names(formula) <- format(formula)

table <- formula %>% map(~aov(.x, df)) %>% map_dfr(tidy, .id="model")

Regression with for-loop with changing variables

Construct the formula using sprintf/paste0 :

m_fit <- vector("list", length(names_pc))

for (i in seq_along(names_pc)){
m <- lm(sprintf('value ~ year + group + group:%s', names_pc[i]), data = dta)
m_fit[[i]] <- m$fit
}

looping variable names in r with linear regression

  • Code below updated to save lm() summaries to variables*

I tried something very simple. I wrote a loop in which I used the "paste0" function to paste "x" and "y" to the iteration number and I used the "get" function to get the objects that the strings referred to. This solution would only work if there is some part of your variable names that's constant and you can iterate through the parts of the variable names that change. By the way, I also changed your data because your x's were perfectly related to your y's, thereby making your covariance matrix blow up, so to speak. Here's the code, I hope it helps:

x1 <- c(runif(40))
y1 <- c(sample(50:300, 40, replace = TRUE))

x2 <- c(runif(40))
y2 <- c(sample(225:975, 40, replace = TRUE))

dataz <- as.data.frame(cbind(x1,y1,x2,y2))


for (i in 1:2){
assign(paste0("coef", i), summary(lm(paste0("x", i, "~", "y", i))))
}

R loop over linear regression

Here’s an approach using broom::glance() and purrr::map_dfr() to collect model summary stats into a tidy tibble:

library(broom)
library(purrr)

lm.test <- map_dfr(
set_names(names(df)[-2]),
~ glance(lm(
as.formula(paste("value ~", .x)),
data = df
)),
.id = "predictor"
)

Result:

# A tibble: 4 x 13
predictor r.squared adj.r.squared sigma statistic p.value df logLik AIC
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 num 0.131 -0.739 27.4 0.150 0.765 1 -12.5 31.1
2 person1 0.836 0.672 11.9 5.10 0.265 1 -10.0 26.1
3 person2 0.542 0.0831 19.9 1.18 0.474 1 -11.6 29.2
4 person3 0.607 0.215 18.4 1.55 0.431 1 -11.3 28.7
# ... with 4 more variables: BIC <dbl>, deviance <dbl>, df.residual <int>,
# nobs <int>

NB, you can capture model coefficients with a similar approach using broom::tidy() instead of glance().

Creating a loop through a list of variables for an LM model in R

You don't even have to use loops. Apply should work nicely.

training_data <- as.data.frame(matrix(sample(1:64), nrow = 8))
colnames(training_data) <- c("independent_variable", paste0("x", 1:7))

Vars <- as.list(c("x1+x2+x3",
"x1+x2+x4",
"x1+x2+x5",
"x1+x2+x6",
"x1+x2+x7"))

allModelsList <- lapply(paste("independent_variable ~", Vars), as.formula)
allModelsResults <- lapply(allModelsList, function(x) lm(x, data = training_data))

If you need models summaries you can add :

allModelsSummaries = lapply(allModelsResults, summary) 

For example you can access the coefficient R² of the model lm(independent_variable ~ x1+x2+x3) by doing this:

allModelsSummaries[[1]]$r.squared

I hope it helps.

How to create many Linear Regression models via a For Loop in R?

Your approach wasn't so bad. This is how I reproduced your work as you described it:

library(rje)   # provides the powerSet function
library(olsrr) # provides the ols_mallows_cp function to calculate the Mallow's Cp values

x <- powerSet(colnames(mtcars[,-1]))
full_model <- lm( mpg ~ ., data=mtcars )

your_models <- lapply( x, function(n) {
d_i <- mtcars[,c( "mpg", n), drop=FALSE] # use drop=FALSE to make sure it stays a 2d structure
return( lm( mpg ~ ., data = d_i ) )
})

Cp_vec <- sapply( your_models, function(m) {
ols_mallows_cp( m, full_model )
})

TenSmallestIndeces <- head( order( Cp_vec ), n=10 )

TenSmallestCp <- head( sort( Cp_vec ), n=10 )

TenSmallestSets <- x[ TenSmallestIndeces ]

## inspect one of your models:
your_models[[ TenSmallestIndeces[1] ]]

It's always preferable to use some sort of apply when collecting from a loop. I frequently use foreach from the foreach package also when building data frames or other 2d structures from a loop.

I create the subset just like you did, and fit the model pretty much the same way, just do it in one go.

Then you just need to understand sort() and order() proberly to look back up in the set you started out with I think.



Related Topics



Leave a reply



Submit