Looping Through Covariates in Regression Using R

Looping through covariates in regression using R

I would put the results in a list and avoid the for loop and assign statements

You can use a combination of reformulate and update to create your formula

orig_formula <- MonExp_EGM~ Month2+Month3+Month4+Month5+Month6+Month7+Month8+Month9+
Month10+Month11+Month12+Yrs_minus_2004 + as.factor(LGA)

te_variables <- paste0('TE_', 1:96)
# Or if you don't have a current version of R
# te_variables <- paste('TE', 1:96, sep = '_')

new_formula <- lapply(te_variables, function(x,orig = orig_formula) {
new <- reformulate(c(x,'.'))
update(orig, new)})
## it works!
new_formula[[1]]
## MonExp_EGM ~ TE_1 + Month2 + Month3 + Month4 + Month5 + Month6 +
## Month7 + Month8 + Month9 + Month10 + Month11 + Month12 +
## Yrs_minus_2004 + as.factor(LGA)
new_formula[[2]]
## MonExp_EGM ~ TE_2 + Month2 + Month3 + Month4 + Month5 + Month6 +
## Month7 + Month8 + Month9 + Month10 + Month11 + Month12 +
## Yrs_minus_2004 + as.factor(LGA)

models <- lapply(new_formula, lm, data = pokies)

There should now be 96 elements in the list models

You can name them to reflect your originally planned nnames

names(models) <- paste0('z.out', 1:96)
# or if you don't have a current version of R
# names(models) <-paste('z.out', 1:96 ,sep = '' )

and then access a single model by

 models$z.out5

etc

or create summaries of all of the models

 summaries <- lapply(models, summary)

etc....

 # just the coefficients
coefficients <- lapply(models, coef)

# the table with coefficient estimates and standard.errors

coef_tables <- apply(summaries, '[[', 'coefficients')

How to make loop to contain confounders(covariates) in a binary logistic regression using glm in R?

Use c() not + in reformulate. In both your functions, x takes the value of column names. I'll use mtcars column names to illustrate:

## Model1 works
reformulate("hp", response = "mpg")
# mpg ~ hp

## Model2 doesn't work
reformulate("hp" + wt + cyl, response = "mpg")
# Error in reformulate("hp" + wt + cyl, response = "mpg") :
# object 'wt' not found

## Fix it with `c()` and quoted column names
reformulate(c("hp", "wt", "cyl"), response = "mpg")
# mpg ~ hp + wt + cyl

## Showing it works with a mix of variables and quoted column names
x = "hp"
reformulate(c(x, "wt", "cyl"), response = "mpg")
# mpg ~ hp + wt + cyl

So in your Model2, change to reformulate(c(x, "age", "bmi"), response="outcome")

You have additional problems - you are running Model2 on all columns except for outcome (setdiff(names(example.data),"outcome")), but you should also exclude bmi and age since they are included inside the function.

How to Loop/Repeat a Linear Regression in R

You want to run 22,000 linear regressions and extract the coefficients? That's simple to do from a coding standpoint.

set.seed(1)

# number of columns in the Lung and Blood data.frames. 22,000 for you?
n <- 5

# dummy data
obs <- 50 # observations
Lung <- data.frame(matrix(rnorm(obs*n), ncol=n))
Blood <- data.frame(matrix(rnorm(obs*n), ncol=n))
Age <- sample(20:80, obs)
Gender <- factor(rbinom(obs, 1, .5))

# run n regressions
my_lms <- lapply(1:n, function(x) lm(Lung[,x] ~ Blood[,x] + Age + Gender))

# extract just coefficients
sapply(my_lms, coef)

# if you need more info, get full summary call. now you can get whatever, like:
summaries <- lapply(my_lms, summary)
# ...coefficents with p values:
lapply(summaries, function(x) x$coefficients[, c(1,4)])
# ...or r-squared values
sapply(summaries, function(x) c(r_sq = x$r.squared,
adj_r_sq = x$adj.r.squared))

The models are stored in a list, where model 3 (with DV Lung[, 3] and IVs Blood[,3] + Age + Gender) is in my_lms[[3]] and so on. You can use apply functions on the list to perform summaries, from which you can extract the numbers you want.

Looping over combinations of regression model terms

With this dummy data:

dat1 <- data.frame(y = rpois(100,5),
x1 = runif(100),
x2 = runif(100),
x3 = runif(100),
z1 = runif(100),
z2 = runif(100)
)

You could get your list of two lm objects this way:

 lapply(dat1[5:6], function(x) lm(dat1$y ~ dat1$x1 + dat1$x2 + dat1$x3 + x))

Which iterates through those two columns and substitutes them as arguments into the lm call.

As Alex notes below, it's preferable to pass the names through the formula, rather than the actual data columns as I have done here.

Creating a loop through a list of variables for an LM model in R

You don't even have to use loops. Apply should work nicely.

training_data <- as.data.frame(matrix(sample(1:64), nrow = 8))
colnames(training_data) <- c("independent_variable", paste0("x", 1:7))

Vars <- as.list(c("x1+x2+x3",
"x1+x2+x4",
"x1+x2+x5",
"x1+x2+x6",
"x1+x2+x7"))

allModelsList <- lapply(paste("independent_variable ~", Vars), as.formula)
allModelsResults <- lapply(allModelsList, function(x) lm(x, data = training_data))

If you need models summaries you can add :

allModelsSummaries = lapply(allModelsResults, summary) 

For example you can access the coefficient R² of the model lm(independent_variable ~ x1+x2+x3) by doing this:

allModelsSummaries[[1]]$r.squared

I hope it helps.

R Loop for Variable Names to run linear regression model

Ok, I'll post an answer. I will use the dataset mtcarsas an example. I believe it will work with your dataset.

First, I create a store, lm.test, an object of class list. In your code you are assigning the output of lm(.) every time through the loop and in the end you would only have the last one, all others would have been rewriten by the newer ones.

Then, inside the loop, I use function reformulate to put together the regression formula. There are other ways of doing this but this one is simple.

# Use just some columns
data <- mtcars[, c("mpg", "cyl", "disp", "hp", "drat", "wt")]
col10 <- names(data)[-1]

lm.test <- vector("list", length(col10))

for(i in seq_along(col10)){
lm.test[[i]] <- lm(reformulate(col10[i], "mpg"), data = data)
}

lm.test

Now you can use the results list for all sorts of things. I suggest you start using lapply and friends for that.

For instance, to extract the coefficients:

cfs <- lapply(lm.test, coef)

In order to get the summaries:

smry <- lapply(lm.test, summary)

It becomes very simple once you're familiar with *apply functions.

How to use purrr to iterate over every combo of covariates and outcomes in lm reg

There are many ways to do this in the larger tidyverse. I am a fan of dplyr::rowwise for this kind of calculations. We can use the colnames instead of the actual data and then create a matrix like tibble with tidyr::expand_grid which contains all combinations of outcomes and covars. Then we can use dplyr::rowwise and use lm inside list() together with reformulate which takes strings as inputs. To get the result we can use broom::tidy.

library(dplyr)
library(purrr)
library(tidyr)

dataset <- tibble(
y1=rnorm(n=100),
y2=rnorm(n=100),
x1=rnorm(n=100),
x2=rnorm(n=100))

outcomes <- dataset %>%
select(y1,y2) %>% colnames

covars <- dataset %>%
select(x1,x2) %>% colnames

paramlist <- expand_grid(outcomes, covars)

paramlist %>%
rowwise %>%
mutate(mod = list(lm(reformulate(outcomes, covars), data = dataset)),
res = list(broom::tidy(mod)))

#> # A tibble: 4 x 4
#> # Rowwise:
#> outcomes covars mod res
#> <chr> <chr> <list> <list>
#> 1 y1 x1 <lm> <tibble [2 x 5]>
#> 2 y1 x2 <lm> <tibble [2 x 5]>
#> 3 y2 x1 <lm> <tibble [2 x 5]>
#> 4 y2 x2 <lm> <tibble [2 x 5]>

Created on 2021-09-06 by the reprex package (v2.0.1)

We can do the same thing with {purrr} instead of dplyr::rowwise:

paramlist %>% 
mutate(mod = map2(outcomes, covars, ~ lm(reformulate(.y, .x), data = dataset)),
res = map(mod, broom::tidy))

#> # A tibble: 4 x 4
#> outcomes covars mod res
#> <chr> <chr> <list> <list>
#> 1 y1 x1 <lm> <tibble [2 x 5]>
#> 2 y1 x2 <lm> <tibble [2 x 5]>
#> 3 y2 x1 <lm> <tibble [2 x 5]>
#> 4 y2 x2 <lm> <tibble [2 x 5]>

Created on 2021-09-06 by the reprex package (v2.0.1)

Another pure {purrr} solution is to use a nested map call. Since it is nested we need to flatten the results before we can use map(summary) on them.

# outcomes and covars are the same strings as above

outcomes %>%
map(~ map(covars, function(.y) lm(reformulate(.y, .x), data = dataset))) %>%
flatten %>%
map(summary)

#> [[1]]
#>
#> Call:
#> lm(formula = reformulate(.y, .x), data = dataset)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.20892 -0.56744 -0.08498 0.55445 2.10146
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.0009328 0.0923062 -0.010 0.992
#> x1 -0.0809739 0.0932059 -0.869 0.387
#>
#> Residual standard error: 0.9173 on 98 degrees of freedom
#> Multiple R-squared: 0.007643, Adjusted R-squared: -0.002483
#> F-statistic: 0.7548 on 1 and 98 DF, p-value: 0.3871
#>
#>
#> [[2]]
#>
#> Call:
#> lm(formula = reformulate(.y, .x), data = dataset)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.11442 -0.59186 -0.08153 0.61642 2.10575
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.02048 0.09461 -0.216 0.829
#> x2 -0.05159 0.10805 -0.477 0.634
#>
#> Residual standard error: 0.9197 on 98 degrees of freedom
#> Multiple R-squared: 0.002321, Adjusted R-squared: -0.007859
#> F-statistic: 0.228 on 1 and 98 DF, p-value: 0.6341
#>
#>
#> [[3]]
#>
#> Call:
#> lm(formula = reformulate(.y, .x), data = dataset)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.3535 -0.7389 -0.2023 0.6236 3.8627
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.08178 0.10659 -0.767 0.445
#> x1 -0.08476 0.10763 -0.788 0.433
#>
#> Residual standard error: 1.059 on 98 degrees of freedom
#> Multiple R-squared: 0.006289, Adjusted R-squared: -0.003851
#> F-statistic: 0.6202 on 1 and 98 DF, p-value: 0.4329
#>
#>
#> [[4]]
#>
#> Call:
#> lm(formula = reformulate(.y, .x), data = dataset)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.4867 -0.7020 -0.1935 0.5869 3.7574
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.06575 0.10875 -0.605 0.547
#> x2 0.12388 0.12420 0.997 0.321
#>
#> Residual standard error: 1.057 on 98 degrees of freedom
#> Multiple R-squared: 0.01005, Adjusted R-squared: -5.162e-05
#> F-statistic: 0.9949 on 1 and 98 DF, p-value: 0.321

Created on 2021-09-06 by the reprex package (v2.0.1)

How to iterate through independent variables, performing multiple linear regressions using the tidyverse framework?

For your problem you can gather() in order to utilize group_split() on all the different kinds of variants. From that point we can iterate over each split data.frame and run the linear model. Inside of the map() we'll broom::tidy() the data and add a column to distinguish the estimates for each model. I used map_df() to end up with a single dataframe but you can also just use map() to end up with a list of data.frames.

library(tidyverse)
library(lme4)
library(broom)

dat <- tribble(~Subjects, ~Gene1, ~Variant1, ~Variant2, ~Variant3, ~Variant4, ~Age, ~Sex, ~RIN, ~ABG,
1, -0.82525943, 2, 2, 2, 0, 87, 1, 6, "F2",
2, -0.20163395, 1, 2, 1, 2, 64 , 0 , 7, "F3",
3, -0.58179068, 0, 0, 1, 2, 60 , 1 , 6, "F4",
4, 0.31863009, 1, 0, 1, 1, 81 , 0 , 6, "F3",
5, 0.71453271, 1, 2, 0, 1, 85 , 0 , 8, "F8",
6, 0.08988614, 1, 2, 1, 1, 78 , 1 , 7, "F10",
7, 0.12337950, 0, 0, 1, 1, 75 , 1 , 6, "F1",
8, 0.73984050, 2, 2, 2, 1, 90 , 1 , 7, "F6",
9, -0.35986213, 2, 0, 0, 0, 76 , 0 , 8, "F5",
10, 0.09627446, 0, 2, 1, 0, 88, 0, 8, "F2")

dat %>%
gather(key = "variants", value = "var_value", Variant1:Variant4) %>%
group_split(variants) %>%
map_df(~lmer(Gene1~var_value+Age+Sex+RIN+(1|ABG), data = .x) %>%
tidy() %>%
mutate(variant_group = unique(.x$variants)))
#> # A tibble: 28 x 6
#> term estimate std.error statistic group variant_group
#> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 (Intercept) -3.91 1.96 -2.00 fixed Variant1
#> 2 var_value -0.280 0.120 -2.34 fixed Variant1
#> 3 Age 0.0401 0.00905 4.43 fixed Variant1
#> 4 Sex 0.00107 0.391 0.00274 fixed Variant1
#> 5 RIN 0.161 0.154 1.05 fixed Variant1
#> 6 sd_(Intercept).ABG 0.462 NA NA ABG Variant1
#> 7 sd_Observation.Resid… 0.00234 NA NA Residu… Variant1
#> 8 (Intercept) -3.60 2.74 -1.31 fixed Variant2
#> 9 var_value -0.0625 0.202 -0.309 fixed Variant2
#> 10 Age 0.0295 0.0175 1.69 fixed Variant2
#> # … with 18 more rows

Created on 2019-02-23 by the reprex package (v0.2.1)

how to use loop to do linear regression in R

If you really want to do this, it's pretty trivial with lapply(), where we use it to "loop" over the other columns of df. A custom function takes each variable in turn as x and fits a model for that covariate.

df <- data.frame(crim = rnorm(20), rm = rnorm(20), ad = rnorm(20), wd = rnorm(20))

mods <- lapply(df[, -1], function(x, dat) lm(crim ~ x, data = dat))

mods is now a list of lm objects. The names of mods contains the names of the covariate used to fit the model. The main negative of this is that all the models are fitted using a variable x. More effort could probably solve this, but I doubt that effort is worth the time.

If you are just selecting models, which may be dubious, there are other ways to achieve this. For example via the leaps package and its regsubsets function:

library("leapls")
a <- regsubsets(crim ~ ., data = df, nvmax = 1, nbest = ncol(df) - 1)
summa <- summary(a)

Then plot(a) will show which of the models is "best", for example.

Original

If I understand what you want (crim is a covariate and the other variables are the responses you want to predict/model using crim), then you don't need a loop. You can do this using a matrix response in a standard lm().

Using some dummy data:

df <- data.frame(crim = rnorm(20), rm = rnorm(20), ad = rnorm(20), wd = rnorm(20))

we create a matrix or multivariate response via cbind(), passing it the three response variables we're interested in. The remaining parts of the call to lm are entirely the same as for a univariate response:

mods <- lm(cbind(rm, ad, wd) ~ crim, data = df)
mods

> mods

Call:
lm(formula = cbind(rm, ad, wd) ~ crim, data = df)

Coefficients:
rm ad wd
(Intercept) -0.12026 -0.47653 -0.26419
crim -0.26548 0.07145 0.68426

The summary() method produces a standard summary.lm output for each of the responses.



Related Topics



Leave a reply



Submit