Looping through covariates in regression using R
I would put the results in a list and avoid the for loop
and assign
statements
You can use a combination of reformulate
and update
to create your formula
orig_formula <- MonExp_EGM~ Month2+Month3+Month4+Month5+Month6+Month7+Month8+Month9+
Month10+Month11+Month12+Yrs_minus_2004 + as.factor(LGA)
te_variables <- paste0('TE_', 1:96)
# Or if you don't have a current version of R
# te_variables <- paste('TE', 1:96, sep = '_')
new_formula <- lapply(te_variables, function(x,orig = orig_formula) {
new <- reformulate(c(x,'.'))
update(orig, new)})
## it works!
new_formula[[1]]
## MonExp_EGM ~ TE_1 + Month2 + Month3 + Month4 + Month5 + Month6 +
## Month7 + Month8 + Month9 + Month10 + Month11 + Month12 +
## Yrs_minus_2004 + as.factor(LGA)
new_formula[[2]]
## MonExp_EGM ~ TE_2 + Month2 + Month3 + Month4 + Month5 + Month6 +
## Month7 + Month8 + Month9 + Month10 + Month11 + Month12 +
## Yrs_minus_2004 + as.factor(LGA)
models <- lapply(new_formula, lm, data = pokies)
There should now be 96 elements in the list models
You can name them to reflect your originally planned nnames
names(models) <- paste0('z.out', 1:96)
# or if you don't have a current version of R
# names(models) <-paste('z.out', 1:96 ,sep = '' )
and then access a single model by
models$z.out5
etc
or create summaries of all of the models
summaries <- lapply(models, summary)
etc....
# just the coefficients
coefficients <- lapply(models, coef)
# the table with coefficient estimates and standard.errors
coef_tables <- apply(summaries, '[[', 'coefficients')
How to make loop to contain confounders(covariates) in a binary logistic regression using glm in R?
Use c()
not +
in reformulate
. In both your functions, x
takes the value of column names. I'll use mtcars
column names to illustrate:
## Model1 works
reformulate("hp", response = "mpg")
# mpg ~ hp
## Model2 doesn't work
reformulate("hp" + wt + cyl, response = "mpg")
# Error in reformulate("hp" + wt + cyl, response = "mpg") :
# object 'wt' not found
## Fix it with `c()` and quoted column names
reformulate(c("hp", "wt", "cyl"), response = "mpg")
# mpg ~ hp + wt + cyl
## Showing it works with a mix of variables and quoted column names
x = "hp"
reformulate(c(x, "wt", "cyl"), response = "mpg")
# mpg ~ hp + wt + cyl
So in your Model2
, change to reformulate(c(x, "age", "bmi"), response="outcome")
You have additional problems - you are running Model2
on all columns except for outcome
(setdiff(names(example.data),"outcome")
), but you should also exclude bmi
and age
since they are included inside the function.
How to Loop/Repeat a Linear Regression in R
You want to run 22,000 linear regressions and extract the coefficients? That's simple to do from a coding standpoint.
set.seed(1)
# number of columns in the Lung and Blood data.frames. 22,000 for you?
n <- 5
# dummy data
obs <- 50 # observations
Lung <- data.frame(matrix(rnorm(obs*n), ncol=n))
Blood <- data.frame(matrix(rnorm(obs*n), ncol=n))
Age <- sample(20:80, obs)
Gender <- factor(rbinom(obs, 1, .5))
# run n regressions
my_lms <- lapply(1:n, function(x) lm(Lung[,x] ~ Blood[,x] + Age + Gender))
# extract just coefficients
sapply(my_lms, coef)
# if you need more info, get full summary call. now you can get whatever, like:
summaries <- lapply(my_lms, summary)
# ...coefficents with p values:
lapply(summaries, function(x) x$coefficients[, c(1,4)])
# ...or r-squared values
sapply(summaries, function(x) c(r_sq = x$r.squared,
adj_r_sq = x$adj.r.squared))
The models are stored in a list, where model 3 (with DV Lung[, 3] and IVs Blood[,3] + Age + Gender) is in my_lms[[3]]
and so on. You can use apply functions on the list to perform summaries, from which you can extract the numbers you want.
Looping over combinations of regression model terms
With this dummy data:
dat1 <- data.frame(y = rpois(100,5),
x1 = runif(100),
x2 = runif(100),
x3 = runif(100),
z1 = runif(100),
z2 = runif(100)
)
You could get your list of two lm
objects this way:
lapply(dat1[5:6], function(x) lm(dat1$y ~ dat1$x1 + dat1$x2 + dat1$x3 + x))
Which iterates through those two columns and substitutes them as arguments into the lm
call.
As Alex notes below, it's preferable to pass the names through the formula, rather than the actual data columns as I have done here.
Creating a loop through a list of variables for an LM model in R
You don't even have to use loops. Apply should work nicely.
training_data <- as.data.frame(matrix(sample(1:64), nrow = 8))
colnames(training_data) <- c("independent_variable", paste0("x", 1:7))
Vars <- as.list(c("x1+x2+x3",
"x1+x2+x4",
"x1+x2+x5",
"x1+x2+x6",
"x1+x2+x7"))
allModelsList <- lapply(paste("independent_variable ~", Vars), as.formula)
allModelsResults <- lapply(allModelsList, function(x) lm(x, data = training_data))
If you need models summaries you can add :
allModelsSummaries = lapply(allModelsResults, summary)
For example you can access the coefficient R² of the model lm(independent_variable ~ x1+x2+x3)
by doing this:
allModelsSummaries[[1]]$r.squared
I hope it helps.
R Loop for Variable Names to run linear regression model
Ok, I'll post an answer. I will use the dataset mtcars
as an example. I believe it will work with your dataset.
First, I create a store, lm.test
, an object of class list
. In your code you are assigning the output of lm(.)
every time through the loop and in the end you would only have the last one, all others would have been rewriten by the newer ones.
Then, inside the loop, I use function reformulate
to put together the regression formula. There are other ways of doing this but this one is simple.
# Use just some columns
data <- mtcars[, c("mpg", "cyl", "disp", "hp", "drat", "wt")]
col10 <- names(data)[-1]
lm.test <- vector("list", length(col10))
for(i in seq_along(col10)){
lm.test[[i]] <- lm(reformulate(col10[i], "mpg"), data = data)
}
lm.test
Now you can use the results list for all sorts of things. I suggest you start using lapply
and friends for that.
For instance, to extract the coefficients:
cfs <- lapply(lm.test, coef)
In order to get the summaries:
smry <- lapply(lm.test, summary)
It becomes very simple once you're familiar with *apply
functions.
How to use purrr to iterate over every combo of covariates and outcomes in lm reg
There are many ways to do this in the larger tidyverse. I am a fan of dplyr::rowwise
for this kind of calculations. We can use the colnames
instead of the actual data and then create a matrix like tibble
with tidyr::expand_grid
which contains all combinations of outcomes and covars. Then we can use dplyr::rowwise
and use lm
inside list()
together with reformulate
which takes strings as inputs. To get the result we can use broom::tidy
.
library(dplyr)
library(purrr)
library(tidyr)
dataset <- tibble(
y1=rnorm(n=100),
y2=rnorm(n=100),
x1=rnorm(n=100),
x2=rnorm(n=100))
outcomes <- dataset %>%
select(y1,y2) %>% colnames
covars <- dataset %>%
select(x1,x2) %>% colnames
paramlist <- expand_grid(outcomes, covars)
paramlist %>%
rowwise %>%
mutate(mod = list(lm(reformulate(outcomes, covars), data = dataset)),
res = list(broom::tidy(mod)))
#> # A tibble: 4 x 4
#> # Rowwise:
#> outcomes covars mod res
#> <chr> <chr> <list> <list>
#> 1 y1 x1 <lm> <tibble [2 x 5]>
#> 2 y1 x2 <lm> <tibble [2 x 5]>
#> 3 y2 x1 <lm> <tibble [2 x 5]>
#> 4 y2 x2 <lm> <tibble [2 x 5]>
Created on 2021-09-06 by the reprex package (v2.0.1)
We can do the same thing with {purrr} instead of dplyr::rowwise
:
paramlist %>%
mutate(mod = map2(outcomes, covars, ~ lm(reformulate(.y, .x), data = dataset)),
res = map(mod, broom::tidy))
#> # A tibble: 4 x 4
#> outcomes covars mod res
#> <chr> <chr> <list> <list>
#> 1 y1 x1 <lm> <tibble [2 x 5]>
#> 2 y1 x2 <lm> <tibble [2 x 5]>
#> 3 y2 x1 <lm> <tibble [2 x 5]>
#> 4 y2 x2 <lm> <tibble [2 x 5]>
Created on 2021-09-06 by the reprex package (v2.0.1)
Another pure {purrr} solution is to use a nested map
call. Since it is nested we need to flatten
the results before we can use map(summary)
on them.
# outcomes and covars are the same strings as above
outcomes %>%
map(~ map(covars, function(.y) lm(reformulate(.y, .x), data = dataset))) %>%
flatten %>%
map(summary)
#> [[1]]
#>
#> Call:
#> lm(formula = reformulate(.y, .x), data = dataset)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.20892 -0.56744 -0.08498 0.55445 2.10146
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.0009328 0.0923062 -0.010 0.992
#> x1 -0.0809739 0.0932059 -0.869 0.387
#>
#> Residual standard error: 0.9173 on 98 degrees of freedom
#> Multiple R-squared: 0.007643, Adjusted R-squared: -0.002483
#> F-statistic: 0.7548 on 1 and 98 DF, p-value: 0.3871
#>
#>
#> [[2]]
#>
#> Call:
#> lm(formula = reformulate(.y, .x), data = dataset)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.11442 -0.59186 -0.08153 0.61642 2.10575
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.02048 0.09461 -0.216 0.829
#> x2 -0.05159 0.10805 -0.477 0.634
#>
#> Residual standard error: 0.9197 on 98 degrees of freedom
#> Multiple R-squared: 0.002321, Adjusted R-squared: -0.007859
#> F-statistic: 0.228 on 1 and 98 DF, p-value: 0.6341
#>
#>
#> [[3]]
#>
#> Call:
#> lm(formula = reformulate(.y, .x), data = dataset)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.3535 -0.7389 -0.2023 0.6236 3.8627
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.08178 0.10659 -0.767 0.445
#> x1 -0.08476 0.10763 -0.788 0.433
#>
#> Residual standard error: 1.059 on 98 degrees of freedom
#> Multiple R-squared: 0.006289, Adjusted R-squared: -0.003851
#> F-statistic: 0.6202 on 1 and 98 DF, p-value: 0.4329
#>
#>
#> [[4]]
#>
#> Call:
#> lm(formula = reformulate(.y, .x), data = dataset)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.4867 -0.7020 -0.1935 0.5869 3.7574
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.06575 0.10875 -0.605 0.547
#> x2 0.12388 0.12420 0.997 0.321
#>
#> Residual standard error: 1.057 on 98 degrees of freedom
#> Multiple R-squared: 0.01005, Adjusted R-squared: -5.162e-05
#> F-statistic: 0.9949 on 1 and 98 DF, p-value: 0.321
Created on 2021-09-06 by the reprex package (v2.0.1)
How to iterate through independent variables, performing multiple linear regressions using the tidyverse framework?
For your problem you can gather()
in order to utilize group_split()
on all the different kinds of variants. From that point we can iterate over each split data.frame and run the linear model. Inside of the map()
we'll broom::tidy()
the data and add a column to distinguish the estimates for each model. I used map_df()
to end up with a single dataframe but you can also just use map()
to end up with a list of data.frames.
library(tidyverse)
library(lme4)
library(broom)
dat <- tribble(~Subjects, ~Gene1, ~Variant1, ~Variant2, ~Variant3, ~Variant4, ~Age, ~Sex, ~RIN, ~ABG,
1, -0.82525943, 2, 2, 2, 0, 87, 1, 6, "F2",
2, -0.20163395, 1, 2, 1, 2, 64 , 0 , 7, "F3",
3, -0.58179068, 0, 0, 1, 2, 60 , 1 , 6, "F4",
4, 0.31863009, 1, 0, 1, 1, 81 , 0 , 6, "F3",
5, 0.71453271, 1, 2, 0, 1, 85 , 0 , 8, "F8",
6, 0.08988614, 1, 2, 1, 1, 78 , 1 , 7, "F10",
7, 0.12337950, 0, 0, 1, 1, 75 , 1 , 6, "F1",
8, 0.73984050, 2, 2, 2, 1, 90 , 1 , 7, "F6",
9, -0.35986213, 2, 0, 0, 0, 76 , 0 , 8, "F5",
10, 0.09627446, 0, 2, 1, 0, 88, 0, 8, "F2")
dat %>%
gather(key = "variants", value = "var_value", Variant1:Variant4) %>%
group_split(variants) %>%
map_df(~lmer(Gene1~var_value+Age+Sex+RIN+(1|ABG), data = .x) %>%
tidy() %>%
mutate(variant_group = unique(.x$variants)))
#> # A tibble: 28 x 6
#> term estimate std.error statistic group variant_group
#> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 (Intercept) -3.91 1.96 -2.00 fixed Variant1
#> 2 var_value -0.280 0.120 -2.34 fixed Variant1
#> 3 Age 0.0401 0.00905 4.43 fixed Variant1
#> 4 Sex 0.00107 0.391 0.00274 fixed Variant1
#> 5 RIN 0.161 0.154 1.05 fixed Variant1
#> 6 sd_(Intercept).ABG 0.462 NA NA ABG Variant1
#> 7 sd_Observation.Resid… 0.00234 NA NA Residu… Variant1
#> 8 (Intercept) -3.60 2.74 -1.31 fixed Variant2
#> 9 var_value -0.0625 0.202 -0.309 fixed Variant2
#> 10 Age 0.0295 0.0175 1.69 fixed Variant2
#> # … with 18 more rows
Created on 2019-02-23 by the reprex package (v0.2.1)
how to use loop to do linear regression in R
If you really want to do this, it's pretty trivial with lapply()
, where we use it to "loop" over the other columns of df
. A custom function takes each variable in turn as x
and fits a model for that covariate.
df <- data.frame(crim = rnorm(20), rm = rnorm(20), ad = rnorm(20), wd = rnorm(20))
mods <- lapply(df[, -1], function(x, dat) lm(crim ~ x, data = dat))
mods
is now a list of lm
objects. The names
of mods
contains the names of the covariate used to fit the model. The main negative of this is that all the models are fitted using a variable x
. More effort could probably solve this, but I doubt that effort is worth the time.
If you are just selecting models, which may be dubious, there are other ways to achieve this. For example via the leaps package and its regsubsets
function:
library("leapls")
a <- regsubsets(crim ~ ., data = df, nvmax = 1, nbest = ncol(df) - 1)
summa <- summary(a)
Then plot(a)
will show which of the models is "best", for example.
Original
If I understand what you want (crim
is a covariate and the other variables are the responses you want to predict/model using crim
), then you don't need a loop. You can do this using a matrix response in a standard lm()
.
Using some dummy data:
df <- data.frame(crim = rnorm(20), rm = rnorm(20), ad = rnorm(20), wd = rnorm(20))
we create a matrix or multivariate response via cbind()
, passing it the three response variables we're interested in. The remaining parts of the call to lm
are entirely the same as for a univariate response:
mods <- lm(cbind(rm, ad, wd) ~ crim, data = df)
mods
> mods
Call:
lm(formula = cbind(rm, ad, wd) ~ crim, data = df)
Coefficients:
rm ad wd
(Intercept) -0.12026 -0.47653 -0.26419
crim -0.26548 0.07145 0.68426
The summary()
method produces a standard summary.lm
output for each of the responses.
Related Topics
R: Saving Ggplot2 Plots in a List
Margin Adjustments When Using Ggplot's Geom_Tile()
Does Installing Blas/Atlas/Mkl/Openblas Will Speed Up R Package That Is Written in C/C++
Fastest Way to Find *The Index* of the Second (Third...) Highest/Lowest Value in Vector or Column
R Obtaining Rownames Date Using Quantmod
Generate All Combinations, of All Lengths, in R, from a Vector
Quantmod Omitting Tickers in Getsymbols
How to Add Annotation on Each Facet
R: Faceted Bar Chart with Percentages Labels Independent for Each Plot
\Sexpr{} Special Latex Characters ($, &, %, # etc.) in .Rnw-File
Drawing Non-Intersecting Circles
Sum Non Na Elements Only, But If All Na Then Return Na