Linear Regression Loop for Each Independent Variable Individually Against Dependent

Linear Regression loop for each independent variable individually against dependent

Hi try something like that :

models <- lapply(paste("mpg", names(mtcars)[-1], sep = "~"), formula)
res.models <- lapply(models, FUN = function(x) {summary(lm(formula = x, data = mtcars))})
names(res.models) <- paste("mpg", names(mtcars)[-1], sep = "~")
res.models[["mpg~disp"]]


# Call:
# lm(formula = x, data = mtcars)

# Residuals:
# Min 1Q Median 3Q Max
# -4.8922 -2.2022 -0.9631 1.6272 7.2305

# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 29.599855 1.229720 24.070 < 2e-16 ***
# disp -0.041215 0.004712 -8.747 9.38e-10 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# Residual standard error: 3.251 on 30 degrees of freedom
# Multiple R-squared: 0.7183, Adjusted R-squared: 0.709
# F-statistic: 76.51 on 1 and 30 DF, p-value: 9.38e-10

Linear regression - moving independent and dependent columns for each run

Use rollapply over a column index. List(c(-5, 0)) means use offsets -5 and 0 on each iteration.

library(zoo)

resids <- t(rollapply(1:ncol(mtcars), list(c(-5, 0)),
function(ix) resid(lm(mtcars[, ix]))))

rsquareds <- rollapply(1:ncol(mtcars), list(c(-5, 0)),
function(ix) summary(lm(mtcars[, ix]))$r.squared)

If you meant to reverse which are the dependent and independent variables then use list(c(0, -5)) instead.

R, automated loop of linear regressions using same IVs on different DVs to store coefficients

using base R:

 data("mtcars")
y=c("mpg","drat","qsec")
x=c("cyl+hp","disp")

A=Map(function(i,j)
summary(lm(as.formula(paste0(i,"~",j)),data=mtcars))$coef[,c(1,4)],
rep(y,each=length(x)),x)

B=do.call(cbind.data.frame,
tapply(A,rep(y,each=length(x)),
function(s){a=do.call(rbind,s);a[row.names(a)!="(Intercept)",]}))
B
drat.Estimate drat.Pr(>|t|) mpg.Estimate mpg.Pr(>|t|) qsec.Estimate qsec.Pr(>|t|)
cyl -0.318242238 5.528430e-05 -2.26469360 4.803752e-04 -0.005485698 0.981671077
hp 0.003401029 6.262861e-02 -0.01912170 2.125285e-01 -0.018339365 0.005865329
disp -0.003063904 5.282022e-06 -0.04121512 9.380327e-10 -0.006253039 0.013144036

It is still not clear to me what the third step needs. I hope you can elaborate further. Although I looked at your code and it seems you are looking for the mean of the coefficients, the median of the coefficients etc.. I do not know if you are looking for the mean ,max, etc of the probabilities also,but I just computed them in case you need them:

  C=split(data.frame(t(B)),rep(c("Estimate","Pr(>|t|)"),length(y)))

D=lapply(C,function(f)
matrix(mapply(function(i,j) i(j),
rep(c(mean,median,min,max),each=length(f)),f),length(f)))

cbind(B,do.call(cbind.data.frame,lapply(D,`colnames<-`,c("mean","median","min","max"))))


drat.Estimate drat.Pr(>|t|) mpg.Estimate mpg.Pr(>|t|) qsec.Estimate qsec.Pr(>|t|) Estimate.mean
cyl -0.318242238 5.528430e-05 -2.26469360 4.803752e-04 -0.005485698 0.981671077 -0.86280718
hp 0.003401029 6.262861e-02 -0.01912170 2.125285e-01 -0.018339365 0.005865329 -0.01135334
disp -0.003063904 5.282022e-06 -0.04121512 9.380327e-10 -0.006253039 0.013144036 -0.01684402
Estimate.median Estimate.min Estimate.max Pr(>|t|).mean Pr(>|t|).median Pr(>|t|).min Pr(>|t|).max
cyl -0.318242238 -2.26469360 -0.005485698 0.327402245 4.803752e-04 5.528430e-05 0.98167108
hp -0.018339365 -0.01912170 0.003401029 0.093674136 6.262861e-02 5.865329e-03 0.21252847
disp -0.006253039 -0.04121512 -0.003063904 0.004383106 5.282022e-06 9.380327e-10 0.01314404

I believe you can transpose this to see it in one screen instead of scrolling left/right.
If this helps let us know. Thank you

How to loop a linear regression over multiple subsets of a factor variable

The problems with the code in the question are:

  1. in R it is normally better not to use loops in the first place
  2. conventionally i is used for a sequential index so it is not a good
    choice of name to use for levels
  3. the body of the loop does not do any subsetting so it will assign the same result on each iteration
  4. posts to SO should have reproducible data and the question did not include that but rather referred to objects without defining their contents. Please read the instructions at the top of the r tag page. Below we have used the built in iris data set for reproducibility.

Here are some approaches using the builtin iris data frame for reproducibility. Each results in a named list where the names are the levels of Species.

1) lm subset argument Map over the levels giving a list:

sublm <- function(x) lm(Petal.Width ~ Sepal.Width, iris, subset = Species == x)
levs <- levels(iris$Species)
Map(sublm, levs)

2) loop sublm and levs are from (1).

L <- list()
for(s in levs) L[[s]] <- sublm(s)

3) nlme or use lmList from nlme

library(nlme)
L3 <- lmList(Petal.Width ~ Sepal.Width | Species, iris)
coef(L3)
summary(L3)

How to Loop/Repeat a Linear Regression in R

You want to run 22,000 linear regressions and extract the coefficients? That's simple to do from a coding standpoint.

set.seed(1)

# number of columns in the Lung and Blood data.frames. 22,000 for you?
n <- 5

# dummy data
obs <- 50 # observations
Lung <- data.frame(matrix(rnorm(obs*n), ncol=n))
Blood <- data.frame(matrix(rnorm(obs*n), ncol=n))
Age <- sample(20:80, obs)
Gender <- factor(rbinom(obs, 1, .5))

# run n regressions
my_lms <- lapply(1:n, function(x) lm(Lung[,x] ~ Blood[,x] + Age + Gender))

# extract just coefficients
sapply(my_lms, coef)

# if you need more info, get full summary call. now you can get whatever, like:
summaries <- lapply(my_lms, summary)
# ...coefficents with p values:
lapply(summaries, function(x) x$coefficients[, c(1,4)])
# ...or r-squared values
sapply(summaries, function(x) c(r_sq = x$r.squared,
adj_r_sq = x$adj.r.squared))

The models are stored in a list, where model 3 (with DV Lung[, 3] and IVs Blood[,3] + Age + Gender) is in my_lms[[3]] and so on. You can use apply functions on the list to perform summaries, from which you can extract the numbers you want.

Many individual dependent variables, code for one by one linear regression

The error occurred because you used the wrong name for one of the variables (there is no frog$MK_RF). The correct call would be

lm(as.matrix(bear) ~ frog$Mkt.RF+frog$SMB+frog$HML)

or

mmod <- lm(as.matrix(bear) ~ Mkt.RF + SMB + HML, data=frog)
summary(mmod)

This gives precisely the same coefficients, standard errors, t-values etc. as if you had looped over the columns in bear individually. Doing it this way has multiple advantages, however.

Try, f.ex.

anova(mmod)
coef(mmod)
residuals(mmod)

Very handy.

R Loop for Variable Names to run linear regression model

Ok, I'll post an answer. I will use the dataset mtcarsas an example. I believe it will work with your dataset.

First, I create a store, lm.test, an object of class list. In your code you are assigning the output of lm(.) every time through the loop and in the end you would only have the last one, all others would have been rewriten by the newer ones.

Then, inside the loop, I use function reformulate to put together the regression formula. There are other ways of doing this but this one is simple.

# Use just some columns
data <- mtcars[, c("mpg", "cyl", "disp", "hp", "drat", "wt")]
col10 <- names(data)[-1]

lm.test <- vector("list", length(col10))

for(i in seq_along(col10)){
lm.test[[i]] <- lm(reformulate(col10[i], "mpg"), data = data)
}

lm.test

Now you can use the results list for all sorts of things. I suggest you start using lapply and friends for that.

For instance, to extract the coefficients:

cfs <- lapply(lm.test, coef)

In order to get the summaries:

smry <- lapply(lm.test, summary)

It becomes very simple once you're familiar with *apply functions.



Related Topics



Leave a reply



Submit