How to Run Lm Regression for Every Column in R

how to run lm regression for every column in R

Your code looks fine except when you call i within lm, R will read i as a string, which you can't regress things against. Using get will allow you to pull the column corresponding to i.

df=data.frame(x=rnorm(100),y1=rnorm(100),y2=rnorm(100),y3=rnorm(100))

storage <- list()
for(i in names(df)[-1]){
storage[[i]] <- lm(get(i) ~ x, df)
}

I create an empty list storage, which I'm going to fill up with each iteration of the loop. It's just a personal preference but I'd also advise against how you've written your current loop:

 for(i in names(df[,-1])){
model = lm(i~x, data=df)
}

You will overwrite model, thus returning only the last iteration results. I suggest you change it to a list, or a matrix where you can iteratively store results.

Hope that helps

How can I operate linear regression for each column in a dataframe

You can create a function that fits a linear model of the other variables in terms of samples.L and samples.T.

lm_func <- function(y) lm(y ~ samples.L + samples.T, data = data)

You can then use lapply() to apply this function to each of the desired columns.

lapply(data[,3:6], lm_func)

Additionally, you can use the tidyverse packages with the broom package to simplify your outputs.

library(tidyverse)
library(broom)
map_dfr(data[,3:6], function(x) summary(lm_func(x)) %>% glance())
map_dfr(data[,3:6], function(x) summary(lm_func(x)) %>% tidy())

Better yet, you can do the following.

fit <- lm(cbind(le.1, le.2, le.3, le.4) ~ samples.L + samples.T, data = data)
summary(fit) %>% map_dfr(glance)
summary(fit) %>% map_dfr(tidy)

Creating a linear regression model for each group in a column

You have some mistakes in the syntax of your functions. Functions are usually written as function(x), and then you substitute the x with the data you want to use it with.

For example, in the linear_model function you defined, if you were to use it alone you would write:

linear_model(data)

However, because you are using it inside the lapply function it is a bit more tricky to see. Lapply is just making a loop and applying the linear_model function to each of the data frames you obtain from split(table2,table2$LOCATION).

The same thing happens with my_predict.

Anyway, this should work for you:

linear_model <- function(x) lm(Education ~ TIME, x)

m <- lapply(split(table2,table2$LOCATION),linear_model)

new_df <- data.frame(TIME=c(2019))

my_predict <- function(x) predict(x,new_df)

sapply(m,my_predict)

ANSWER TO THE EDIT

There are probably more efficient ways of looping the prediction, but here is my approach:

pred_data <- list()

for (i in 3:6){
linear_model <- function(x) lm(x[,i] ~ TIME, x)
m <- lapply(split(tableLinR,tableLinR$LOCATION),linear_model)
new_df <- data.frame(TIME=c(2020, 2021), row.names = c("2020", "2021"))
my_predict <- function(x) predict(x,new_df)
pred_data[[colnames(tableLinR)[i]]] <- sapply(m,my_predict)
}

pred_data <- melt(pred_data)
pred_data <- as.data.frame(pivot_wider(pred_data, names_from = L1, values_from = value))

First you create an empty list where you will be saving the outputs of your loop. In for (i in 3:4) you put the interval of columns you want a prediction from. The result pred_data is a list that you can transform into a data frame in different ways. With melt and pivot_wider you obtain a format similar to your original data.

How can I perform and store linear regression models between all continuous variables in a data frame?

Assuming you need pairwise comparisons between all columns of mtcars, you can use combn() function to find all pairwise comparisons (2), and perform all linear models with:

combinations <- combn(colnames(mtcars), 2)

forward <- list()
reverse <- list()

for(i in 1:ncol(combinations)){
forward[[i]] <- lm(formula(paste0(combinations[,i][1], "~", combinations[,i][2])), data = mtcars)

reverse[[i]] <- lm(formula(paste0(combinations[,i][2], "~", combinations[,i][1])), data = mtcars)
}

all <- c(forward, reverse)

all will be your list with all of the linear models together, with both forward and reverse directions of associations between the two variables.

If you want combinations between three variables, you can do combn(colnames(mtcars), 3), and so on.

How to perform linear regression over a dataframe for each row

Updated to extract r-squared, and to forego the use of broom::tidy

dat %>% 
mutate(id=row_number()) %>%
pivot_longer(starts_with("Year")) %>%
group_by(id) %>%
mutate(x=c(1,2,3,4)) %>%
nest() %>%
mutate(model = map(data, ~ lm(value ~ x, data = .)),
result = map(model, function(x) list(intercept= x$coef[1],
slope = x$coef[2],
rsq = summary(x)$r.squared))) %>%
unnest_wider(result)

Output:

      id data             model  intercept   slope   rsq
<int> <list> <list> <dbl> <dbl> <dbl>
1 1 <tibble [4 x 4]> <lm> 1.65 0.28 0.956
2 2 <tibble [4 x 4]> <lm> 0.550 0.33 0.995
3 3 <tibble [4 x 4]> <lm> 0.550 2.02 0.971
4 4 <tibble [4 x 4]> <lm> 1.9 0.18 0.953
5 5 <tibble [4 x 4]> <lm> 1.81 0.0550 0.953
6 6 <tibble [4 x 4]> <lm> 1.1 0.145 0.940
7 7 <tibble [4 x 4]> <lm> 3.25 0.200 0.952
8 8 <tibble [4 x 4]> <lm> 3.40 0.99 0.975
9 9 <tibble [4 x 4]> <lm> 3.5 0.23 0.92
10 10 <tibble [4 x 4]> <lm> 20.7 -4 0.373

Prior Answer

You can use tidyverse and broom

library(tidyverse)
library(broom)
dat %>%
mutate(id=row_number()) %>%
pivot_longer(starts_with("Year")) %>%
group_by(id) %>%
mutate(x=c(1,2,3,4)) %>%
nest() %>%
mutate(model = map(data, ~ lm(value ~ x, data = .)),
tidied = map(model, tidy)) %>%
unnest(tidied)

Output:

      id data             model  term        estimate std.error statistic  p.value
<int> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 <tibble [4 x 4]> <lm> (Intercept) 1.65 0.116 14.2 0.00492
2 1 <tibble [4 x 4]> <lm> x 0.28 0.0424 6.60 0.0222
3 2 <tibble [4 x 4]> <lm> (Intercept) 0.550 0.0474 11.6 0.00736
4 2 <tibble [4 x 4]> <lm> x 0.33 0.0173 19.1 0.00274
5 3 <tibble [4 x 4]> <lm> (Intercept) 0.550 0.681 0.808 0.504
6 3 <tibble [4 x 4]> <lm> x 2.02 0.249 8.13 0.0148
7 4 <tibble [4 x 4]> <lm> (Intercept) 1.9 0.0775 24.5 0.00166
8 4 <tibble [4 x 4]> <lm> x 0.18 0.0283 6.36 0.0238
9 5 <tibble [4 x 4]> <lm> (Intercept) 1.81 0.0237 76.1 0.000173
10 5 <tibble [4 x 4]> <lm> x 0.0550 0.00866 6.35 0.0239
11 6 <tibble [4 x 4]> <lm> (Intercept) 1.1 0.0712 15.5 0.00416
12 6 <tibble [4 x 4]> <lm> x 0.145 0.0260 5.58 0.0306
13 7 <tibble [4 x 4]> <lm> (Intercept) 3.25 0.0866 37.5 0.000709
14 7 <tibble [4 x 4]> <lm> x 0.200 0.0316 6.32 0.0241
15 8 <tibble [4 x 4]> <lm> (Intercept) 3.40 0.309 11.0 0.00814
16 8 <tibble [4 x 4]> <lm> x 0.99 0.113 8.78 0.0127
17 9 <tibble [4 x 4]> <lm> (Intercept) 3.5 0.131 26.6 0.00141
18 9 <tibble [4 x 4]> <lm> x 0.23 0.0480 4.80 0.0408
19 10 <tibble [4 x 4]> <lm> (Intercept) 20.7 10.0 2.06 0.176
20 10 <tibble [4 x 4]> <lm> x -4 3.67 -1.09 0.389

Input:

structure(list(proteins = c("p1", "p2", "p3", "p4", "p5", "p6", 
"p7", "p8", "p9", "p10"), Year.1 = c(1.9, 0.9, 2.3, 2.1, 1.85,
1.2, 3.5, 4.2, 3.8, 23), Year.2 = c(2.3, 1.2, 5.2, 2.2, 1.92,
1.45, 3.6, 5.6, 3.9, 4.2), Year.4 = c(2.4, 1.5, 6.2, 2.5, 1.99,
1.55, 3.8, 6.5, 4.1, 6.5), Year.5 = c(2.8, 1.9, 8.7, 2.6, 2.01,
1.65, 4.1, 7.2, 4.5, 8.9)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))

How to run linear regression on all variables when some columns are different classes

You can exclude the offending variables with the subtraction - operator

lm(goal ~ . - var, data = df)

How to perform linear regression for multiple columns and get a dataframe output with: regression equation and r squared value?

Something like this:

library(tidyverse)
library(broom)
df1 %>%
pivot_longer(
cols = starts_with("X")
) %>%
mutate(name = factor(name)) %>%
group_by(name) %>%
group_split() %>%
map_dfr(.f = function(df){
lm(LH27_20822244_U_Stationary ~ value, data = df) %>%
glance() %>%
# tidy() %>%
add_column(name = unique(df$name), .before=1)
})

Using tidy()

  name             term        estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 X20676887_X2LH_S (Intercept) 12.8 2.28 5.62 0.00494
2 X20676887_X2LH_S value 0.393 0.0855 4.59 0.0101
3 X20819831_11LH_S (Intercept) 10.4 3.72 2.79 0.0495
4 X20819831_11LH_S value 0.492 0.142 3.47 0.0256
5 X20822214_X4LH_S (Intercept) 10.5 3.30 3.20 0.0329
6 X20822214_X4LH_S value 0.485 0.126 3.86 0.0182

Using glance()

  name          r.squared adj.r.squared  sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 X20676887_X2~ 0.841 0.801 0.0350 21.1 0.0101 1 12.8 -19.6 -20.3 0.00490 4 6
2 X20819831_11~ 0.751 0.688 0.0438 12.0 0.0256 1 11.5 -17.0 -17.6 0.00766 4 6
3 X20822214_X4~ 0.788 0.735 0.0403 14.9 0.0182 1 12.0 -17.9 -18.6 0.00651 4 6


Related Topics



Leave a reply



Submit