how to run lm regression for every column in R
Your code looks fine except when you call i
within lm
, R will read i
as a string, which you can't regress things against. Using get
will allow you to pull the column corresponding to i
.
df=data.frame(x=rnorm(100),y1=rnorm(100),y2=rnorm(100),y3=rnorm(100))
storage <- list()
for(i in names(df)[-1]){
storage[[i]] <- lm(get(i) ~ x, df)
}
I create an empty list storage
, which I'm going to fill up with each iteration of the loop. It's just a personal preference but I'd also advise against how you've written your current loop:
for(i in names(df[,-1])){
model = lm(i~x, data=df)
}
You will overwrite model
, thus returning only the last iteration results. I suggest you change it to a list, or a matrix where you can iteratively store results.
Hope that helps
How can I operate linear regression for each column in a dataframe
You can create a function that fits a linear model of the other variables in terms of samples.L
and samples.T
.
lm_func <- function(y) lm(y ~ samples.L + samples.T, data = data)
You can then use lapply()
to apply this function to each of the desired columns.
lapply(data[,3:6], lm_func)
Additionally, you can use the tidyverse
packages with the broom
package to simplify your outputs.
library(tidyverse)
library(broom)
map_dfr(data[,3:6], function(x) summary(lm_func(x)) %>% glance())
map_dfr(data[,3:6], function(x) summary(lm_func(x)) %>% tidy())
Better yet, you can do the following.
fit <- lm(cbind(le.1, le.2, le.3, le.4) ~ samples.L + samples.T, data = data)
summary(fit) %>% map_dfr(glance)
summary(fit) %>% map_dfr(tidy)
Creating a linear regression model for each group in a column
You have some mistakes in the syntax of your functions. Functions are usually written as function(x), and then you substitute the x with the data you want to use it with.
For example, in the linear_model
function you defined, if you were to use it alone you would write:
linear_model(data)
However, because you are using it inside the lapply
function it is a bit more tricky to see. Lapply is just making a loop and applying the linear_model
function to each of the data frames you obtain from split(table2,table2$LOCATION)
.
The same thing happens with my_predict
.
Anyway, this should work for you:
linear_model <- function(x) lm(Education ~ TIME, x)
m <- lapply(split(table2,table2$LOCATION),linear_model)
new_df <- data.frame(TIME=c(2019))
my_predict <- function(x) predict(x,new_df)
sapply(m,my_predict)
ANSWER TO THE EDIT
There are probably more efficient ways of looping the prediction, but here is my approach:
pred_data <- list()
for (i in 3:6){
linear_model <- function(x) lm(x[,i] ~ TIME, x)
m <- lapply(split(tableLinR,tableLinR$LOCATION),linear_model)
new_df <- data.frame(TIME=c(2020, 2021), row.names = c("2020", "2021"))
my_predict <- function(x) predict(x,new_df)
pred_data[[colnames(tableLinR)[i]]] <- sapply(m,my_predict)
}
pred_data <- melt(pred_data)
pred_data <- as.data.frame(pivot_wider(pred_data, names_from = L1, values_from = value))
First you create an empty list where you will be saving the outputs of your loop. In for (i in 3:4)
you put the interval of columns you want a prediction from. The result pred_data
is a list that you can transform into a data frame in different ways. With melt
and pivot_wider
you obtain a format similar to your original data.
How can I perform and store linear regression models between all continuous variables in a data frame?
Assuming you need pairwise comparisons between all columns of mtcars
, you can use combn()
function to find all pairwise comparisons (2), and perform all linear models with:
combinations <- combn(colnames(mtcars), 2)
forward <- list()
reverse <- list()
for(i in 1:ncol(combinations)){
forward[[i]] <- lm(formula(paste0(combinations[,i][1], "~", combinations[,i][2])), data = mtcars)
reverse[[i]] <- lm(formula(paste0(combinations[,i][2], "~", combinations[,i][1])), data = mtcars)
}
all <- c(forward, reverse)
all
will be your list with all of the linear models together, with both forward and reverse directions of associations between the two variables.
If you want combinations between three variables, you can do combn(colnames(mtcars), 3)
, and so on.
How to perform linear regression over a dataframe for each row
Updated to extract r-squared, and to forego the use of broom::tidy
dat %>%
mutate(id=row_number()) %>%
pivot_longer(starts_with("Year")) %>%
group_by(id) %>%
mutate(x=c(1,2,3,4)) %>%
nest() %>%
mutate(model = map(data, ~ lm(value ~ x, data = .)),
result = map(model, function(x) list(intercept= x$coef[1],
slope = x$coef[2],
rsq = summary(x)$r.squared))) %>%
unnest_wider(result)
Output:
id data model intercept slope rsq
<int> <list> <list> <dbl> <dbl> <dbl>
1 1 <tibble [4 x 4]> <lm> 1.65 0.28 0.956
2 2 <tibble [4 x 4]> <lm> 0.550 0.33 0.995
3 3 <tibble [4 x 4]> <lm> 0.550 2.02 0.971
4 4 <tibble [4 x 4]> <lm> 1.9 0.18 0.953
5 5 <tibble [4 x 4]> <lm> 1.81 0.0550 0.953
6 6 <tibble [4 x 4]> <lm> 1.1 0.145 0.940
7 7 <tibble [4 x 4]> <lm> 3.25 0.200 0.952
8 8 <tibble [4 x 4]> <lm> 3.40 0.99 0.975
9 9 <tibble [4 x 4]> <lm> 3.5 0.23 0.92
10 10 <tibble [4 x 4]> <lm> 20.7 -4 0.373
Prior Answer
You can use tidyverse and broom
library(tidyverse)
library(broom)
dat %>%
mutate(id=row_number()) %>%
pivot_longer(starts_with("Year")) %>%
group_by(id) %>%
mutate(x=c(1,2,3,4)) %>%
nest() %>%
mutate(model = map(data, ~ lm(value ~ x, data = .)),
tidied = map(model, tidy)) %>%
unnest(tidied)
Output:
id data model term estimate std.error statistic p.value
<int> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 <tibble [4 x 4]> <lm> (Intercept) 1.65 0.116 14.2 0.00492
2 1 <tibble [4 x 4]> <lm> x 0.28 0.0424 6.60 0.0222
3 2 <tibble [4 x 4]> <lm> (Intercept) 0.550 0.0474 11.6 0.00736
4 2 <tibble [4 x 4]> <lm> x 0.33 0.0173 19.1 0.00274
5 3 <tibble [4 x 4]> <lm> (Intercept) 0.550 0.681 0.808 0.504
6 3 <tibble [4 x 4]> <lm> x 2.02 0.249 8.13 0.0148
7 4 <tibble [4 x 4]> <lm> (Intercept) 1.9 0.0775 24.5 0.00166
8 4 <tibble [4 x 4]> <lm> x 0.18 0.0283 6.36 0.0238
9 5 <tibble [4 x 4]> <lm> (Intercept) 1.81 0.0237 76.1 0.000173
10 5 <tibble [4 x 4]> <lm> x 0.0550 0.00866 6.35 0.0239
11 6 <tibble [4 x 4]> <lm> (Intercept) 1.1 0.0712 15.5 0.00416
12 6 <tibble [4 x 4]> <lm> x 0.145 0.0260 5.58 0.0306
13 7 <tibble [4 x 4]> <lm> (Intercept) 3.25 0.0866 37.5 0.000709
14 7 <tibble [4 x 4]> <lm> x 0.200 0.0316 6.32 0.0241
15 8 <tibble [4 x 4]> <lm> (Intercept) 3.40 0.309 11.0 0.00814
16 8 <tibble [4 x 4]> <lm> x 0.99 0.113 8.78 0.0127
17 9 <tibble [4 x 4]> <lm> (Intercept) 3.5 0.131 26.6 0.00141
18 9 <tibble [4 x 4]> <lm> x 0.23 0.0480 4.80 0.0408
19 10 <tibble [4 x 4]> <lm> (Intercept) 20.7 10.0 2.06 0.176
20 10 <tibble [4 x 4]> <lm> x -4 3.67 -1.09 0.389
Input:
structure(list(proteins = c("p1", "p2", "p3", "p4", "p5", "p6",
"p7", "p8", "p9", "p10"), Year.1 = c(1.9, 0.9, 2.3, 2.1, 1.85,
1.2, 3.5, 4.2, 3.8, 23), Year.2 = c(2.3, 1.2, 5.2, 2.2, 1.92,
1.45, 3.6, 5.6, 3.9, 4.2), Year.4 = c(2.4, 1.5, 6.2, 2.5, 1.99,
1.55, 3.8, 6.5, 4.1, 6.5), Year.5 = c(2.8, 1.9, 8.7, 2.6, 2.01,
1.65, 4.1, 7.2, 4.5, 8.9)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L))
How to run linear regression on all variables when some columns are different classes
You can exclude the offending variables with the subtraction -
operator
lm(goal ~ . - var, data = df)
How to perform linear regression for multiple columns and get a dataframe output with: regression equation and r squared value?
Something like this:
library(tidyverse)
library(broom)
df1 %>%
pivot_longer(
cols = starts_with("X")
) %>%
mutate(name = factor(name)) %>%
group_by(name) %>%
group_split() %>%
map_dfr(.f = function(df){
lm(LH27_20822244_U_Stationary ~ value, data = df) %>%
glance() %>%
# tidy() %>%
add_column(name = unique(df$name), .before=1)
})
Using tidy()
name term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 X20676887_X2LH_S (Intercept) 12.8 2.28 5.62 0.00494
2 X20676887_X2LH_S value 0.393 0.0855 4.59 0.0101
3 X20819831_11LH_S (Intercept) 10.4 3.72 2.79 0.0495
4 X20819831_11LH_S value 0.492 0.142 3.47 0.0256
5 X20822214_X4LH_S (Intercept) 10.5 3.30 3.20 0.0329
6 X20822214_X4LH_S value 0.485 0.126 3.86 0.0182
Using glance()
name r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 X20676887_X2~ 0.841 0.801 0.0350 21.1 0.0101 1 12.8 -19.6 -20.3 0.00490 4 6
2 X20819831_11~ 0.751 0.688 0.0438 12.0 0.0256 1 11.5 -17.0 -17.6 0.00766 4 6
3 X20822214_X4~ 0.788 0.735 0.0403 14.9 0.0182 1 12.0 -17.9 -18.6 0.00651 4 6
Related Topics
Linear Model and Dplyr - a Better Solution
Combine Separate Year and Month Columns into Single Date Column
Apply a Function to Groups Within a Data.Frame in R
Load a Small Random Sample from a Large CSV File into R Data Frame
How to Run an 'R' Script Without Suppressing Output
Applying the Same Factor Levels to Multiple Variables in an R Data Frame
Read CSV File Hosted on Google Drive
How to Automate Multiple Requests to a Web Search Form Using R
Get Selected Rows of Rhandsontable
Clustering Very Large Dataset in R
How to Create a Pivot Table in R with Multiple (3+) Variables
Inserting an Image to Ggplot Outside the Chart Area
Ggmap with Geom_Map Superimposed
How to Expand an Ellipsis (...) Argument Without Evaluating It in R