Pull Out P-Values and R-Squared from a Linear Regression

pull out p-values and r-squared from a linear regression

r-squared: You can return the r-squared value directly from the summary object summary(fit)$r.squared. See names(summary(fit)) for a list of all the items you can extract directly.

Model p-value: If you want to obtain the p-value of the overall regression model,
this blog post outlines a function to return the p-value:

lmp <- function (modelobject) {
if (class(modelobject) != "lm") stop("Not an object of class 'lm' ")
f <- summary(modelobject)$fstatistic
p <- pf(f[1],f[2],f[3],lower.tail=F)
attributes(p) <- NULL
return(p)
}

> lmp(fit)
[1] 1.622665e-05

In the case of a simple regression with one predictor, the model p-value and the p-value for the coefficient will be the same.

Coefficient p-values: If you have more than one predictor, then the above will return the model p-value, and the p-value for coefficients can be extracted using:

summary(fit)$coefficients[,4]  

Alternatively, you can grab the p-value of coefficients from the anova(fit) object in a similar fashion to the summary object above.

Extracting final p-value from output of regression (lm) in R

you can see the code that is used to print the summary by typing

class(sumres)
#> "summary.lm"

to get the class, and then get the code for the print method by typing

stats:::print.summary.lm

into the console which includes these lines:

 cat(...lots of stuff..., "p-value:", format.pval(pf(x$fstatistic[1L], 
x$fstatistic[2L], x$fstatistic[3L], lower.tail = FALSE),
digits = digits)...morestuff...)

so in this case, you want:

pf(sumres$fstatistic[1L], sumres$fstatistic[2L], sumres$fstatistic[3L], lower.tail = FALSE)

Extract lists of p-values for each regression coefficients (1104 linear regressions) with R

Here's a tidyverse solution in multiple parts, hopefully easier to read that way :-) I used mtcars as a play dataset with mpg as the invariant independent variable

library(dplyr)
library(purrr)
library(broom)
library(tibble)

# first key change is let `broom::tidy` do the hard work

test2 <- lapply(2:10, function(i) broom::tidy(lm(mtcars[,i] ~ mtcars[,"mpg"])))
names(test2) <- names(mtcars[2:10])
basic_information <-
map2_df(test2,
names(test2),
~ mutate(.x, which_dependent = .y)) %>%
mutate(term = ifelse(term == "(Intercept)", "Intercept", "mpg")) %>%
select(which_dependent, everything())

basic_information
#> # A tibble: 18 x 6
#> which_dependent term estimate std.error statistic p.value
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 cyl Intercept 11.3 0.593 19.0 2.87e-18
#> 2 cyl mpg -0.253 0.0283 -8.92 6.11e-10
#> 3 disp Intercept 581. 41.7 13.9 1.26e-14
#> 4 disp mpg -17.4 1.99 -8.75 9.38e-10
#> 5 hp Intercept 324. 27.4 11.8 8.25e-13
#> 6 hp mpg -8.83 1.31 -6.74 1.79e- 7
#> 7 drat Intercept 2.38 0.248 9.59 1.20e-10
#> 8 drat mpg 0.0604 0.0119 5.10 1.78e- 5
#> 9 wt Intercept 6.05 0.309 19.6 1.20e-18
#> 10 wt mpg -0.141 0.0147 -9.56 1.29e-10
#> 11 qsec Intercept 15.4 1.03 14.9 2.05e-15
#> 12 qsec mpg 0.124 0.0492 2.53 1.71e- 2
#> 13 vs Intercept -0.678 0.239 -2.84 8.11e- 3
#> 14 vs mpg 0.0555 0.0114 4.86 3.42e- 5
#> 15 am Intercept -0.591 0.253 -2.33 2.64e- 2
#> 16 am mpg 0.0497 0.0121 4.11 2.85e- 4
#> 17 gear Intercept 2.51 0.411 6.10 1.05e- 6
#> 18 gear mpg 0.0588 0.0196 3.00 5.40e- 3

Just to change things up a bit... we'll use map to construct formula

y <- 'mpg'
x <- names(mtcars[2:10])

models <- map(setNames(x, x),
~ lm(as.formula(paste(.x, y, sep="~")),
data=mtcars))

pvalues <-
data.frame(rsquared = unlist(map(models, ~ summary(.)$r.squared)),
RSE = unlist(map(models, ~ summary(.)$sigma))) %>%
rownames_to_column(var = "which_dependent")

results <- full_join(basic_information, pvalues)

#> Joining, by = "which_dependent"
results
# A tibble: 18 x 8
which_dependent term estimate std.error statistic p.value rsquared RSE
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 cyl Intercept 11.3 0.593 19.0 2.87e-18 0.726 0.950
2 cyl mpg -0.253 0.0283 -8.92 6.11e-10 0.726 0.950
3 disp Intercept 581. 41.7 13.9 1.26e-14 0.718 66.9
4 disp mpg -17.4 1.99 -8.75 9.38e-10 0.718 66.9
5 hp Intercept 324. 27.4 11.8 8.25e-13 0.602 43.9
6 hp mpg -8.83 1.31 -6.74 1.79e- 7 0.602 43.9
7 drat Intercept 2.38 0.248 9.59 1.20e-10 0.464 0.398
8 drat mpg 0.0604 0.0119 5.10 1.78e- 5 0.464 0.398
9 wt Intercept 6.05 0.309 19.6 1.20e-18 0.753 0.494
10 wt mpg -0.141 0.0147 -9.56 1.29e-10 0.753 0.494
11 qsec Intercept 15.4 1.03 14.9 2.05e-15 0.175 1.65
12 qsec mpg 0.124 0.0492 2.53 1.71e- 2 0.175 1.65
13 vs Intercept -0.678 0.239 -2.84 8.11e- 3 0.441 0.383
14 vs mpg 0.0555 0.0114 4.86 3.42e- 5 0.441 0.383
15 am Intercept -0.591 0.253 -2.33 2.64e- 2 0.360 0.406
16 am mpg 0.0497 0.0121 4.11 2.85e- 4 0.360 0.406
17 gear Intercept 2.51 0.411 6.10 1.05e- 6 0.231 0.658
18 gear mpg 0.0588 0.0196 3.00 5.40e- 3 0.231 0.658

extracting p values from multiple linear regression (lm) inside of a ddply function using spatial data

I would suggest a combination of packages dplyr and broom.
This approach will return everything you need to know about the model (info about coefficients and info about model itself) as a dataframe:

set.seed(25)

df <- data.frame(
id = rep(1:2, 2),
x = rep(c(25, 30),10),
y = rep(c(100, 200), 10),
year = rep(1980:1989, 2),
response = rnorm(20)
)

df

# id x y year response
# 1 1 25 100 1980 -0.21183360
# 2 2 30 200 1981 -1.04159113
# 3 1 25 100 1982 -1.15330756
# 4 2 30 200 1983 0.32153150
# 5 1 25 100 1984 -1.50012988
# 6 2 30 200 1985 -0.44553326
# 7 1 25 100 1986 1.73404543
# 8 2 30 200 1987 0.51129562
# 9 1 25 100 1988 0.09964504
# 10 2 30 200 1989 -0.05789111
# 11 1 25 100 1980 -1.74278763
# 12 2 30 200 1981 -1.32495298
# 13 1 25 100 1982 -0.54793388
# 14 2 30 200 1983 -1.45638428
# 15 1 25 100 1984 0.08268682
# 16 2 30 200 1985 0.92757895
# 17 1 25 100 1986 -0.71676933
# 18 2 30 200 1987 0.96239968
# 19 1 25 100 1988 1.54588458
# 20 2 30 200 1989 -1.00976361



library(dplyr)
library(broom)


df %>%
group_by(id) %>%
do({model = lm(response~year, data=.) # create your model
data.frame(tidy(model), # get coefficient info
glance(model))}) # get model info


# id term estimate std.error statistic p.value r.squared adj.r.squared sigma statistic.1 p.value.1 df logLik AIC BIC deviance df.residual
# 1 1 (Intercept) -492.2144842 213.19252113 -2.308779 0.04978386 0.3996362 0.32459069 0.9611139 5.325253 0.04987162 2 -12.67704 31.35409 32.26184 7.389919 8
# 2 1 year 0.2479705 0.10745580 2.307651 0.04987162 0.3996362 0.32459069 0.9611139 5.325253 0.04987162 2 -12.67704 31.35409 32.26184 7.389919 8
# 3 2 (Intercept) -258.6253012 196.88284243 -1.313600 0.22539989 0.1771294 0.07427055 0.8871395 1.722063 0.22582607 2 -11.87614 29.75227 30.66003 6.296132 8
# 4 2 year 0.1301582 0.09918521 1.312274 0.22582607 0.1771294 0.07427055 0.8871395 1.722063 0.22582607 2 -11.87614 29.75227 30.66003 6.296132 8

You can use group_by(id,x,y) if you really want your x and y columns in your final dataframe. For example if you want a similar output to the one you provided you can do :

library(dplyr)
library(broom)
library(tidyr)

dd =
df %>%
group_by(id, x, y) %>%
do({model = lm(response~year, data=.) # create your model
data.frame(tidy(model), # get coefficient info
glance(model))}) # get model info

Then select the info you need:

 dd %>%
select(id, x, y, term, estimate, adj.r.squared)

# id x y term estimate adj.r.squared
# 1 1 25 100 (Intercept) -492.2144842 0.32459069
# 2 1 25 100 year 0.2479705 0.32459069
# 3 2 30 200 (Intercept) -258.6253012 0.07427055
# 4 2 30 200 year 0.1301582 0.07427055

where you get one row for the intercept and one row for the slope.

Or, even reshape that data frame to:

dd %>%
select(id, x, y, term, estimate, adj.r.squared) %>%
spread(term, estimate)

# id x y adj.r.squared (Intercept) year
# 1 1 25 100 0.32459069 -492.2145 0.2479705
# 2 2 30 200 0.07427055 -258.6253 0.1301582

column year is the slope (coefficient of variable year)

In a similar way you can also use package purrr to create a new data frame that has list columns with the corresponding info:

set.seed(25)

df <- data.frame(
id = rep(1:2, 2),
x = rep(c(25, 30),10),
y = rep(c(100, 200), 10),
year = rep(1980:1989, 2),
response = rnorm(20)
)

library(dplyr)
library(broom)
library(tidyr)
library(purrr)

dd = df %>%
group_by(id, x, y) %>% # for each combination of those variables
nest() %>% # nest data (rest of columns)
mutate(Model = map(data, ~lm(response~year, data=.)), # use nested data to build model of interest
Coeff_Info = map(Model, tidy), # get coefficient info
Model_Info = map(Model, glance)) %>% # get model info
ungroup() # forget the grouping

# check how new dataset looks like
dd

# # A tibble: 2 x 7
# id x y data Model Coeff_Info Model_Info
# <int> <dbl> <dbl> <list> <list> <list> <list>
# 1 1 25 100 <tibble [10 x 2]> <S3: lm> <data.frame [2 x 5]> <data.frame [1 x 11]>
# 2 2 30 200 <tibble [10 x 2]> <S3: lm> <data.frame [2 x 5]> <data.frame [1 x 11]>

You can still access any element you want, but remember that some of your columns now have list elements:

# get coefficient info for both models
dd$Coeff_Info

# [[1]]
# term estimate std.error statistic p.value
# 1 (Intercept) -492.2144842 213.1925211 -2.308779 0.04978386
# 2 year 0.2479705 0.1074558 2.307651 0.04987162
#
# [[2]]
# term estimate std.error statistic p.value
# 1 (Intercept) -258.6253012 196.88284243 -1.313600 0.2253999
# 2 year 0.1301582 0.09918521 1.312274 0.2258261


# get the r squared value for 1st model
dd %>% # from new dataset
filter(id == 1) %>% # keep all info / rows of 1st model
pull(Model_Info) %>% # get model info
map_dbl("r.squared") # show r squared

# [1] 0.3996362

Or, alternatively

dd %>%                        # from new dataset
unnest(id, Model_Info) %>% # umnest id and model info
filter(id == 1) %>% # keep row of 1st model
pull(r.squared) # show the r squared

# [1] 0.3996362

Extracting t-stat p values from lm in R

You could try this:

   summary(fit)


Related Topics



Leave a reply



Submit