Extract regression coefficient values
A summary.lm
object stores these values in a matrix
called 'coefficients'
. So the value you are after can be accessed with:
a2Pval <- summary(mg)$coefficients[2, 4]
Or, more generally/readably, coef(summary(mg))["a2","Pr(>|t|)"]
. See here for why this method is preferred.
How to extract the regression coefficient from statsmodels.api?
You can use the params
property of a fitted model to get the coefficients.
For example, the following code:
import statsmodels.api as sm
import numpy as np
np.random.seed(1)
X = sm.add_constant(np.arange(100))
y = np.dot(X, [1,2]) + np.random.normal(size=100)
result = sm.OLS(y, X).fit()
print(result.params)
will print you a numpy array [ 0.89516052 2.00334187]
- estimates of intercept and slope respectively.
If you want more information, you can use the object result.summary()
that contains 3 detailed tables with model description.
How to extract the coefficients of a linear model and store in a variable in R?
df <- mtcars
fit <- lm(mpg~., data = df)
beta_0 = fit$coefficients[1]
#base R approach
coef_base <- coef(fit)
coef_base
#> (Intercept) cyl disp hp drat wt
#> 12.30337416 -0.11144048 0.01333524 -0.02148212 0.78711097 -3.71530393
#> qsec vs am gear carb
#> 0.82104075 0.31776281 2.52022689 0.65541302 -0.19941925
#tidyverse approach with the broom package
coef_tidy <- broom::tidy(fit)
coef_tidy
#> # A tibble: 11 x 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 12.3 18.7 0.657 0.518
#> 2 cyl -0.111 1.05 -0.107 0.916
#> 3 disp 0.0133 0.0179 0.747 0.463
#> 4 hp -0.0215 0.0218 -0.987 0.335
#> 5 drat 0.787 1.64 0.481 0.635
#> 6 wt -3.72 1.89 -1.96 0.0633
#> 7 qsec 0.821 0.731 1.12 0.274
#> 8 vs 0.318 2.10 0.151 0.881
#> 9 am 2.52 2.06 1.23 0.234
#> 10 gear 0.655 1.49 0.439 0.665
#> 11 carb -0.199 0.829 -0.241 0.812
for (i in coef_base) {
#do work on i
print(i)
}
#> [1] 12.30337
#> [1] -0.1114405
#> [1] 0.01333524
#> [1] -0.02148212
#> [1] 0.787111
#> [1] -3.715304
#> [1] 0.8210407
#> [1] 0.3177628
#> [1] 2.520227
#> [1] 0.655413
#> [1] -0.1994193
pull out p-values and r-squared from a linear regression
r-squared: You can return the r-squared value directly from the summary object summary(fit)$r.squared
. See names(summary(fit))
for a list of all the items you can extract directly.
Model p-value: If you want to obtain the p-value of the overall regression model,
this blog post outlines a function to return the p-value:
lmp <- function (modelobject) {
if (class(modelobject) != "lm") stop("Not an object of class 'lm' ")
f <- summary(modelobject)$fstatistic
p <- pf(f[1],f[2],f[3],lower.tail=F)
attributes(p) <- NULL
return(p)
}
> lmp(fit)
[1] 1.622665e-05
In the case of a simple regression with one predictor, the model p-value and the p-value for the coefficient will be the same.
Coefficient p-values: If you have more than one predictor, then the above will return the model p-value, and the p-value for coefficients can be extracted using:
summary(fit)$coefficients[,4]
Alternatively, you can grab the p-value of coefficients from the anova(fit)
object in a similar fashion to the summary object above.
Extracting coefficients from a regression in R
You may use the names()
data(mtcars)
fit <- lm(mpg ~ wt, mtcars)
names(summary(fit))
names(summary(fit))
[1] "call" "terms" "residuals" "coefficients" "aliased" "sigma" "df" "r.squared"
[9] "adj.r.squared" "fstatistic" "cov.unscaled"
Then
Intercept:
summary(fit)$coefficients[1,1]
Slope:
summary(fit)$coefficients[2,1]
Extract regression coefficients out of large list in R
You can get the std error, p-values, etc. with the following modifications:
condlm <- function(i){
if(sum(is.na(df2012[,i]))==dim(df2013)[1]) # ignore the columns only containing NA's
return()
else
lm.model <- lm(df2013[,i]~df2012[,i])
summary(lm.model)
}
lms <- lapply(1:dim(df2013)[2], condlm)
lms
However please note that due to the way that your data is currently structured in your example, you do not have sufficient data to obtain numeric values for std. error, etc. since you are under-fitting your model.
For example, with your sample data we will get the following (partial output)
> lms
[[1]]
NULL
[[2]]
Call:
lm(formula = df2013[, i] ~ df2012[, i])
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.5455 NA NA NA
df2012[, i] 0.1818 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA
Extract only variables and coefficients with Signif. less 0.05 in R
I think broom
would make it easier:
library(tidyverse)
fit <- lm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am,
data = mtcars
)
coef <- broom::tidy(fit)
coef %>% filter(p.value < 0.05)
# or
subset(coef, coef$p.value < 0.05)
Using python to extract regression coefficients
Doe without code, its hard to say why you are getting the behaviour you are seeing?
Here's a sample complete code that works.
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
df = pd.DataFrame(np.random.randint(100, size=(50,2)))
df.rename(columns={0:'X1', 1:'X2'}, inplace=True)
# GLM Model
model = smf.glm("X2 ~ X1", data=df, family= sm.families.Poisson()).fit()
print(model.summary())
print(model.params)
# Poisson Model
poisson = smf.poisson("X2 ~ X1", data=df).fit()
print (poisson.summary())
print (poisson.params)
Extract lists of p-values for each regression coefficients (1104 linear regressions) with R
Here's a tidyverse solution in multiple parts, hopefully easier to read that way :-) I used mtcars
as a play dataset with mpg
as the invariant independent variable
library(dplyr)
library(purrr)
library(broom)
library(tibble)
# first key change is let `broom::tidy` do the hard work
test2 <- lapply(2:10, function(i) broom::tidy(lm(mtcars[,i] ~ mtcars[,"mpg"])))
names(test2) <- names(mtcars[2:10])
basic_information <-
map2_df(test2,
names(test2),
~ mutate(.x, which_dependent = .y)) %>%
mutate(term = ifelse(term == "(Intercept)", "Intercept", "mpg")) %>%
select(which_dependent, everything())
basic_information
#> # A tibble: 18 x 6
#> which_dependent term estimate std.error statistic p.value
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 cyl Intercept 11.3 0.593 19.0 2.87e-18
#> 2 cyl mpg -0.253 0.0283 -8.92 6.11e-10
#> 3 disp Intercept 581. 41.7 13.9 1.26e-14
#> 4 disp mpg -17.4 1.99 -8.75 9.38e-10
#> 5 hp Intercept 324. 27.4 11.8 8.25e-13
#> 6 hp mpg -8.83 1.31 -6.74 1.79e- 7
#> 7 drat Intercept 2.38 0.248 9.59 1.20e-10
#> 8 drat mpg 0.0604 0.0119 5.10 1.78e- 5
#> 9 wt Intercept 6.05 0.309 19.6 1.20e-18
#> 10 wt mpg -0.141 0.0147 -9.56 1.29e-10
#> 11 qsec Intercept 15.4 1.03 14.9 2.05e-15
#> 12 qsec mpg 0.124 0.0492 2.53 1.71e- 2
#> 13 vs Intercept -0.678 0.239 -2.84 8.11e- 3
#> 14 vs mpg 0.0555 0.0114 4.86 3.42e- 5
#> 15 am Intercept -0.591 0.253 -2.33 2.64e- 2
#> 16 am mpg 0.0497 0.0121 4.11 2.85e- 4
#> 17 gear Intercept 2.51 0.411 6.10 1.05e- 6
#> 18 gear mpg 0.0588 0.0196 3.00 5.40e- 3
Just to change things up a bit... we'll use map
to construct formula
y <- 'mpg'
x <- names(mtcars[2:10])
models <- map(setNames(x, x),
~ lm(as.formula(paste(.x, y, sep="~")),
data=mtcars))
pvalues <-
data.frame(rsquared = unlist(map(models, ~ summary(.)$r.squared)),
RSE = unlist(map(models, ~ summary(.)$sigma))) %>%
rownames_to_column(var = "which_dependent")
results <- full_join(basic_information, pvalues)
#> Joining, by = "which_dependent"
results
# A tibble: 18 x 8
which_dependent term estimate std.error statistic p.value rsquared RSE
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 cyl Intercept 11.3 0.593 19.0 2.87e-18 0.726 0.950
2 cyl mpg -0.253 0.0283 -8.92 6.11e-10 0.726 0.950
3 disp Intercept 581. 41.7 13.9 1.26e-14 0.718 66.9
4 disp mpg -17.4 1.99 -8.75 9.38e-10 0.718 66.9
5 hp Intercept 324. 27.4 11.8 8.25e-13 0.602 43.9
6 hp mpg -8.83 1.31 -6.74 1.79e- 7 0.602 43.9
7 drat Intercept 2.38 0.248 9.59 1.20e-10 0.464 0.398
8 drat mpg 0.0604 0.0119 5.10 1.78e- 5 0.464 0.398
9 wt Intercept 6.05 0.309 19.6 1.20e-18 0.753 0.494
10 wt mpg -0.141 0.0147 -9.56 1.29e-10 0.753 0.494
11 qsec Intercept 15.4 1.03 14.9 2.05e-15 0.175 1.65
12 qsec mpg 0.124 0.0492 2.53 1.71e- 2 0.175 1.65
13 vs Intercept -0.678 0.239 -2.84 8.11e- 3 0.441 0.383
14 vs mpg 0.0555 0.0114 4.86 3.42e- 5 0.441 0.383
15 am Intercept -0.591 0.253 -2.33 2.64e- 2 0.360 0.406
16 am mpg 0.0497 0.0121 4.11 2.85e- 4 0.360 0.406
17 gear Intercept 2.51 0.411 6.10 1.05e- 6 0.231 0.658
18 gear mpg 0.0588 0.0196 3.00 5.40e- 3 0.231 0.658
Related Topics
R Spreading Multiple Columns With Tidyr
How to Merge Color, Line Style and Shape Legends in Ggplot
Subscript Letters in Ggplot Axis Label
Split Date-Time Column into Date and Time Variables
Filtering a Data Frame on a Vector
Collapsing Rows Where Some Are All Na, Others Are Disjoint With Some Nas
Rename Multiple Columns by Names
Convert Data.Frame Column Format from Character to Factor
Overlay Histogram With Density Curve
How to Count Runs in a Sequence
Unlist Data Frame Column Preserving Information from Other Column
How to Number/Label Data-Table by Group-Number from Group_By
Applying a Function to Every Row of a Table Using Dplyr
Wrap Long Axis Labels Via Labeller=Label_Wrap in Ggplot2
How to Uninstall R and Rstudio With All Packages, Settings and Everything Else