Convert Summary to Data.Frame

Convert summary to data.frame

You can consider unclass, I suppose:

data.frame(unclass(summary(mydf)), check.names = FALSE, stringsAsFactors = FALSE)
# ADMIT GRE GPA RANK
# 1 Min. :0.0000 Min. :380.0 Min. :2.930 Min. :1.000
# 2 1st Qu.:0.2500 1st Qu.:550.0 1st Qu.:3.047 1st Qu.:2.250
# 3 Median :1.0000 Median :650.0 Median :3.400 Median :3.000
# 4 Mean :0.6667 Mean :626.7 Mean :3.400 Mean :2.833
# 5 3rd Qu.:1.0000 3rd Qu.:735.0 3rd Qu.:3.655 3rd Qu.:3.750
# 6 Max. :1.0000 Max. :800.0 Max. :4.000 Max. :4.000
str(.Last.value)
# 'data.frame': 6 obs. of 4 variables:
# $ ADMIT: chr "Min. :0.0000 " "1st Qu.:0.2500 " "Median :1.0000 " "Mean :0.6667 " ...
# $ GRE : chr "Min. :380.0 " "1st Qu.:550.0 " "Median :650.0 " "Mean :626.7 " ...
# $ GPA : chr "Min. :2.930 " "1st Qu.:3.047 " "Median :3.400 " "Mean :3.400 " ...
# $ RANK: chr "Min. :1.000 " "1st Qu.:2.250 " "Median :3.000 " "Mean :2.833 " ...

Note that there is a lot of excessive whitespace there, in both the names and the values.

However, it might be sufficient to do something like:

do.call(cbind, lapply(mydf, summary))
# ADMIT GRE GPA RANK
# Min. 0.0000 380.0 2.930 1.000
# 1st Qu. 0.2500 550.0 3.048 2.250
# Median 1.0000 650.0 3.400 3.000
# Mean 0.6667 626.7 3.400 2.833
# 3rd Qu. 1.0000 735.0 3.655 3.750
# Max. 1.0000 800.0 4.000 4.000

Convert summary of data.frame into a dataframe

We could use the matrix route

out <- as.data.frame.matrix(ds)
row.names(out) <- NULL

-output

out
a b
1 Min. :1.0 Min. :4.0
2 1st Qu.:1.5 1st Qu.:4.5
3 Median :2.0 Median :5.0
4 Mean :2.0 Mean :5.0
5 3rd Qu.:2.5 3rd Qu.:5.5
6 Max. :3.0 Max. :6.0

If we need the min etc as row names, loop over the columns with sapply and apply the summary

as.data.frame(sapply(d, summary))

-output

          a   b
Min. 1.0 4.0
1st Qu. 1.5 4.5
Median 2.0 5.0
Mean 2.0 5.0
3rd Qu. 2.5 5.5
Max. 3.0 6.0

Convert model summary to data frame


model_df <- as.data.frame(coef(summary(model)))

# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.2308795 0.15761451 1.464836 0.14616637
# x -0.1645842 0.09883217 -1.665290 0.09904877

To avoid the row names you could add them in as column afterwards or directly with data.table:

library(data.table)
model_df <- data.table(coef(summary(model)), keep.rownames = 'term')
setDF(model_df)

# term Estimate Std. Error t value Pr(>|t|)
# 1 (Intercept) 0.2308795 0.15761451 1.464836 0.14616637
# 2 x -0.1645842 0.09883217 -1.665290 0.09904877

EDIT: As commented but deschen, the column names won't be pretty to fix the column names you could use setNames().

model_df <- 
setNames(model_df, c("term", "coef", "std_error", "t_value", "p_value"))
# term coef std_error t_value p_value
# 1 (Intercept) 0.2308795 0.15761451 1.464836 0.14616637
# 2 x -0.1645842 0.09883217 -1.665290 0.09904877

How to convert summary output to a data frame?

With your approach, you can take advantage of the fact that summary returns a vector of counts, with names for each value of ids:

> my.summary <- summary(DATA$ids)
> data.frame(ids=names(my.summary), nums=my.summary)
ids nums
1 1 4
2 2 5
3 3 2
4 4 2
5 5 2

Or--and this approach is more straightforward--you can create a frequency table based on ids and then convert that to a data frame:

> as.data.frame(table(ids), responseName="nums")
ids nums
1 1 4
2 2 5
3 3 2
4 4 2
5 5 2

R convert summary result (statistics with all dataframe columns) into dataframe

I commonly use a little function (adapted from a script found on the net) to do this kind of transformation:

sumstats = function(x) { 
null.k <- function(x) sum(is.na(x))
unique.k <- function(x) {if (sum(is.na(x)) > 0) length(unique(x)) - 1
else length(unique(x))}
range.k <- function(x) max(x, na.rm=TRUE) - min(x, na.rm=TRUE)
mean.k=function(x) {if (is.numeric(x)) round(mean(x, na.rm=TRUE), digits=2)
else "N*N"}
sd.k <- function(x) {if (is.numeric(x)) round(sd(x, na.rm=TRUE), digits=2)
else "N*N"}
min.k <- function(x) {if (is.numeric(x)) round(min(x, na.rm=TRUE), digits=2)
else "N*N"}
q05 <- function(x) quantile(x, probs=.05, na.rm=TRUE)
q10 <- function(x) quantile(x, probs=.1, na.rm=TRUE)
q25 <- function(x) quantile(x, probs=.25, na.rm=TRUE)
q50 <- function(x) quantile(x, probs=.5, na.rm=TRUE)
q75 <- function(x) quantile(x, probs=.75, na.rm=TRUE)
q90 <- function(x) quantile(x, probs=.9, na.rm=TRUE)
q95 <- function(x) quantile(x, probs=.95, na.rm=TRUE)
max.k <- function(x) {if (is.numeric(x)) round(max(x, na.rm=TRUE), digits=2)
else "N*N"}

sumtable <- cbind(as.matrix(colSums(!is.na(x))), sapply(x, null.k), sapply(x, unique.k), sapply(x, range.k), sapply(x, mean.k), sapply(x, sd.k),
sapply(x, min.k), sapply(x, q05), sapply(x, q10), sapply(x, q25), sapply(x, q50),
sapply(x, q75), sapply(x, q90), sapply(x, q95), sapply(x, max.k))

sumtable <- as.data.frame(sumtable); names(sumtable) <- c('count', 'null', 'unique',
'range', 'mean', 'std', 'min', '5%', '10%', '25%', '50%', '75%', '90%',
'95%', 'max')
return(sumtable)
}
sumstats(df1)
count null unique range mean std var min 5% 10% 25% 50% 75% 90% 95% max
gender 30.00 0.00 2.00 1.00 1.67 0.48 0.23 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 2.00
age 30.00 0.00 6.00 5.00 3.50 1.74 3.02 1.00 1.00 1.00 2.00 3.50 5.00 6.00 6.00 6.00
height 30.00 0.00 30.00 29.00 155.50 8.80 77.50 141.00 142.45 143.90 148.25 155.50 162.75 167.10 168.55 170.00

You might easily adapt it to add more descriptive columns, such as quantiles, nulls, range, etc. It does return a data.frame. You also might want to specify in advance the behaviour with NAs in the arguments.

Hope it helps.

summary to a data frame

Like this?

var <- rnorm(100)
x <- summary(var)
data.frame(x=matrix(x),row.names=names(x))
## x
## Min. -2.68300
## 1st Qu. -0.70930
## Median -0.09732
## Mean -0.00809
## 3rd Qu. 0.71550
## Max. 2.58100

Converting statsmodels summary object to Pandas Dataframe

The answer from @Michael B works well, but requires "recreating" the table. The table itself is actually directly available from the summary().tables attribute. Each table in this attribute (which is a list of tables) is a SimpleTable, which has methods for outputting different formats. We can then read any of those formats back as a pd.DataFrame:

import statsmodels.api as sm

model = sm.OLS(y,x)
results = model.fit()
results_summary = results.summary()

# Note that tables is a list. The table at index 1 is the "core" table. Additionally, read_html puts dfs in a list, so we want index 0
results_as_html = results_summary.tables[1].as_html()
pd.read_html(results_as_html, header=0, index_col=0)[0]


Related Topics



Leave a reply



Submit