Convert summary to data.frame
You can consider unclass
, I suppose:
data.frame(unclass(summary(mydf)), check.names = FALSE, stringsAsFactors = FALSE)
# ADMIT GRE GPA RANK
# 1 Min. :0.0000 Min. :380.0 Min. :2.930 Min. :1.000
# 2 1st Qu.:0.2500 1st Qu.:550.0 1st Qu.:3.047 1st Qu.:2.250
# 3 Median :1.0000 Median :650.0 Median :3.400 Median :3.000
# 4 Mean :0.6667 Mean :626.7 Mean :3.400 Mean :2.833
# 5 3rd Qu.:1.0000 3rd Qu.:735.0 3rd Qu.:3.655 3rd Qu.:3.750
# 6 Max. :1.0000 Max. :800.0 Max. :4.000 Max. :4.000
str(.Last.value)
# 'data.frame': 6 obs. of 4 variables:
# $ ADMIT: chr "Min. :0.0000 " "1st Qu.:0.2500 " "Median :1.0000 " "Mean :0.6667 " ...
# $ GRE : chr "Min. :380.0 " "1st Qu.:550.0 " "Median :650.0 " "Mean :626.7 " ...
# $ GPA : chr "Min. :2.930 " "1st Qu.:3.047 " "Median :3.400 " "Mean :3.400 " ...
# $ RANK: chr "Min. :1.000 " "1st Qu.:2.250 " "Median :3.000 " "Mean :2.833 " ...
Note that there is a lot of excessive whitespace there, in both the names and the values.
However, it might be sufficient to do something like:
do.call(cbind, lapply(mydf, summary))
# ADMIT GRE GPA RANK
# Min. 0.0000 380.0 2.930 1.000
# 1st Qu. 0.2500 550.0 3.048 2.250
# Median 1.0000 650.0 3.400 3.000
# Mean 0.6667 626.7 3.400 2.833
# 3rd Qu. 1.0000 735.0 3.655 3.750
# Max. 1.0000 800.0 4.000 4.000
Convert summary of data.frame into a dataframe
We could use the matrix
route
out <- as.data.frame.matrix(ds)
row.names(out) <- NULL
-output
out
a b
1 Min. :1.0 Min. :4.0
2 1st Qu.:1.5 1st Qu.:4.5
3 Median :2.0 Median :5.0
4 Mean :2.0 Mean :5.0
5 3rd Qu.:2.5 3rd Qu.:5.5
6 Max. :3.0 Max. :6.0
If we need the min
etc as row names, loop over the columns with sapply
and apply the summary
as.data.frame(sapply(d, summary))
-output
a b
Min. 1.0 4.0
1st Qu. 1.5 4.5
Median 2.0 5.0
Mean 2.0 5.0
3rd Qu. 2.5 5.5
Max. 3.0 6.0
Convert model summary to data frame
model_df <- as.data.frame(coef(summary(model)))
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.2308795 0.15761451 1.464836 0.14616637
# x -0.1645842 0.09883217 -1.665290 0.09904877
To avoid the row names you could add them in as column afterwards or directly with data.table
:
library(data.table)
model_df <- data.table(coef(summary(model)), keep.rownames = 'term')
setDF(model_df)
# term Estimate Std. Error t value Pr(>|t|)
# 1 (Intercept) 0.2308795 0.15761451 1.464836 0.14616637
# 2 x -0.1645842 0.09883217 -1.665290 0.09904877
EDIT: As commented but deschen, the column names won't be pretty to fix the column names you could use setNames()
.
model_df <-
setNames(model_df, c("term", "coef", "std_error", "t_value", "p_value"))
# term coef std_error t_value p_value
# 1 (Intercept) 0.2308795 0.15761451 1.464836 0.14616637
# 2 x -0.1645842 0.09883217 -1.665290 0.09904877
How to convert summary output to a data frame?
With your approach, you can take advantage of the fact that summary
returns a vector of counts, with names for each value of ids
:
> my.summary <- summary(DATA$ids)
> data.frame(ids=names(my.summary), nums=my.summary)
ids nums
1 1 4
2 2 5
3 3 2
4 4 2
5 5 2
Or--and this approach is more straightforward--you can create a frequency table based on ids
and then convert that to a data frame:
> as.data.frame(table(ids), responseName="nums")
ids nums
1 1 4
2 2 5
3 3 2
4 4 2
5 5 2
R convert summary result (statistics with all dataframe columns) into dataframe
I commonly use a little function (adapted from a script found on the net) to do this kind of transformation:
sumstats = function(x) {
null.k <- function(x) sum(is.na(x))
unique.k <- function(x) {if (sum(is.na(x)) > 0) length(unique(x)) - 1
else length(unique(x))}
range.k <- function(x) max(x, na.rm=TRUE) - min(x, na.rm=TRUE)
mean.k=function(x) {if (is.numeric(x)) round(mean(x, na.rm=TRUE), digits=2)
else "N*N"}
sd.k <- function(x) {if (is.numeric(x)) round(sd(x, na.rm=TRUE), digits=2)
else "N*N"}
min.k <- function(x) {if (is.numeric(x)) round(min(x, na.rm=TRUE), digits=2)
else "N*N"}
q05 <- function(x) quantile(x, probs=.05, na.rm=TRUE)
q10 <- function(x) quantile(x, probs=.1, na.rm=TRUE)
q25 <- function(x) quantile(x, probs=.25, na.rm=TRUE)
q50 <- function(x) quantile(x, probs=.5, na.rm=TRUE)
q75 <- function(x) quantile(x, probs=.75, na.rm=TRUE)
q90 <- function(x) quantile(x, probs=.9, na.rm=TRUE)
q95 <- function(x) quantile(x, probs=.95, na.rm=TRUE)
max.k <- function(x) {if (is.numeric(x)) round(max(x, na.rm=TRUE), digits=2)
else "N*N"}
sumtable <- cbind(as.matrix(colSums(!is.na(x))), sapply(x, null.k), sapply(x, unique.k), sapply(x, range.k), sapply(x, mean.k), sapply(x, sd.k),
sapply(x, min.k), sapply(x, q05), sapply(x, q10), sapply(x, q25), sapply(x, q50),
sapply(x, q75), sapply(x, q90), sapply(x, q95), sapply(x, max.k))
sumtable <- as.data.frame(sumtable); names(sumtable) <- c('count', 'null', 'unique',
'range', 'mean', 'std', 'min', '5%', '10%', '25%', '50%', '75%', '90%',
'95%', 'max')
return(sumtable)
}
sumstats(df1)
count null unique range mean std var min 5% 10% 25% 50% 75% 90% 95% max
gender 30.00 0.00 2.00 1.00 1.67 0.48 0.23 1.00 1.00 1.00 1.00 2.00 2.00 2.00 2.00 2.00
age 30.00 0.00 6.00 5.00 3.50 1.74 3.02 1.00 1.00 1.00 2.00 3.50 5.00 6.00 6.00 6.00
height 30.00 0.00 30.00 29.00 155.50 8.80 77.50 141.00 142.45 143.90 148.25 155.50 162.75 167.10 168.55 170.00
You might easily adapt it to add more descriptive columns, such as quantiles, nulls, range, etc. It does return a data.frame. You also might want to specify in advance the behaviour with NAs in the arguments.
Hope it helps.
summary to a data frame
Like this?
var <- rnorm(100)
x <- summary(var)
data.frame(x=matrix(x),row.names=names(x))
## x
## Min. -2.68300
## 1st Qu. -0.70930
## Median -0.09732
## Mean -0.00809
## 3rd Qu. 0.71550
## Max. 2.58100
Converting statsmodels summary object to Pandas Dataframe
The answer from @Michael B works well, but requires "recreating" the table. The table itself is actually directly available from the summary().tables attribute. Each table in this attribute (which is a list of tables) is a SimpleTable, which has methods for outputting different formats. We can then read any of those formats back as a pd.DataFrame:
import statsmodels.api as sm
model = sm.OLS(y,x)
results = model.fit()
results_summary = results.summary()
# Note that tables is a list. The table at index 1 is the "core" table. Additionally, read_html puts dfs in a list, so we want index 0
results_as_html = results_summary.tables[1].as_html()
pd.read_html(results_as_html, header=0, index_col=0)[0]
Related Topics
How to Define the "Mid" Range in Scale_Fill_Gradient2()
Creating a Unique Sequence of Dates
Why Does "One" < 2 Equal False in R
Using Grep to Help Subset a Data Frame
Faster Weighted Sampling Without Replacement
How to Increase the Number of Columns Using R in Linux
Fill Missing Combinations in a Dataframe
Convert a Dataframe to Presence Absence Matrix
How to Learn R as a Programming Language
Access and Preserve List Names in Lapply Function
How to Show the Y Value on Tooltip While Hover in Ggplot2
Select First Element of Nested List
How to Reorder Data.Table Columns (Without Copying)
Promise Already Under Evaluation: Recursive Default Argument Reference or Earlier Problems
How to Prevent Rbind() from Geting Really Slow as Dataframe Grows Larger
How to Stop Executing of R Code Inside Shiny (Without Stopping the Shiny Process)