Obtaining Separate Summary Statistics by Categorical Variable with Stargazer Package

Solution

library(stargazer)
library(dplyr)
library(tidyr)

ToothGrowth %>%
    group_by(supp) %>%
    mutate(id = 1:n()) %>%
    ungroup() %>%
    gather(temp, val, len, dose) %>%
    unite(temp1, supp, temp, sep = '_') %>%
    spread(temp1, val) %>%
    select(-id) %>%
    as.data.frame() %>%
    stargazer(type = 'text')

Result

=========================================
Statistic N   Mean  St. Dev.  Min   Max  
-----------------------------------------
OJ_dose   30 1.167   0.634   0.500 2.000 
OJ_len    30 20.663  6.606   8.200 30.900
VC_dose   30 1.167   0.634   0.500 2.000 
VC_len    30 16.963  8.266   4.200 33.900
-----------------------------------------

Explanation

This gets rid of the problem mentioned by the OP in a comment to the original answer, "What I really want is a single table with summary statistics separated by a categorical variable instead of creating separate tables." The easiest way I saw to do that with stargazer was to create a new data frame that had variables for each group's observations using a gather(), unite(), spread() strategy. The only trick to it is to avoid duplicate identifiers by creating unique identifiers by group and dropping that variable before calling stargazer().

Summary statistics for each category of categorical variables in R

I'm assuming you want each categoric approached separately rather than in combination.
You could start with

library(SmartEDA)
library(purrr)
map(c("gender","education" ),
    ~ExpCustomStat(demographics,       
                  Cvar=.x, 
                  Nvar=c("pandl_r2","pandl_r3") ,
                  stat = c('Count','Prop','mean','min','P0.25','median','p0.75','max'))
    )

where nvar has the numeric's to assess and you list out the categories in the first input to the map. if you want all the results stacked you'd have to map the first column to a generic name before stacking like so

library(dplyr)
map_dfr(c("gender","education" ),
    ~ExpCustomStat(demographics,       
                  Cvar=.x, 
                  Nvar=c("pandl_r2","pandl_r3") ,
                  stat = c('Count','Prop','mean','min','P0.25','median','p0.75','max')) |>
      rename_at(1, \(x)"var") |> mutate(catname = .x) |> relocate(catname)
    )

How to create a summary statistics table with two groups using stargazer?

Not entirely sure what your desired output is but does this help?

mtcars %>% 
  group_by(am) %>%
  summarise(mpg = mean(mpg), disp = mean(disp), hp = mean(hp)) %>%
  gather(key = "variable","value",mpg,disp,hp) %>%
  spread(am,value) %>%
  group_by(variable) %>%
  mutate(difference = `1`-`0`)

## Source: local data frame [3 x 4]
## Groups: variable [3]
##
##   variable       `0`       `1`  difference
##      <chr>     <dbl>     <dbl>       <dbl>
## 1     disp 290.37895 143.53077 -146.848178
## 2       hp 160.26316 126.84615  -33.417004
## 3      mpg  17.14737  24.39231    7.244939

R: Summary statistics for groups / subsets within panel data - code and layout

You can use this dplyr/tidy pipeline:

library(tidyverse)

dt %>%
  group_by(Rating) %>% 
  summarize(mean_Revenue = mean(Revenue),
            mean_Costs = mean(Costs),
            mean_Age = mean(Age),
            Observations=n()
  ) %>% 
  pivot_longer(cols = !Rating) %>% 
  pivot_wider(id_cols = "name",names_from = Rating,values_from = value,names_glue = "Rating{.name}") %>% 
  mutate(`Anova F-Test (p-value)` = c(sapply(dt %>% select(Revenue:Age), function(y) anova(lm(y~dt$Rating))$`Pr(>F)`[[1]]),NA)) %>% 
  left_join(
    dt %>%  
      pivot_longer(cols=Revenue:Age) %>% 
      group_by(name = paste0("mean_",name)) %>% 
      summarize(Total_means=mean(value))
  )

Output:

  name         Rating1 Rating2 Rating3 Rating4 Rating5 `Anova F-Test (p-value)` Total_means
  <chr>          <dbl>   <dbl>   <dbl>   <dbl>   <dbl>                    <dbl>       <dbl>
1 mean_Revenue     200   400       250     300     200                    0.742       289. 
2 mean_Costs        45    26.7      40      30      20                    0.196        33.3
3 mean_Age           2     3         4       4       2                    0.552         3  
4 Observations       2     3         2       1       1                   NA            NA

Updated 4/22/22

Original answer did not limit the anova to Ratings 1 and 5

# small function to get anova
get_anova <-function(y,rating, ratings=c(1,5)) {
  y_ = y[rating %in% ratings]
  x_ = rating[rating %in% ratings]
  anova(lm(y_~x_))$`Pr(>F)`[[1]]
}

dt %>%
  group_by(Rating) %>% 
  summarize(mean_Revenue = mean(Revenue),
            mean_Costs = mean(Costs),
            mean_Age = mean(Age),
            Observations=n()
  ) %>% 
  pivot_longer(cols = !Rating) %>% 
  pivot_wider(id_cols = "name",names_from = Rating,values_from = value,names_glue = "Rating{.name}") %>% 
  mutate(anova = c(sapply(dt %>% select(Revenue:Age), function(y) get_anova(y,rating=dt$Rating)),NA)) %>% 
  left_join(
    dt %>%  
      pivot_longer(cols=Revenue:Age) %>% 
      group_by(name = paste0("mean_",name)) %>% 
      summarize(Total_means=mean(value))
  )

Analysing a data frame that contains a time series using stargazer

You can either use split + lapply from base R:

library(stargazer)

lapply(split(df, df$year), stargazer, type = "text")

or by:

by(df, df$year, stargazer, type = 'text')

Result:

===============================================================
Statistic     N      Mean        St. Dev.      Min      Max    
---------------------------------------------------------------
Population    10 9,083,988.000 7,541,970.000 491,723 21,759,420
Distance..km. 10   5,637.500     2,385.941    2,211    9,500   
year          10   2,008.000       0.000      2,008    2,008   
---------------------------------------------------------------

===============================================================
Statistic     N      Mean        St. Dev.      Min      Max    
---------------------------------------------------------------
Population    10 9,361,404.000 7,798,880.000 496,963 22,549,547
Distance..km. 10   5,637.500     2,385.941    2,211    9,500   
year          10   2,009.000       0.000      2,009    2,009   
---------------------------------------------------------------

===============================================================
Statistic     N      Mean        St. Dev.      Min      Max    
---------------------------------------------------------------
Population    10 9,645,370.000 8,065,676.000 502,384 23,369,131
Distance..km. 10   5,637.500     2,385.941    2,211    9,500   
year          10   2,010.000       0.000      2,010    2,010   
---------------------------------------------------------------
df$year: 2008
[1] ""                                                               
[2] "==============================================================="
[3] "Statistic     N      Mean        St. Dev.      Min      Max    "
[4] "---------------------------------------------------------------"
[5] "Population    10 9,083,988.000 7,541,970.000 491,723 21,759,420"
[6] "Distance..km. 10   5,637.500     2,385.941    2,211    9,500   "
[7] "year          10   2,008.000       0.000      2,008    2,008   "
[8] "---------------------------------------------------------------"
-------------------------------------------------------------------------- 
df$year: 2009
[1] ""                                                               
[2] "==============================================================="
[3] "Statistic     N      Mean        St. Dev.      Min      Max    "
[4] "---------------------------------------------------------------"
[5] "Population    10 9,361,404.000 7,798,880.000 496,963 22,549,547"
[6] "Distance..km. 10   5,637.500     2,385.941    2,211    9,500   "
[7] "year          10   2,009.000       0.000      2,009    2,009   "
[8] "---------------------------------------------------------------"
-------------------------------------------------------------------------- 
df$year: 2010
[1] ""                                                               
[2] "==============================================================="
[3] "Statistic     N      Mean        St. Dev.      Min      Max    "
[4] "---------------------------------------------------------------"
[5] "Population    10 9,645,370.000 8,065,676.000 502,384 23,369,131"
[6] "Distance..km. 10   5,637.500     2,385.941    2,211    9,500   "
[7] "year          10   2,010.000       0.000      2,010    2,010   "
[8] "---------------------------------------------------------------"

The disadvantage of these two methods is that they print out the tables twice (once from stargazer output, another from lapply/by). To get around this, you can use walk form purrr to only call stargazer for it's side-effects:

library(dplyr)
library(purrr)

df %>%
  split(.$year) %>%
  walk(~ stargazer(., type = "text"))

Result:

===============================================================
Statistic     N      Mean        St. Dev.      Min      Max    
---------------------------------------------------------------
Population    10 9,083,988.000 7,541,970.000 491,723 21,759,420
Distance..km. 10   5,637.500     2,385.941    2,211    9,500   
year          10   2,008.000       0.000      2,008    2,008   
---------------------------------------------------------------

===============================================================
Statistic     N      Mean        St. Dev.      Min      Max    
---------------------------------------------------------------
Population    10 9,361,404.000 7,798,880.000 496,963 22,549,547
Distance..km. 10   5,637.500     2,385.941    2,211    9,500   
year          10   2,009.000       0.000      2,009    2,009   
---------------------------------------------------------------

===============================================================
Statistic     N      Mean        St. Dev.      Min      Max    
---------------------------------------------------------------
Population    10 9,645,370.000 8,065,676.000 502,384 23,369,131
Distance..km. 10   5,637.500     2,385.941    2,211    9,500   
year          10   2,010.000       0.000      2,010    2,010   
---------------------------------------------------------------

Note:

All methods above works for latex output (type = "latex"). I only set type = "text" for demonstrative purposes.

Create and Export a Summary Statistics Table

You have a tibble and stargazer doesn't support it. If you change it to dataframe it works.

library(stargazer)

data_stuct <- data.frame(data_stuct)

stargazer(data_stuct[c("BNBClose", "BTCClose", "ADAClose", "LINKClose", 
      "DODGEClose")],type="text",title="Summary Statistics", out="table1.txt")

#Summary Statistics
#========================================================================
#Statistic  N    Mean    St. Dev.    Min    Pctl(25)  Pctl(75)     Max   
#------------------------------------------------------------------------
#BNBClose   10   1.568    0.219     1.217     1.411     1.659     1.965  
#BTCClose   10 4,507.324 220.613  4,229.360 4,339.010 4,731.635 4,826.480
#ADAClose   10   0.022    0.002     0.019     0.021     0.022     0.026  
#LINKClose  10   0.408    0.043     0.346     0.385     0.440     0.476  
#DODGEClose 10   0.001   0.00004    0.001     0.001     0.001     0.001  
------------------------------------------------------------------------

Obtaining Separate Summary Statistics by Categorical Variable with Stargazer Package