Show Percent % Instead of Counts in Charts of Categorical Variables

Show percent % instead of counts in charts of categorical variables

Since this was answered there have been some meaningful changes to the ggplot syntax. Summing up the discussion in the comments above:

 require(ggplot2)
 require(scales)

 p <- ggplot(mydataf, aes(x = foo)) +  
        geom_bar(aes(y = (..count..)/sum(..count..))) + 
        ## version 3.0.0
        scale_y_continuous(labels=percent)

Here's a reproducible example using mtcars:

 ggplot(mtcars, aes(x = factor(hp))) +  
        geom_bar(aes(y = (..count..)/sum(..count..))) + 
        scale_y_continuous(labels = percent) ## version 3.0.0

Sample Image

This question is currently the #1 hit on google for 'ggplot count vs percentage histogram' so hopefully this helps distill all the information currently housed in comments on the accepted answer.

Remark: If hp is not set as a factor, ggplot returns:

Sample Image

Show percent % instead of counts in charts of categorical variables but with multiple variables in one plot

You can just calculate the group-wise percentages before plotting them:

library(tidyverse)

subset(pivot_longer(MD3[1:6], 1:6), !is.na(value)) %>%
  group_by(name, value) %>%
  count() %>%
  group_by(value) %>%
  summarise(percent = n/sum(n), name = name) %>%
  ggplot(aes(y = value, x = percent, fill = name)) +
  geom_col(position = 'dodge') +
  scale_colour_viridis_d(aesthetics = "fill") +
  scale_x_continuous(labels = scales::percent) +
  theme_light()+
  labs(x = "Anteil in %", y ="", fill = 'Vielfalt im Bezug auf:')

Sample Image

ggplot: showing % instead of counts in charts of categorical variables with multiple levels

You could use sjp.xtab from the sjPlot-package for that:

sjp.xtab(diamonds$clarity, 
         diamonds$cut, 
         showValueLabels = F, 
         tableIndex = "row", 
         barPosition = "stack")

Sample Image

The data preparation for stacked group-percentages that sum up to 100% should be:

data.frame(prop.table(table(diamonds$clarity, diamonds$cut),1))

thus, you could write

mydf <- data.frame(prop.table(table(diamonds$clarity, diamonds$cut),1))
ggplot(mydf, aes(Var1, Freq, fill = Var2)) + 
  geom_bar(position = "stack", stat = "identity") +
  scale_y_continuous(labels=scales::percent)

Edit: This one adds up each category (Fair, Good...) to 100%, using 2 in prop.table and position = "dodge":

mydf <- data.frame(prop.table(table(diamonds$clarity, diamonds$cut),2))
ggplot(mydf, aes(Var1, Freq, fill = Var2)) + 
    geom_bar(position = "dodge", stat = "identity") +
    scale_y_continuous(labels=scales::percent)

sjp.xtab(diamonds$clarity, 
         diamonds$cut, 
         showValueLabels = F, 
         tableIndex = "col")

Sample Image

Verifying the last example with dplyr, summing up percentages within each group:

library(dplyr)
mydf %>% group_by(Var2) %>% summarise(percsum = sum(Freq))

>        Var2 percsum
> 1      Fair       1
> 2      Good       1
> 3 Very Good       1
> 4   Premium       1
> 5     Ideal       1

(see this page for further plot-options and examples from sjp.xtab...)

Percentage instead of count in ggplot

Using scales::percent and scales::percent_format this could be achieved like so:

library(ggplot2)
library(scales)

data1 = data.frame(NSE=c("A-B", "C", "D-E"), Percentage=c(68, 66, 63))

ggplot(data1, aes(x=NSE, y=Percentage))+
  geom_bar(stat = "identity", width=0.6, fill = "red", color = "grey40", alpha = 5)+ 
  geom_text(aes(label=scales::percent(Percentage, scale = 1, accuracy = 1)), vjust=1.5, color="white", 
            size=3)+
  scale_y_continuous(labels = scales::percent_format(scale = 1, accuracy = 1), limits = c(0,100)) +
  ggtitle("No se sintieron discriminados en los ultimos 
doce meses, segun NSE (en porcentaje)")+labs(y="", x="")+
  theme(plot.title = element_text(color="black", size=10, face="bold"))

Sample Image

^{Created on 2020-08-02 by the reprex package (v0.3.0)}

Add labels as percentages instead of counts on a grouped bar graph in seaborn

Use groupby.transform to compute the split percentages per day:

df['Customers (%)'] = df.groupby('Day')['Customers'].transform(lambda x: x / x.sum() * 100)

#     Day  Customers Time  Customers (%)
# 0   Mon         44    M      57.142857
# 1   Tue         46    M      50.000000
# 2   Wed         49    M      49.494949
# 3  Thur         59    M      54.629630
# 4   Fri         54    M      47.368421
# 5   Mon         33    E      42.857143
# 6   Tue         46    E      50.000000
# 7   Wed         50    E      50.505051
# 8  Thur         49    E      45.370370
# 9   Fri         60    E      52.631579

Then plot this new Customers (%) column and label the bars using ax.bar_label (with percentage formatting via the fmt param):

ax = sns.barplot(x='Day', y='Customers (%)', hue='Time', data=df) 

for container in ax.containers:
    ax.bar_label(container, fmt='%.0f%%')

Note that ax.bar_label requires matplotlib 3.4.0.

Sample Image

How to create a histogram of frequencies in percentage in ggplot?

You can use this code:

library(tidyverse)
library(scales)
df<-data.frame(corr=runif(100, min = -1, max = 1))

ggplot(df, aes(x = corr)) + 
  geom_histogram(fill = "blue", col = "black")+ 
  scale_y_continuous(breaks = seq(0,10,1),labels = paste(seq(0, 10, by = 1), "%", sep = ""))+
  geom_text(aes(y = (..count..),label =  scales::percent((..count..)/sum(..count..))), stat="bin",colour="red",vjust=-1, size = 3) +
  ylab("Percentage")

Output:

Sample Image

How to get percentage of categorical variables and overall percent of a single choice

Updated answer

The tricky part of this problem is the difference between row percentages and column percentages that are represented in the data. Since all rows but the total row are column percentages, we will need to process the data twice, first for the the province * variable level of aggregation, and then variable aggregated over province.

new_data <-data.frame(province=c("a","b"),
                      food=c("yes","no","no","yes","yes","no"),
                      shelter_type=c("unfinished","permanent","transitional"))   
library(dplyr)
library(tidyr)

First we'll generate what ultimately becomes column percentages within a wide format data frame. We use pivot_longer() to create a narrow format tidy data set, create counts, summarise() the counts, and then group_by() variable & value to generate column percentages.

new_data  %>% group_by(province) %>%
     pivot_longer(.,c(food,shelter_type),names_to = "variable",
                  values_to = "value") %>% ungroup() %>%
     group_by(province,variable,value) %>% 
     mutate(count = 1) %>% summarise(.,count = sum(count)) %>% ungroup() %>%
     group_by(variable,value) %>% 
     mutate(pct = count / sum(count)) -> prov_var

Next, we reaggregate the data to create what will become the Total province. We take the original data, convert to narrow format tidy data, and this time group_by() variable & value to calculate the percentages across province.

new_data  %>% group_by(province) %>%
     pivot_longer(.,c(food,shelter_type),names_to = "variable",
                  values_to = "value") %>% ungroup() %>%
     group_by(variable,value) %>%  
     mutate(count = 1) %>% summarise(., count = sum(count)) %>% 
     mutate(province = "Total",
            pct = count / sum(count)) -> tot_var

Finally, we rbind() the data and use tidyr::pivot_wider() to create the wide format data frame as illustrated in the original question.

# now add rows & pivot_wider()
rbind(prov_var,tot_var) %>% 
     mutate(concat_var = paste(variable,value,sep="_")) %>% 
     select(-variable,-value,-count) %>% 
     pivot_wider(id_cols = province,names_from=concat_var,
                 values_from = pct)

...and the output:

# A tibble: 3 x 6
  province food_no food_yes shelter_type_perm… shelter_type_tra… shelter_type_unf…
  <chr>      <dbl>    <dbl>              <dbl>             <dbl>             <dbl>
1 a          0.333    0.667              0.5               0.5               0.5  
2 b          0.667    0.333              0.5               0.5               0.5  
3 Total      0.5      0.5                0.333             0.333             0.333

Partial solutions with `tables::tabular()`

Another way to attempt to answer the question is with the tables package. We can generate the column percentages by province as follows.

library(tables)

# replicate column percentages, where "All" is 100

tabular((Factor(province,"Province") + 1) ~ 
                (Factor(food) + Factor(shelter_type)) * 
                (Percent("col")),data = new_data )

Unfortunately, the row for totals isn't what was requested.

          food            shelter_type                        
          no      yes     permanent    transitional unfinished
 Province Percent Percent Percent      Percent      Percent   
 a         33.33   66.67   50           50           50       
 b         66.67   33.33   50           50           50       
 All      100.00  100.00  100          100          100

We can fix the All row by configuring the table with row percentages, but then the data by province doesn't match what was requested.

# replicate row percentages in All row
tabular((Factor(province,"Province") + 1) ~ 
                (Factor(food) + Factor(shelter_type)) * 
                (Percent("row")),data = new_data )

          food            shelter_type                        
          no      yes     permanent    transitional unfinished
 Province Percent Percent Percent      Percent      Percent   
 a        33.33   66.67   33.33        33.33        33.33     
 b        66.67   33.33   33.33        33.33        33.33     
 All      50.00   50.00   33.33        33.33        33.33

Correct solution with `tabular()`

However, if we control the percentages by specifying them on the row dimension of the table instead of the column dimension, we can achieve the desired output.

tabular((Factor(province,"Province")*( colPct = Percent("col")) + 1*(rowPct = Percent("row")))  ~ 
                (Factor(food) + Factor(shelter_type)),data = new_data )

...and the output:

                 food        shelter_type                        
 Province        no    yes   permanent    transitional unfinished
 a        colPct 33.33 66.67 50.00        50.00        50.00     
 b        colPct 66.67 33.33 50.00        50.00        50.00     
 All      rowPct 50.00 50.00 33.33        33.33        33.33

Original answer

We'll use the dplyr package to summarise the data by province & food, calculate percentages, and then ungroup() to calculate percentage of total responses.

new_data <-data.frame(province=c("a","b"),
                      food=c("yes","no","no","yes","yes","no"),
                      shelter_type=c("unfinished","permanent","transitional"))

library(dplyr)

new_data %>% group_by(province,food) %>%
     summarise(count_food = n()) %>% group_by(province) %>%
     mutate(pct_food = count_food / sum(count_food)) %>%
     ungroup(.) %>%
     mutate(pct_total = count_food / sum(count_food))

...and the output:

# A tibble: 4 x 5
  province food  count_food pct_food pct_total
  <chr>    <chr>      <int>    <dbl>     <dbl>
1 a        no             1    0.333     0.167
2 a        yes            2    0.667     0.333
3 b        no             2    0.667     0.333
4 b        yes            1    0.333     0.167
>

Show the percentage instead of count in histogram using ggplot2 | R

We can replace the y aesthetic by the relative value of the count computed statistic, and set the scale to show percentages :

ggplot2.histogram(data=dat, xName='dens',
                  groupName='lines', legendPosition="top",
                  alpha=0.1) + 
  labs(x="X", y="Count") +
  theme(panel.border = element_rect(colour = "black"),
        panel.grid.minor = element_blank(), 
        axis.line = element_line(colour = "black")) + 
  theme_bw()+
  theme(legend.title=element_blank()) + 
  aes(y=stat(count)/sum(stat(count))) + 
  scale_y_continuous(labels = scales::percent)

Show Percent % Instead of Counts in Charts of Categorical Variables

Show percent % instead of counts in charts of categorical variables

Show percent % instead of counts in charts of categorical variables but with multiple variables in one plot

ggplot: showing % instead of counts in charts of categorical variables with multiple levels

Percentage instead of count in ggplot

Add labels as percentages instead of counts on a grouped bar graph in seaborn

How to create a histogram of frequencies in percentage in ggplot?

How to get percentage of categorical variables and overall percent of a single choice

Updated answer

Partial solutions with `tables::tabular()`

Correct solution with `tabular()`

Original answer

Show the percentage instead of count in histogram using ggplot2 | R

Related Topics

Leave a reply

Show percent % instead of counts in charts of categorical variables

Show percent % instead of counts in charts of categorical variables but with multiple variables in one plot

ggplot: showing % instead of counts in charts of categorical variables with multiple levels

Percentage instead of count in ggplot

Add labels as percentages instead of counts on a grouped bar graph in seaborn

How to create a histogram of frequencies in percentage in ggplot?

How to get percentage of categorical variables and overall percent of a single choice

Updated answer

Partial solutions with tables::tabular()

Correct solution with tabular()

Original answer

Show the percentage instead of count in histogram using ggplot2 | R

Related Topics

Leave a reply

Partial solutions with `tables::tabular()`

Correct solution with `tabular()`