Show Percent % Instead of Counts in Charts of Categorical Variables

Show percent % instead of counts in charts of categorical variables

Since this was answered there have been some meaningful changes to the ggplot syntax. Summing up the discussion in the comments above:

 require(ggplot2)
require(scales)

p <- ggplot(mydataf, aes(x = foo)) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
## version 3.0.0
scale_y_continuous(labels=percent)

Here's a reproducible example using mtcars:

 ggplot(mtcars, aes(x = factor(hp))) +  
geom_bar(aes(y = (..count..)/sum(..count..))) +
scale_y_continuous(labels = percent) ## version 3.0.0

Sample Image

This question is currently the #1 hit on google for 'ggplot count vs percentage histogram' so hopefully this helps distill all the information currently housed in comments on the accepted answer.

Remark: If hp is not set as a factor, ggplot returns:

Sample Image

Show percent % instead of counts in charts of categorical variables but with multiple variables in one plot

You can just calculate the group-wise percentages before plotting them:

library(tidyverse)

subset(pivot_longer(MD3[1:6], 1:6), !is.na(value)) %>%
group_by(name, value) %>%
count() %>%
group_by(value) %>%
summarise(percent = n/sum(n), name = name) %>%
ggplot(aes(y = value, x = percent, fill = name)) +
geom_col(position = 'dodge') +
scale_colour_viridis_d(aesthetics = "fill") +
scale_x_continuous(labels = scales::percent) +
theme_light()+
labs(x = "Anteil in %", y ="", fill = 'Vielfalt im Bezug auf:')

Sample Image

ggplot: showing % instead of counts in charts of categorical variables with multiple levels

You could use sjp.xtab from the sjPlot-package for that:

sjp.xtab(diamonds$clarity, 
diamonds$cut,
showValueLabels = F,
tableIndex = "row",
barPosition = "stack")

Sample Image

The data preparation for stacked group-percentages that sum up to 100% should be:

data.frame(prop.table(table(diamonds$clarity, diamonds$cut),1))

thus, you could write

mydf <- data.frame(prop.table(table(diamonds$clarity, diamonds$cut),1))
ggplot(mydf, aes(Var1, Freq, fill = Var2)) +
geom_bar(position = "stack", stat = "identity") +
scale_y_continuous(labels=scales::percent)

Edit: This one adds up each category (Fair, Good...) to 100%, using 2 in prop.table and position = "dodge":

mydf <- data.frame(prop.table(table(diamonds$clarity, diamonds$cut),2))
ggplot(mydf, aes(Var1, Freq, fill = Var2)) +
geom_bar(position = "dodge", stat = "identity") +
scale_y_continuous(labels=scales::percent)

or

sjp.xtab(diamonds$clarity, 
diamonds$cut,
showValueLabels = F,
tableIndex = "col")

Sample Image

Verifying the last example with dplyr, summing up percentages within each group:

library(dplyr)
mydf %>% group_by(Var2) %>% summarise(percsum = sum(Freq))

> Var2 percsum
> 1 Fair 1
> 2 Good 1
> 3 Very Good 1
> 4 Premium 1
> 5 Ideal 1

(see this page for further plot-options and examples from sjp.xtab...)

Percentage instead of count in ggplot

Using scales::percent and scales::percent_format this could be achieved like so:

library(ggplot2)
library(scales)

data1 = data.frame(NSE=c("A-B", "C", "D-E"), Percentage=c(68, 66, 63))

ggplot(data1, aes(x=NSE, y=Percentage))+
geom_bar(stat = "identity", width=0.6, fill = "red", color = "grey40", alpha = 5)+
geom_text(aes(label=scales::percent(Percentage, scale = 1, accuracy = 1)), vjust=1.5, color="white",
size=3)+
scale_y_continuous(labels = scales::percent_format(scale = 1, accuracy = 1), limits = c(0,100)) +
ggtitle("No se sintieron discriminados en los ultimos
doce meses, segun NSE (en porcentaje)")+labs(y="", x="")+
theme(plot.title = element_text(color="black", size=10, face="bold"))

Sample Image

Created on 2020-08-02 by the reprex package (v0.3.0)

Add labels as percentages instead of counts on a grouped bar graph in seaborn

Use groupby.transform to compute the split percentages per day:

df['Customers (%)'] = df.groupby('Day')['Customers'].transform(lambda x: x / x.sum() * 100)

# Day Customers Time Customers (%)
# 0 Mon 44 M 57.142857
# 1 Tue 46 M 50.000000
# 2 Wed 49 M 49.494949
# 3 Thur 59 M 54.629630
# 4 Fri 54 M 47.368421
# 5 Mon 33 E 42.857143
# 6 Tue 46 E 50.000000
# 7 Wed 50 E 50.505051
# 8 Thur 49 E 45.370370
# 9 Fri 60 E 52.631579

Then plot this new Customers (%) column and label the bars using ax.bar_label (with percentage formatting via the fmt param):

ax = sns.barplot(x='Day', y='Customers (%)', hue='Time', data=df) 

for container in ax.containers:
ax.bar_label(container, fmt='%.0f%%')

Note that ax.bar_label requires matplotlib 3.4.0.

Sample Image

How to create a histogram of frequencies in percentage in ggplot?

You can use this code:

library(tidyverse)
library(scales)
df<-data.frame(corr=runif(100, min = -1, max = 1))

ggplot(df, aes(x = corr)) +
geom_histogram(fill = "blue", col = "black")+
scale_y_continuous(breaks = seq(0,10,1),labels = paste(seq(0, 10, by = 1), "%", sep = ""))+
geom_text(aes(y = (..count..),label = scales::percent((..count..)/sum(..count..))), stat="bin",colour="red",vjust=-1, size = 3) +
ylab("Percentage")

Output:

Sample Image

How to get percentage of categorical variables and overall percent of a single choice

Updated answer

The tricky part of this problem is the difference between row percentages and column percentages that are represented in the data. Since all rows but the total row are column percentages, we will need to process the data twice, first for the the province * variable level of aggregation, and then variable aggregated over province.

new_data <-data.frame(province=c("a","b"),
food=c("yes","no","no","yes","yes","no"),
shelter_type=c("unfinished","permanent","transitional"))
library(dplyr)
library(tidyr)

First we'll generate what ultimately becomes column percentages within a wide format data frame. We use pivot_longer() to create a narrow format tidy data set, create counts, summarise() the counts, and then group_by() variable & value to generate column percentages.

new_data  %>% group_by(province) %>%
pivot_longer(.,c(food,shelter_type),names_to = "variable",
values_to = "value") %>% ungroup() %>%
group_by(province,variable,value) %>%
mutate(count = 1) %>% summarise(.,count = sum(count)) %>% ungroup() %>%
group_by(variable,value) %>%
mutate(pct = count / sum(count)) -> prov_var

Next, we reaggregate the data to create what will become the Total province. We take the original data, convert to narrow format tidy data, and this time group_by() variable & value to calculate the percentages across province.

new_data  %>% group_by(province) %>%
pivot_longer(.,c(food,shelter_type),names_to = "variable",
values_to = "value") %>% ungroup() %>%
group_by(variable,value) %>%
mutate(count = 1) %>% summarise(., count = sum(count)) %>%
mutate(province = "Total",
pct = count / sum(count)) -> tot_var

Finally, we rbind() the data and use tidyr::pivot_wider() to create the wide format data frame as illustrated in the original question.

# now add rows & pivot_wider()
rbind(prov_var,tot_var) %>%
mutate(concat_var = paste(variable,value,sep="_")) %>%
select(-variable,-value,-count) %>%
pivot_wider(id_cols = province,names_from=concat_var,
values_from = pct)

...and the output:

# A tibble: 3 x 6
province food_no food_yes shelter_type_perm… shelter_type_tra… shelter_type_unf…
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a 0.333 0.667 0.5 0.5 0.5
2 b 0.667 0.333 0.5 0.5 0.5
3 Total 0.5 0.5 0.333 0.333 0.333

Partial solutions with tables::tabular()

Another way to attempt to answer the question is with the tables package. We can generate the column percentages by province as follows.

library(tables)

# replicate column percentages, where "All" is 100

tabular((Factor(province,"Province") + 1) ~
(Factor(food) + Factor(shelter_type)) *
(Percent("col")),data = new_data )

Unfortunately, the row for totals isn't what was requested.

          food            shelter_type                        
no yes permanent transitional unfinished
Province Percent Percent Percent Percent Percent
a 33.33 66.67 50 50 50
b 66.67 33.33 50 50 50
All 100.00 100.00 100 100 100

We can fix the All row by configuring the table with row percentages, but then the data by province doesn't match what was requested.

# replicate row percentages in All row
tabular((Factor(province,"Province") + 1) ~
(Factor(food) + Factor(shelter_type)) *
(Percent("row")),data = new_data )

food shelter_type
no yes permanent transitional unfinished
Province Percent Percent Percent Percent Percent
a 33.33 66.67 33.33 33.33 33.33
b 66.67 33.33 33.33 33.33 33.33
All 50.00 50.00 33.33 33.33 33.33

Correct solution with tabular()

However, if we control the percentages by specifying them on the row dimension of the table instead of the column dimension, we can achieve the desired output.

tabular((Factor(province,"Province")*( colPct = Percent("col")) + 1*(rowPct = Percent("row")))  ~ 
(Factor(food) + Factor(shelter_type)),data = new_data )

...and the output:

                 food        shelter_type                        
Province no yes permanent transitional unfinished
a colPct 33.33 66.67 50.00 50.00 50.00
b colPct 66.67 33.33 50.00 50.00 50.00
All rowPct 50.00 50.00 33.33 33.33 33.33

Original answer

We'll use the dplyr package to summarise the data by province & food, calculate percentages, and then ungroup() to calculate percentage of total responses.

new_data <-data.frame(province=c("a","b"),
food=c("yes","no","no","yes","yes","no"),
shelter_type=c("unfinished","permanent","transitional"))

library(dplyr)

new_data %>% group_by(province,food) %>%
summarise(count_food = n()) %>% group_by(province) %>%
mutate(pct_food = count_food / sum(count_food)) %>%
ungroup(.) %>%
mutate(pct_total = count_food / sum(count_food))

...and the output:

# A tibble: 4 x 5
province food count_food pct_food pct_total
<chr> <chr> <int> <dbl> <dbl>
1 a no 1 0.333 0.167
2 a yes 2 0.667 0.333
3 b no 2 0.667 0.333
4 b yes 1 0.333 0.167
>

Show the percentage instead of count in histogram using ggplot2 | R

We can replace the y aesthetic by the relative value of the count computed statistic, and set the scale to show percentages :

ggplot2.histogram(data=dat, xName='dens',
groupName='lines', legendPosition="top",
alpha=0.1) +
labs(x="X", y="Count") +
theme(panel.border = element_rect(colour = "black"),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black")) +
theme_bw()+
theme(legend.title=element_blank()) +
aes(y=stat(count)/sum(stat(count))) +
scale_y_continuous(labels = scales::percent)


Related Topics



Leave a reply



Submit