Ggplot Graphing of Proportions of Observations Within Categories

ggplot graphing of proportions of observations within categories

I will understand if this isn't really what you're looking for, but I found your description of what you wanted very confusing until I realized that you were simply trying to visualize your data in a way that seemed very unnatural to me.

If someone asked me to produce a graph with the proportions within each category, I'd probably turn to a segmented bar chart:

ggplot(df,aes(x = cat2,fill = cat1)) + 
geom_bar(position = "fill")

Sample Image

Note the y axis records proportions, not counts, as you wanted.

ggplot graphing proportions within multiple categories

Here is one approach using dplyr.

library(dplyr)
library(ggplot2)

set.seed(111)
a = sample(c(TRUE, FALSE), 50, replace=TRUE)
b = sample(c(TRUE, FALSE), 50, replace=TRUE)
c = sample(c(TRUE, FALSE), 50, replace=TRUE)
df = as.data.frame(cbind(a,b,c))

UPDATED

Given the comments of the OP, here is a revised version.

foo <- group_by(df, a, b, c) %>% 
summarise(total = n()) %>%
mutate(prop = total / sum(total))

# Draw a ggplot figure
ggplot(foo, aes(x = a, y = prop, fill = b)) +
geom_bar(stat = "identity", position = "dodge")

Sample Image

Graph proportion within a factor level rather than a count in ggplot2

We can calculate the proportion in each group and then plot. Also you can manually specify colors using scale_fill_manual

library(dplyr)
library(ggplot2)

df %>%
na.omit() %>%
group_by(var1, var2) %>%
summarise(n = n()) %>%
mutate(n = n/sum(n)) %>%
ungroup() %>%
ggplot() + aes(var2, n, fill = var1) +
geom_bar(position = "dodge", stat = "identity") +
labs(x="Left or Right",y="Count")+
scale_y_continuous() +
scale_fill_discrete(name = "Answer:")+ theme_classic()+
theme(legend.position="top") +
scale_fill_manual(values = c("black", "red"))

Sample Image

Here I have removed all the rows with NA in it. If you want to do it for only specific columns you can use filter with is.na to remove those values. So for example, to remove NA values only from var1, we can do

df %>%
filter(!is.na(var1))
group_by(var1, var2) %>% .....

Show percent % instead of counts in charts of categorical variables

Since this was answered there have been some meaningful changes to the ggplot syntax. Summing up the discussion in the comments above:

 require(ggplot2)
require(scales)

p <- ggplot(mydataf, aes(x = foo)) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
## version 3.0.0
scale_y_continuous(labels=percent)

Here's a reproducible example using mtcars:

 ggplot(mtcars, aes(x = factor(hp))) +  
geom_bar(aes(y = (..count..)/sum(..count..))) +
scale_y_continuous(labels = percent) ## version 3.0.0

Sample Image

This question is currently the #1 hit on google for 'ggplot count vs percentage histogram' so hopefully this helps distill all the information currently housed in comments on the accepted answer.

Remark: If hp is not set as a factor, ggplot returns:

Sample Image

Plot proportions of dummy over time for three groups

One way would be to precalculate the proportion and plot it using geom_line:

library(tidyverse)
df %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
group_by(group, date) %>% #group
summarise(prop = sum(y=="1")/n()) %>% #calculate proportion
ggplot()+
geom_line(aes(x = date, y = prop, color = group))+
geom_point(aes(x = date, y = prop, color = group))

Sample Image

Answer to the updated question in the comments:

df %>%
mutate(date = as.POSIXct(date)) %>% #convert date to date
group_by(group, date) %>% #group
summarise(prop = sum(y=="1")/n()) %>%
ggplot()+
geom_line(aes(x = date, y = prop, color = group))+
geom_point(aes(x = date, y = prop, color = group))+
geom_vline(xintercept = as.POSIXct("2000-05-03 CEST"))

Sample Image

ggplot draw multiple plots by levels of a variable

By changing ..count../sum(..count..) to ..density.., it gives you the desired proportion

ggplot(data=d)+geom_histogram(aes(x=n,y=..density..),binwidth = 1)+facet_wrap(~group)

Create a bar chart with proportions

I'm not sure about dr part. If there's something wrong, please let me know.

library(dplyr)
library(ggplot2)
library(forcats)

test %>%
group_by(bird, season) %>%
summarise(key = unique(dr),
dr = sum(dr)) %>%
group_by(bird) %>%
mutate(dr = dr/sum(dr) * key/100) %>%
ungroup %>%
mutate(bird = fct_reorder(bird, desc(bird))) %>%
ggplot(aes(x=bird, y=dr, fill=season)) +
geom_bar(stat="identity")+
scale_fill_brewer(palette="Paired")+
theme_minimal() +
coord_flip()

Sample Image

Excluding levels/groups within categorical variable (ggplot graph)

You can count() the categories first, and then filter(), before feeding to ggplot. In this way, you would use geom_col() instead:

df %>% count(active) %>% filter(n>2) %>% 
ggplot(aes(x=active,y=n)) +
geom_col()

Alternatively, you could group_by() / filter() directly within your ggplot() call, like this:

ggplot(df %>% group_by(active) %>% filter(n()>2), aes(x=active)) + 
geom_bar()


Related Topics



Leave a reply



Submit