Ggplot2 - Multi-Group Histogram with In-Group Proportions Rather Than Frequency

ggplot2 - Multi-group histogram with in-group proportions rather than frequency

Wrong solution

You can use stat_bin() and y=..density.. to get percentages in each group.

ggplot(df, alpha = 0.2,
aes(x = LetterGrade, group = ExperimentCohort, fill = ExperimentCohort))+
stat_bin(aes(y=..density..), position='dodge')

UPDATE - correct solution

As pointed out by @rpierce y=..density.. will calculate density values for each group not the percentages (they are not the same).

To get the correct solution with percentages one way is to calculate them before plotting. For this used function ddply() from library plyr. In each ExperimentCohort calculated proportions using functions prop.table() and table() and saved them as prop. With names() and table() got back LetterGrade.

df.new<-ddply(df,.(ExperimentCohort),summarise,
prop=prop.table(table(LetterGrade)),
LetterGrade=names(table(LetterGrade)))

head(df.new)
ExperimentCohort prop LetterGrade
1 One 0.21739130 A
2 One 0.08695652 B
3 One 0.13043478 C
4 One 0.13043478 D
5 One 0.30434783 E
6 One 0.13043478 F

Now use this new data frame for plotting. As proportions are already calculated - provided them as y values and added stat="identity" inside the geom_bar.

ggplot(df.new,aes(LetterGrade,prop,fill=ExperimentCohort))+
geom_bar(stat="identity",position='dodge')

Sample Image

Multi-group histogram with group-specific frequencies

Give this a try. In this, I am using dplyr which is a package that contains updated versions of the ddply-type functions from plyr. One thing, I am not sure if you want to have your x-axis be the Study_Groups or your Genotypes. your question states you want the frequency of Genotype within each group but your graph has the Genotypes on the x. The solution follows the stated desire, not the plot. However, making the change to get Genotype on the x is simple. I'll note in the code comments where and what change to make.

library(dplyr)
library(ggplot2)

df2 <- df %>%
count(Study_Group, Genotypes) %>%
group_by(Study_Group) %>% #change to `group_by(Genotypes) %>%` for alternative approach
mutate(prop = n / sum(n))

ggplot(data = df2, aes(Study_Group, prop, fill = Genotypes)) +
geom_bar(stat = "identity", position = "dodge")

Sample Image

Normalizing y-axis in histograms in R ggplot to proportion by group

Like this? [edited based on OP's comment]

Sample Image

ggplot(all,aes(x=value,fill=dataset))+
geom_histogram(aes(y=0.5*..density..),
alpha=0.5,position='identity',binwidth=0.5)

Using y=..density.. scales the histograms so the area under each is 1, or sum(binwidth*y)=1. As a result, you would use y = binwidth*..density.. to have y represent the fraction of the total in each bin. In your case, binwidth=0.5.

IMO this is a little easier to interpret:

Sample Image

ggplot(all,aes(x=value,fill=dataset))+
geom_histogram(aes(y=0.5*..density..),binwidth=0.5)+
facet_wrap(~dataset,nrow=2)

Multiple Relative frequency histogram in R, ggplot

Below are some basic example with the build-in iris dataset. The relative part is obtained by multiplying the density with the binwidth.

library(ggplot2)

ggplot(iris, aes(Sepal.Length, fill = Species)) +
geom_histogram(aes(y = after_stat(density * width)),
position = "identity", alpha = 0.5)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Sample Image

ggplot(iris, aes(Sepal.Length)) +
geom_histogram(aes(y = after_stat(density * width))) +
facet_wrap(~ Species)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Sample Image

Created on 2022-03-07 by the reprex package (v2.0.1)

ggplot2 and R - Applying custom colors to a multi group histogram in long format

Delete the scale_color_viridis and scale_fill_viridis lines - these are applying the Viridis color scale. Replace with scale_fill_manual(values = c(lightgreen, lightred, darkpurple)). And in your aesthetic mapping replace color = variable with fill = variable. For a histogram, color refers to the color of the lines outlining each bar, and fill refers to the color each bar is filled in.

This should leave you with:

p <- density2 %>%
ggplot(aes(x = value, fill = variable)) +
geom_histogram(binwidth = 1, alpha = 0.5, position = "identity") +
scale_fill_manual(values = c(lightgreen, lightred, darkpurple)) +
theme_bw() +
labs(fill = "") +
theme(panel.grid = element_blank())

p + scale_y_sqrt() +
theme(legend.position = "none") +
labs(y = "data pts", x = "elevation (m)")

I've also done some other clean-up. show.legend = FALSE does not belong inside aes() - and your theme(legend.position = "none") should take care of it.

I did not download your data, save it in my working directory, import it into R, and test this code on it. If you need more help, please post a small subset of your data in a copy/pasteable format (e.g., dput(density2[1:20, ]) for the first 20 rows---choose a suitable subset) and I'll be happy to test and adjust.

ggplot: relative frequencies of two groups

I usually do this by simply precalculating the values outside of ggplot2 and using stat = "identity":

df1 <- melt(ddply(df,.(gender),function(x){prop.table(table(x$outcome))}),id.vars = 1)

ggplot(df1, aes(x = variable,y = value)) +
facet_wrap(~gender, nrow=2, ncol=1) +
geom_bar(stat = "identity")


Related Topics



Leave a reply



Submit