Let Ggplot2 Histogram Show Classwise Percentages on Y Axis

Let ggplot2 histogram show classwise percentages on y axis

Calculating from stats

You can scale them by group by using the special stat variables group and count, using group to select subsets of count.

If you have ggplot 3.3.0 or newer, you can use the after_stat function to access these special variables:

ggplot(data, aes(carat, fill=color)) +
geom_histogram(
aes(y=after_stat(c(
count[group==1]/sum(count[group==1]),
count[group==2]/sum(count[group==2])
)*100)),
position='dodge',
binwidth=0.5
) +
ylab("Percentage") + xlab("Carat")

a ggplot graph of Carat vs Percentage, with two sets of bars, each showing the percentage of the given color, as desired

Using older versions of ggplot

In earlier versions, this is more cumbersome - if you have at least 3.0 you can wrap stat() function around each individual variable reference, in pre-3.0 versions you have to surround them with two dots instead:

aes(y=c(
..count..[..group..==1]/sum(..count..[..group..==1]),
..count..[..group..==2]/sum(..count..[..group..==2])
)*100),

Yeah but what are all these stats?

For more details on where these variables come from, summary stats will be documented alongside the stat function being used - for example geom_histogram's default stat_bin() has this Computed variables section:

Computed variables:

  • count number of points in bin
  • density density of points in bin, scaled to integrate to 1
  • ncount count, scaled to maximum of 1
  • ndensity density, scaled to maximum of 1
  • width widths of bins

Beyond that, you can use ggplot_build() to inspect all the stats generated for any given plot:

> p = ggplot(data, [...etc...])
> ggplot_build(p)
$data
$data[[1]]
fill y count x xmin xmax density ncount
1 #440154FF 1.50553506 102 -0.125 -0.25 0.00 0.0301107011 0.0224323730
2 #440154FF 67.11439114 4547 0.375 0.25
[...snip...]
ndensity flipped_aes PANEL group ymin ymax colour size linetype
1 0.0224323730 FALSE 1 1 0 1.50553506 NA 0.5 1
2 1.0000000000 FALSE 1 1 0 67.11439114 NA 0.5 1
[...snip...]

How to show percentage of individuals on y axis instead of count in histogram by groups?

This can be achieved like so:

Note: Concerning the NAs you were right. Simply subset for non-NA values or use dplyr::filter or ...

a <- read.table(text = "id FID   IID FLASER PLASER DIABDUR HBA1C ESRD   pheno
1 fam1000-03 G1000 1 1 38 10.2 1 control
2 fam1001-03 G1001 1 1 15 7.3 1 control
3 fam1003-03 G1003 1 2 17 7.0 1 case
4 fam1005-03 G1005 1 1 36 7.7 1 control
5 fam1009-03 G1009 1 1 23 7.6 1 control
6 fam1052-03 G1052 1 1 32 7.3 1 control
7 fam1052-03 G1052 1 1 32 7.3 1 NA", header = TRUE)

library(ggplot2)

ggplot(a, aes(x=HBA1C, fill=pheno)) +
geom_histogram(aes(y = ..count.. / tapply(..count.., ..group.., sum)[..group..]),
position='dodge', binwidth=0.5) +
scale_y_continuous(labels = scales::percent)

Sample Image

Created on 2020-05-23 by the reprex package (v0.3.0)

Show percent % instead of counts in charts of categorical variables

Since this was answered there have been some meaningful changes to the ggplot syntax. Summing up the discussion in the comments above:

 require(ggplot2)
require(scales)

p <- ggplot(mydataf, aes(x = foo)) +
geom_bar(aes(y = (..count..)/sum(..count..))) +
## version 3.0.0
scale_y_continuous(labels=percent)

Here's a reproducible example using mtcars:

 ggplot(mtcars, aes(x = factor(hp))) +  
geom_bar(aes(y = (..count..)/sum(..count..))) +
scale_y_continuous(labels = percent) ## version 3.0.0

Sample Image

This question is currently the #1 hit on google for 'ggplot count vs percentage histogram' so hopefully this helps distill all the information currently housed in comments on the accepted answer.

Remark: If hp is not set as a factor, ggplot returns:

Sample Image

R ggplot with percentages

One solution would be to calculate relative frequency with you input data and pass the results directly to ggplot, using the stat = "identity" parameter in geom_bar (see this post):

library(tidyverse)

df <- tibble::tribble(
~key, ~value,
1, "Before",
1, "After",
1, "During",
1, "Before",
2, "Before",
2, "After",
3, "During"
)

df %>%
dplyr::count(key, value) %>%
dplyr::group_by(key) %>%
dplyr::mutate(p = n / sum(n)) %>%
ggplot() +
geom_bar(
mapping = aes(x = key, y = p, fill = value),
stat = "identity",
position = position_dodge()
) +
scale_y_continuous(labels = scales::percent_format())

Sample Image

Created on 2019-10-28 by the reprex package (v0.3.0)

Frequency count histogram displaying only integer values on the y-axis?

How about:


ggplot(data = sample, aes (x = binMidPts, y = counts)) + geom_col() +
scale_y_continuous( breaks=round(pretty( range(sample$counts) )) )

Sample Image

This answer suggests pretty_breaks from the scales package. The manual page of pretty_breaks mentions pretty from base. And from there you just have to round it to the nearest integer.

R Histogram Display x-axis ticks as percentages

You don't give us any data to work with, so I will illustrate with some bogus data. Starting with the base x-values.

set.seed(2017)
dat = rnorm(500)/4
hist(dat, breaks=16)

Wrong x-axis

The idea is to suppress printing the x-axis and replace it with the one that you want.

H1 = hist(dat, breaks=16, xaxt="n")
axis(side=1, at=H1$breaks, labels=paste0(100*H1$breaks, "%"))

With Percent labels

Extract data from a ggplot

To get values actually plotted you can use function ggplot_build() where argument is your plot.

p <- ggplot(mtcars,aes(mpg))+geom_histogram()+
facet_wrap(~cyl)+geom_vline(data=data.frame(x=c(20,30)),aes(xintercept=x))

pg <- ggplot_build(p)

This will make list and one of sublists is named data. This sublist contains dataframe with values used in plot, for example, for histrogramm it contains y values (the same as count). If you use facets then column PANEL shows in which facet values are used. If there are more than one geom_ in your plot then data will contains dataframes for each - in my example there is one dataframe for histogramm and another for vlines.

head(pg$data[[1]])
y count x ndensity ncount density PANEL group ymin ymax
1 0 0 9.791667 0 0 0 1 1 0 0
2 0 0 10.575000 0 0 0 1 1 0 0
3 0 0 11.358333 0 0 0 1 1 0 0
4 0 0 12.141667 0 0 0 1 1 0 0
5 0 0 12.925000 0 0 0 1 1 0 0
6 0 0 13.708333 0 0 0 1 1 0 0
xmin xmax
1 9.40000 10.18333
2 10.18333 10.96667
3 10.96667 11.75000
4 11.75000 12.53333
5 12.53333 13.31667
6 13.31667 14.10000

head(pg$data[[2]])
xintercept PANEL group xend x
1 20 1 1 20 20
2 30 1 1 30 30
3 20 2 2 20 20
4 30 2 2 30 30
5 20 3 3 20 20
6 30 3 3 30 30


Related Topics



Leave a reply



Submit