Get Values and Positions to Label a Ggplot Histogram

Get values and positions to label a ggplot histogram

geom_histogram() is just a fancy wrapper to stat_bin so you can all that yourself with the bars and text that you like. Here's an example

#sample data
set.seed(15)
csub<-data.frame(Anomaly10y = rpois(50,5))

And then we plot it with

ggplot(csub,aes(x=Anomaly10y)) + 
stat_bin(binwidth=1) + ylim(c(0, 12)) +
stat_bin(binwidth=1, geom="text", aes(label=..count..), vjust=-1.5)

to get

labeled univariate ggplot2 barplot

Histogram ggplot : Show count label for each bin for each category

Update for ggplot2 2.x

You can now center labels within stacked bars without pre-summarizing the data using position=position_stack(vjust=0.5). For example:

ggplot(aes(x = price ) , data = diamonds) + 
geom_histogram(aes(fill=cut), binwidth=1500, colour="grey20", lwd=0.2) +
stat_bin(binwidth=1500, geom="text", colour="white", size=3.5,
aes(label=..count.., group=cut), position=position_stack(vjust=0.5)) +
scale_x_continuous(breaks=seq(0,max(diamonds$price), 1500))

Original Answer

You can get the counts for each value of cut by adding cut as a group aesthetic to stat_bin. I also moved binwidth outside of aes, which was causing binwidth to be ignored in your original code:

ggplot(aes(x = price ), data = diamonds) + 
geom_histogram(aes(fill = cut ), binwidth=1500, colour="grey20", lwd=0.2) +
stat_bin(binwidth=1500, geom="text", colour="white", size=3.5,
aes(label=..count.., group=cut, y=0.8*(..count..))) +
scale_x_continuous(breaks=seq(0,max(diamonds$price), 1500))

Sample Image

One issue with the code above is that I'd like the labels to be vertically centered within each bar section, but I'm not sure how to do that within stat_bin, or if it's even possible. Multiplying by 0.8 (or whatever) moves each label by a different relative amount. So, to get the labels centered, I created a separate data frame for the labels in the code below:

# Create text labels
dat = diamonds %>%
group_by(cut,
price=cut(price, seq(0,max(diamonds$price)+1500,1500),
labels=seq(0,max(diamonds$price),1500), right=FALSE)) %>%
summarise(count=n()) %>%
group_by(price) %>%
mutate(ypos = cumsum(count) - 0.5*count) %>%
ungroup() %>%
mutate(price = as.numeric(as.character(price)) + 750)

ggplot(aes(x = price ) , data = diamonds) +
geom_histogram(aes(fill = cut ), binwidth=1500, colour="grey20", lwd=0.2) +
geom_text(data=dat, aes(label=count, y=ypos), colour="white", size=3.5)

Sample Image

To configure the breaks on the y axis, just add scale_y_continuous(breaks=seq(0,20000,2000)) or whatever breaks you'd like.

How to label stacked histogram in ggplot

The inbuilt functions geom_histogram and stat_bin are perfect for quickly building plots in ggplot. However, if you are looking to do more advanced styling it is often required to create the data before you build the plot. In your case you have overlapping labels which are visually messy.

The following codes builds a binned frequency table for the dataframe:

# Subset data
mpg_df <- data.frame(displ = mpg$displ, class = mpg$class)
melt(table(mpg_df[, c("displ", "class")]))

# Bin Data
breaks <- 1
cuts <- seq(0.5, 8, breaks)
mpg_df$bin <- .bincode(mpg_df$displ, cuts)

# Count the data
mpg_df <- ddply(mpg_df, .(mpg_df$class, mpg_df$bin), nrow)
names(mpg_df) <- c("class", "bin", "Freq")

You can use this new table to set a conditional label, so boxes are only labelled if there are more than a certain number of observations:

ggplot(mpg_df, aes(x = bin, y = Freq,  fill = class)) +
geom_bar(stat = "identity", colour = "black", width = 1) +
geom_text(aes(label=ifelse(Freq >= 4, as.character(class), "")),
position=position_stack(vjust=0.5), colour="black")

Sample Image

I don't think it makes a lot of sense duplicating the labels, but it may be more useful showing the frequency of each group:

ggplot(mpg_df, aes(x = bin, y = Freq,  fill = class)) +
geom_bar(stat = "identity", colour = "black", width = 1) +
geom_text(aes(label=ifelse(Freq >= 4, Freq, "")),
position=position_stack(vjust=0.5), colour="black")

Sample Image

Update

I realised you can actually selectively filter a label using the internal ggplot function ..count... No need to preformat the data!

ggplot(mpg, aes(x = displ, fill = class, label = class)) +
geom_histogram(binwidth = 1,col="black") +
stat_bin(binwidth=1, geom="text", position=position_stack(vjust=0.5), aes(label=ifelse(..count..>4, ..count.., "")))

This post is useful for explaining special variables within ggplot: Special variables in ggplot (..count.., ..density.., etc.)

This second approach will only work if you want to label the dataset with the counts. If you want to label the dataset by the class or another parameter, you will have to prebuild the data frame using the first method.

How to put label on histogram bin

You can use stat = "bin" inside geom_text. Use stat(density) for the y axis values, and stat(count) for the label aesthetic. Nudge the text upwards with a small negative vjust to make the counts sit on top of the bars.

mpg %>%
ggplot(aes(x = cty)) +
guides(fill = 'none') +
xlab('Fuel Consumption in City Area') +
geom_histogram(aes(y = stat(density)), binwidth = 50, fill = '#3ba7c4') +
geom_text(stat = "bin", aes(y = stat(density), label = stat(count)),
binwidth = 50, vjust = -0.2) +
geom_density(alpha = 0.2)

Sample Image

In reality you would want more bins, and to make the density line less opaque so it does not clash too much with the labels.

mpg %>%
ggplot(aes(x = cty)) +
guides(fill = 'none') +
xlab('Fuel Consumption in City Area') +
geom_histogram(aes(y = stat(density)), binwidth = 5, fill = '#3ba7c4',
color = '#2898c0') +
geom_text(stat = "bin", aes(y = stat(density), label = stat(count)),
binwidth = 5, vjust = -0.2) +
geom_density(color = alpha("black", 0.2)) +
theme_classic()

Sample Image

ggplot2 Adding data labels to grouped histograms chart

You need to set the grouping correctly for the dodging to work. Instead of using ylim, which cuts off one of your bars, we can turn off axis expansion which looks better for bars going down to 0. (You may need to use ylim with a higher value to make sure all labels are printed.)

ggplot(year, aes(as.factor(Stars), pct)) + 
geom_col(aes(fill = as.factor(Year)), position = "dodge") +
geom_text(
aes(label = round(pct, digits = 1), group = interaction(Stars, Year)),
position = position_dodge(0.9), size = 3, fontface = "bold", vjust = 0
) +
scale_fill_manual(values=c("#05668D", "#028090", "#00A896", "#02C39A", "#4ecdc4", "#F0F3BD")) +
scale_y_continuous(expand = c(0, 0)) +
theme(
panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
plot.title = element_text(hjust = 0.5)
) +
labs(title = "Share of stars", x = "Share of stars (in %)", y = "Stars", fill = "Year")

Sample Image

Whats the right way to add text to geom_histogram in ggplot?

Different layers typically don't share stateful information, so you could use the same stat as the histogram (stat_bin()) to display the labels. Then, you can use after_stat() to use the computed variables of the stat part of the layer to make labels.

library(ggplot2)

sample_data<- structure(list(
wage = c(81L, 77L, 63L, 84L, 110L, 151L, 59L, 109L, 159L, 71L),
school = c(15L, 12L, 10L, 15L, 16L, 18L, 11L, 12L, 10L, 11L),
expr = c(17L, 10L, 18L, 16L, 13L, 15L, 19L, 20L, 21L, 20L),
public = c(0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L),
female = c(1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L),
industry = c(63L, 93L, 71L, 34L, 83L, 38L, 82L, 50L, 71L, 37L)),
row.names = c("1","2", "3", "4", "5", "6", "7", "8", "9", "10"),
class = "data.frame")

ggplot(sample_data) +
geom_histogram(
aes(x = wage,
y = after_stat(density)),
binwidth = 4, colour = "black"
) +
stat_bin(
aes(x = wage,
y = after_stat(density),
label = after_stat(ifelse(count == 0, "", count))),
binwidth = 4, geom = "text", vjust = -1
)

Sample Image

Created on 2021-03-28 by the reprex package (v1.0.0)

How to print Frequencies on top of Histogram bars in ggplot

Instead of the geom_histogram wrapper, switch to the underlying stat_bin function, where you can use the built in geom="text", combined with the after_stat(count) to add the label to a histogram.

ggplot(mpg,aes(x=displ)) + 
stat_bin(binwidth=1) +
stat_bin(binwidth=1, geom="text", aes(label=after_stat(count)), vjust=0)

Modified from https://stackoverflow.com/a/24199013/10276092

density histogram in ggplot2: label bar height

You can do it with ggplot_build():

library(ggplot2)
dat = data.frame(a = c(5.5,7,4,20,4.75,6,5,8.5,10,10.5,13.5,14,11))
p=ggplot(dat, aes(x=a)) +
geom_histogram(aes(y=..density..),breaks = seq(4,20,by=2))+xlab("Required Solving Time")

ggplot_build(p)$data
#[[1]]
# y count x xmin xmax density ncount ndensity PANEL group ymin ymax colour fill size linetype alpha
#1 0.19230769 5 5 4 6 0.19230769 1.0 26.0 1 -1 0 0.19230769 NA grey35 0.5 1 NA
#2 0.03846154 1 7 6 8 0.03846154 0.2 5.2 1 -1 0 0.03846154 NA grey35 0.5 1 NA
#3 0.07692308 2 9 8 10 0.07692308 0.4 10.4 1 -1 0 0.07692308 NA grey35 0.5 1 NA
#4 0.07692308 2 11 10 12 0.07692308 0.4 10.4 1 -1 0 0.07692308 NA grey35 0.5 1 NA
#5 0.07692308 2 13 12 14 0.07692308 0.4 10.4 1 -1 0 0.07692308 NA grey35 0.5 1 NA
#6 0.00000000 0 15 14 16 0.00000000 0.0 0.0 1 -1 0 0.00000000 NA grey35 0.5 1 NA
#7 0.00000000 0 17 16 18 0.00000000 0.0 0.0 1 -1 0 0.00000000 NA grey35 0.5 1 NA
#8 0.03846154 1 19 18 20 0.03846154 0.2 5.2 1 -1 0 0.03846154 NA grey35 0.5 1 NA


p + geom_text(data = as.data.frame(ggplot_build(p)$data),
aes(x=x, y= density , label = round(density,2)),
nudge_y = 0.005)


Related Topics



Leave a reply



Submit