Ggplot2 Find Number of Counts in Histogram Maximum

ggplot2 find number of counts in histogram maximum

For example used data movies as sample data are not provided.

With function ggplot_build() you can get list containing all the elements used for plotting your data. All the data are in list element data[[1]]. Column count of this element contains values for histogram. You can use maximal value of this column to set limits for your plot.

plot = ggplot(movies, aes(rating)) + geom_histogram(binwidth=0.5, aes(fill=..count..))
ggplot_build(plot)$data[[1]]
fill y count x ndensity ncount density PANEL group ymin ymax xmin xmax
1 #132B43 0 0 0.75 0.0000000000 0.0000000000 0.0000000000 1 1 0 0 0.5 1.0
2 #142E48 272 272 1.25 0.0323232323 0.0323232323 0.0092535892 1 1 0 272 1.0 1.5
3 #16314B 454 454 1.75 0.0539512775 0.0539512775 0.0154453290 1 1 0 454 1.5 2.0
4 #17344F 668 668 2.25 0.0793820559 0.0793820559 0.0227257263 1 1 0 668 2.0 2.5
5 #1B3A58 1133 1133 2.75 0.1346405229 0.1346405229 0.0385452813 1 1 0 1133 2.5 3.0

plot + scale_y_continuous(expand = c(0,0),
limits=c(0,max(ggplot_build(plot)$data[[1]]$count)*1.1))

Sample Image

Adding breaks to count (y axis) of a histogram according to the count min-max range in R?

You can make a function for breaks that takes the limits of axis as the argument.

From the documentation of scale_continuous, breaks can take:

A function that takes the limits as input and returns breaks as output

Here is an example, where I go from 0 to the maximum y axis limit by 1. (I use 0 instead of the minimum count because histograms start at 0.)

The x in the function is the limits of the axis in the plot as calculated by ggplot() or as set by the user.

byone = function(x) {
seq(0, max(x), by = 1)
}

You can pas this function to breaks in scale_y_continuous(). The limits are pulled from directly from the plot and passed to the first argument of the function.

ggplot(df, aes(user_id)) +
geom_histogram() +
scale_y_continuous(breaks = byone)

Sample Image

Calculating the maximum histogram value

stat_bin uses binwidth=range/30 by default. I'm not sure exactly how it's calculated but this should be a fairly reasonable approximation:

max(table(cut(df$x,seq(min(df$x),max(df$x),dist(range(df$x))/30)))) 

Histogram ggplot to count number of rows per timestamp

Without seeing your data, I haven't been able to fine-tune this, but something like this should be close to what you want...

#ggplot needs a dataframe, so...
datas <- data.frame(Data=as.POSIXct(Timestamp, format="%Y/%m/%d %H:%M:%S"))

ggplot(datas, aes(x=Data)) +
geom_histogram(binwidth=3600, fill="grey", colour="black") + #binwidth in seconds
ylab("Nº de Inquéritos Realizados") + #xlab should pick up variable name 'Data'
ggtitle("Datas de Realização do Inquérito")

How to show count of each bin on histogram on the plot

You can add a stat_bin to do the calculations of counts for you, and then use those values for the labels and heights. Here's an example

set.seed(15)
dd<-data.frame(x=rnorm(100,5,2))
ggplot(dd, aes(x=x))+ geom_histogram() +
stat_bin(aes(y=..count.., label=..count..), geom="text", vjust=-.5)

labeled histogram

Histogram ggplot : Show count label for each bin for each category

Update for ggplot2 2.x

You can now center labels within stacked bars without pre-summarizing the data using position=position_stack(vjust=0.5). For example:

ggplot(aes(x = price ) , data = diamonds) + 
geom_histogram(aes(fill=cut), binwidth=1500, colour="grey20", lwd=0.2) +
stat_bin(binwidth=1500, geom="text", colour="white", size=3.5,
aes(label=..count.., group=cut), position=position_stack(vjust=0.5)) +
scale_x_continuous(breaks=seq(0,max(diamonds$price), 1500))

Original Answer

You can get the counts for each value of cut by adding cut as a group aesthetic to stat_bin. I also moved binwidth outside of aes, which was causing binwidth to be ignored in your original code:

ggplot(aes(x = price ), data = diamonds) + 
geom_histogram(aes(fill = cut ), binwidth=1500, colour="grey20", lwd=0.2) +
stat_bin(binwidth=1500, geom="text", colour="white", size=3.5,
aes(label=..count.., group=cut, y=0.8*(..count..))) +
scale_x_continuous(breaks=seq(0,max(diamonds$price), 1500))

Sample Image

One issue with the code above is that I'd like the labels to be vertically centered within each bar section, but I'm not sure how to do that within stat_bin, or if it's even possible. Multiplying by 0.8 (or whatever) moves each label by a different relative amount. So, to get the labels centered, I created a separate data frame for the labels in the code below:

# Create text labels
dat = diamonds %>%
group_by(cut,
price=cut(price, seq(0,max(diamonds$price)+1500,1500),
labels=seq(0,max(diamonds$price),1500), right=FALSE)) %>%
summarise(count=n()) %>%
group_by(price) %>%
mutate(ypos = cumsum(count) - 0.5*count) %>%
ungroup() %>%
mutate(price = as.numeric(as.character(price)) + 750)

ggplot(aes(x = price ) , data = diamonds) +
geom_histogram(aes(fill = cut ), binwidth=1500, colour="grey20", lwd=0.2) +
geom_text(data=dat, aes(label=count, y=ypos), colour="white", size=3.5)

Sample Image

To configure the breaks on the y axis, just add scale_y_continuous(breaks=seq(0,20000,2000)) or whatever breaks you'd like.

Display maximum frequency point of each bin in ggplot2 stat_binhex

I think, after playing with the data a bit more, I now understand. Each bin in the plot represents multiple points, e.g., (9,9);(9,10)(10,9);(10,10) are all in a single bin in the plot. I must caution that this is the expected behavior. It is unclear to me why you do not want to do it this way. Instead, you seem to want to display the values of just one of those points (e.g. 9,9).

I don't think you will be able to do this directly in a call to geom_hex or stat_hexbin, as those functions are trying to faithfully represent all of the data. In fact, they are not necessarily expecting discrete coordinates like you have at all -- they work equally well on continuous data.

For your purpose, if you want finer control, you may want to instead use geom_tile and count the values yourself, eg. (using dplyr and magrittr):

countedData <-
dat %$%
table(x,y) %>%
as.data.frame()

ggplot(countedData
, aes(x = x
, y = y
, fill = Freq)) +
geom_tile()

Sample Image

and you might play with the representation a bit from there, but it would at least display each of the separate coordinates more faithfully.

Alternatively, you could filter your raw data to only include the points that are the maximum within a bin. That would require you to match the binning, but could at least be an option.

For completeness, here is how to adapt the stat_summary_hex solution that @Jon Nagra (OP) linked to. Note that there are a few additional steps, so I don't think that this is quite a duplicate. Specifically, the table step above is required to generate something that can be used as a z for the summaries, and then you need to convert x and y back from factors to the original scale.

ggplot(countedData
, aes(x = as.numeric(as.character(x))
, y = as.numeric(as.character(y))
, z = Freq)) +
stat_summary_hex(fun = max, bins = 10
, col = "white")

Sample Image

Of note, I still think that the geom_tile may be more useful, even it is not quite as flashy.

How to extract bin numbers and percents from ggplot2?

Here is how you do it:

ggplot_build(p2)$data[[1]]

Let ggplot2 histogram show classwise percentages on y axis

Calculating from stats

You can scale them by group by using the special stat variables group and count, using group to select subsets of count.

If you have ggplot 3.3.0 or newer, you can use the after_stat function to access these special variables:

ggplot(data, aes(carat, fill=color)) +
geom_histogram(
aes(y=after_stat(c(
count[group==1]/sum(count[group==1]),
count[group==2]/sum(count[group==2])
)*100)),
position='dodge',
binwidth=0.5
) +
ylab("Percentage") + xlab("Carat")

a ggplot graph of Carat vs Percentage, with two sets of bars, each showing the percentage of the given color, as desired

Using older versions of ggplot

In earlier versions, this is more cumbersome - if you have at least 3.0 you can wrap stat() function around each individual variable reference, in pre-3.0 versions you have to surround them with two dots instead:

aes(y=c(
..count..[..group..==1]/sum(..count..[..group..==1]),
..count..[..group..==2]/sum(..count..[..group..==2])
)*100),

Yeah but what are all these stats?

For more details on where these variables come from, summary stats will be documented alongside the stat function being used - for example geom_histogram's default stat_bin() has this Computed variables section:

Computed variables:

  • count number of points in bin
  • density density of points in bin, scaled to integrate to 1
  • ncount count, scaled to maximum of 1
  • ndensity density, scaled to maximum of 1
  • width widths of bins

Beyond that, you can use ggplot_build() to inspect all the stats generated for any given plot:

> p = ggplot(data, [...etc...])
> ggplot_build(p)
$data
$data[[1]]
fill y count x xmin xmax density ncount
1 #440154FF 1.50553506 102 -0.125 -0.25 0.00 0.0301107011 0.0224323730
2 #440154FF 67.11439114 4547 0.375 0.25
[...snip...]
ndensity flipped_aes PANEL group ymin ymax colour size linetype
1 0.0224323730 FALSE 1 1 0 1.50553506 NA 0.5 1
2 1.0000000000 FALSE 1 1 0 67.11439114 NA 0.5 1
[...snip...]


Related Topics



Leave a reply



Submit