Add Number of Observations Per Group in Ggplot2 Boxplot

How to add number of observation per group to a time course graph with R?

Still not sure about your desired result but maybe this helps. Instead of using stat_summary the pragmatic approach is quite often to do the calculations outside of ggplot, i.e. use an aggregated dataframe with the labels or counts and the position. As you said you want the "number of observations per group next to the time point" I also added the mean of lactate to the dataframe to place the number of obs next to the points.

Note: I opted for geom_label as by default it adds some padding around the label.

library(dplyr)
library(ggplot2)

df_labels <- df %>%
group_by(time, group) %>%
summarise(mean_lactate = mean(lactate), n = n(), .groups = "drop")

ggplot(df, aes(x = time, y = lactate, color = as.factor(group))) +
geom_point(stat = "summary", fun = mean, size = 4) +
geom_line(stat = "summary", fun = mean, aes(group = as.factor(group))) +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.01) +
xlab("NRP Time (hours)") +
ylab("Lactate (mmol/l)") +
geom_label(data = df_labels, aes(y = mean_lactate, label = n), vjust = 1, hjust = 0, label.size = 0, fill = NA) +
theme_bw() +
ggtitle("Panel A") +
theme(plot.title = element_text(hjust = 0.5)) +
theme(text = element_text(family = "Helvetica", size = 20)) +
scale_color_manual(
name = "Outcome",
breaks = c("0", "1"),
labels = c("Negative", "Positive"),
values = c("#E12000", "#002F80")
)

Sample Image

DATA

df <- structure(list(id = c(
1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
4L, 4L
), group = c(
"A", "A", "A", "B", "B", "B", "A", "A", "A",
"B", "B", "B"
), time = c(
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L
), lactate = c(
1.2, 1.1, 1.3, 0.8, 0.7, 0.9, 0.7, 0.9,
1.3, 0.5, 0.6, 0.7
)), class = "data.frame", row.names = c(
NA,
-12L
))

How to use ggplot2 in R to plot a boxplot with number of observations?

Neither the base boxplot or the ggplot geom_boxplot functions expect data with weights/counts like this, so I think your best bet is to expand the data into individual observations.

expanded_data = data[rep(seq_len(nrow(data)), times = data$NumberObservations), ]
ggplot(data = expanded_data,
aes(x = Grouping, y = Value, group = Grouping)) +
geom_boxplot()

Add a number of observations per group AND SUBGROUP in ggplot2 boxplot

is it what you want?

require(ggplot2)

give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}

ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median, position=position_dodge(width=0.75))

Sample Image

How to add a number of observations per group and use group mean in ggplot2 boxplot?

Is this anything like what you're after? With stat_summary, as requested:

# function for number of observations 
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}

# function for mean labels
mean.n <- function(x){
return(c(y = median(x)*0.97, label = round(mean(x),2)))
# experiment with the multiplier to find the perfect position
}

# plot
ggplot(mtcars, aes(factor(cyl), mpg, label=rownames(mtcars))) +
geom_boxplot(fill = "grey80", colour = "#3366FF") +
stat_summary(fun.data = give.n, geom = "text", fun.y = median) +
stat_summary(fun.data = mean.n, geom = "text", fun.y = mean, colour = "red")

Black number is number of observations, red number is mean value. joran's answer shows you how to put the numbers at the top of the boxes
Sample Image

hat-tip: https://stackoverflow.com/a/3483657/1036500

Add number of observations per group in ggplot2 boxplot

You can just use position:

p <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(am))) +  
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = 0.75))
p

Sample Image

The width argument of position_dodge() controls the positioning on the horizontal axis. 0.75 is the sweet spot, see how it works for different numbers of groupings:

p2 <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(cyl))) + 
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = 0.75))
p2

Sample Image

Annotate ggplot boxplot facets with number of observations per bar/group

Try this approach using dplyr and ggplot2. You can build the label with mutate() and then format to have only one value based on max value of deliciousness. After that geom_text() can enable the text as you want. Here the code:

library(dplyr)
library(ggplot2)
#Data
mms <- data.frame(deliciousness = rnorm(100),
type=sample(as.factor(c("peanut", "regular")),
100, replace=TRUE),
color=sample(as.factor(c("red", "green", "yellow", "brown")),
100, replace=TRUE))
#Plot
mms %>% group_by(color,type) %>% mutate(N=n()) %>%
mutate(N=ifelse(deliciousness==max(deliciousness,na.rm=T),paste0('n=',N),NA)) %>%
ggplot(aes(x=type, y=deliciousness, fill=type,label=N)) +
geom_boxplot(notch=TRUE)+
geom_text(fontface='bold')+
facet_wrap(~ color,nrow=3, scales = "free")+
xlab("")+
scale_fill_manual(values = c("coral1", "lightcyan1", "olivedrab1"))+
theme(legend.position="none")

Output:

Sample Image

How to add number of observation and Median per group in boxplot in ggplot2

I think you can use the following solution taken from an article. For this purpose we can make use of stat_summary function, but for its fun.data we have to create a custom function that calculates count and median. It should be noted that we use geom_text in stat_summary so it requires x or y and label argument. So when we write our custom stat_box_sum function we have to make sure that the resulting data frame has these aesthetics as column names:

stat_box_sum <- function(y, upper_limit = max(iris$Sepal.Length)) {
DF <- data.frame(
y = max(y),
label = paste("N:", length(y), "\n",
"Median:", median(y), "\n")
)
DF
}

stat_box_sum(iris$Sepal.Length)
y label
1 7.9 N: 150 \n Median: 5.8 \n

df %>%
mutate(grp = ifelse(Sepal.Width > mean(Sepal.Width), 'Gr-1', 'Gr-2')) %>%
ggplot(aes(x = Species, y = Sepal.Length, fill = grp)) +
geom_boxplot() +
coord_flip() +
facet_wrap(. ~ grp) +
stat_summary(
fun.data = stat_box_sum,
geom = "text",
hjust = 0.7,
vjust = 0.7)

Sample Image
The only remaining problem I suppose is the adjustment of labels which I try to figure out soon.



Related Topics



Leave a reply



Submit