How to Add a Number of Observations Per Group and Use Group Mean in Ggplot2 Boxplot

Add number of observations per group in ggplot2 boxplot

You can just use position:

p <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(am))) +  
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = 0.75))
p

Sample Image

The width argument of position_dodge() controls the positioning on the horizontal axis. 0.75 is the sweet spot, see how it works for different numbers of groupings:

p2 <- ggplot(mtcars, aes(factor(vs), mpg, colour = factor(cyl))) + 
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median,
position = position_dodge(width = 0.75))
p2

Sample Image

How to add a number of observations per group and use group mean in ggplot2 boxplot?

Is this anything like what you're after? With stat_summary, as requested:

# function for number of observations 
give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}

# function for mean labels
mean.n <- function(x){
return(c(y = median(x)*0.97, label = round(mean(x),2)))
# experiment with the multiplier to find the perfect position
}

# plot
ggplot(mtcars, aes(factor(cyl), mpg, label=rownames(mtcars))) +
geom_boxplot(fill = "grey80", colour = "#3366FF") +
stat_summary(fun.data = give.n, geom = "text", fun.y = median) +
stat_summary(fun.data = mean.n, geom = "text", fun.y = mean, colour = "red")

Black number is number of observations, red number is mean value. joran's answer shows you how to put the numbers at the top of the boxes
Sample Image

hat-tip: https://stackoverflow.com/a/3483657/1036500

How to add number of observation per group to a time course graph with R?

Still not sure about your desired result but maybe this helps. Instead of using stat_summary the pragmatic approach is quite often to do the calculations outside of ggplot, i.e. use an aggregated dataframe with the labels or counts and the position. As you said you want the "number of observations per group next to the time point" I also added the mean of lactate to the dataframe to place the number of obs next to the points.

Note: I opted for geom_label as by default it adds some padding around the label.

library(dplyr)
library(ggplot2)

df_labels <- df %>%
group_by(time, group) %>%
summarise(mean_lactate = mean(lactate), n = n(), .groups = "drop")

ggplot(df, aes(x = time, y = lactate, color = as.factor(group))) +
geom_point(stat = "summary", fun = mean, size = 4) +
geom_line(stat = "summary", fun = mean, aes(group = as.factor(group))) +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.01) +
xlab("NRP Time (hours)") +
ylab("Lactate (mmol/l)") +
geom_label(data = df_labels, aes(y = mean_lactate, label = n), vjust = 1, hjust = 0, label.size = 0, fill = NA) +
theme_bw() +
ggtitle("Panel A") +
theme(plot.title = element_text(hjust = 0.5)) +
theme(text = element_text(family = "Helvetica", size = 20)) +
scale_color_manual(
name = "Outcome",
breaks = c("0", "1"),
labels = c("Negative", "Positive"),
values = c("#E12000", "#002F80")
)

Sample Image

DATA

df <- structure(list(id = c(
1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
4L, 4L
), group = c(
"A", "A", "A", "B", "B", "B", "A", "A", "A",
"B", "B", "B"
), time = c(
1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
1L, 2L, 3L
), lactate = c(
1.2, 1.1, 1.3, 0.8, 0.7, 0.9, 0.7, 0.9,
1.3, 0.5, 0.6, 0.7
)), class = "data.frame", row.names = c(
NA,
-12L
))

Add a number of observations per group AND SUBGROUP in ggplot2 boxplot

is it what you want?

require(ggplot2)

give.n <- function(x){
return(c(y = median(x)*1.05, label = length(x)))
# experiment with the multiplier to find the perfect position
}

ggplot(mtcars, aes(factor(cyl), mpg, fill = factor(gear))) +
geom_boxplot() +
stat_summary(fun.data = give.n, geom = "text", fun.y = median, position=position_dodge(width=0.75))

Sample Image

How to use ggplot2 in R to plot a boxplot with number of observations?

Neither the base boxplot or the ggplot geom_boxplot functions expect data with weights/counts like this, so I think your best bet is to expand the data into individual observations.

expanded_data = data[rep(seq_len(nrow(data)), times = data$NumberObservations), ]
ggplot(data = expanded_data,
aes(x = Grouping, y = Value, group = Grouping)) +
geom_boxplot()

How to add number of valid observations of each group at each timepoint to my linechart

A typical way to show the uncertainty caused by different sample size is to use error bars or a ribbon to indicate the standard error. This gives a nice visual intuition of the uncertainty introduced by both the spread of the data and the sample size. However, you can also add labels of counts too. You just need to summarize your data appropriately.

For completeness, here is your data represented with both a standard error ribbon and labels of the number of samples at each time point:

library(tidyverse)

mydata %>%
pivot_longer(value1:value5) %>%
group_by(groupe, name) %>%
summarize(count = sum(!is.na(value)),
mean = mean(value, na.rm = TRUE),
sd = sd(value, na.rm = TRUE)) %>%
mutate(time = as.numeric(gsub("\\D", "", name)),
upper = mean + sd/sqrt(count),
lower = mean - sd/sqrt(count)) %>%
ggplot(aes(time, mean, color = groupe)) +
geom_ribbon(aes(ymin = lower, ymax = upper, fill = groupe),
color = NA, alpha = 0.2) +
geom_point() +
geom_line() +
geom_label(aes(label = paste0("n = ", count),
y = mean + ifelse(groupe == "A", 1,-1)),
key_glyph = draw_key_blank) +
scale_color_manual(values = c("orangered3", "deepskyblue4")) +
scale_fill_manual(values = c("orangered3", "deepskyblue4")) +
labs(title = 'Mean values for each group over time \u00B1 standard error',
subtitle = expression(italic("Labels show sample size at each point"))) +
theme_light(base_size = 16)

Sample Image



Related Topics



Leave a reply



Submit