Standard Error Bars Using Stat_Summary

Standard error bars using stat_summary

Well, I can't tell you how to get a multiplier by group into stat_summary.

However, it looks like your goal is to plot means and error bars that represent one standard error from the mean in ggplot without summarizing the dataset before plotting.

There is a mean_se function in ggplot2 that we can use instead of mean_cl_normal from Hmisc. The mean_se function has a multiplier of 1 as the default so we don't need to pass any extra arguments if we want standard error bars.

ggplot(mtcars, aes(cyl, qsec)) + 
stat_summary(fun.y = mean, geom = "bar") +
stat_summary(fun.data = mean_se, geom = "errorbar")

If you want to use the mean_cl_normal function from Hmisc, you have to change the multiplier to 1 so you get one standard error from the mean. The mult argument is an argument for mean_cl_normal. Arguments that you need to pass to the summary function you are using needs to be given as a list to the fun.args argument:

ggplot(mtcars, aes(cyl, qsec)) + 
stat_summary(fun.y = mean, geom = "bar") +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", fun.args = list(mult = 1))

In pre-2.0 versions of ggplot2, the argument could be passed directly:

ggplot(mtcars, aes(cyl, qsec)) + 
stat_summary(fun.y = mean, geom = "bar") +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", mult = 1)

issue with using stat_summary to produce error bars for line graphs when faceting

Here's what I think is happening: There are two rows of data per week in the unfacetted plot, but only one row per week in each panel of the facetted plot, causing the standard error calculation to return NA. stat_summary is intended for unsummarized data and does the data summaries internally. Use bug_subset_final with stat_summary, or switch to geom_errorbar to continue using wickhami_sum. Details below.

You've pre-summarized the data, but stat_summary is intended to work on the raw data and calculate the summary values internally. In the summary data frame wickhami_sum that you've passed to ggplot, there are two rows per week, one for each week of 2015 and one for each week of 2016. All of the data by week and year has been collapsed down to a single row for each week and year by the summary operation.

Thus, in the unfacetted plot, there are two rows of data for stat_summary to operate on for each week. But in the facetted plot, it's trying to calculate a standard error from a single observation, which is probably returning NA, hence nothing gets plotted. Even in the unfacetted plot, your error bars are being calculated from the two mean values for each year, which isn't what you want either.

Instead, either continue to use wickhami_sum, but instead of stat_summary do:

geom_errorbar(aes(ymin = wickhami - se, ymax=wickhami + se))

Or, use the raw data (which looks like it's called bug_subset_final) with stat_summary:

ggplot(bug_subset_final, aes(x=week, y=wickhami)) +      
stat_summary(fun.data=mean_se, geom="errorbar)`.

R: Show % differences between values: how to calculate error bars?

First of all, you can get your original plot using stat_summary() more easily because it will calculate the mean and SD for you directly inside the ggplot() call.

But to your question, you easily calculate the fold change prior to passing to ggplot() by doing a mutate() where you set vol[reg == "control"] as the denominator. Then you can format the y axis using {scales}.

library(tidyverse)
library(scales)

dd <- data.frame(id = rep(c(1,2,3), 2),
vol = c(10,5,8,11,10,9),
reg = rep(c('control', 'new'), each = 3))


# original plot using stat_summary to avoid transforming data
dd %>%
ggplot(aes(reg, vol)) +
stat_summary(geom = "bar", fun = mean) +
stat_summary(geom = "errorbar", fun.data = mean_cl_normal, fun.args = list(mult = 1))

Sample Image

# calculate % of control
dd %>%
mutate(norm_vol = vol/mean(vol[reg == "control"])) %>%
ggplot(aes(reg, norm_vol)) +
stat_summary(geom = "bar", fun = mean) +
stat_summary(geom = "errorbar", fun.data = mean_cl_normal, fun.args = list(mult = 1)) +
scale_y_continuous(labels = scales::percent_format())

Sample Image

Created on 2022-02-21 by the reprex package (v2.0.1)

How to calculate standard error instead of standard deviation in ggplot

A couple of things. First, you need to reassign e when you add geom_violin and stat_summary. Otherwise, it isn't carrying those changes forward when you add the boxplot in the next step. Second, when you add the boxplot last, it is mapping over the points and error bars from stat_summary so it looks like they're disappearing. If you add the boxplot first and then stat_summary the points and error bars will be placed on top of the boxplot. Here is an example:

library(ggplot2)
library(ggpubr)
library(Hmisc)

data("ToothGrowth")
ToothGrowth$dose <- as.factor(ToothGrowth$dose)

theme_set(
theme_classic() +
theme(legend.position = "top")
)

# Initiate a ggplot
e <- ggplot(ToothGrowth, aes(x = dose, y = len))

# Add violin plot
e <- e + geom_violin(trim = FALSE)

# Combine with box plot to add median and quartiles
# Change fill color by groups, remove legend
e <- e + geom_violin(aes(fill = dose), trim = FALSE) +
geom_boxplot(width = 0.2)+
scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07"))+
theme(legend.position = "none")

# Add mean points +/- SE
# Use geom = "pointrange" or geom = "crossbar"
e +
stat_summary(
fun.data = "mean_se", fun.args = list(mult = 1),
geom = "pointrange", color = "black"
)

You said in a comment that you couldn't see any changes when you tried mean_se and mean_cl_normal. Perhaps the above solution will have solved the problem, but you should see a difference. Here is an example just comparing mean_se and mean_sdl. You should notice the error bars are smaller with mean_se.

ggplot(ToothGrowth, aes(x = dose, y = len)) +
stat_summary(
fun.data = "mean_sdl", fun.args = list(mult = 1),
geom = "pointrange", color = "black"
)
ggplot(ToothGrowth, aes(x = dose, y = len)) +
stat_summary(
fun.data = "mean_se", fun.args = list(mult = 1),
geom = "pointrange", color = "black"
)

Here is a simplified solution if you don't want to reassign at each step:

ggplot(ToothGrowth, aes(x = dose, y = len)) + 
geom_violin(aes(fill = dose), trim = FALSE) +
geom_boxplot(width = 0.2) +
stat_summary(fun.data = "mean_se", fun.args = list(mult = 1),
geom = "pointrange", color = "black") +
scale_fill_manual(values = c("#00AFBB", "#E7B800", "#FC4E07")) +
theme(legend.position = "none")

Error bars look huge in R, but not in Excel

As the other answer points out, you should be looking at the standard error (sd/sqrt(n)) rather than the standard deviation. Here is a slightly more compact way to run your code, using stat_summary() to compute the summary statistics (mean_cl_normal normally plots the Normal 95% CIs, mult = 1 tells it to plot ±1 SE instead). If you want the end-caps on your error bars to be narrower, use the width= argument to adjust them.

(My plot still has large error bars but I assume that's because of the size of your reproducible example.)

library(tidyverse)
filter(dat, Condition != "z" & Environment != "a") %>%
mutate(across(Gate = fct_inorder)) %>%
ggplot(aes(Gate, Correct, colour = Sound)) +
stat_summary(geom="line", fun = mean) +
stat_summary(geom="errorbar", fun.data = \(x) mean_cl_normal(x, mult=1)) +
facet_wrap(~ Block)

Using geom_pointrange() to plot means and standard errors

It is easier to check this, if you can provide the actual dataframe descriptive_blp_data. Running your code with some arbitrary dataset does work as intended and produces error bars, so there is nothing really wrong with the ggplot part.

There may be a few reasons why this does not work with your actual dataset - maybe the standard errors are too small to show up with a point size of 5?

descriptive_blp_data <- data.frame(
"group" = c("Group_3", "Group_2", "Group_1"),
"mean_blp" = c(150, 50, -50),
"se_blp" = c(40, 20, 30)
)

library(ggplot2)

ggplot(descriptive_blp_data) +
aes(x = group, y = mean_blp, colour = group, size = 5) +
geom_pointrange(aes(ymin = mean_blp - se_blp, ymax = mean_blp + se_blp), width=.2,
position=position_dodge(.9)) +
scale_color_manual(
values = list(
Group_2 = "#9EBCDA",
Group_3 = "#8856A7",
Group_1 = "#E0ECF4"
)
) +
labs(y = "Mean BLP score (SE)") +
coord_flip() +
theme_classic() +
theme(legend.position = "none", axis.title.y = element_blank()) +
ylim(-218, 218)

Sample Image



Related Topics



Leave a reply



Submit