Boxplot Show the Value of Mean

Boxplot show the value of mean

First, you can calculate the group means with aggregate:

means <- aggregate(weight ~  group, PlantGrowth, mean)

This dataset can be used with geom_text:

library(ggplot2)
ggplot(data=PlantGrowth, aes(x=group, y=weight, fill=group)) + geom_boxplot() +
stat_summary(fun=mean, colour="darkred", geom="point",
shape=18, size=3, show.legend=FALSE) +
geom_text(data = means, aes(label = weight, y = weight + 0.08))

Here, + 0.08 is used to place the label above the point representing the mean.

Sample Image


An alternative version without ggplot2:

means <- aggregate(weight ~  group, PlantGrowth, mean)

boxplot(weight ~ group, PlantGrowth)
points(1:3, means$weight, col = "red")
text(1:3, means$weight + 0.08, labels = means$weight)

Sample Image

How to display numeric mean and std values next to a box plot in a series of box plots?

The boxplot method returns a dictionary that includes parts of the boxplot (whiskers, caps, boxes, medians, fliers, means). You can use these to add annotation at various location within the plot. Below I added mean and standard deviation values to the right of the median line:

Read this for more details Overlaying the numeric value of median/variance in boxplots

m1 = data3.mean(axis=0)
st1 = data3.std(axis=0)

fig, ax = plt.subplots()
bp = ax.boxplot(data3, showmeans=True)

for i, line in enumerate(bp['medians']):
x, y = line.get_xydata()[1]
text = ' μ={:.2f}\n σ={:.2f}'.format(m1[i], st1[i])
ax.annotate(text, xy=(x, y))

which plots

Sample Image

Boxplot in R showing the mean


abline(h=mean(x))

for a horizontal line (use v instead of h for vertical if you orient your boxplot horizontally), or

points(mean(x))

for a point. Use the parameter pch to change the symbol. You may want to colour them to improve visibility too.

Note that these are called after you have drawn the boxplot.

If you are using the formula interface, you would have to construct the vector of means. For example, taking the first example from ?boxplot:

boxplot(count ~ spray, data = InsectSprays, col = "lightgray")
means <- tapply(InsectSprays$count,InsectSprays$spray,mean)
points(means,col="red",pch=18)

If your data contains missing values, you might want to replace the last argument of the tapply function with function(x) mean(x,na.rm=T)

Show mean in the box plot in python?

This is a minimal example and produces the desired result:

import matplotlib.pyplot as plt
import numpy as np

data_to_plot = np.random.rand(100,5)

fig = plt.figure(1, figsize=(9, 6))
ax = fig.add_subplot(111)
bp = ax.boxplot(data_to_plot, showmeans=True)

plt.show()

EDIT:

If you want to achieve the same with matplotlib version 1.3.1 you'll have to plot the means manually. This is an example of how to do it:

import matplotlib.pyplot as plt
import numpy as np

data_to_plot = np.random.rand(100,5)
positions = np.arange(5) + 1

fig, ax = plt.subplots(1,2, figsize=(9,4))

# matplotlib > 1.4
bp = ax[0].boxplot(data_to_plot, positions=positions, showmeans=True)
ax[0].set_title("Using showmeans")

#matpltolib < 1.4
bp = ax[1].boxplot(data_to_plot, positions=positions)
means = [np.mean(data) for data in data_to_plot.T]
ax[1].plot(positions, means, 'rs')
ax[1].set_title("Plotting means manually")

plt.show()

Result:

Sample Image

Show mean values in boxplots in R

As others said, you can share your dataset for more specific help, but in this case I think the point can be made using a dummy dataset. I'm creating one that looks pretty similar to your own in terms of naming, so theoretically you can just plug in this code and it could work.

The biggest thing you need here is to control how ggplot2 is separating the separate boxplots for the data_box$Sitting_Position that share the same data_box$Kind. The process of separating and spreading the boxes around that x= axis value is called "dodging". When you supply a fill= or color= (or other) aesthetic in aes() for that geom, ggplot2 knows enough that it will assume you also want to group the data according to that value. So, your initial ggplot() call has in aes() that fill=Sitting_Position, which means that geom_boxplot() "works" - it creates the separate boxes that are colored differently and which are "dodged" properly.

When you create the points and the text, ggplot2 has no idea that you want to "dodge" this data, and even if you did want to dodge, on what basis to use for the dodge, since the fill= aesthetic doesn't make sense for a text or point geom. How to fix this? The answer is to:

  • Supply a group= aesthetic, which can override the grouping of a fill= or color= aesthetic, but which also can serve as a basis for the dodging for geoms that do not have a similar aesthetic.

  • Specify more clearly how you want to dodge. This will be important for accurate positioning of all things you want to dodge. Otherwise, you will have things dodged, but maybe not the same distance.

Here's how I combined all that:

# the datasets
set.seed(1234)
data_box <- data.frame(
Kind=c(rep('Model-free AR',100),rep('Real-world',100)),
TimeTotal=c(rnorm(50,5.5,1),rnorm(50,5.43,1.1),rnorm(50,4.9,1),rnorm(50,4.7,0.2)),
Sitting_Position=rep(c(rep('face to face',50),rep('side by side',50)),2)
)
means <- aggregate(TimeTotal ~ Sitting_Position*Kind, data_box, mean)

# the plot
ggplot(data_box, aes(x=Kind, y=TimeTotal)) + theme_bw() +

# specifying dodge here and width to avoid overlapping boxes
geom_boxplot(
aes(fill=Sitting_Position),
position=position_dodge(0.6), width=0.5
) +
# note group aesthetic and same dodge call for next two objects
stat_summary(
aes(group=Sitting_Position),
position=position_dodge(0.6),
fun=mean,
geom='point', color='darkred', shape=18, size=3,
show.legend = FALSE
) +
geom_text(
data=means,
aes(label=round(TimeTotal,2), y=TimeTotal + 0.18, group=Sitting_Position),
position=position_dodge(0.6)
)

Giving you this:

Sample Image

Mean and median in r boxplot

You can create the summary statistics beforehand and pass them through to geom_boxplot using stat = 'identity'

library(tidyverse)

div %>%
mutate(season = factor(season, level_order)) %>%
group_by(season, site) %>%
summarize(ymin = quantile(shannon, 0),
lower = quantile(shannon, 0.25),
median = median(shannon),
mean = mean(shannon),
upper = quantile(shannon, 0.75),
ymax = quantile(shannon, 1)) %>%
ggplot(aes(x = season, fill = site)) +
geom_boxplot(stat = 'identity',
aes(ymin = ymin, lower = lower, middle = mean, upper = upper,
ymax = ymax)) +
geom_point(aes(y = median, group = site),
position = position_dodge(width = 0.9)) +
xlab("season") +
ylab("Shannon index")

Sample Image



Related Topics



Leave a reply



Submit