In Ggplot2, What Do the End of the Boxplot Lines Represent

In ggplot2, what do the end of the boxplot lines represent?

The "dots" at the end of the boxplot represent outliers. There are a number of different rules for determining if a point is an outlier, but the method that R and ggplot use is the "1.5 rule". If a data point is:

  • less than Q1 - 1.5*IQR
  • greater than Q3 + 1.5*IQR

then that point is classed as an "outlier". The whiskers are defined as:

upper whisker = min(max(x), Q_3 + 1.5 * IQR)

lower whisker = max(min(x), Q_1 – 1.5 * IQR)

where IQR = Q_3 – Q_1, the box length. So the upper whisker is located at the smaller of the maximum x value and Q_3 + 1.5 IQR,
whereas the lower whisker is located at the larger of the smallest x value and Q_1 – 1.5 IQR.

Additional information

  • See the wikipedia boxplot page for alternative outlier rules.
  • There are actually a variety of ways of calculating quantiles. Have a look at `?quantile for the description of the nine different methods.

Example

Consider the following example

> set.seed(1)
> x = rlnorm(20, 1/2)#skewed data
> par(mfrow=c(1,3))
> boxplot(x, range=1.7, main="range=1.7")
> boxplot(x, range=1.5, main="range=1.5")#default
> boxplot(x, range=0, main="range=0")#The same as range="Very big number"

This gives the following plot:
Sample Image

As we decrease range from 1.7 to 1.5 we reduce the length of the whisker. However, range=0 is a special case - it's equivalent to "range=infinity"

marking the very end of the two whiskers in each boxplot in ggplot2 in R statistics

You just need to calculate the end points of the boxplots and add them, using stat_summary. For example

##Load the library
library(ggplot2)
data(mpg)

##Create a function to calculate the points
##Probably a built-in function that does this
get_tails = function(x) {
q1 = quantile(x)[2]
q3 = quantile(x)[4]
iqr = q3 -q1
upper = q3+1.5*iqr
lower = q1-1.5*iqr
if(length(x) == 1){return(x)} # will deal with abnormal marks at the periphery of the plot if there is one value only
##Trim upper and lower
up = max(x[x < upper])
lo = min(x[x > lower])
return(c(lo, up))
}

Use stat_summary to add it to your plot:

ggplot(mpg, aes(x=drv,y=hwy)) + geom_boxplot() + 
stat_summary(geom="point", fun.y= get_tails, colour="Red")

Also, your definition of the end points isn't quite correct. See my answer to another question for a few more details.

Paired Boxplot with lines coloured by factor in R

Alternatively, if you want to color by natcode, just change the line geom_line(aes(group = sites, color = manage)) to geom_line(aes(group = sites, color = natcode))

library(ggplot2)
df2 <- data.frame(manage = c("F","F","F","F","M","M"),
natcode = c("Y","Y","Y","Y","Y","Y"),
sites = c("MF1","MF2","MF3","MF4","MF1","MF2"),
variable = c("PESUKmedian","PESUKmedian","PESUKmedian","annualmedian","annualmedian","PESUKmedian"),
value = c(59.4363000,2.9628212,11.9980950,5.5549982,10.9977350,19.0449542))
df2
manage natcode sites variable value
F Y MF1 PESUKmedian 59.436300
F Y MF2 PESUKmedian 2.962821
F Y MF3 PESUKmedian 11.998095
F Y MF4 annualmedian 5.554998
M Y MF1 annualmedian 10.997735
M Y MF2 PESUKmedian 19.044954

ggplot(df2, aes(variable, value)) +
geom_boxplot(width=0.3, size=1.5, fatten=1.5, colour="black") +
geom_point(colour="red", size=2, alpha=0.5) +
geom_line(aes(group=sites, color = manage)) +
theme_classic()

Joining means on a boxplot with a line (ggplot2)

Is that what you are looking for?

library(ggplot2)

x <- factor(rep(1:10, 100))
y <- rnorm(1000)
df <- data.frame(x=x, y=y)

ggplot(df, aes(x=x, y=y)) +
geom_boxplot() +
stat_summary(fun=mean, geom="line", aes(group=1)) +
stat_summary(fun=mean, geom="point")

Update:

Some clarification about setting group=1: I think that I found an explanation in Hadley Wickham's book "ggplot2: Elegant Graphics for Data Analysis. On page 51 he writes:

Different groups on different layers.

Sometimes we want to plot summaries
based on different levels of
aggregation. Different layers might
have different group aesthetics, so
that some display individual level
data while others display summaries of
larger groups.

Building on the previous example,
suppose we want to add a single smooth
line to the plot just created, based
on the ages and heights of all the
boys. If we use the same grouping for
the smooth that we used for the line,
we get the first plot in Figure 4.4.

p + geom_smooth(aes(group = Subject),
method="lm", se = F)

This is not what we wanted; we have
inadvertently added a smoothed line
for each boy. This new layer needs a
different group aesthetic, group = 1,
so that the new line will be based on
all the data, as shown in the second
plot in the figure. The modified layer
looks like this:

p + geom_smooth(aes(group = 1),
method="lm", size = 2, se = F)

[...] Using aes(group = 1) in the
smooth layer fits a single line of
best fit across all boys."

Boxplot with lines connecting individual daa points

This code does what I need...

LN1__00 <- c(5.5,2.5,4.5,3.0,5.5,11.5)
LN2__00 <- c(9.5,9.5,5.5,7.0,11.5,17.5)
LN3__00 <- c(26.5,42.5,40.5,18.0,27.5,32.5)
condition <- c("1","2","1","2","1","2")
PB_ID <- c("A","A","B","B","C","C")

Sleepstages_Lat <- data.frame(LN1__00,LN2__00,LN3__00,condition,PB_ID)

Sleepstages_Lat2 <- melt(Sleepstages_Lat, id.vars = c("PB_ID", "condition"))

Sleepstages_Lat2$var.cond = paste(Sleepstages_Lat2$variable, Sleepstages_Lat2$condition, sep = "_")

#create jitter
b1 <- runif(nrow(Sleepstages_Lat2), -0.2, -0.1)
b2 <- runif(nrow(Sleepstages_Lat2), 0.1, 0.2)
Sleepstages_Lat2$b_corr <- NA
for (i in 1:nrow(Sleepstages_Lat2)){
if (Sleepstages_Lat2$condition[i] == 1){
Sleepstages_Lat2$b_corr[i] <- as.numeric(Sleepstages_Lat2$variable[i])+b1[i]
}else{
Sleepstages_Lat2$b_corr[i] <- as.numeric(Sleepstages_Lat2$variable[i])+b2[i]
}
}

# PLOT
plottitle = "Conditions"
subtitle = "Sleep (Stage) Latencies"

# define some stuff
colour_datapoints = "gray45" # gray45
shape_datapoints = 1
size_datapoints = 2
stroke_datapoints = 1 # thickness of circles

margins = unit(c(1, 8, 1, 1), 'lines')
p <- ggplot (Sleepstages_Lat2, aes(x = variable,
y=value,
fill = condition))
p <- p + geom_boxplot(outlier.shape = NA,
alpha = 0.9,
colour="black",
notch = F)+
geom_point(shape = shape_datapoints,
size = size_datapoints,
colour = colour_datapoints,
stroke = stroke_datapoints,
aes(x = b_corr,
group = var.cond))+
geom_line(aes(x = b_corr, y = value, group=interaction(PB_ID, variable)), colour = "gray68", show.legend = FALSE, linetype="dashed")+
theme_bw()+
coord_flip()
p

ggplot2 - align overlayed points in center of boxplot, and connect the points with lines

It is possible to extract the transformed points from the geom_dotplot using ggplot_build() - see Is it possible to get the transformed plot data? (e.g. coordinates of points in dot plot, density curve)

These points can be merged onto the original data, to be used as the anchor points for the geom_line.

Putting it all together:

library(dplyr)
library(ggplot2)

examiner <- rep(1:15, 2)
time <- rep(c("before", "after"), each = 15)
result <- c(1,3,2,3,2,1,2,4,3,2,3,2,1,3,3,3,4,4,5,3,4,3,2,2,3,4,3,4,4,3)

# Create a numeric version of time
data <- data.frame(examiner, time, result) %>%
mutate(group = case_when(
time == "before" ~ 2,
time == "after" ~ 1)
)

# Build a ggplot of the dotplot to extract data
dotpoints <- ggplot(data, aes(time, result, fill=time)) +
geom_dotplot(binaxis="y", aes(x=time, y=result, group = time),
stackdir = "center", binwidth = 0.075)

# Extract values of the dotplot
dotpoints_dat <- ggplot_build(dotpoints)[["data"]][[1]] %>%
mutate(key = row_number(),
x = as.numeric(x),
newx = x + 1.2*stackpos*binwidth/2) %>%
select(key, x, y, newx)

# Join the extracted values to the original data
data <- arrange(data, group, result) %>%
mutate(key = row_number())
newdata <- inner_join(data, dotpoints_dat, by = "key") %>%
select(-key)

# Create final plot
ggplot(newdata, aes(time, result, fill=time)) +
geom_boxplot() +
geom_dotplot(binaxis="y", aes(x=time, y=result, group = time),
stackdir = "center", binwidth = 0.075) +
geom_line(aes(x=newx, y=result, group = examiner), alpha=0.3)

Result



Related Topics



Leave a reply



Submit