Sort Boxplot by Mean (And Not Median) in R

Sort boxplot by mean (and not median) in R

This is a job for reorder():

myDataFrame$TYPE <- with(myDataFrame, reorder(TYPE, SCORE, mean))
boxplot( SCORE~TYPE, data=myDataFrame )

Sample Image

Sorting a boxplot based on median value

Check out ?reorder. The example seems to be what you want, but sorted in the opposite order. I changed -count in the first line below to sort in the order you want.

  bymedian <- with(InsectSprays, reorder(spray, -count, median))
boxplot(count ~ bymedian, data = InsectSprays,
xlab = "Type of spray", ylab = "Insect count",
main = "InsectSprays data", varwidth = TRUE,
col = "lightgray")

How to properly sort facet boxplots by median?

Here is a relatively simple way of achieving the requested arrangement using two helper function available here

reorder_within <- function(x, by, within, fun = mean, sep = "___", ...) {
new_x <- paste(x, within, sep = sep)
stats::reorder(new_x, by, FUN = fun)
}

scale_x_reordered <- function(..., sep = "___") {
reg <- paste0(sep, ".+$")
ggplot2::scale_x_discrete(labels = function(x) gsub(reg, "", x), ...)
}

library(tidyverse)
data(diamonds)

p <- ggplot(diamonds, aes(x = reorder_within(color, price, cut, median), y = price)) +
geom_boxplot(width = 5) +
scale_x_reordered()+
facet_wrap(~cut, scales = "free_x")

Sample Image

using ylim(0, 5500) will remove a big part of the data resulting in different box plots which will interfere with any formerly defined order. If you wish to limit an axis without doing so it is better to use:

p + coord_cartesian(ylim = c(0, 5500))

this results in:

Sample Image

If you really intend to remove a big part of data and keep the arrangement, filter the data prior the plot:

diamonds %>%
filter(price < 5500) %>%
ggplot(aes(x = reorder_within(color, price, cut, median), y = price)) +
geom_boxplot(width = 5) +
scale_x_reordered()+
facet_wrap(~cut, scales = "free_x")

Sample Image

Reorder boxplots based on their box size with ggplot2 in R

Do you mean the Interquartile range (IQR())? If so you can do

diamonds %>% 
as.tibble() %>%
ggplot(aes(reorder(cut, price, IQR), price)) +
geom_boxplot()

Ordering box plots on x axis by mean

Normally I'd comment and close as duplicate of, e.g.,

  • How do you specifically order ggplot axis?,
  • Order barchart in R,
  • How to change the order of a discrete x scale in ggplot?,
  • Order bars in ggplot2 bargraph,
  • sorting - R ggplot ordering bars

or pretty much anything that comes up if you search Stack Overflow for "ggplot2 order". If you want boxplot-specific examples (the method is the same), see

  • Ordering x in ggplot2 boxplot using computed statistic,
  • How to boxplot factors and order one of the factors according to a continuous variable in ggplot2?
  • r - boxplot: order groups by the mean....

Or even this one which you asked less than 2 weeks ago. Different geom, same principle.

But, you also have some other issues, one of which is using data$column inside aes() which is a bit of pet peeve of mine, so let's address that too.

Don't use data$column inside aes()! It means you're not using the data argument correctly. Related: it's not clear at all why you start the plot with the empty data frame df_c, when df_g has everything you need:

ggplot(df_g, aes(x = Var2, y = Closeness), position = "dodge") + 
geom_boxplot(outlier.size = 1.5)

correctly using the data argument and not specifying data$column inside aes() will make sure your plot works right in all cases. If you use $ inside aes(), facets and other complex features probably will not work. If you need to use multiple data frames in one plot, do it at the layer level (e.g., geom_point(data = other_data, aes(x = x_var, y = y_var))). You still don't need to use $ inside aes().

As for your two stated problems, they are both solved by editing your data. ggplot is very good at plotting data, you just need to make your data look like what you want to plot.

I'd like the categories (A, B, C, D) to be ordered by descending mean;

Order the factor in your data!

df_g$Var2 = with(df_g, reorder(x = Var2, X = Closeness, FUN = function(x) -mean(x, na.rm = TRUE)))

Some categories only have one sample (i.e. B, D, and E). I'd like to remove them before plotting.

Okay, remove them! You could wholly remove them from your data or just subset the data that you give to the plot:

more_than_one = levels(df_g$Var2)[table(df_g$Var2) > 1]

ggplot(subset(df_g, Var2 %in% more_than_one), aes(Var2, Closeness)) +
geom_boxplot()


Related Topics



Leave a reply



Submit