Sort boxplot by mean (and not median) in R
This is a job for reorder()
:
myDataFrame$TYPE <- with(myDataFrame, reorder(TYPE, SCORE, mean))
boxplot( SCORE~TYPE, data=myDataFrame )
Sorting a boxplot based on median value
Check out ?reorder
. The example seems to be what you want, but sorted in the opposite order. I changed -count
in the first line below to sort in the order you want.
bymedian <- with(InsectSprays, reorder(spray, -count, median))
boxplot(count ~ bymedian, data = InsectSprays,
xlab = "Type of spray", ylab = "Insect count",
main = "InsectSprays data", varwidth = TRUE,
col = "lightgray")
How to properly sort facet boxplots by median?
Here is a relatively simple way of achieving the requested arrangement using two helper function available here
reorder_within <- function(x, by, within, fun = mean, sep = "___", ...) {
new_x <- paste(x, within, sep = sep)
stats::reorder(new_x, by, FUN = fun)
}
scale_x_reordered <- function(..., sep = "___") {
reg <- paste0(sep, ".+$")
ggplot2::scale_x_discrete(labels = function(x) gsub(reg, "", x), ...)
}
library(tidyverse)
data(diamonds)
p <- ggplot(diamonds, aes(x = reorder_within(color, price, cut, median), y = price)) +
geom_boxplot(width = 5) +
scale_x_reordered()+
facet_wrap(~cut, scales = "free_x")
using ylim(0, 5500)
will remove a big part of the data resulting in different box plots which will interfere with any formerly defined order. If you wish to limit an axis without doing so it is better to use:
p + coord_cartesian(ylim = c(0, 5500))
this results in:
If you really intend to remove a big part of data and keep the arrangement, filter the data prior the plot:
diamonds %>%
filter(price < 5500) %>%
ggplot(aes(x = reorder_within(color, price, cut, median), y = price)) +
geom_boxplot(width = 5) +
scale_x_reordered()+
facet_wrap(~cut, scales = "free_x")
Reorder boxplots based on their box size with ggplot2 in R
Do you mean the Interquartile range (IQR()
)? If so you can do
diamonds %>%
as.tibble() %>%
ggplot(aes(reorder(cut, price, IQR), price)) +
geom_boxplot()
Ordering box plots on x axis by mean
Normally I'd comment and close as duplicate of, e.g.,
- How do you specifically order ggplot axis?,
- Order barchart in R,
- How to change the order of a discrete x scale in ggplot?,
- Order bars in ggplot2 bargraph,
- sorting - R ggplot ordering bars
or pretty much anything that comes up if you search Stack Overflow for "ggplot2 order". If you want boxplot-specific examples (the method is the same), see
- Ordering x in ggplot2 boxplot using computed statistic,
- How to boxplot factors and order one of the factors according to a continuous variable in ggplot2?
- r - boxplot: order groups by the mean....
Or even this one which you asked less than 2 weeks ago. Different geom
, same principle.
But, you also have some other issues, one of which is using data$column
inside aes()
which is a bit of pet peeve of mine, so let's address that too.
Don't use data$column
inside aes()
! It means you're not using the data argument correctly. Related: it's not clear at all why you start the plot with the empty data frame df_c
, when df_g
has everything you need:
ggplot(df_g, aes(x = Var2, y = Closeness), position = "dodge") +
geom_boxplot(outlier.size = 1.5)
correctly using the data
argument and not specifying data$column
inside aes()
will make sure your plot works right in all cases. If you use $
inside aes()
, facets and other complex features probably will not work. If you need to use multiple data frames in one plot, do it at the layer level (e.g., geom_point(data = other_data, aes(x = x_var, y = y_var))
). You still don't need to use $
inside aes()
.
As for your two stated problems, they are both solved by editing your data. ggplot
is very good at plotting data, you just need to make your data look like what you want to plot.
I'd like the categories (A, B, C, D) to be ordered by descending mean;
Order the factor in your data!
df_g$Var2 = with(df_g, reorder(x = Var2, X = Closeness, FUN = function(x) -mean(x, na.rm = TRUE)))
Some categories only have one sample (i.e. B, D, and E). I'd like to remove them before plotting.
Okay, remove them! You could wholly remove them from your data or just subset the data that you give to the plot:
more_than_one = levels(df_g$Var2)[table(df_g$Var2) > 1]
ggplot(subset(df_g, Var2 %in% more_than_one), aes(Var2, Closeness)) +
geom_boxplot()
Related Topics
Percentage of Overlap Between Polygons
Shiny App File Upload: How to Save the Files Uploaded on a Shiny Gui to a Particular Destination
Rmarkdown::Render() in a Loop - Cannot Allocate Vector of Size
Unexpected Behaviour with Argument Defaults
Pivot_Longer Multiple Variables of Different Kinds
Scraping Tables on Multiple Web Pages with Rvest in R
Is There a Command Similar to Matlab's "Close All" in R? (How to Close All Graphics Devices)
Copying List of Files from One Folder to Other in R
Avoid Copying the Whole Vector When Replacing an Element (A[1] <- 2)
Shiny App Does Not Reflect Changes in Update Rdata File
Visualising and Rotating a Matrix
How to Install 2 Different R Versions on Debian
How to Put a Complicated Equation into a R Formula
R - Set Execution Time Limit in Loop
Plotting Pie Charts in Ggplot2
How to Make Join Operations in Dplyr Silent
In R: Joining Vector Elements by Row, Converting Vector Rows to Strings