How to Plot a Boxplot with Correctly Spaced Continuous X-Axis Values in Ggplot2

How to plot a boxplot with correctly spaced continuous x-axis values in ggplot2

df <- data.frame(y=abs(rnorm(8)),
x=rep(c(0,100,200,500),times=2))

ggplot(df, aes(x, y, group=x)) +
geom_boxplot()

Sample Image

This solution relies on two changes. First, to plot boxes positioned on a continuous x axis, we need to provide numeric rather than factor x values. However, this does not work by itself, because without x values being grouped by factor levels, ggplot no longer knows how to group the data into different boxes. So, we also need to provide an additional grouping variable.

How to plot a boxplot with correctly spaced continuous x-axis values and a grouping variable in ggplot2?

In your data, you have a discrete variable, i.e., class. However, you need the data to be grouped by class and x_int. So, we can specify this grouping by using interaction in the group argument for x_int and class. Then, fill with class.

library(tidyverse)

df %>%
ggplot(aes(x=x_int, y=y, group = interaction(x_int, class), fill = class)) +
geom_boxplot()

Output

Sample Image

How to create geom_boxplot with large amount of continuous x-variables

Here is a way using the original data you posted on Google - which actually was much more helpful, IMO.

ggplot(df, aes(x=CH, y=value,group=CH))+
geom_boxplot(notch=FALSE, outlier.shape=NA, fill="red", alpha=0.2)+
scale_x_log10()

Sample Image

So, as @BenBolker said before he deleted his answer(??), you should leave the x-variable (CH) as numeric, and set group=CH in the call to aes(...).

With your real data there is another problem though. Your CH is more or less logarithmically spaced, so there are about as many points < 1 as there are between 1 - 10, etc. ggplot wants to make the boxes all the same size, so with a linear x-axis the box width is smaller than the line width, and you don't see the boxes at all. Changing the x-axis to a logarithmic scale fixes that, more or less.

How to plot multiple boxplots with numeric x values properly in ggplot2?

Your question was a tough cookie, but I learned something new from it!

Just using group = dataset is not sufficient because you also have the tool variable to look out for. After digging around a bit, I found this post which made use of the interaction() function.

This is the trick that was missing. You want to use group because you are not using a factor for the x values, but you need to include tool in the separation of your data (hence using interaction() which will compute the possible crosses between the 2 variables).

# This is for pretty-printing the axis labels
my_labs <- function(x){
paste0(x/1000, "k")
}
levs <- unique(data2$dataset)

ggplot(data2, aes(x = dataset, y = time, color = tool,
group = interaction(dataset, tool))) +
geom_boxplot() + labs(x = 'Datasets', y = 'Seconds', title = 'Time') +
scale_x_log10(breaks = levs, labels = my_labs) + # define a log scale with your axis ticks
scale_y_log10() + theme_bw()

This plots

Sample Image

R: How to plot a boxplot with numeric x-axis for according spacing (not ggplot)

You can use the at argument to specify x locations for your boxplots, though to get them narrow enough to avoid overplotting, you need to add an invisible box and set the relative widths of the visible boxes to a smaller value:

boxplot(cbind(kraft_ou, n = rep(NA, nrow(kraft_ou))),
names=c("1,0 [N]", "1,3 [N]","1,6 [N]","2,0 [N]","2,5 [N]","3,1 [N]",
" "),
col = "bisque",
ylim = c(1, 7), width = c(0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 1),
at = c(1, 1.3, 1.6, 2.0, 2.5, 3.1, 3.1))
abline(h = 4)

Sample Image

To add a regression line, you would need to have all your data frame values in a single y variable, and a vector of their corresponding x axis positions:

abline(lm(unlist(kraft_ou) ~ rep(c(1, 1.3, 1.6, 2.0, 2.5, 3.1), each = 30)))

Sample Image

How to Create Boxplots with a Continuous x axis in R?

If I understood correctly:

I turn the rownames into the first column

library(data.table)
setDT(df, keep.rownames = TRUE)[]

Then melt it with reshape2

library(reshape2)
df=melt(df,id.vars=c("rn","age"))

And plot it using ggplot2

library(ggplot2)
ggplot(df,aes(x=age,y=value,group=rn))+geom_boxplot()

Sample Image

Ggplot2 Boxplot width setting changes x-axis

I think the strange behaviour comes from ggplot trying to automatically dodge your boxplots apart. By setting position = position_dodge(width = 0) the plot seems to be created as expected without changing the placement of boxes along the x-axis. (But gives a warning about overlapping x intervals)

Lat<- c(50.70228,50.70228,50.70228,51.82067,51.82067,51.82067,52.45893,52.45893,52.45893,52.76478,52.76478,52.76478,52.78354,52.78354,52.78354,53.56102,53.56102,53.56102,53.65364,53.65364,53.65364,53.63130,53.63130,53.63130,54.19035,54.19035,54.19035,54.25751,54.25751,54.25751,54.23526,54.23526,54.23526,54.62469,54.62469,54.62469,54.67831,54.67831,54.67831,54.67900,54.67900,54.67900,54.94908,54.94908,54.94908,55.19456,55.19456,55.19456,54.79198,54.79198,54.79198,55.34981,55.34981,55.34981,55.85655,55.85655,55.85655,56.06078,56.06078,56.06078,55.84553,55.84553,55.84553,56.00197,56.00197,56.00197,56.71842,56.71842,56.71842,57.00116,57.00116,57.00116,57.06942,57.06942,57.06942,57.26815,57.26815,57.26815,57.45532,57.45532,57.45532,57.88596,57.88596,57.88596,51.07711,51.07711,51.07711,51.07801,51.07621,51.11159,51.11159,51.11159,52.02484,52.02484,52.02484,52.02581,52.02581,52.02581,52.02685,52.02685,52.02685,52.05353,52.05353,52.05626,52.05353,52.05353,52.05353,52.05353,52.05353,52.05353,51.93541,51.93541,51.93541,51.93541,51.93541,51.93541,51.93541,51.93541,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.92425,52.90810,52.90810,52.90810,52.90810,52.90810,52.90810,52.78968,52.78778,52.78968,52.78968,52.78881,52.78883,52.78883,52.78883,52.78970,52.78970,52.79506,52.79506,52.79506,53.77270,53.77276,53.77109,53.77109,53.77276,53.76845,53.76845,53.77109,53.76845,53.77109,53.87020,53.87020,53.87020,53.87103,53.88205,53.88205,53.88205,53.88205,53.87701,53.87701,53.87098,53.87098,53.87098,53.86932,53.86932,53.86932,56.51869,56.51869,56.51869,56.55870,56.55870,56.55870,56.55964,56.55964,56.55964,57.51056,57.49542,57.49542,57.50878,57.50878,57.50878,57.45201,57.45477,57.45192,57.45192,57.45192)
y <- c(33.45407,21.40954,27.73487,20.38318,26.65483,31.68201,23.95467,20.77363,32.94192,22.71228,25.78824,28.39449,35.60615,24.29325,22.95047,25.65343,30.23262,22.05534,37.20565,35.53812,38.20211,39.38034,35.16619,38.82336,29.72370,38.25754,26.51339,39.38283,29.57483,31.80111,24.52967,34.83037,21.75038,35.50868,39.41830,21.96971,22.82504,32.69746,35.10747,27.75669,34.96690,37.61921,37.17226,20.50448,39.26582,22.08668,28.41502,36.69530,23.69404,23.18052,33.27420,23.04157,33.17285,32.00579,21.83845,22.97143,32.27190,21.53771,38.65481,20.14341,33.62718,39.86755,39.77881,30.59810,27.65909,24.11646,34.56981,29.30249,34.99361,32.39553,28.90443,34.88775,22.77049,36.44468,30.64496,35.81501,31.77673,24.19058,39.36298,21.47219,23.02268,31.37647,27.28457,33.14749,23.20842,39.73427,39.81399,35.51515,24.55080,39.41190,29.59987,38.46791,20.94479,37.22109,26.36060,30.91641,39.25975,39.88288,22.59061,30.24439,21.66110,30.36878,28.76901,38.75561,33.80408,31.05842,26.18921,21.30804,35.02966,33.85981,30.84373,31.67341,35.07605,37.93820,31.30481,21.45117,37.13626,25.70964,25.64736,38.58381,31.24448,26.55902,23.90817,33.70300,26.48909,37.73200,32.52413,22.44440,28.19878,32.46415,25.13711,26.66075,28.16254,20.40673,39.89327,30.83327,32.40196,39.81218,39.80391,21.87316,34.95792,33.38958,38.18441,22.03114,35.64410,34.90643,24.23056,36.66581,29.35813,20.86880,30.02044,36.13727,24.65558,39.43175,29.00154,29.78185,22.89196,37.15204,35.88188,28.73920,28.04934,37.50701,30.36306,28.39842,35.20973,26.54260,29.57763,26.03163,26.90440,27.60110,25.80086,39.98019,21.59970,28.83825,32.01711,20.50812,38.43331,32.41898,27.68722,32.59905,24.18150,29.05701,22.38512,32.93342,37.66694,37.65391,34.19613,23.89985,36.90012,20.74244,27.08511,29.21433,35.83771,35.59557,33.74533,27.08854,38.38994)
V3 <-c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2)

library(ggplot2)
df <- as.data.frame(cbind(Lat, y, as.factor(V3)))

df_plot <- ggplot(df) +
geom_boxplot(aes(colour=as.factor(V3), x=Lat, y=y, group=as.factor(Lat)),
position=position_dodge(width = 0),
width=1) +
theme_classic()

boxplot with overlapping boxes

Grouped Boxplot on discrete x-axis in R

You could make the x axis discrete, simply feeding in the extra factor levels that you want to make the appropriate breaks in the x axis:

ggplot(data = df, aes(x = factor(n, levels = c(200, 250, 300)), y = value)) + 
geom_boxplot(aes(fill = variable)) +
scale_y_log10() +
scale_x_discrete(drop = FALSE, name = 'n')

Sample Image

Fill and dodge boxplots by group on a continuous x axis

From ?aes_group_order:

By default, the group is set to the interaction of all discrete variables in the
plot.

In your data, you only have one discrete variable, "fill". However, we wish the data to be grouped by both "fill" and "x". Thus, we need to specify the desired grouping using the group argument. And yes, you were correct, interaction is the way to go.

First, a slightly smaller data set (easier to link data to output):

d <- data.frame(x = rep(c(1, 2, 4), each = 8),
grp = rep(c("a", "b"), each = 4),
y = sample(24))

Then the plot, where we group data by the different combinations of "x" and "grp" (interaction(x, grp)), and fill the boxes by "grp":

ggplot(d, aes(x = x, y = y, group = interaction(x, grp), fill = grp)) +
geom_boxplot()

Sample Image



Related Topics



Leave a reply



Submit