Add Geom_Line to Link All The Geom_Point in Boxplot Conditioned on a Factor with Ggplot2

add geom_line to link all the geom_point in boxplot conditioned on a factor with ggplot2

Some days ago position_jitterdodge gained a seed argument in the development version. So,

devtools::install_github("tidyverse/ggplot2")
library(ggplot2)

and then

pd = ggplot2::position_jitterdodge(dodge.width = 0.75, jitter.width = 0.3, seed = 1)
df %>%
ggplot (aes(x= time, y=result, fill=gender))+
geom_boxplot(alpha=0.2)+
geom_point(aes(color = gender, group=subject),position = pd)+
geom_line(aes(color=gender, group=subject),
position=pd, alpha=0.3)+
scale_fill_brewer(palette = 'Set1')+
scale_color_brewer(palette = 'Set1')

gives this plot

Sample Image

Join data points on boxplot with lines ggplot2

Without the actual data it is hard give you in depth help, please refer to this site for a guide for a great reproducible example, as mentioned in the comments.

I am assuming you want to compare one datapoint from alLeft6 to one from pmLeft6 (otherwise the horizontal line would make little sense). This indicates you have some column in your data linking these points together (Pairs in the example data).

With made up data this would be as easy as setting the geom_line() grouping variable to this column (Pairs). To align the geom_point() with the geom_line() with jitter an easy solution is to define the offset before the ggplot call (here called pd).

library(tidyverse)
grid.mag.ROIcontrols.allRuns = tibble(Model = c(rep("alLeft6", 10),rep("pmLeft6", 10),rep("alRight6", 10),rep("pmRight6", 10)),
Grid_Magnitude = c(runif(10, -1, 1),runif(10, -0.5, 1.5), runif(10, -1, 1),runif(10, -1.5, 0.5)),
Side = c(rep("Left", 20), rep("Right", 20)),
Pair = c(rep(1:10, 2), rep(11:20, 2))
) %>%
mutate(Pair = as.factor(Pair))

pd <- position_dodge(0.2)
ggplot(grid.mag.ROIcontrols.allRuns, aes(Model,Grid_Magnitude)) +
geom_boxplot(aes(fill=Model),outlier.shape = NA,alpha=0.6) +
geom_line(aes(group=Pair), position = pd) +
geom_point(aes(fill=Model,group=Pair),size=2,shape=21, position = pd) +
facet_grid(~Side,scales = "free") +
scale_fill_brewer(palette="GnBu") +
labs(title = "Average Grid Magnitude, pm vs al EC")

Sample Image

Implementing paired lines into boxplot.ggplot2

I do not know the package from which ggplot2.boxplot comes from but I will show you how perform the requested operation in ggplot.

The requested output is a bit problematic for ggplot since you want both points and lines connecting them to be jittered by the same amount. One way to perform that is to jitter the points prior making the plot. But the x axis is discrete, here is a workaround:

b <- runif(nrow(df), -0.1, 0.1)

ggplot(df) +
geom_boxplot(aes(x = as.numeric(condition), y = pain, group = condition))+
geom_point(aes(x = as.numeric(condition) + b, y = pain)) +
geom_line(aes(x = as.numeric(condition) + b, y = pain, group = ID)) +
scale_x_continuous(breaks = c(1,2), labels = c("No Treatment", "Treatment"))+
xlab("condition")

Sample Image

First I have made a vector to jitter by called b, and converted the x axis to numeric so I could add b to the x axis coordinates. Latter I relabeled the x axis.

I do agree with eipi10's comment that the plot works better without jitter:

ggplot(df, aes(condition, pain)) +
geom_boxplot(width=0.3, size=1.5, fatten=1.5, colour="grey70") +
geom_point(colour="red", size=2, alpha=0.5) +
geom_line(aes(group=ID), colour="red", linetype="11") +
theme_classic()

Sample Image

and the updated plot with jittered points eipi10 style:

ggplot(df) +
geom_boxplot(aes(x = as.numeric(condition),
y = pain,
group = condition),
width=0.3,
size=1.5,
fatten=1.5,
colour="grey70")+
geom_point(aes(x = as.numeric(condition) + b,
y = pain),
colour="red",
size=2,
alpha=0.5) +
geom_line(aes(x = as.numeric(condition) + b,
y = pain,
group = ID),
colour="red",
linetype="11") +
scale_x_continuous(breaks = c(1,2),
labels = c("No Treatment", "Treatment"),
expand = c(0.2,0.2))+
xlab("condition") +
theme_classic()

Sample Image

Combining geom_point and geom_line with position_jitterdodge for two grouping factors

One possible solution - specifying jitter values manualy:

library(ggplot2)

a <- c(1,2,3,4,5,6,1,2,3,4,5,6)
# b <- c("loss", "draw", "win", "draw", "loss", "win", "loss", "draw", "win", "draw", "loss", "win")
b <- c(2, 1, 3, 1, 2, 3, 2, 1, 3, 1, 2, 3)
c <- c(2, 3, 5, 4, 4, 5, 4, 4, 3, 5, 2, 4)
d <- c(rep("x", 6), rep("y", 6))
temp <- data.frame(a,b,c,d)

set.seed(2016)
jitterVal <- runif(12, max = 0.25)
jitterVal <- jitterVal * ifelse(temp$d == "x", -1, +1)

ggplot(temp, aes(x = b + jitterVal, y = c, fill = d, colour = d)) +
geom_point() +
geom_line(aes(group = a)) +
scale_x_continuous(breaks = c(1, 2, 3), labels = c("draw", "loss", "win")) +
xlab(NULL) +
expand_limits(x = c(0.5, 3.5))

How to connect grouped points in ggplot within groups?

Not a direct answer to your question, but I wanted to suggest an alternative visualisation.

You are dealing with paired data. A much more convincing visualisation is achieved with a scatter plot. You will use the two dimensions of your paper rather than mapping your two dimensions onto only one. You can compare control with subjects better and see immediately which one got better or worse.

library(tidyverse)

d <- data.frame (
Subject = c("1", "2", "3", "4"),
Group = c("Exp", "Exp", "Control", "Control"),
Tr = c("14", "11", "4", "23"),
Sr = c("56", "78", "12", "10"),
Increase = c("TRUE", "TRUE", "TRUE", "FALSE")
) %>%
## convert to numeric first
mutate(across(c(Tr,Sr), as.integer))

## set coordinate limits
lims <- range(c(d$Tr, d$Sr))

ggplot(d) +
geom_point(aes(Tr, Sr, color = Group)) +
## adding a line of equality and setting limits equal helps guide the eye
geom_abline(intercept = 0, slope = 1, lty = "dashed") +
coord_equal(xlim = lims , ylim = lims )

Sample Image

geom_point() + geom_boxplot() for different colour/fill variable?

You should swap the order of geom_point and geom_boxplot and add alpha to geom_boxplot like this:

company_code <- c(1,1,1,1,2,2,2,2,3,3,3,3)
financial_year <- c("2018-19","2019-20","2020-21","2018-21","2018-19","2019-20","2020-21","2018-21","2018-19","2019-20","2020-21","2018-21")
production <- c(2000,2500,3000,7500,1000,1500,1000,3500,5000,5500,4000,14500)

df <- data.frame(company_code,financial_year,production)
df$company_code <- as.factor(df$company_code)
df$financial_year <- as.factor(df$financial_year)

library(ggplot2); library(dplyr); library(forcats)
df %>%
arrange(financial_year != "2018-19", -production) %>%
mutate(company_code = fct_inorder(as.factor(company_code))) %>%
arrange(company_code, financial_year) %>%
ggplot(., aes(x=financial_year, y = production)) +
geom_boxplot(colour = "black", alpha = 0) +
geom_point(aes(colour=company_code),position = position_jitter(height=0, width=0.2),
size = 1.1, alpha = 0.6)

Sample Image

Created on 2022-08-29 with reprex v2.0.2

Connecting grouped points with lines in ggplot

(Inspired by @aosmith's answer here to a similar question)

I'd suggest jittering before ggplot -- that way both the points and the lines can use the same points.

library(dplyr)
d_jit <- d %>%
# position on x axis is based on combination of B and jittered A. Mix to taste.
mutate(A_jit = as.numeric(B)*0.5 - 0.75 + jitter(as.numeric(A), 0.5),
grouping = interaction(id, A))

# Trick borrowed from https://stackoverflow.com/questions/44656299/ggplot-connecting-each-point-within-one-group-on-discrete-x-axis
# ... x-axis defined using A and geom_blank, but added layers use A_jit
ggplot(d_jit, aes(x=A, y=y, group = grouping)) +
geom_blank() +
geom_line(aes(A_jit), alpha = 0.2) +
geom_point(aes(A_jit, col=B))

Sample Image

Plotting lines and the group aesthetic in ggplot2

In the words of Hadley himself:

The important thing [for a line graph with a factor on the horizontal axis] is to manually specify the grouping. By
default ggplot2 uses the combination of all categorical variables in
the plot to group geoms - that doesn't work for this plot because you
get an individual line for each point. Manually specify group = 1
indicates you want a single line connecting all the points.

You can actually group the points in very different ways as demonstrated by koshke here

R: boxplot: draw lines between each subject in case of repeated measurements

A solution using boxplot() and lines(). I'm not sure your approach will be easy to read. Maybe you should make the boxplots a little fade out as shown.

lvl <- lapply(plot_data[c("group", "ID")], unique)  # store levels

clr <- Map(function(x, alpha)
rainbow(length(x), alpha=alpha), lvl, alpha=c(.1, 1)) # prefab colors
#Note: the alpha controls the fade, here .1 for boxplot and 1 for lines

boxplot(value ~ group, data=plot_data, border="darkgrey", col=clr$group)
sapply(seq(lvl$ID), function(x)
lines(value ~ group, data=plot_data[plot_data$ID == lvl$ID[x], ], type="b",
col=clr$ID[x], lwd=2))
legend("topleft", legend=lvl$ID, lwd=2, col=clr$ID, ncol=2, cex=.8, bty="n")

Yields

Sample Image


Data:

plot_data <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L), .Label = c("stage1", "stage2", "stage3", "stage4"), class = "factor"),
value = c(5.78956733993255, 5.88809484057128, 5.10837934492156,
5.4757885155268, 5.97264352883212, 5.67352280486375, 5.20609766873531,
5.86022568447515, 5.81150085269473, 5.94000163977034, 6.0783141606953,
5.94233451236505, 6.49362113315146, 6.12048651557416, 6.1347389126895,
5.20223867462482, 5.87281575519592, 5.73356315004639, 5.75488595489878,
6.36840553430375, 4.99227170993108, 5.18667783471756, 4.99904926030431,
5.15853247770574, 5.11713153058663, 5.49876751452684, 5.20934719201177,
5.02541789095849, 5.01650351425633, 5.5694368747063, 6.42072400392499,
5.5407249458367, 5.87118571228348, 6.68436990131158, 6.81803358788602,
6.84979289071634, 6.78138321859296, 6.70059150888119, 5.99049715092406,
6.45158472727053), ID = structure(c(1L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 2L, 1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L,
1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L, 1L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 2L), .Label = c("id1", "id10", "id2",
"id3", "id4", "id5", "id6", "id7", "id8", "id9"), class = "factor")), row.names = c(NA,
-40L), class = "data.frame")

if else condition in ggplot to add an extra layer

What you are seeing is a syntax error. The most robust way I can think of is:

tmp.data<-c(1,2,3) 
if(tmp.data[1]!="no value") {
p = p + geom_point()
}
p + geom_line()

So you compose the object p in a sequence, only adding geom_point() when the if statements yields TRUE.



Related Topics



Leave a reply



Submit