Ggplot Geom_Point() with Colors Based on Specific, Discrete Values

ggplot geom_point() with colors based on specific, discrete values

You need to cut your values into intervals:

library(ggplot2)
ggplot(mtcars, aes(wt, mpg)) +
geom_point(aes(colour = cut(qsec, c(-Inf, 17, 19, Inf))),
size = 5) +
scale_color_manual(name = "qsec",
values = c("(-Inf,17]" = "black",
"(17,19]" = "yellow",
"(19, Inf]" = "red"),
labels = c("<= 17", "17 < qsec <= 19", "> 19"))

resulting plot

in R, ggplot geom_point() with colors based on specific, discrete values - part 2

You were almost there! It's just the names of the 'cut' factors that are incorrect. If you try:

cut(test$TC1, c(-Inf, 1000, 5000, 50000, Inf))
# [1] (-Inf,1e+03] (1e+03,5e+03] (-Inf,1e+03] (-Inf,1e+03] (-Inf,1e+03]
# [6] (-Inf,1e+03] (-Inf,1e+03] (5e+03,5e+04] (5e+04, Inf] <NA>
# Levels: (-Inf,1e+03] (1e+03,5e+03] (5e+03,5e+04] (5e+04, Inf]

As you see the names of the levels are a bit different from what you are typing.

library(ggplot2)

ggplot(data = test, aes(x = Long, y = Lat)) +
borders("world", fill="gray75", colour="gray75", ylim = c(-60, 60)) +
geom_point(aes(size=TC2, color = cut(TC1, c(-Inf, 1000, 5000, 50000, Inf)))) +
scale_color_manual(name = "TC1",
values = c("(-Inf,1e+03]" = "green",
"(1e+03,5e+03]" = "yellow",
"(5e+03,5e+04]" = "orange",
"(5e+04, Inf]" = "red"),
labels = c("up to 1", "1 to 5", "5 to 50", "greater than 50")) +
theme(legend.position = "right") +
coord_quickmap()
#> Warning: Removed 2 rows containing missing values (geom_point).

Sample Image

Data:

test <- read.table(text = 'TC1    TC2 Lat Long    Country
1 2.9 2678.0 50.62980 -95.60953 Canada
2 1775.7 5639.9 -31.81889 123.19389 Australia
3 4.4 5685.6 -10.10449 38.54364 Tanzania
4 7.9 NA 54.81822 -99.91685 Canada
5 11.2 2443.0 7.71667 -7.91667 "Cote d\'Ivoire"
6 112.1 4233.4 -17.35093 128.02609 Australia
7 4.4 114.6 45.21361 -67.31583 Canada
8 8303.5 4499.9 46.63626 -81.39866 Canada
9 100334.8 2404.5 46.67291 -93.11937 USA
10 NA 1422.9 -17.32921 31.28224 Zimbabwe', header = T)

ggplot: color points by density as they approach a specific value?

I think that you should opt for an histogram or density plot:

n <- 500
data <- data.frame(model= rep("model",n),value = rnorm(n,56.72,10))

ggplot(data, aes(x = value, y = after_stat(count))) +
geom_histogram(binwidth = 1)+
geom_density(size = 1)+
geom_vline(xintercept = 56.72, linetype = "dashed", color = "black")+
theme_bw()

Sample Image

Here is your plot with the same data:

ggplot(data, aes(x = model, y = value))+ 
geom_point(size = 1) +
geom_hline(yintercept = 56.72, linetype = "dashed", color = "black")

Sample Image

If your model is iterative and do converge to the value, I suggest you plot as a function of the iteration to show the convergence. An other option, keeping a similar plot to your, is dodging the position of the points :

ggplot(data, aes(x = model, y = value))+ 
geom_point(position = position_dodge2(width = 0.2),
shape = 1,
size = 2,
stroke = 1,
alpha = 0.5) +
geom_hline(yintercept = 56.72, linetype = "dashed", color = "black")

Sample Image

Here is a color density plot as you asked:

library(dplyr)
library(ggplot2)
data %>%
mutate(bin = cut(value, breaks = 10:120)) %>%
dplyr::group_by(bin) %>%
mutate(density = dplyr::n()) %>%
ggplot(aes(x = model, y = value, color = density))+
geom_point(size = 1) +
geom_hline(yintercept = 56.72, linetype = "dashed", color = "black")+
scale_colour_viridis_c(option = "A")

Sample Image

Setting geom_path color based on geom_point colors

I have found an imperfect, yet workable solution. Thank you for sharing your dataset, yet as I pointed out in the comments, it did not have any points that would satisfy your criteria indicated in the original question. With that being said, I'll answer the question using a made up dataset similar to your own:

set.seed(54321)
df <- data.frame(
x=1:50,
y=sample(c('Path1', 'Path2', 'Path3'), 50, replace=TRUE),
value=as.character(sample(1:5, 50, replace=TRUE))
)

The Question

As you posed, you wanted a way of drawing a line through all your data. Points are colored according to a value, and the logic behind the color of the line is as follows:

  • When two points adjacent to one another on the x axis have the same value (same color), the line color should match the value fill color (here I'll make it a solid line)
  • when two points adjacent to one another on the x axis have different values (different colors), the line color should be black (or here, it will be dotted and gray)

For our purposes, df$x will be the x axis and df$y will be the y axis. I made df$y discrete to match the OP's case. Critically: I have also made df$value discrete. Since the OP is intending to use this to compare two points based on the logic above, it's important to force the comparison among discrete values or "binned" values rather than comparing two continuous values. This is due to unexpected results when comparing two doubles. As an example, 1.0000000000000001==1.00000000000000001 evaluates to be TRUE in the console, even though it should be FALSE, whereas both of those numbers would lie within a "bin" that was 0.999 to 1.001.

Simple plot below. Goal is to change that dotted line according to above:

g <- ggplot(df, aes(x,y)) + theme_bw() +
scale_fill_manual(values = rainbow(5)) +
scale_color_manual(values = rainbow(5))

g + geom_path(group=1, color='gray50', linetype=2) +
geom_point(shape=21, size=4, aes(fill=value))

Sample Image

The Logic and Function

At first I thought we could just set the color=value to control color and group=1 to control connectivity and we'd be all set... but that doesn't quite work properly:

g + geom_path(group=1, aes(color=value)) +
geom_point(shape=21, size=4, aes(fill=value))

Sample Image

The problem lies in that the color is always changing according to df$value, where we want it to be black or gray when df$value changes, and then be drawn again when df$value is constant. In essence, color-changing was not the problem, it was connectivity. In this case, I wrote connect_check() and used it to create another column in the dataset to control connectivity.

connect_check <- function(x) {
return_vector <- vector(length=length(x), mode='double')

grp_num <- 1
previous <- x[1]

for (i in 1:length(x)) {
if (x[i]==previous) {
return_vector[i] <- grp_num
}
else {
grp_num <- grp_num + 1
return_vector[i] <- grp_num
}
previous <- x[i]
}
return(return_vector)
}

# make a new column in the dataset
df$connected <- connect_check(df$value)

The result of connect_check() is a vector that increments the value every time the value of that position in the vector changes. Here's a simple example:

> test <- c(1,2,2,4,7,5,5,5,2,2,3,8)
> test
[1] 1 2 2 4 7 5 5 5 2 2 3 8
> connect_check(test)
[1] 1 2 2 3 4 5 5 5 6 6 7 8

The Final Plot

The final solution here is to use the newly-created df$connected to control connectivity via the group= aesthetic, and assign color=value as before. The only problem is that ggplot doesn't connect a line between a group of one point, so the kind of wonky workaround is that I'm using a geom_path call before to draw a light gray dotted line through all the points... then overplotting the points based on df$connectivity connection and their df$value. In the end, it works. I think there might be a way if you use duplicated(df$value), but again... this works too. :)

g +
geom_path(linetype=2, color='gray50', group=1) +
geom_line(aes(color=value, group=connected), size=1) +
geom_point(shape=21, size=3, aes(fill=value))

Sample Image

Note: I made the size= of the points smaller in the last plot so you can see the horizontal lines drawn where y remains constant and value either stays the same or changes.

Final point: in your own dataset, like I referenced, you could "bin" the data. I would go about that by making a separate column that assigns longData$value_bin first (which could just be as simple as longData$value_bin <- round(longData$value, 1)). You would then use df$value_bin to compare the values of points to decide connectivity and color. If point fill= is still set to value, but line color= is set to value_bin, you may not have precisely the same color.

Changing specific value color with ggplot

You can use a condition to color only the point and CI that you want. ifelse(test to select your group, color name if yes, color name if not).

library(tidyverse)

df=iris %>%
group_by(Species) %>%
summarise(min=min(Petal.Length),
max=max(Sepal.Length),
mean=(min+max)/2)
ggplot(df,aes(Species, mean,color=Species)) +
geom_point() +
geom_pointrange(aes(ymin = min, ymax = max))+
scale_color_manual(values = ifelse(df$Species=="setosa","red","black"))

To change your axis labels, you can use ggtext as indicated in this post

library(ggtext)
library(glue)

highlight = function(x, pat, color="black", family="") {
ifelse(grepl(pat, x), glue("<b style='font-family:{family}; color:{color}'>{x}</b>"), x)
}
ggplot(df,aes(Species, mean,color=Species)) +
geom_point() +
geom_pointrange(aes(ymin = min, ymax = max))+
scale_x_discrete(labels=function(x) highlight(x, "setosa", "purple")) +
scale_color_manual(values = ifelse(df$Species=="setosa","red","black"))
theme_bw()+
theme(axis.text.x=element_markdown(size=15))

It didn't work with theme_minimal, so I used a different one.

Plot

R ggplot increase points by size and by color for same variable

Here is a way to do it - create a discrete variable and plot both color & size using the variables with manual scales

Note that this end up not continous scales. In my personal experience, I found that create bucket for size & colors would help to have better control over the colors and better visualization especially in the data contains outliers.

# Create some demo data
df <- data.frame(
day_of_week = rep(c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"), 5),
light = rep(c("Daylight", "Darkness - no lighting", "Darkness - light unlit", "Darkness - light lit",
"Darkness - lighting unknown"), 7),
speed_limit = runif(35, 20, 70)
)

# Using ggplot2 for obvious reason
library(ggplot2)

# create the discrete variables using cut
df$speed_limit_cut <- cut(df$speed_limit, breaks = c(20, 30, 40, 50, 60, 70),
include.lowest = TRUE, right = TRUE)
levels_count <- length(levels(df$speed_limit_cut))

# Create the colors scale coresponded to the levels in cut
color_scales_fn <- colorRampPalette(c("#142D47", "#54AEF4"))
manual_color <- color_scales_fn(levels_count)
names(manual_color) <- levels(df$speed_limit_cut)
# Create the sizes scale
manual_size <- seq(1, by = 1, length.out = levels_count)
names(manual_size) <- levels(df$speed_limit_cut)

# Plot using the new variable
ggplot(df, aes(x=day_of_week, y=light, color = speed_limit_cut, size=speed_limit_cut)) +
geom_point() +
scale_size_manual(values = manual_size) +
scale_color_manual(values = manual_color)

Here is the output

Sample Image

Created on 2021-03-30 by the reprex package (v1.0.0)

Shade colors (change 'brightness') based on discrete values

Although changing the alpha is great, it is not exactly changing the "brightness", but the transparency of the graphs.

Here a possible workaround, also using alpha, but for an overlying black box plot with the same groups.

I added the color aesthetic to the first plot in order to separate the groups by trans.

You can then play around to get the right 'brightness values' by changing the alpha in scale_alpha_manual

library(tidyverse)
ggplot(mpg) +
# the ugly interaction call is to avoid weirdly coloured outlier dots.
geom_boxplot(aes(x = manufacturer, y = hwy, fill = manufacturer,
group = interaction(manufacturer,(str_match(trans,"auto|manual"))))) +
geom_boxplot(aes(x = manufacturer, y = hwy, alpha = str_match(trans,"auto|manual")), fill = 'black') +
scale_alpha_manual(values = c(0.1,0.4))

Sample Image

Created on 2020-01-16 by the reprex package (v0.3.0)

ggplot2 legend: combine discrete colors and continuous point size

Doing what you originally asked - continuous + discrete in a single legend - in general doesn't seem to be possible even conceptually. The only sensible thing would be to have two legends for size, with a different color for each legend.


Now let's consider having a single legend. Given your "In my case, each unique combination of point size + color is associated with a description.", it sounds like there are very few possible point sizes. In that case, you could use both scales as discrete. But I believe even that is not enough as you use different variables for size and color scales. A solution then would be to create a single factor variable with all possible combinations of color.group and point.size. In particular,

df <- data.frame(xval, yval, f = interaction(color.group, point.size), description)
ggplot(df, aes(x = xval, y = yval, size = f, color = f)) +
geom_point() + scale_color_discrete(labels = 1:3) +
scale_size_discrete(labels = 1:3)

Sample Image

Here 1:3 are those descriptions that you want, and you may also set the colors the way you like. For instance,

ggplot(df, aes(x = xval, y = yval, size = f, color = f)) + 
geom_point() + scale_size_discrete(labels = 1:3) +
scale_color_manual(labels = 1:3, values = c("red", "blue", "green"))

Sample Image

However, we may also exploit color.group by using

ggplot(df, aes(x = xval, y = yval, size = f, color = f)) + 
geom_point() + scale_size_discrete(labels = 1:3) +
scale_color_manual(labels = 1:3, values = gsub("(.*)\\..*", "\\1", sort(df$f)))

Sample Image



Related Topics



Leave a reply



Submit