How to assign colors to categorical variables in ggplot2 that have stable mapping?
For simple situations like the exact example in the OP, I agree that Thierry's answer is the best. However, I think it's useful to point out another approach that becomes easier when you're trying to maintain consistent color schemes across multiple data frames that are not all obtained by subsetting a single large data frame. Managing the factors levels in multiple data frames can become tedious if they are being pulled from separate files and not all factor levels appear in each file.
One way to address this is to create a custom manual colour scale as follows:
#Some test data
dat <- data.frame(x=runif(10),y=runif(10),
grp = rep(LETTERS[1:5],each = 2),stringsAsFactors = TRUE)
#Create a custom color scale
library(RColorBrewer)
myColors <- brewer.pal(5,"Set1")
names(myColors) <- levels(dat$grp)
colScale <- scale_colour_manual(name = "grp",values = myColors)
and then add the color scale onto the plot as needed:
#One plot with all the data
p <- ggplot(dat,aes(x,y,colour = grp)) + geom_point()
p1 <- p + colScale
#A second plot with only four of the levels
p2 <- p %+% droplevels(subset(dat[4:10,])) + colScale
The first plot looks like this:
and the second plot looks like this:
This way you don't need to remember or check each data frame to see that they have the appropriate levels.
facet_wrap and assign colors to categorical variables in ggplot2
The only way I can think of is to duplicate (with crossing
or similar) the data across all available countries.
library(dplyr)
library(tidyr)
library(ggplot2)
# helpful to find the most-impacted countries with over 1000 cases
topdat <- dat %>%
group_by(GeoId) %>%
summarize(n=max(Cases)) %>%
filter(n > 1000) %>%
arrange(desc(n))
plotdat <- dat %>%
mutate(
`Countries and territories` =
gsub("_", " ",
if_else(`Countries and territories` == "CANADA",
"Canada", `Countries and territories`))) %>%
inner_join(., topdat, by = "GeoId") %>%
arrange(DateRep) %>%
group_by(GeoId) %>%
filter(cumany(Cases > 100)) %>%
mutate(
ndays = as.numeric(difftime(DateRep, min(DateRep), units = "days")),
ncases = cumsum(Cases),
ndeaths = cumsum(Deaths),
ismax = ncases == max(ncases)
) %>%
crossing(., Country = unique(.$`Countries and territories`)) %>%
mutate(
col = case_when(
`Countries and territories` == Country ~ 1L,
GeoId %in% c("CN", "IT", "UK") ~ 2L,
TRUE ~ 3L
)
)
firstpane <- plotdat %>%
select(-Country) %>%
filter(GeoId %in% c("CN", "IT", "UK")) %>%
group_by(GeoId) %>%
slice(which.max(ncases)) %>%
crossing(., Country = unique(plotdat$`Countries and territories`))
ggplot(plotdat, mapping = aes(x = ndays, y = ncases, group = GeoId)) +
geom_line(aes(color = factor(col)), data = ~ subset(., col == 3L)) +
geom_line(aes(color = factor(col)), data = ~ subset(., col == 2L)) +
geom_line(aes(color = factor(col)), data = ~ subset(., col == 1L)) +
geom_text(aes(label = `Countries and territories`),
hjust = 0, vjust = 1.2,
data = subset(firstpane, Country == min(Country))) +
geom_point(data = firstpane) +
geom_point(color = "red", data = ~ subset(., ismax & col == 1L)) +
facet_wrap(~ Country) +
scale_y_continuous(trans = "log10", labels = scales::comma) +
scale_color_manual(values = c("red", "gray50", "#bbbbbb88"), guide = FALSE) +
labs(x = "Days since 100th case", y = NULL) +
lims(x = c(1, 100))
I did three geom_line
to manually control the layering, so the red line is always on top. Otherwise, replace all three with geom_line(aes(color = factor(col)))
.
Manually setting group colors for ggplot2
You can associate each of your groups with a colour, then pass to the function:
group.colors <- c(A = "#333BFF", B = "#CC6600", C ="#9633FF", D = "#E2FF33", E = "#E3DB71")
simplePlot <- function(DT, tit)
ggplot(DT ,aes(x=Name, y=Value, fill=Group)) +
geom_bar(stat="identity") + xlab("") + ggtitle(tit) +
#Specify colours
scale_fill_manual(values=group.colors)
Then using your plots:
grid.arrange(ncol=2, simplePlot(DT1, tit="Plot 1"),
simplePlot(DT2, tit="Plot 2"))
I think the issue with your approach was that the colours weren't named, so scale_fill_manual()
can't assoicate them. Compare:
ColorsDT <- data.table(Group=LETTERS[1:5], Color=c("#333BFF", "#CC6600", "#9633FF", "#E2FF33", "#E3DB71"), key="Group")
ColorsDT
# Group Color
#1: A #333BFF
#2: B #CC6600
#3: C #9633FF
#4: D #E2FF33
#5: E #E3DB71
with:
ColorsDT.name <- data.table(A = "#333BFF", B = "#CC6600", C = "#9633FF", D = "#E2FF33", E = "#E3DB71")
ColorsDT.name
# A B C D E
# 1: #333BFF #CC6600 #9633FF #E2FF33 #E3DB71
Add color to specific categorical variable
You can give colors manually to each level using scale_color_manual
library(ggplot2)
ggplot(iris, aes(Sepal.Length, Petal.Width, color = Species)) +
geom_point() +
scale_color_manual(values = c('setosa' = 'Blue', 'versicolor' = 'black',
'virginica' = 'black'))
If there are many such levels and it is not possible to assign colors to all those manually, we can create a named vector as suggested in this answer.
color_vec <- rep("black", length(unique(iris$Species)))
names(color_vec) <- unique(iris$Species)
color_vec[names(color_vec) == "setosa"] <- "blue"
and use this in scale_color_manual
ggplot(iris, aes(Sepal.Length, Petal.Width, color = Species)) +
geom_point() +
scale_color_manual(values = color_vec)
Related Topics
How to Fix Degree Symbol Not Showing Correctly in R on Linux/Fedora 31
How to Calculate Euclidean Distance Between Two Matrices in R
Manually Set Order of Fill Bars in Arbitrary Order Using Ggplot2
How to Round Percentage to 2 Decimal Places in Ggplot2
Split Data.Frame Row into Multiple Rows Based on Commas
Ifelse Assignment in Data.Table
Shiny Slider Customized Values
R Not Responding Request to Interrupt Stop Process
Make List of Vectors by Joining Pair-Corresponding Elements of 2 Vectors Efficiently in R
Can't Install Any R Packages on Linux Server
Creating Categorical Variables from Mutually Exclusive Dummy Variables
How to Give Numbers to Each Group of a Dataframe with Dplyr::Group_By
How to Keep The Only Intersection of The Spatial Features & Remove Everything Outside of a Boundary
Create New Variable by Multiple Conditions via Mutate Case_When
Using The Result of Summarise (Dplyr) to Mutate The Original Dataframe