How to Manually Set Colours to a Categorical Variables Using Ggplot()

How to assign colors to categorical variables in ggplot2 that have stable mapping?

For simple situations like the exact example in the OP, I agree that Thierry's answer is the best. However, I think it's useful to point out another approach that becomes easier when you're trying to maintain consistent color schemes across multiple data frames that are not all obtained by subsetting a single large data frame. Managing the factors levels in multiple data frames can become tedious if they are being pulled from separate files and not all factor levels appear in each file.

One way to address this is to create a custom manual colour scale as follows:

#Some test data
dat <- data.frame(x=runif(10),y=runif(10),
grp = rep(LETTERS[1:5],each = 2),stringsAsFactors = TRUE)

#Create a custom color scale
library(RColorBrewer)
myColors <- brewer.pal(5,"Set1")
names(myColors) <- levels(dat$grp)
colScale <- scale_colour_manual(name = "grp",values = myColors)

and then add the color scale onto the plot as needed:

#One plot with all the data
p <- ggplot(dat,aes(x,y,colour = grp)) + geom_point()
p1 <- p + colScale

#A second plot with only four of the levels
p2 <- p %+% droplevels(subset(dat[4:10,])) + colScale

The first plot looks like this:

Sample Image

and the second plot looks like this:

Sample Image

This way you don't need to remember or check each data frame to see that they have the appropriate levels.

facet_wrap and assign colors to categorical variables in ggplot2

The only way I can think of is to duplicate (with crossing or similar) the data across all available countries.

library(dplyr)
library(tidyr)
library(ggplot2)

# helpful to find the most-impacted countries with over 1000 cases
topdat <- dat %>%
group_by(GeoId) %>%
summarize(n=max(Cases)) %>%
filter(n > 1000) %>%
arrange(desc(n))

plotdat <- dat %>%
mutate(
`Countries and territories` =
gsub("_", " ",
if_else(`Countries and territories` == "CANADA",
"Canada", `Countries and territories`))) %>%
inner_join(., topdat, by = "GeoId") %>%
arrange(DateRep) %>%
group_by(GeoId) %>%
filter(cumany(Cases > 100)) %>%
mutate(
ndays = as.numeric(difftime(DateRep, min(DateRep), units = "days")),
ncases = cumsum(Cases),
ndeaths = cumsum(Deaths),
ismax = ncases == max(ncases)
) %>%
crossing(., Country = unique(.$`Countries and territories`)) %>%
mutate(
col = case_when(
`Countries and territories` == Country ~ 1L,
GeoId %in% c("CN", "IT", "UK") ~ 2L,
TRUE ~ 3L
)
)

firstpane <- plotdat %>%
select(-Country) %>%
filter(GeoId %in% c("CN", "IT", "UK")) %>%
group_by(GeoId) %>%
slice(which.max(ncases)) %>%
crossing(., Country = unique(plotdat$`Countries and territories`))

ggplot(plotdat, mapping = aes(x = ndays, y = ncases, group = GeoId)) +
geom_line(aes(color = factor(col)), data = ~ subset(., col == 3L)) +
geom_line(aes(color = factor(col)), data = ~ subset(., col == 2L)) +
geom_line(aes(color = factor(col)), data = ~ subset(., col == 1L)) +
geom_text(aes(label = `Countries and territories`),
hjust = 0, vjust = 1.2,
data = subset(firstpane, Country == min(Country))) +
geom_point(data = firstpane) +
geom_point(color = "red", data = ~ subset(., ismax & col == 1L)) +
facet_wrap(~ Country) +
scale_y_continuous(trans = "log10", labels = scales::comma) +
scale_color_manual(values = c("red", "gray50", "#bbbbbb88"), guide = FALSE) +
labs(x = "Days since 100th case", y = NULL) +
lims(x = c(1, 100))

covid plots of most-impacted countries

I did three geom_line to manually control the layering, so the red line is always on top. Otherwise, replace all three with geom_line(aes(color = factor(col))).

Manually setting group colors for ggplot2

You can associate each of your groups with a colour, then pass to the function:

group.colors <- c(A = "#333BFF", B = "#CC6600", C ="#9633FF", D = "#E2FF33", E = "#E3DB71")

simplePlot <- function(DT, tit)
ggplot(DT ,aes(x=Name, y=Value, fill=Group)) +
geom_bar(stat="identity") + xlab("") + ggtitle(tit) +
#Specify colours
scale_fill_manual(values=group.colors)

Then using your plots:

grid.arrange(ncol=2,  simplePlot(DT1, tit="Plot 1"), 
simplePlot(DT2, tit="Plot 2"))

Sample Image

I think the issue with your approach was that the colours weren't named, so scale_fill_manual() can't assoicate them. Compare:

ColorsDT <-  data.table(Group=LETTERS[1:5], Color=c("#333BFF", "#CC6600", "#9633FF", "#E2FF33", "#E3DB71"), key="Group")
ColorsDT
# Group Color
#1: A #333BFF
#2: B #CC6600
#3: C #9633FF
#4: D #E2FF33
#5: E #E3DB71

with:

ColorsDT.name <-  data.table(A = "#333BFF", B = "#CC6600", C = "#9633FF", D = "#E2FF33", E =  "#E3DB71")
ColorsDT.name
# A B C D E
# 1: #333BFF #CC6600 #9633FF #E2FF33 #E3DB71

Add color to specific categorical variable

You can give colors manually to each level using scale_color_manual

library(ggplot2)

ggplot(iris, aes(Sepal.Length, Petal.Width, color = Species)) +
geom_point() +
scale_color_manual(values = c('setosa' = 'Blue', 'versicolor' = 'black',
'virginica' = 'black'))

Sample Image


If there are many such levels and it is not possible to assign colors to all those manually, we can create a named vector as suggested in this answer.

color_vec <- rep("black", length(unique(iris$Species)))
names(color_vec) <- unique(iris$Species)
color_vec[names(color_vec) == "setosa"] <- "blue"

and use this in scale_color_manual

ggplot(iris, aes(Sepal.Length, Petal.Width, color = Species)) + 
geom_point() +
scale_color_manual(values = color_vec)


Related Topics



Leave a reply



Submit