Easiest Way to Discretize Continuous Scales for Ggplot2 Color Scales

easiest way to discretize continuous scales for ggplot2 color scales?

The solution is slightly complicated, because you want a discrete scale. Otherwise you could probably simply use round.

library(ggplot2)

bincol <- function(x,low,medium,high) {
  breaks <- function(x) pretty(range(x), n = nclass.Sturges(x), min.n = 1)

  colfunc <- colorRampPalette(c(low, medium, high))

  binned <- cut(x,breaks(x))

  res <- colfunc(length(unique(binned)))[as.integer(binned)]
  names(res) <- as.character(binned)
  res
}

labels <- unique(names(bincol(iris$Sepal.Length,"blue","yellow","red")))
breaks <- unique(bincol(iris$Sepal.Length,"blue","yellow","red"))
breaks <- breaks[order(labels,decreasing = TRUE)]
labels <- labels[order(labels,decreasing = TRUE)]

ggplot(iris) + 
  geom_point(aes(x=Sepal.Width, y=Sepal.Length,
                 colour=bincol(Sepal.Length,"blue","yellow","red")), size=4) +
  scale_color_identity("Sepal.Length", labels=labels, 
                       breaks=breaks, guide="legend")

Sample Image

ggplot2 continuous colors for discrete scale and delete a legend

You have two scales because you are using both a fill and a color aesthetic, and you are using different variables for them (one discrete, one continuous). What you want to do is use a single variable just for the fill.

Additionally, your labels are all messed up because you are passing them as character, which ggplot will then sort lexically (so "10" comes before "2").

Here is a solution that bypasses both these problems. We keep the original factor format for the output of cut, which will be labeled in the correct order, and we just use the fill aesthetic. Note also how much simpler it is to set the scale by creating the colors and labeling them with the levels, and using scale_fill_manual:

library(ggplot2)
df <- expand.grid(1:10, 1:10)                   # make up data
df <- transform(df, z=Var1 * Var2)              # make up data
df <- transform(df, z.cut=cut(z, 10))           # bin data

colors <- colorRampPalette(c("blue", "yellow", "red"))(length(levels(df$z.cut)))
ggplot(df, aes(x=Var1, y=Var2, fill=z.cut)) + 
  geom_tile() +
  scale_fill_manual(values=setNames(colors, levels(df$z.cut)))

Sample Image

Combine continuous and discrete color scale in ggplot2?

I'm sure this can be made more efficient, but here's one approach. Essentially, we follow your advice of subsetting the data into the different parts, divide the continuous data into discrete bins, then patch everything back together and use a scale of our own choosing.

library(ggplot2)
library(RColorBrewer)

#Sample data
dat <- data.frame(x = rnorm(100), y = rnorm(100), z = rnorm(100))
dat[sample(nrow(dat), 5), 3] <- NA
dat[sample(nrow(dat), 5), 3] <- Inf

#Subset out the real values
dat.good <- dat[!(is.na(dat$z)) & is.finite(dat$z) ,]
#Create 6 breaks for them
dat.good$col <- cut(dat.good$z, 6)

#Grab the bad ones
dat.bad <- dat[is.na(dat$z) | is.infinite(dat$z) ,]
dat.bad$col <- as.character(dat.bad$z)

#Rbind them back together
dat.plot <- rbind(dat.good, dat.bad)

#Make your own scale with RColorBrewer
yourScale <- c(brewer.pal(6, "Blues"), "red","green")

ggplot(dat.plot, aes(x,y, colour = col)) + 
  geom_point() +
  scale_colour_manual("Intensity", values = yourScale)

Sample Image

Understanding color scales in ggplot2

This is a good question... and I would have hoped there would be a practical guide somewhere. One could question if SO would be a good place to ask this question, but regardless, here's my attempt to summarize the various scale_color_*() and scale_fill_*() functions built into ggplot2. Here, we'll describe the range of functions using scale_color_*(); however, the same general rules will apply for scale_fill_*() functions.

Overall Categorization

There are 22 functions in all, but happily we can group them intelligently based on practical usage scenarios. There are three key criteria that can be used to define practically how to use each of the scale_color_*() functions:

Nature of the mapping data. Is the data mapped to the color aesthetic discrete or continuous? CONTINUOUS data is something that can be explained via real numbers: time, temperature, lengths - these are all continuous because even if your observations are 1 and 2, there can exist something that would have a theoretical value of 1.5. DISCRETE data is just the opposite: you cannot express this data via real numbers. Take, for example, if your observations were: "Model A" and "Model B". There is no obvious way to express something in-between those two. As such, you can only represent these as single colors or numbers.
The Colorspace. The color palette used to draw onto the plot. By default, ggplot2 uses (I believe) a color palette based on evenly-spaced hue values. There are other functions built into the library that use either Brewer palettes or Viridis colorspaces.
The level of Specification. Generally, once you have defined if the scale function is continuous and in what colorspace, you have variation on the level of control or specification the user will need or can specify. A good example of this is the functions: *_continuous(), *_gradient(), *_gradient2(), and *_gradientn().

Continuous Scales

We can start off with continuous scales. These functions are all used when applied to observations that are continuous variables (see above). The functions here can further be defined if they are either binned or not binned. "Binning" is just a way of grouping ranges of a continuous variable to all be assigned to a particular color. You'll notice the effect of "binning" is to change the legend keys from a "colorbar" to a "steps" legend.

The continuous example (colorbar legend):

library(ggplot2)
cont <- ggplot(mtcars, aes(mpg, disp, color=cyl)) + geom_point(size=4)

cont + scale_color_continuous()

Sample Image

The binned example (color steps legend):

cont + scale_color_binned()

Sample Image

The following are continuous functions.

Name of Function	Colorspace	Legend	What it does
scale_color_continuous()	default	Colorbar	basic scale (as if you did nothing)
scale_color_gradient()	user-defined	Colorbar	define `low` and `high` values
scale_color_gradient2()	user-defined	Colorbar	define `low` `mid` and `high` values
scale_color_gradientn()	user_defined	Colorbar	define any number of incremental val
scale_color_binned()	default	Colorsteps	basic scale, but binned
scale_color_steps()	user-defined	Colorsteps	define `low` and `high` values
scale_color_steps2()	user-defined	Colorsteps	define `low`, `mid`, and `high` vals
scale_color_stepsn()	user-defined	Colorsteps	define any number of incremental vals
scale_color_viridis_c()	Viridis	Colorbar	viridis color scale. Change palette via `option=`.
scale_color_viridis_b()	Viridis	Colorsteps	Viridis color scale, binned. Change palette via `option=`.
scale_color_distiller()	Brewer	Colorbar	Brewer color scales. Change palette via `palette=`.
scale_color_fermenter()	Brewer	Colorsteps	Brewer color scale, binned. Change palette via `palette=`.

Plotting discrete and continuous scales in same ggplot

You can do this. You need to tell grid graphics to overlay one plot on top of the other. You have to get margins and spacing etc, exactly right, and you have to think about the transparency of the top layers. In short... it's not worth it. As well as possibly making the plot confusing.

However, I thought some people might like a pointer on how to acheive this. N.B. I used code from this gist to make the elements in the top plot transparent so they don't opaque the elements below:

grid.newpage()
pushViewport( viewport( layout = grid.layout( 1 , 1 , widths = unit( 1 , "npc" ) ) ) ) 
print( p1 + theme(legend.position="none") , vp = viewport( layout.pos.row = 1 , layout.pos.col = 1 ) )
print( p2 + theme(legend.position="none") , vp = viewport( layout.pos.row = 1 , layout.pos.col = 1 ) )

See my answer here for how to add legends into another position on the grid layout.

Sample Image

Add continous / discrete alpha scale with custom color

One option is to use override.aes in guide_legend() to change the look of the alpha legend. For example, add an overall fill color and then an alpha value for each of the 5 boxes. I used seq() to make an even sequence of alpha values but I'm not sure if that is the best option.

I did this all in scale_alpha_continuous() via the guide argument:

guide = guide_legend(override.aes = list(fill = "grey24",
                                         alpha = seq(.5, 1, length.out = 5)))

Here's the whole plot and output.

ggplot(fluct_all, aes(x = type, y = test_acc, alpha = fold, fill = type, group = interaction(type, fold))) +
    stat_boxplot(geom = "errorbar", alpha = 1, width = 0.2, position = position_dodge(0.9)) +
    geom_boxplot(outlier.alpha = 0, outlier.size = 0, fill = "white", alpha = 1, position = position_dodge(0.9)) +
    geom_boxplot(outlier.alpha = 0.01, outlier.size = 0.75, position = position_dodge(0.9)) +
    scale_fill_discrete(
        # values = color_scale_class,
        name = "Input Data\nCombination",
        labels = c("TLS", "TLS & GEO", "TLS & RGB", "ALL")
    ) +
    theme_light() +
    # theme(
    #     text = element_text(size = 14, family = "Calibri"),
    #     legend.title = element_text(family = "Calibri", size = 16),
    #     legend.key.width = unit(0.75, "cm"),
    #     legend.key.height = unit(1, "cm"),
    #     legend.text = element_text(family = "Calibri", size = 14)
    # ) +
    scale_x_discrete(labels = c("TLS", "TLS & GEO", "TLS & RGB", "ALL")) +
    xlab("") +
    ylab("Test Accuracy\n") +
    scale_alpha_continuous(range = c(0.5, 1), name = "\nFold",
                           guide = guide_legend(override.aes = list(fill = "grey24",
                                                                    alpha = seq(.5, 1, length.out = 5))))

Sample Image

^{Created on 2021-08-04 by the reprex package (v2.0.0)}

General way to break on unique values in ggplot2 continuous scales

Indeed, ggplot2 does not have a general way to do this. For continuous scales, the training method is to update the range of the scale every time a new layer is examined. It makes sense in the 'grammar of graphics' that scales are mostly independent of geometry layers.

You could, in theory, tackle this problem from the bottom up by making a new Range ggproto class that keeps track of unique values. However, ggplot2 does not export their Range classes, which likely means they don't support tinkering with this. Also, its quite the task to setup a new type of scale.

Instead I'm proposing to hack the ggplot_add() method to leak information from the global plot to the scale. First thing to do is to wrap the constructor of a scale, that tags on an extra class to that scale.

library(ggplot2)

scale_x_unique <- function(...) {
  sc  <- scale_x_continuous(...)
  new <- ggproto("ScaleUnique", sc)
  new
}

Next, we want to update the ggplot_add method for our ScaleUnique class. The function beneath essentially checks if there are any user-defined breaks and, if there are none, evaluate the scale's aesthetics in the global plot data.

ggplot_add.ScaleUnique <- function(object, plot, object_name) {
  # "waiver" class is for undefined arguments
  if (inherits(object$breaks, "waiver")) {
    # Find common aesthetic between scale and plot mapping
    aes <- intersect(object$aesthetics, names(plot$mapping))
    # Find out the expression associated with that aesthetic
    aes <- plot$mapping[[aes[[1]]]]
    # Evaluate the aesthetic
    values <- rlang::eval_tidy(aes, plot$data)
    # Assign unique values to breaks
    object$breaks <- sort(unique(values))
  }
  plot$scales$add(object)
  plot
}

Now you can use it like any other scale

ggplot(mpg, aes(cyl, cty)) + 
  geom_point() +
  scale_x_unique()

Sample Image

^{Created on 2021-08-11 by the reprex package (v1.0.0)}

This of course only works if the aesthetic is defined in the global plot call and the data is available in the global plot. You could in theory traverse all layers and keep updating your unique values, but this becomes cumbersome.

Easiest Way to Discretize Continuous Scales for Ggplot2 Color Scales