Can You Make Geom_Ribbon Leave a Gap for Missing Values

Can you make geom_ribbon leave a gap for missing values?

Looks like a bug in ggplot2, there seems to be a missing handle_na function that needs to be added as a part of a new unified way of dealing with NA values.

Update:

The first post here refined an entire new ggproto to fix this, but I realized
that as a one-liner workaround you can just override the handle_na function like I do in the code below (# fix GeomRibbon):

require(dplyr)
require(ggplot2)
require(grid)

set.seed(1)

test <- data.frame(x = rep(1:10, 3), y = abs(rnorm(30)), z = rep(LETTERS[1:3], 10))
%>% arrange(x, z)

test[test$x == 4, "y"] <- NA

test$ymax <- test$y
test$ymin <- 0
zl <- levels(test$z)
for (i in 2:length(zl)) {
zi <- test$z == zl[i]
zi_1 <- test$z == zl[i - 1]
test$ymin[zi] <- test$ymax[zi_1]
test$ymax[zi] <- test$ymin[zi] + test$ymax[zi]
}

# fix GeomRibbon
GeomRibbon$handle_na <- function(data, params) { data }

ggplot(test, aes(x = x,y=y, ymax = ymax, ymin = ymin, fill = z)) +
geom_ribbon() +
scale_x_continuous(breaks = 1:10)

yielding:

Sample Image

How to efficiently drop missing data from geom_ribbon? Is there a simpler approach from ggplot2?

Could try:

ggplot(new_new_mtcars) +
geom_path(aes(x= wt, y = value, linetype = as.factor(Variable)), size = 0.71) +
geom_ribbon(data = . %>% filter(!is.na(grouping)),
aes(x = wt, fill = grouping, ymin = min(wt), ymax = max(wt)), alpha = .25)

Few other comments:

  • You don't need to reference your data frame in the aes call (just fill = grouping is enough),
  • If you want the transparency (alpha) parameter to take effect with a fixed value, you need to take it out of aes. Keep it inside when you are referencing a variable (e.g. you want to have different levels of alpha for certain groups/factors).

How can I make geom_area() leave a gap for missing values?

It seems that the problem has to do with how the values are stacked. The error message tells you that the rows containing missing values were removed, so there is simply no gap present in the data that your are plotting.

However, geom_ribbon, of which geom_area is a special case, leaves gaps for missing values. geom_ribbon plots an area as well, but you have to specify the maximum and minimum y-values. So the trick can be done by calculating these values manually and then plotting with geom_ribbon(). Starting with your data frame test, I create the ymin and ymax data as follows:

test$ymax <-test$y
test$ymin <- 0
zl <- levels(test$z)
for ( i in 2:length(zl) ) {
zi <- test$z==zl[i]
zi_1 <- test$z==zl[i-1]
test$ymin[zi] <- test$ymax[zi_1]
test$ymax[zi] <- test$ymin[zi] + test$ymax[zi]
}

and then plot with geom_ribbon:

ggplot(test, aes(x=x,ymax=ymax,ymin=ymin, fill=z)) + geom_ribbon()

This gives the following plot:

Sample Image

How to add a discontinuity when plotting geom_line and geom_ribbon in R?

You can add grouping column to mark X values above and below the cutoff. In this case, I've hard-coded the criterion, but in general you can do it programmatically if you have criteria for where the discontinuities should be.

For example:

ggplot(data.1, aes(X, mean.y, group=X<5)) +
geom_line(color="red") +
geom_ribbon(aes(ymin=mean.y-sd.y, ymax=mean.y+sd.y), alpha=0.4) +
scale_x_continuous(limits=c(0,11), breaks = 0:12) +
theme_bw() +
theme(panel.grid.minor = element_blank(),
panel.grid.major = element_blank())

Or, if our criterion is to have a discontinuity whenever the distance between x-values is greater than one:

data.1 %>% 
mutate(g = c(0, cumsum(diff(X) > 1))) %>%
ggplot(aes(X, mean.y, group=g)) +
geom_line(color="red") +
geom_ribbon(aes(ymin=mean.y-sd.y, ymax=mean.y+sd.y), alpha=0.4) +
scale_x_continuous(limits=c(0,11), breaks = 0:12) +
theme_bw() +
theme(panel.grid.minor = element_blank(),
panel.grid.major = element_blank())

Either way, here's the resulting plot:

Sample Image

Here's some additional explanation to answer the question in the comment regarding how the mutate step creates the grouping column: We want to create a grouping variable that separates X values before and after a discontinuity. In the code above, we do that with a combination of the diff and cumsum functions.

diff calculates lagged differences. For example:

diff(data.1$X)
[1] 1 1 3 1 1 1 1 1

Note that one of the differences (the one between 3 and 6) is 3. Now let's add a logical condition:

diff(data.1$X) > 1
[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE

So now we have a vector of logical values where TRUE marks differences greater than one. cumsum will treat TRUE as equal to 1 and FALSE as equal to zero. The value of the cumulative sum will increment by one each time we encounter a TRUE, and will stay constant when we encounter a FALSE.

cumsum(diff(data.1$X) > 1)
[1] 0 0 1 1 1 1 1 1

Okay, now we have two groups, marking the X values before and after the discontinuity (if there are multiple discontinuities, we'll get a new group for each one). But we're not quite done.

Note that diff takes a vector of length n and returns a vector of length n-1. This is simply because there are only n-1 lagged differences between n values. Thus, we add a leading zero to get a vector that's the same length as the input data:

c(0, cumsum(diff(data.1$X) > 1))
[1] 0 0 0 1 1 1 1 1 1

possible bug in geom_ribbon

Here is a solution. I replaced data = d1[d1$big == "B",] in the first geom_ribbon function with:

data = rbind(d1[d1$big == "B",],
d1[c((which(diff(as.numeric(d1$big)) == -1) + 1),
(which(diff(as.numeric(d1$big)) == 1))), ])

This is necessary since the first and last rows of d1$big == "B" sequences often contain different csa and csb values. As a result, there is a visible ribbon connecting the data. The above command uses the last rows before and the first rows after these sequences together with the data for the first ribbon.
This problem does not exist for d1$big == "A" (the base for the second ribbon).

The complete code:

ggplot() +
geom_line(data = d2,
aes(x = time, y = value, group = variable, color = variable)) +
geom_hline(yintercept = 0, linetype = 2) +
geom_ribbon(data = rbind(d1[d1$big == "B",],
d1[c((which(diff(as.numeric(d1$big)) == -1) + 1),
(which(diff(as.numeric(d1$big)) == 1))), ]),
aes(x = time, ymin = csa, ymax = csb),
alpha = .25, fill = "#9999CC") +
geom_ribbon(data = d1[d1$big == "A",],
aes(x = time, ymin = csb, ymax = csa),
alpha = .25, fill = "#CC6666") +
scale_color_manual(values = c("#CC6666" , "#9999CC"))

Sample Image

How to avoid the connection lines in geom_line or geom_path when there is no data?

This is another one that was slightly more complicated than I originally thought, but I think I have a solution that seems to work. At first glance, it seems you could just set data=stat_total[which(stat_total$conc_mean!=0),], which would mean only those values greater than 0 would be plotted... but that doesn't work. The reason is simply that ggplot will still connect the line all the way through via geom_path and draw the ribbon via geom_ribbon, since data exists to the right and left of those 0 values.

The key here is to understand that we want to change and assign the group= aesthetic. This controls connectivity of geoms like lines. It's easily demonstrated via the following:

d <- data.frame(x=1:10, y=1:10, grp=c(rep(1,4),2,rep(3,5)))
ggplot(d, aes(x,y)) + theme_bw() +
geom_line(aes(group=grp)) + geom_point()

Sample Image

So the theoretical solution to your example will involve getting a group= aesthetic to apply to "sections" of stat_total$conc_mean that don't equal zero, while also just not plotting when stat_total$conc_mean equals zero. Critically, the "sections" need to have different group= aesthetic values. If they don't, then we'll just get the whole thing connected like you have now, since again--there still exists data to the right and left of those zeros, so ggplot will just draw a line through them.

Solution

First, I arranged your data frame by stat_total$gas and then stat_total$date_mean.

df <- arrange(stat_total, gas, date_mean)

Then, I wanted to

(1) create a column that basically indicated when stat_total$conc_mean was 0 or contained a value > 0. I concede there is probably a more elegant to accomplish the goal here without this step, but this part also makes it easier to follow the solution.

df$a <- ifelse(df$conc_mean==0, NA, 1)

(2) Use a function to create a new grouping column. The function steps through a vector and stores a count number (g_num) into a return vector in that position when there is a number, but stores NA and increments g_num when it finds an NA. The result is a return vector that has the sequence of numbers we want here.

my_func <- function(x) {
g_num <- 1
return_vect <- vector(mode='double',length=length(x))
for(i in 1:length(x)) {
if (is.na(x[i])){
return_vect[i] <- NA
g_num <- g_num+1
}
else {
return_vect[i] <- g_num
}
}
return(return_vect)
}

# create the new column
df$g <- my_func(df$a)

An example of how it works is shown below:

> test <- c(1,1,1,NA,NA,1,1,NA,1,1)
> test
[1] 1 1 1 NA NA 1 1 NA 1 1
> my_func(test)
[1] 1 1 1 NA NA 3 3 NA 4 4

(3) Plot it. It's the same as your original code, but we use the new column as the group= aesthetic, and also only plot values > 0 for stat_total$conc_mean (so you avoid getting a line at the bottom of the graph for certain sections.

ggplot(df[which(df$conc_mean!=0),], aes(color=gas, group=g)) + 
geom_path(aes(x=date_mean, y=conc_mean, color=gas), size=1.2, na.rm = T) +
geom_ribbon(aes(x=date_mean, ymin=conc_min, ymax=conc_max, fill=gas), color="grey70", alpha=0.4, na.rm = T)+
scale_x_datetime(date_breaks = "3 weeks" , date_labels = "%d-%b") +
xlab(NULL) +
ylab('[ppb]') +
theme_bw() +
facet_wrap(gas~.,scales = 'free_x',ncol = 1,nrow=2)

Sample Image

stacking geom_ribbon

You just didn't map a variable to y in your geom_ribbon call. Adding y = y causes it to work for me. In general, geom_ribbon doesn't require a y aesthetic, but I believe it does in the case of stacking. I presume there's a well-thought out reasoning for why that is, but you never know...

Also, all the source code for ggplot2 is on github.

ggplot ribbon colour variation dependent on value with variable y axis

Your example data doesn't seem to involve any crossing of one line over the other, so it is difficult to see how it could be used to demonstrate a changing of the ribbon color according to which line was higher.

I have therefore produced a simple example data set:

set.seed(8)

df <- data.frame(day = seq(as.Date("2019-01-01"), by = "day", length = 100),
positive = cumsum(rnorm(100)),
negative = cumsum(rnorm(100)))

head(df)
#> day positive negative
#> 1 2019-01-01 -0.08458607 0.2968513
#> 2 2019-01-02 0.75581405 -1.6037223
#> 3 2019-01-03 0.29233128 -3.2510879
#> 4 2019-01-04 -0.25850372 -5.0294934
#> 5 2019-01-05 0.47753671 -4.9945500
#> 6 2019-01-06 0.36965531 -5.4890964

An individual ribbon cannot change its aesthetic along its length, so if you want multiple different regions colored differently, you will need to group them individually. For a line that crosses several times, we can do it like this:

library(ggplot2)
library(dplyr)

df %>%
mutate(effect = factor(cumsum(abs(c(0, diff(positive > negative)))))) %>%
ggplot(aes(day, positive)) +
geom_line(aes(color = "positive"), size = 1) +
geom_line(aes(y = negative, color = "negative"), size = 1) +
geom_ribbon(aes(fill = effect, ymin = negative, ymax = positive),
color = NA, alpha = 0.1) +
scale_color_manual(values = c("red", "forestgreen"), name = "") +
scale_fill_manual(values = rep(c("red", "forestgreen"), 20),
guide = guide_none()) +
theme_bw()

Sample Image

Created on 2020-12-04 by the reprex package (v0.3.0)



Related Topics



Leave a reply



Submit