Can you make geom_ribbon leave a gap for missing values?
Looks like a bug in ggplot2
, there seems to be a missing handle_na
function that needs to be added as a part of a new unified way of dealing with NA values.
Update:
The first post here refined an entire new ggproto
to fix this, but I realized
that as a one-liner workaround you can just override the handle_na
function like I do in the code below (# fix GeomRibbon
):
require(dplyr)
require(ggplot2)
require(grid)
set.seed(1)
test <- data.frame(x = rep(1:10, 3), y = abs(rnorm(30)), z = rep(LETTERS[1:3], 10))
%>% arrange(x, z)
test[test$x == 4, "y"] <- NA
test$ymax <- test$y
test$ymin <- 0
zl <- levels(test$z)
for (i in 2:length(zl)) {
zi <- test$z == zl[i]
zi_1 <- test$z == zl[i - 1]
test$ymin[zi] <- test$ymax[zi_1]
test$ymax[zi] <- test$ymin[zi] + test$ymax[zi]
}
# fix GeomRibbon
GeomRibbon$handle_na <- function(data, params) { data }
ggplot(test, aes(x = x,y=y, ymax = ymax, ymin = ymin, fill = z)) +
geom_ribbon() +
scale_x_continuous(breaks = 1:10)
yielding:
How to efficiently drop missing data from geom_ribbon? Is there a simpler approach from ggplot2?
Could try:
ggplot(new_new_mtcars) +
geom_path(aes(x= wt, y = value, linetype = as.factor(Variable)), size = 0.71) +
geom_ribbon(data = . %>% filter(!is.na(grouping)),
aes(x = wt, fill = grouping, ymin = min(wt), ymax = max(wt)), alpha = .25)
Few other comments:
- You don't need to reference your data frame in the
aes
call (justfill = grouping
is enough), - If you want the transparency (
alpha
) parameter to take effect with a fixed value, you need to take it out ofaes
. Keep it inside when you are referencing a variable (e.g. you want to have different levels ofalpha
for certain groups/factors).
How can I make geom_area() leave a gap for missing values?
It seems that the problem has to do with how the values are stacked. The error message tells you that the rows containing missing values were removed, so there is simply no gap present in the data that your are plotting.
However, geom_ribbon
, of which geom_area
is a special case, leaves gaps for missing values. geom_ribbon
plots an area as well, but you have to specify the maximum and minimum y-values. So the trick can be done by calculating these values manually and then plotting with geom_ribbon()
. Starting with your data frame test
, I create the ymin
and ymax
data as follows:
test$ymax <-test$y
test$ymin <- 0
zl <- levels(test$z)
for ( i in 2:length(zl) ) {
zi <- test$z==zl[i]
zi_1 <- test$z==zl[i-1]
test$ymin[zi] <- test$ymax[zi_1]
test$ymax[zi] <- test$ymin[zi] + test$ymax[zi]
}
and then plot with geom_ribbon
:
ggplot(test, aes(x=x,ymax=ymax,ymin=ymin, fill=z)) + geom_ribbon()
This gives the following plot:
How to add a discontinuity when plotting geom_line and geom_ribbon in R?
You can add grouping column to mark X
values above and below the cutoff. In this case, I've hard-coded the criterion, but in general you can do it programmatically if you have criteria for where the discontinuities should be.
For example:
ggplot(data.1, aes(X, mean.y, group=X<5)) +
geom_line(color="red") +
geom_ribbon(aes(ymin=mean.y-sd.y, ymax=mean.y+sd.y), alpha=0.4) +
scale_x_continuous(limits=c(0,11), breaks = 0:12) +
theme_bw() +
theme(panel.grid.minor = element_blank(),
panel.grid.major = element_blank())
Or, if our criterion is to have a discontinuity whenever the distance between x-values is greater than one:
data.1 %>%
mutate(g = c(0, cumsum(diff(X) > 1))) %>%
ggplot(aes(X, mean.y, group=g)) +
geom_line(color="red") +
geom_ribbon(aes(ymin=mean.y-sd.y, ymax=mean.y+sd.y), alpha=0.4) +
scale_x_continuous(limits=c(0,11), breaks = 0:12) +
theme_bw() +
theme(panel.grid.minor = element_blank(),
panel.grid.major = element_blank())
Either way, here's the resulting plot:
Here's some additional explanation to answer the question in the comment regarding how the mutate
step creates the grouping column: We want to create a grouping variable that separates X
values before and after a discontinuity. In the code above, we do that with a combination of the diff
and cumsum
functions.
diff
calculates lagged differences. For example:
diff(data.1$X)
[1] 1 1 3 1 1 1 1 1
Note that one of the differences (the one between 3 and 6) is 3. Now let's add a logical condition:
diff(data.1$X) > 1
[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
So now we have a vector of logical values where TRUE
marks differences greater than one. cumsum
will treat TRUE
as equal to 1 and FALSE
as equal to zero. The value of the cumulative sum will increment by one each time we encounter a TRUE
, and will stay constant when we encounter a FALSE
.
cumsum(diff(data.1$X) > 1)
[1] 0 0 1 1 1 1 1 1
Okay, now we have two groups, marking the X
values before and after the discontinuity (if there are multiple discontinuities, we'll get a new group for each one). But we're not quite done.
Note that diff
takes a vector of length n and returns a vector of length n-1. This is simply because there are only n-1 lagged differences between n values. Thus, we add a leading zero to get a vector that's the same length as the input data:
c(0, cumsum(diff(data.1$X) > 1))
[1] 0 0 0 1 1 1 1 1 1
possible bug in geom_ribbon
Here is a solution. I replaced data = d1[d1$big == "B",]
in the first geom_ribbon
function with:
data = rbind(d1[d1$big == "B",],
d1[c((which(diff(as.numeric(d1$big)) == -1) + 1),
(which(diff(as.numeric(d1$big)) == 1))), ])
This is necessary since the first and last rows of d1$big == "B"
sequences often contain different csa
and csb
values. As a result, there is a visible ribbon connecting the data. The above command uses the last rows before and the first rows after these sequences together with the data for the first ribbon.
This problem does not exist for d1$big == "A"
(the base for the second ribbon).
The complete code:
ggplot() +
geom_line(data = d2,
aes(x = time, y = value, group = variable, color = variable)) +
geom_hline(yintercept = 0, linetype = 2) +
geom_ribbon(data = rbind(d1[d1$big == "B",],
d1[c((which(diff(as.numeric(d1$big)) == -1) + 1),
(which(diff(as.numeric(d1$big)) == 1))), ]),
aes(x = time, ymin = csa, ymax = csb),
alpha = .25, fill = "#9999CC") +
geom_ribbon(data = d1[d1$big == "A",],
aes(x = time, ymin = csb, ymax = csa),
alpha = .25, fill = "#CC6666") +
scale_color_manual(values = c("#CC6666" , "#9999CC"))
How to avoid the connection lines in geom_line or geom_path when there is no data?
This is another one that was slightly more complicated than I originally thought, but I think I have a solution that seems to work. At first glance, it seems you could just set data=stat_total[which(stat_total$conc_mean!=0),]
, which would mean only those values greater than 0 would be plotted... but that doesn't work. The reason is simply that ggplot
will still connect the line all the way through via geom_path
and draw the ribbon via geom_ribbon
, since data exists to the right and left of those 0 values.
The key here is to understand that we want to change and assign the group=
aesthetic. This controls connectivity of geoms like lines. It's easily demonstrated via the following:
d <- data.frame(x=1:10, y=1:10, grp=c(rep(1,4),2,rep(3,5)))
ggplot(d, aes(x,y)) + theme_bw() +
geom_line(aes(group=grp)) + geom_point()
So the theoretical solution to your example will involve getting a group=
aesthetic to apply to "sections" of stat_total$conc_mean
that don't equal zero, while also just not plotting when stat_total$conc_mean
equals zero. Critically, the "sections" need to have different group=
aesthetic values. If they don't, then we'll just get the whole thing connected like you have now, since again--there still exists data to the right and left of those zeros, so ggplot
will just draw a line through them.
Solution
First, I arranged your data frame by stat_total$gas
and then stat_total$date_mean
.
df <- arrange(stat_total, gas, date_mean)
Then, I wanted to
(1) create a column that basically indicated when stat_total$conc_mean
was 0 or contained a value > 0. I concede there is probably a more elegant to accomplish the goal here without this step, but this part also makes it easier to follow the solution.
df$a <- ifelse(df$conc_mean==0, NA, 1)
(2) Use a function to create a new grouping column. The function steps through a vector and stores a count number (g_num
) into a return vector in that position when there is a number, but stores NA
and increments g_num
when it finds an NA
. The result is a return vector that has the sequence of numbers we want here.
my_func <- function(x) {
g_num <- 1
return_vect <- vector(mode='double',length=length(x))
for(i in 1:length(x)) {
if (is.na(x[i])){
return_vect[i] <- NA
g_num <- g_num+1
}
else {
return_vect[i] <- g_num
}
}
return(return_vect)
}
# create the new column
df$g <- my_func(df$a)
An example of how it works is shown below:
> test <- c(1,1,1,NA,NA,1,1,NA,1,1)
> test
[1] 1 1 1 NA NA 1 1 NA 1 1
> my_func(test)
[1] 1 1 1 NA NA 3 3 NA 4 4
(3) Plot it. It's the same as your original code, but we use the new column as the group=
aesthetic, and also only plot values > 0 for stat_total$conc_mean
(so you avoid getting a line at the bottom of the graph for certain sections.
ggplot(df[which(df$conc_mean!=0),], aes(color=gas, group=g)) +
geom_path(aes(x=date_mean, y=conc_mean, color=gas), size=1.2, na.rm = T) +
geom_ribbon(aes(x=date_mean, ymin=conc_min, ymax=conc_max, fill=gas), color="grey70", alpha=0.4, na.rm = T)+
scale_x_datetime(date_breaks = "3 weeks" , date_labels = "%d-%b") +
xlab(NULL) +
ylab('[ppb]') +
theme_bw() +
facet_wrap(gas~.,scales = 'free_x',ncol = 1,nrow=2)
stacking geom_ribbon
You just didn't map a variable to y
in your geom_ribbon
call. Adding y = y
causes it to work for me. In general, geom_ribbon
doesn't require a y
aesthetic, but I believe it does in the case of stacking. I presume there's a well-thought out reasoning for why that is, but you never know...
Also, all the source code for ggplot2 is on github.
ggplot ribbon colour variation dependent on value with variable y axis
Your example data doesn't seem to involve any crossing of one line over the other, so it is difficult to see how it could be used to demonstrate a changing of the ribbon color according to which line was higher.
I have therefore produced a simple example data set:
set.seed(8)
df <- data.frame(day = seq(as.Date("2019-01-01"), by = "day", length = 100),
positive = cumsum(rnorm(100)),
negative = cumsum(rnorm(100)))
head(df)
#> day positive negative
#> 1 2019-01-01 -0.08458607 0.2968513
#> 2 2019-01-02 0.75581405 -1.6037223
#> 3 2019-01-03 0.29233128 -3.2510879
#> 4 2019-01-04 -0.25850372 -5.0294934
#> 5 2019-01-05 0.47753671 -4.9945500
#> 6 2019-01-06 0.36965531 -5.4890964
An individual ribbon cannot change its aesthetic along its length, so if you want multiple different regions colored differently, you will need to group them individually. For a line that crosses several times, we can do it like this:
library(ggplot2)
library(dplyr)
df %>%
mutate(effect = factor(cumsum(abs(c(0, diff(positive > negative)))))) %>%
ggplot(aes(day, positive)) +
geom_line(aes(color = "positive"), size = 1) +
geom_line(aes(y = negative, color = "negative"), size = 1) +
geom_ribbon(aes(fill = effect, ymin = negative, ymax = positive),
color = NA, alpha = 0.1) +
scale_color_manual(values = c("red", "forestgreen"), name = "") +
scale_fill_manual(values = rep(c("red", "forestgreen"), 20),
guide = guide_none()) +
theme_bw()
Created on 2020-12-04 by the reprex package (v0.3.0)
Related Topics
Rename Columns Using 'Starts_With()' Where New Prefix Is a String
Resetting Cumsum If Value Goes to Negative in R
R: How to Make a Confusion Matrix for a Predictive Model
Axis Labels for Each Bar and Each Group in Bar Charts with Dodged Groups
Behavior of Summing !Is.Na() Results
Calculate Average Over Multiple Data Frames
Separate a Column into 2 Columns at the Last Underscore in R
Summarize Different Columns with Different Functions
Difference Between Backticks and Quotes in Aes Function in Ggplot
Tooltip or Popover in Shiny Datatables for Row Names
How to Add Random 'Na's into a Data Frame
How to Use the Spread Function Properly in Tidyr
Removing Text Containing Non-English Character
Plot Linear Regressions Lines Without Interaction in Ggplot2
Randomly Sample Data Frame into 3 Groups in R
Changing Class and Mode from Character to Numeric
Create Line Graph with Ggplot2, Using Time Periods as X-Variable