Ggplot2 Make Missing Value in Geom_Tile Not Blank

ggplot2 make missing value in geom_tile not blank

This issue can also be fixed by an option in scale_fill_continuous

scale_fill_continuous(na.value = 'salmon')

Edit below:

This only fills in the explicitly (i.e. values which are NA) missing values. (It may have worked differently in previous versions of ggplot, I'm too lazy to check)

See the following code for an example:

library(tidyverse)
Data <- expand.grid(x = 1:5,y=1:5) %>%
mutate(Value = rnorm(25))

Data %>%
filter(y!=3) %>%
ggplot(aes(x=x,y=y,fill=Value))+
geom_tile()+
scale_fill_continuous(na.value = 'salmon')

Data %>%
mutate(Value=ifelse(1:n() %in% sample(1:n(),22),NA,Value)) %>%
ggplot(aes(x=x,y=y,fill=Value))+
geom_tile()+
scale_fill_continuous(na.value = 'salmon')

An easy fix for this is to use the complete function to make the missing values explicit.

Data %>%
filter(1:n() %in% sample(1:n(),22)) %>%
complete(x,y) %>%
ggplot(aes(x=x,y=y,fill=Value))+
geom_tile()+
scale_fill_continuous(na.value = 'salmon')

In some cases the expand function may be more useful than the complete function.

geom_tile - draw empty tiles

I'm finding other SO posts about removing NA from legends by just filtering out the NA rows. In this case, it's clearly intentional that those NAs are there. Instead, you can set the breaks for the fill scale to include just the numbers you need and not the NAs.

ggplot(df2, aes(x = x, y = y, fill = z)) + 
geom_tile(size = 0.5, color = "black") +
scale_fill_discrete(na.value = "white", breaks = 1:5)

Sample Image

If you need something more robust, like if the z-scale isn't just a simple count of 1 to 5, you could pull out a vector of unique values to for variable, excluding NA, and pass that to the breaks argument.

Missing values in heatmap

Use the complete function from tidyr to fill in missing factor combinations in your data.frame with NA.

Then you can use the na.value parameter in the color gradient to set the color.

library(ggplot2)
library(dplyr)
library(tidyr)

# z <- complete(All_Tissues_BP_Head, Tissue, GO.ID)

ggplot(complete(All_Tissues_BP_Head, Tissue, GO.ID), aes(Tissue, GO.ID)) +
geom_tile(aes(fill = Transcript_Count), color = "black") +
scale_fill_gradient2(low = "white", mid = "blue", high= "black",
midpoint = mean(All_Tissues_BP_Head$Transcript_Count), na.value="black") +
scale_x_discrete(expand = c(0,0)) +
ggtitle(expression(atop(bold("Biological Processes")))) +
theme(legend.title = element_text(size=12),
legend.text = element_text(size=12)) +
theme(axis.text = element_text(size=12),
axis.title.y = element_blank(),
axis.title.x = element_text(size = 12)) +
labs(fill = "Transcript \n count")

How to change fill colour of empty cells in ggplot heatmap

Sometimes the straightforward approach to doing something in ggplot2 is to manipulate your dataset prior to plotting. In this case, expanding your dataset so it contains all combinations you want to plot is an option.

I use tidyr::complete for this along with grouping from package dplyr. This adds a row for xvar of 1 through 7 for every label. If the row doesn't currently exist, it fills in missing values with NA. NA seems sufficient for your use case, but you could also use the fill argument to set the missing values to 0 or something.

library(dplyr)
library(tidyr)

longdf = activeDF %>%
group_by(label) %>%
complete(xvar = 1:7)

longdf

# A tibble: 49 x 3
# Groups: label [7]
label xvar value
<int> <int> <int>
1 610 1 0
2 610 2 0
3 610 3 0
4 610 4 0
5 610 5 1
6 610 6 1
7 610 7 0
8 1044 1 0
9 1044 2 NA
10 1044 3 NA
# ... with 39 more rows

Using the expanded dataset, your plot now expands through the entire range of the data and so the tiling is complete.

ggplot(longdf, aes(xvar, factor(label)) ) +
geom_tile(aes(fill = value), colour = 'black') +
scale_x_continuous(breaks = round(seq(0, 7))) +
scale_fill_gradient(high = brewer.pal(10, 'PiYG')[8], low='white', na.value = 'white')

Sample Image

R geom_tile on theme_void has blank space above and below plot

You could try to specify axis limits and suppress automatic axis expansion. Setting axis limits is more straightforward for factors: if needed, convert numeric tile coordinates to character (as.character) or factor (as.factor). Also remove the legend.

Example of stripped heatmap:

data.frame(
x = c('col_1','col_2','col_1','col_2'),
y = c('row_1','row_1','row_2','row_2'),
value = runif(4)
) %>%
ggplot() +
geom_tile(aes(x, y, fill = value)) +
scale_x_discrete(limits = c('col_1','col_2'), expand = c(0,0)) +
scale_y_discrete(limits = c('row_1','row_2'), expand = c(0,0)) +
scale_fill_continuous(guide = 'none') +
theme_void()

edit: for composite charts, there's also cowplot

geom_tile do not present true colors

TLDR

There are a few problems that causes the fill color to concentrate around the "black" end of the color spectrum, but upon further inspection, the real culprit is the fact that there are multiple repeating values for each value of Taxonomy in the visualization, and so your value of 95 (row 59) is most likely overwritten by the later value (row 65) since it's the same Taxonomy. This explains why you felt that 'geom_tile do not present true colors', as per your question title.

One would think the biggest offender is that massive outlier you have within the ICI column, hence squashing all the other values to a very small range. This can be verified:

summary(ici_table$ICI)

Outputs:

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.1870 0.8489 1.0489 3.4674 1.5457 95.0000

But surprisingly, the colors do not match the true values (i.e., I
cannot find a yellow square for the ICI = 95 in my data).

As I've pointed out in the TLDR, you're also not really going to see a yellow square corresponding the exact value of ICI = 95 (row 59) on your visualization because it belongs to the f__Lachnospiraceae g__-3 taxonomy, and that isn't the only observed value for this taxonomy. So ggplot will, reasonably, use the later value it find for f__Lachnospiraceae g__-3, which happens to be row 65.

Sample Image

As a verification, plotting the last 15 rows of the dataframe you're getting less than 15 values in your visualization:

ggplot(data = tail(ici_table, 15), aes(x=Sample_ID, y=Taxonomy, fill=ICI)) + ...

To reiterate, you will not find 15 squares / grids, corresponding to the 15 rows (tail(ici_table, 15)). So herein lies the main problem: you need to aggregate the values, by say the sum or the mean before visualizing their values on the heatmap.

In other words, since you likely want to plot a heatmap using aggregate values (sum of values for all observations in each Taxonomy), you should perform the aggregation prior to plotting the heatmap.

Solution

The following code visualizes the average value of AVGICI grouped by Taxonomy:

library(dplyr)
ici_table %>%
tail(15) %>%
group_by(Taxonomy, Sample_ID) %>%
summarise(AvgICI = mean(ICI)) %>%
ungroup() %>%
ggplot(aes(x=Sample_ID, y=Taxonomy, fill=AvgICI)) +
geom_tile(width=0.08, height=0.95) +
scale_fill_gradientn(colors=c("black", "steelblue4",
"steelblue3", "steelblue2", "steelblue",
"yellow2", "yellow1", "yellow"),
limits=c(0,100)
)

And this time you absolutely see the big outlier group (f__Lachnospiraceae g__-3) since their average value is 47.988 compared to the 0 to 1.3 range for values in other groups.

You may also set sum instead of mean to be the aggregation function and observe the differences.

brks <-  c(0, 10, 25, 50, 75)
library(dplyr)
ici_table %>%
group_by(Taxonomy, Sample_ID) %>%
summarise(AvgICI = mean(ICI)) %>%
ungroup() %>%
ggplot(aes(x=Sample_ID, y=Taxonomy, fill=AvgICI)) +
geom_tile(width=0.08, height=0.95) +
scale_fill_gradientn(colors=c("black", "steelblue4",
"steelblue3", "steelblue2", "steelblue",
"yellow2", "yellow1", "yellow"),
limits=c(0,100),
breaks=brks,
labels=brks
) +
labs(x="ICI (Average)", y="") +
theme(panel.background = element_blank(),
legend.position="right",
panel.border = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
axis.line.y = element_blank(),
plot.margin= unit(c(1,1,1,-0.1), "cm"),
panel.grid = element_blank())

Along with your themes, you can still clearly find the outlier group. Note that due to the values being squashed to nearly the minimum limit, most of the bars will be approximately black:

Sample Image

geom_tile border missing at corners

The issue is now fixed by this update to ggplot2.

How to remove empty spaces between tiles in geom_tile and change tile size

OP. I noticed that in your response to another answer, you've refined your question a bit. I would recommend you edit your original question to reflect some of what you were looking to do, but here's the overall picture to summarize what you wanted to know:

  • How to remove the gray space between tiles
  • How to make the tiles smaller
  • How to make the tiles more square

Here's how to address each one in turn.

How to remove gray space between tiles

This was already answered in a comment and in the other answer from @dy_by. The tile geom has the attribute width which determines how big the tile is relative to the coordinate system, where width=1 means the tiles "touch" one another. This part is important, because the size of the tile is different than the size of the tile relative to the coordinate system. If you set width=0.4, then the size of the tile is set to take up 40% of the area between one discrete value in x and y. This means, if you have any value other than width=1, then you will have "space" between the tiles.

How to make the tiles square

The tile geom draws a square tile, so the reason that your tiles are not square in the output has nothing to do with the geom - it has to do with your coordinate system and the graphics device drawing it in your program. By default, ggplot2 will draw your coordinate system in an aspect ratio to match that of your graphics device. Change the size of the device viewport (the window), and the aspect ratio of your coordinate system (and tiles) will change. There is an easy way to fix this to be "square", which is to use coord_fixed(). You can set any aspect ratio you want, but by default, it will be set to 1 (square).

How to make the tiles smaller

Again, the size of your tiles is not controlled by the geom_tile() function... or the coordinate system. It's controlled by the viewport you set in your graphics device. Note that the coordinate system and geoms will resize, but the text will remain constant. This means that if you scale down a viewport or window, your tiles will become smaller, but the size of the text will (relatively-speaking) seem larger. Try this out by calling ggsave() with different arguments for width= with your plot.

Putting it together

Therefore, here's my suggestion for how to change your code to fix all of that. Note I'm also suggesting you change the theme to theme_classic() or something similar, which removes the gridlines by default and the background color is set to white. It works well for tile maps like this.

p <- ggplot(df, aes(x=variable, y=id, label=value, fill=as.factor(value))) + 
geom_tile(colour="white", alpha=0.2, width=1) +
scale_fill_manual(values=c("yellow", "orange", "red", "green", "grey")) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
labs(x = "Value", y="id") +
scale_x_discrete(expand=c(0,0))+
scale_y_discrete(expand=c(0,0)) +
coord_fixed() +
theme_classic()
p

Now for saving that plot with different width= settings to show you how things change for sizing. You don't have to specify height=, since the aspect ratio is fixed at 1.

ggsave("example_big.png", plot=p, width=12)

Sample Image

ggsave("example_small.png", plot=p, width=3)

Sample Image



Related Topics



Leave a reply



Submit