What Does ..Level.. Mean in Ggplot::Stat_Density2D

what does ..level.. mean in ggplot::stat_density2d

Expanding on the answer provided by @hrbrmstr -- first, the call to geom_density2d() is redundant. That is, you can achieve the same results with:

library(ggplot2)
library(MASS)

gg <- ggplot(geyser, aes(x = duration, y = waiting)) + 
    geom_point() + 
    stat_density2d(aes(fill = ..level..), geom = "polygon")

Let's consider some other ways to visualize this density estimate that may help clarify what is going on:

base_plot <- ggplot(geyser, aes(x = duration, y = waiting)) + 
  geom_point()

base_plot + 
  stat_density2d(aes(color = ..level..))

Plot1

base_plot + 
  stat_density2d(aes(fill = ..density..), geom = "raster", contour = FALSE)

Plot2

base_plot +
  stat_density2d(aes(alpha = ..density..), geom = "tile", contour = FALSE)

Notice, however, we can no longer see the points generated from geom_point().

Plot3

Finally, note that you can control the bandwidth of the density estimate. To do this, we pass x and y bandwidth arguments to h (see ?kde2d):

base_plot +
  stat_density2d(aes(fill = ..density..), geom = "raster", contour = FALSE,
                 h = c(2, 5))

Plot4

Again, the points from geom_point() are hidden as they are behind the call to stat_density2d().

What does fill=..level.. mean in ggplot?

The dot-dot notation ..some_var.. allows to access statistics computed by ggplot to construct the plot. In this case, the levels for the 2d-density. You can, instead, extract and use them with stat(): see related question on RStudio community

stat_density2d - What does the legend mean?

Let's take the faithful example from ggplot2:

ggplot(faithful, aes(x = eruptions, y = waiting)) +
  stat_density_2d(aes(fill = factor(stat(level))), geom = "polygon") +
  geom_point() +
  xlim(0.5, 6) +
  ylim(40, 110)

(apologies in advance for not making this prettier)

Sample Image

The level is the height at which the 3D "mountains" were sliced. I don't know of a way (others might) to translate that to a percentage but I do know to get you said percentages.

If we look at that chart, level 0.002 contains the vast majority of the points (all but 2). Level 0.004 is actually 2 polygons and they contain all but ~dozen of the points. If I'm getting the gist of what you're asking that's what you want to know, except not count but the percentage of points encompassed by polygons at a given level. That's straightforward to compute using the methodology from the various ggplot2 "stats" involved.

Note that while we're importing the tidyverse and sp packages we'll use some other functions fully-qualified. Now, let's reshape the faithful data a bit:

library(tidyverse)
library(sp)

xdf <- select(faithful, x = eruptions, y = waiting)

(easier to type x and y)

Now, we'll compute the two-dimensional kernel density estimation the way ggplot2 does:

h <- c(MASS::bandwidth.nrd(xdf$x), MASS::bandwidth.nrd(xdf$y))

dens <- MASS::kde2d(
  xdf$x, xdf$y, h = h, n = 100,
  lims = c(0.5, 6, 40, 110)
)

breaks <- pretty(range(zdf$z), 10)

zdf <- data.frame(expand.grid(x = dens$x, y = dens$y), z = as.vector(dens$z))

z <- tapply(zdf$z, zdf[c("x", "y")], identity)

cl <- grDevices::contourLines(
  x = sort(unique(dens$x)), y = sort(unique(dens$y)), z = dens$z,
  levels = breaks
)

I won't clutter the answer with str() output but it's kinda fun looking at what happens there.

We can use spatial ops to figure out how many points fall within given polygons, then we can group the polygons at the same level to provide counts and percentages per-level:

SpatialPolygons(
  lapply(1:length(cl), function(idx) {
    Polygons(
      srl = list(Polygon(
        matrix(c(cl[[idx]]$x, cl[[idx]]$y), nrow=length(cl[[idx]]$x), byrow=FALSE)
      )),
      ID = idx
    )
  })
) -> cont

coordinates(xdf) <- ~x+y

data_frame(
  ct = sapply(over(cont, geometry(xdf), returnList = TRUE), length),
  id = 1:length(ct),
  lvl = sapply(cl, function(x) x$level)
) %>% 
  count(lvl, wt=ct) %>% 
  mutate(
    pct = n/length(xdf),
    pct_lab = sprintf("%s of the points fall within this level", scales::percent(pct))
  )
## # A tibble: 12 x 4
##      lvl     n    pct pct_lab                              
##    <dbl> <int>  <dbl> <chr>                                
##  1 0.002   270 0.993  99.3% of the points fall within this level
##  2 0.004   259 0.952  95.2% of the points fall within this level
##  3 0.006   249 0.915  91.5% of the points fall within this level
##  4 0.008   232 0.853  85.3% of the points fall within this level
##  5 0.01    206 0.757  75.7% of the points fall within this level
##  6 0.012   175 0.643  64.3% of the points fall within this level
##  7 0.014   145 0.533  53.3% of the points fall within this level
##  8 0.016    94 0.346  34.6% of the points fall within this level
##  9 0.018    81 0.298  29.8% of the points fall within this level
## 10 0.02     60 0.221  22.1% of the points fall within this level
## 11 0.022    43 0.158  15.8% of the points fall within this level
## 12 0.024    13 0.0478  4.8% of the points fall within this level

I only spelled it out to avoid blathering more but the percentages will change depending on how you modify the various parameters to the density computation (same holds true for my ggalt::geom_bkde2d() which uses a different estimator).

If there is a way to tease out the percentages without re-performing the calculations there's no better way to have that pointed out than by letting other SO R folks show how much more clever they are than the person writing this answer (hopefully in more diplomatic ways than seem to be the mode of late).

How to correctly interpret ggplot's stat_density2d

HPDregionplot in package:emdbook is supposed to do that. It does use MASS::kde2d but it normalizes the result. It has the disadvantage to my mind that it requires an mcmc object.

library(MASS)
library(coda)
HPDregionplot(mcmc(data.matrix(df)), prob=0.8)
with(df, points(x,y))

Sample Image

colouring density of stat_density2d in ggplot with ggmap

I had to figure this out just last week for work. geom_density2d creates a path, which by definition has no fill. To get a fill, you need a polygon. So instead of geom_density2d, you need to call stat_density2d(geom = "polygon").

The dataframe is just random data that gave a nice density; you should be able to adapt the same for your map (I was making a very similar map for work, so using stat_density2d should be fine).

Also note calc(level) is the replacement for ..level.., but I think it's only in the github version of ggplot2, so if you're using the CRAN version, just swap that for the older ..level..

library(tidyverse)

set.seed(1234)
df <- tibble(
    x = c(rnorm(100, 1, 3), rnorm(50, 2, 2.5)),
    y = c(rnorm(80, 5, 4), rnorm(30, 15, 2), rnorm(40, 2, 1)) %>% sample(150)
)

ggplot(df, aes(x = x, y = y)) +
    stat_density2d(aes(fill = calc(level)), geom = "polygon") +
    geom_point()

Sample Image

Created on 2018-04-26 by the reprex package (v0.2.0).

Remove gaps in a stat_density2d ggplot chart without modifying XY limits

The trick is to expand the canvas using xlim and ylim so that there is enough room for ggplot to draw complete contours around the data. Then you can use tighter xlim and ylim parameters within the coord_fixed term to show the window you want...

ggplot() + 
  stat_density2d(data = Unit_J, aes(x=X, y=Y, fill=..level.., alpha=0.9), 
                                lwd= 0.05, bins=50, col="blue", geom="polygon") +
  scale_fill_continuous(low="blue",high="darkblue") +
  scale_alpha(range=c(0, 0.03), guide="none") + 
  xlim(-7000,-3500) + ylim(400,2500) + #expanded in x direction
  coord_fixed(expand=FALSE,xlim=c(-6600,-3800),ylim=c(400,2500)) + #added parameters 
  geom_point(data = Unit_J, aes(x=X, y=Y), alpha=0.5, cex=0.4, col="darkblue") + 
  theme_bw() + 
  theme(legend.position="none")

Sample Image

What Does ..Level.. Mean in Ggplot::Stat_Density2D