What Does ..Level.. Mean in Ggplot::Stat_Density2D

what does ..level.. mean in ggplot::stat_density2d

Expanding on the answer provided by @hrbrmstr -- first, the call to geom_density2d() is redundant. That is, you can achieve the same results with:

library(ggplot2)
library(MASS)

gg <- ggplot(geyser, aes(x = duration, y = waiting)) +
geom_point() +
stat_density2d(aes(fill = ..level..), geom = "polygon")

Let's consider some other ways to visualize this density estimate that may help clarify what is going on:

base_plot <- ggplot(geyser, aes(x = duration, y = waiting)) + 
geom_point()

base_plot +
stat_density2d(aes(color = ..level..))

Plot1

base_plot + 
stat_density2d(aes(fill = ..density..), geom = "raster", contour = FALSE)

Plot2

base_plot +
stat_density2d(aes(alpha = ..density..), geom = "tile", contour = FALSE)

Notice, however, we can no longer see the points generated from geom_point().

Plot3

Finally, note that you can control the bandwidth of the density estimate. To do this, we pass x and y bandwidth arguments to h (see ?kde2d):

base_plot +
stat_density2d(aes(fill = ..density..), geom = "raster", contour = FALSE,
h = c(2, 5))

Plot4

Again, the points from geom_point() are hidden as they are behind the call to stat_density2d().

What does fill=..level.. mean in ggplot?

The dot-dot notation ..some_var.. allows to access statistics computed by ggplot to construct the plot. In this case, the levels for the 2d-density. You can, instead, extract and use them with stat(): see related question on RStudio community

stat_density2d - What does the legend mean?

Let's take the faithful example from ggplot2:

ggplot(faithful, aes(x = eruptions, y = waiting)) +
stat_density_2d(aes(fill = factor(stat(level))), geom = "polygon") +
geom_point() +
xlim(0.5, 6) +
ylim(40, 110)

(apologies in advance for not making this prettier)

Sample Image

The level is the height at which the 3D "mountains" were sliced. I don't know of a way (others might) to translate that to a percentage but I do know to get you said percentages.

If we look at that chart, level 0.002 contains the vast majority of the points (all but 2). Level 0.004 is actually 2 polygons and they contain all but ~dozen of the points. If I'm getting the gist of what you're asking that's what you want to know, except not count but the percentage of points encompassed by polygons at a given level. That's straightforward to compute using the methodology from the various ggplot2 "stats" involved.

Note that while we're importing the tidyverse and sp packages we'll use some other functions fully-qualified. Now, let's reshape the faithful data a bit:

library(tidyverse)
library(sp)

xdf <- select(faithful, x = eruptions, y = waiting)

(easier to type x and y)

Now, we'll compute the two-dimensional kernel density estimation the way ggplot2 does:

h <- c(MASS::bandwidth.nrd(xdf$x), MASS::bandwidth.nrd(xdf$y))

dens <- MASS::kde2d(
xdf$x, xdf$y, h = h, n = 100,
lims = c(0.5, 6, 40, 110)
)

breaks <- pretty(range(zdf$z), 10)

zdf <- data.frame(expand.grid(x = dens$x, y = dens$y), z = as.vector(dens$z))

z <- tapply(zdf$z, zdf[c("x", "y")], identity)

cl <- grDevices::contourLines(
x = sort(unique(dens$x)), y = sort(unique(dens$y)), z = dens$z,
levels = breaks
)

I won't clutter the answer with str() output but it's kinda fun looking at what happens there.

We can use spatial ops to figure out how many points fall within given polygons, then we can group the polygons at the same level to provide counts and percentages per-level:

SpatialPolygons(
lapply(1:length(cl), function(idx) {
Polygons(
srl = list(Polygon(
matrix(c(cl[[idx]]$x, cl[[idx]]$y), nrow=length(cl[[idx]]$x), byrow=FALSE)
)),
ID = idx
)
})
) -> cont

coordinates(xdf) <- ~x+y

data_frame(
ct = sapply(over(cont, geometry(xdf), returnList = TRUE), length),
id = 1:length(ct),
lvl = sapply(cl, function(x) x$level)
) %>%
count(lvl, wt=ct) %>%
mutate(
pct = n/length(xdf),
pct_lab = sprintf("%s of the points fall within this level", scales::percent(pct))
)
## # A tibble: 12 x 4
## lvl n pct pct_lab
## <dbl> <int> <dbl> <chr>
## 1 0.002 270 0.993 99.3% of the points fall within this level
## 2 0.004 259 0.952 95.2% of the points fall within this level
## 3 0.006 249 0.915 91.5% of the points fall within this level
## 4 0.008 232 0.853 85.3% of the points fall within this level
## 5 0.01 206 0.757 75.7% of the points fall within this level
## 6 0.012 175 0.643 64.3% of the points fall within this level
## 7 0.014 145 0.533 53.3% of the points fall within this level
## 8 0.016 94 0.346 34.6% of the points fall within this level
## 9 0.018 81 0.298 29.8% of the points fall within this level
## 10 0.02 60 0.221 22.1% of the points fall within this level
## 11 0.022 43 0.158 15.8% of the points fall within this level
## 12 0.024 13 0.0478 4.8% of the points fall within this level

I only spelled it out to avoid blathering more but the percentages will change depending on how you modify the various parameters to the density computation (same holds true for my ggalt::geom_bkde2d() which uses a different estimator).

If there is a way to tease out the percentages without re-performing the calculations there's no better way to have that pointed out than by letting other SO R folks show how much more clever they are than the person writing this answer (hopefully in more diplomatic ways than seem to be the mode of late).

How to correctly interpret ggplot's stat_density2d

HPDregionplot in package:emdbook is supposed to do that. It does use MASS::kde2d but it normalizes the result. It has the disadvantage to my mind that it requires an mcmc object.

library(MASS)
library(coda)
HPDregionplot(mcmc(data.matrix(df)), prob=0.8)
with(df, points(x,y))

Sample Image

colouring density of stat_density2d in ggplot with ggmap

I had to figure this out just last week for work. geom_density2d creates a path, which by definition has no fill. To get a fill, you need a polygon. So instead of geom_density2d, you need to call stat_density2d(geom = "polygon").

The dataframe is just random data that gave a nice density; you should be able to adapt the same for your map (I was making a very similar map for work, so using stat_density2d should be fine).

Also note calc(level) is the replacement for ..level.., but I think it's only in the github version of ggplot2, so if you're using the CRAN version, just swap that for the older ..level..

library(tidyverse)

set.seed(1234)
df <- tibble(
x = c(rnorm(100, 1, 3), rnorm(50, 2, 2.5)),
y = c(rnorm(80, 5, 4), rnorm(30, 15, 2), rnorm(40, 2, 1)) %>% sample(150)
)

ggplot(df, aes(x = x, y = y)) +
stat_density2d(aes(fill = calc(level)), geom = "polygon") +
geom_point()

Sample Image

Created on 2018-04-26 by the reprex package (v0.2.0).

Remove gaps in a stat_density2d ggplot chart without modifying XY limits

The trick is to expand the canvas using xlim and ylim so that there is enough room for ggplot to draw complete contours around the data. Then you can use tighter xlim and ylim parameters within the coord_fixed term to show the window you want...

ggplot() + 
stat_density2d(data = Unit_J, aes(x=X, y=Y, fill=..level.., alpha=0.9),
lwd= 0.05, bins=50, col="blue", geom="polygon") +
scale_fill_continuous(low="blue",high="darkblue") +
scale_alpha(range=c(0, 0.03), guide="none") +
xlim(-7000,-3500) + ylim(400,2500) + #expanded in x direction
coord_fixed(expand=FALSE,xlim=c(-6600,-3800),ylim=c(400,2500)) + #added parameters
geom_point(data = Unit_J, aes(x=X, y=Y), alpha=0.5, cex=0.4, col="darkblue") +
theme_bw() +
theme(legend.position="none")

Sample Image



Related Topics



Leave a reply



Submit