what does ..level.. mean in ggplot::stat_density2d
Expanding on the answer provided by @hrbrmstr -- first, the call to geom_density2d()
is redundant. That is, you can achieve the same results with:
library(ggplot2)
library(MASS)
gg <- ggplot(geyser, aes(x = duration, y = waiting)) +
geom_point() +
stat_density2d(aes(fill = ..level..), geom = "polygon")
Let's consider some other ways to visualize this density estimate that may help clarify what is going on:
base_plot <- ggplot(geyser, aes(x = duration, y = waiting)) +
geom_point()
base_plot +
stat_density2d(aes(color = ..level..))
base_plot +
stat_density2d(aes(fill = ..density..), geom = "raster", contour = FALSE)
base_plot +
stat_density2d(aes(alpha = ..density..), geom = "tile", contour = FALSE)
Notice, however, we can no longer see the points generated from geom_point()
.
Finally, note that you can control the bandwidth of the density estimate. To do this, we pass x
and y
bandwidth arguments to h
(see ?kde2d
):
base_plot +
stat_density2d(aes(fill = ..density..), geom = "raster", contour = FALSE,
h = c(2, 5))
Again, the points from geom_point()
are hidden as they are behind the call to stat_density2d()
.
What does fill=..level.. mean in ggplot?
The dot-dot notation ..some_var..
allows to access statistics computed by ggplot to construct the plot. In this case, the levels for the 2d-density. You can, instead, extract and use them with stat()
: see related question on RStudio community
stat_density2d - What does the legend mean?
Let's take the faithful
example from ggplot2:
ggplot(faithful, aes(x = eruptions, y = waiting)) +
stat_density_2d(aes(fill = factor(stat(level))), geom = "polygon") +
geom_point() +
xlim(0.5, 6) +
ylim(40, 110)
(apologies in advance for not making this prettier)
The level is the height at which the 3D "mountains" were sliced. I don't know of a way (others might) to translate that to a percentage but I do know to get you said percentages.
If we look at that chart, level 0.002
contains the vast majority of the points (all but 2). Level 0.004
is actually 2 polygons and they contain all but ~dozen of the points. If I'm getting the gist of what you're asking that's what you want to know, except not count but the percentage of points encompassed by polygons at a given level. That's straightforward to compute using the methodology from the various ggplot2 "stats" involved.
Note that while we're importing the tidyverse
and sp
packages we'll use some other functions fully-qualified. Now, let's reshape the faithful
data a bit:
library(tidyverse)
library(sp)
xdf <- select(faithful, x = eruptions, y = waiting)
(easier to type x
and y
)
Now, we'll compute the two-dimensional kernel density estimation the way ggplot2 does:
h <- c(MASS::bandwidth.nrd(xdf$x), MASS::bandwidth.nrd(xdf$y))
dens <- MASS::kde2d(
xdf$x, xdf$y, h = h, n = 100,
lims = c(0.5, 6, 40, 110)
)
breaks <- pretty(range(zdf$z), 10)
zdf <- data.frame(expand.grid(x = dens$x, y = dens$y), z = as.vector(dens$z))
z <- tapply(zdf$z, zdf[c("x", "y")], identity)
cl <- grDevices::contourLines(
x = sort(unique(dens$x)), y = sort(unique(dens$y)), z = dens$z,
levels = breaks
)
I won't clutter the answer with str()
output but it's kinda fun looking at what happens there.
We can use spatial ops to figure out how many points fall within given polygons, then we can group the polygons at the same level to provide counts and percentages per-level:
SpatialPolygons(
lapply(1:length(cl), function(idx) {
Polygons(
srl = list(Polygon(
matrix(c(cl[[idx]]$x, cl[[idx]]$y), nrow=length(cl[[idx]]$x), byrow=FALSE)
)),
ID = idx
)
})
) -> cont
coordinates(xdf) <- ~x+y
data_frame(
ct = sapply(over(cont, geometry(xdf), returnList = TRUE), length),
id = 1:length(ct),
lvl = sapply(cl, function(x) x$level)
) %>%
count(lvl, wt=ct) %>%
mutate(
pct = n/length(xdf),
pct_lab = sprintf("%s of the points fall within this level", scales::percent(pct))
)
## # A tibble: 12 x 4
## lvl n pct pct_lab
## <dbl> <int> <dbl> <chr>
## 1 0.002 270 0.993 99.3% of the points fall within this level
## 2 0.004 259 0.952 95.2% of the points fall within this level
## 3 0.006 249 0.915 91.5% of the points fall within this level
## 4 0.008 232 0.853 85.3% of the points fall within this level
## 5 0.01 206 0.757 75.7% of the points fall within this level
## 6 0.012 175 0.643 64.3% of the points fall within this level
## 7 0.014 145 0.533 53.3% of the points fall within this level
## 8 0.016 94 0.346 34.6% of the points fall within this level
## 9 0.018 81 0.298 29.8% of the points fall within this level
## 10 0.02 60 0.221 22.1% of the points fall within this level
## 11 0.022 43 0.158 15.8% of the points fall within this level
## 12 0.024 13 0.0478 4.8% of the points fall within this level
I only spelled it out to avoid blathering more but the percentages will change depending on how you modify the various parameters to the density computation (same holds true for my ggalt::geom_bkde2d()
which uses a different estimator).
If there is a way to tease out the percentages without re-performing the calculations there's no better way to have that pointed out than by letting other SO R folks show how much more clever they are than the person writing this answer (hopefully in more diplomatic ways than seem to be the mode of late).
How to correctly interpret ggplot's stat_density2d
HPDregionplot in package:emdbook is supposed to do that. It does use MASS::kde2d but it normalizes the result. It has the disadvantage to my mind that it requires an mcmc object.
library(MASS)
library(coda)
HPDregionplot(mcmc(data.matrix(df)), prob=0.8)
with(df, points(x,y))
colouring density of stat_density2d in ggplot with ggmap
I had to figure this out just last week for work. geom_density2d
creates a path, which by definition has no fill. To get a fill, you need a polygon. So instead of geom_density2d
, you need to call stat_density2d(geom = "polygon")
.
The dataframe is just random data that gave a nice density; you should be able to adapt the same for your map (I was making a very similar map for work, so using stat_density2d
should be fine).
Also note calc(level)
is the replacement for ..level..
, but I think it's only in the github version of ggplot2
, so if you're using the CRAN version, just swap that for the older ..level..
library(tidyverse)
set.seed(1234)
df <- tibble(
x = c(rnorm(100, 1, 3), rnorm(50, 2, 2.5)),
y = c(rnorm(80, 5, 4), rnorm(30, 15, 2), rnorm(40, 2, 1)) %>% sample(150)
)
ggplot(df, aes(x = x, y = y)) +
stat_density2d(aes(fill = calc(level)), geom = "polygon") +
geom_point()
Created on 2018-04-26 by the reprex package (v0.2.0).
Remove gaps in a stat_density2d ggplot chart without modifying XY limits
The trick is to expand the canvas using xlim
and ylim
so that there is enough room for ggplot
to draw complete contours around the data. Then you can use tighter xlim
and ylim
parameters within the coord_fixed
term to show the window you want...
ggplot() +
stat_density2d(data = Unit_J, aes(x=X, y=Y, fill=..level.., alpha=0.9),
lwd= 0.05, bins=50, col="blue", geom="polygon") +
scale_fill_continuous(low="blue",high="darkblue") +
scale_alpha(range=c(0, 0.03), guide="none") +
xlim(-7000,-3500) + ylim(400,2500) + #expanded in x direction
coord_fixed(expand=FALSE,xlim=c(-6600,-3800),ylim=c(400,2500)) + #added parameters
geom_point(data = Unit_J, aes(x=X, y=Y), alpha=0.5, cex=0.4, col="darkblue") +
theme_bw() +
theme(legend.position="none")
Related Topics
Predicting Lda Topics for New Data
Make Dataframe of Top N Frequent Terms for Multiple Corpora Using Tm Package in R
Label Minimum and Maximum of Scale Fill Gradient Legend with Text: Ggplot2
How to Create a Continuous Density Heatmap of 2D Scatter Data in R
Error: Zipping Up Workbook Failed When Trying to Write.Xlsx
Defining Minimum Point Size in Ggplot2 - Geom_Point
R Text Mining Documents from CSV File (One Row Per Doc)
In R Plot Arima Fitted Model with the Original Series
Ordering Permutation in Rcpp I.E. Base::Order()
R: Filling Missing Dates in a Time Series
Put Multiple Data Frames into List (Smart Way)
Street Address to Geolocation Lat/Long
Formatting Ggplot2 Axis Labels with Commas (And K? Mm) If I Already Have a Y-Scale
What's the Difference Between Reactive Value and Reactive Expression
Unnesting a List of Lists in a Data Frame Column
Passing Arguments to Iterated Function Through Apply
Ggplot2: Multiple Plots with Different Variables in a Single Row, Single Grouping Legend