Understanding Bandwidth Smoothing in Ggplot2

Understanding bandwidth smoothing in ggplot2

adjust= is not the same as bw=. When you plot

plot(density(log10(realdata), bw=1.5))
lines(density(log10(simulation), bw=1.5), lty=2)

you get the same thing as ggplot

Sample Image

For whatever reason, ggplot does not allow you to specify a bw= parameter. By default, density uses bw.nrd0() so while you changed this for the plot using base graphics, you cannot change this value using ggplot. But what get's used is adjust*bw. So since we know how to calculate the default bw, we can recalculate adjust= to give use the same value.

#helper function
bw<-function(b, x) { b/bw.nrd0(x) }

require(ggplot2)
ggplot() +
geom_density(aes(x=x, linetype="real data"), data=vec1, adjust=bw(1.5, vec1$x)) +
geom_density(aes(x=x, linetype="simulation"), data=vec2, adjust=bw(1.5, vec2$x)) +
scale_linetype_manual(name="data", 
    values=c("real data"="solid", "simulation"="dashed"))

And that results in

Sample Image

which is the same as the base graphics plot.

How do I change the kernel bandwidth used in a density plot in R

stat_geom utilises the adjust argument to apply a multiplier to the optimal bandwidth that ggplot calculates see documentation for density(). Try:

ggplot(mtcars,aes(mpg))+geom_density() + stat_density(adjust = 2)

I gather to determine the calculated optimal bandwidth - based on "the standard deviation of the smoothing kernel" - you'll need to interrogate Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. New York: Springer.

How do I make densities with different sizes have the same smoothness in ggplot2?

With the help of @Carlos and others, I found what I was looking for. It's true that the smoothness of the density should typcially refelct the size of the sample as Carlos mentioned, but in my case what I wanted is for the bandwidth of the two densities to be the same; in particular, I wanted them to be that of the smaller group. The default bandwidth in ggplot2 is bw.nrd0; I can use that on the smaller group and then set that as the global bandwidth for my plot.

bw <- bw.nrd0(bigll$educ[bigll$treat == 1])
ggplot(bigll, aes(x = educ, fill = factor(treat))) + 
       geom_density(alpha = .5, bw = bw)

Sample Image

That definitely obscures some of the detail in the larger distribution, but for my purposes this was sufficient.

How to adjust bandwidth for ridgeplots in R

Pretty sure you can just add it as an argument to geom_density_ridges() e.g.

+ geom_density_ridges(bandwidth = 0.1)

The argument is passed to the underlying function stat_density_ridges.

Meaning of band width in ggplot geom_smooth lm

By default, it is the 95% confidence level interval for predictions from a linear model ("lm"). The documentation from ?geom_smooth states that:

The default stat for this geom is stat_smooth see that documentation for more options to control the underlying statistical transformation.

Digging one level deeper, doc from ?stat_smooth tells us about the methods used to calculate the smoother's area.

For quick results, one can play with one of the arguments for stat_smooth which is level : level of confidence interval to use (0.95 by default)

By passing that parameter to geom_smooth, it is passed in turn to stat_smooth, so that if you wish to have a narrower region, you could use for instance .90 as a confidence level:

ggplot(mtcars, aes(x=wt, y=mpg)) +
    geom_point(aes(colour=factor(cyl))) +
    geom_smooth(method="lm", level=0.90)

Sample Image

Displaying smoothed (convolved) densities with ggplot2

Since my comments solved your problem, I'll convert them to an answer:

The density function takes individual measurements and calculates a kernel density distribution by convolution (gaussian is the default kernel). For example, plot(density(rnorm(1000))). You can control the smoothness with the bw (bandwidth) parameter. For example, plot(density(rnorm(1000), bw=0.01)).

But your data frame is already a density distribution (analogous to the output of the density function). To generate a smoother density estimate, you need to start with the underlying data and run density on it, adjusting bw to get the smoothness where you want it.

If you don't have access to the underlying data, you can smooth out your existing density distributions as follows:

ggplot(data=dataM, aes(x=bins, y=value, colour=variable)) + 
          geom_smooth(se=FALSE, span=0.3) + 
          scale_x_continuous(limits = c(0, 2)).

Play around with the span parameter to get the smoothness you want.

What does the span argument control in geom_smooth?

The span (also defined alpha) will determine the width of the moving window when smoothing your data.

"In a loess fit, the alpha parameter determines the width of the sliding window. More specifically, alpha gives the proportion of observations that is to be used in each local regression. Accordingly, this parameter is specified as a value between 0 and 1. The alpha value used for the loess curve in Fig. 2 is 0.65; so, each of the local regressions used to produce that curve incorporates 65% of the total data points. "

Taken from:

Jacoby (2000) Loess:: a nonparametric, graphical tool for depicting relationships between variables. Electoral Studies 19-4. (Paywalled paper)

For more details check the referenced paper.

Understanding Bandwidth Smoothing in Ggplot2