Overlay Normal Curve to Histogram in R

Overlay normal curve to histogram in R

Here's a nice easy way I found:

h <- hist(g, breaks = 10, density = 10,
col = "lightgray", xlab = "Accuracy", main = "Overall")
xfit <- seq(min(g), max(g), length = 40)
yfit <- dnorm(xfit, mean = mean(g), sd = sd(g))
yfit <- yfit * diff(h$mids[1:2]) * length(g)

lines(xfit, yfit, col = "black", lwd = 2)

Overlay normal curve to histogram in ggplot2

I suspect that stat_function does indeed add the density of the normal distribution. But the y-axis range just let's it disappear all the way at the bottom of the plot. If you scale your histogram to a density with aes(x = dist, y=..density..) instead of absolute counts, your curve from dnorm should become visible.

(As a side note, your distribution does not look normal to me. You might want to check, e.g. with a qqplot)

library(ggplot2)

dist = data.frame(dist = rnorm(100))

plot1 <-ggplot(data = dist) +
geom_histogram(mapping = aes(x = dist, y=..density..), fill="steelblue", colour="black", binwidth = 1) +
ggtitle("Frequences") +
stat_function(fun = dnorm, args = list(mean = mean(dist$dist), sd = sd(dist$dist)))

Sample Image

Overlay a Normal curve to Histogram

When we plot the density rather than the frequency histogram by setting freq=FALSE, we may overlay a curve of a normal distribution with the mean of the means. For the xlim of the curve we use the range of the means.

mean.of.means <- mean(sapply(x, mean))
r <- range(sapply(x, mean))

v <- hist(sapply(x, mean), freq=FALSE, xlim=r, ylim=c(0, .5))
curve(dnorm(x, mean=mean.of.means, sd=1), r[1], r[2], add=TRUE, col="red")

Sample Image

Also possible is to draw a sufficient amount of a normal distribution, and overlay the histogram with the lines of the density distribution.

lines(density(rnorm(1e6, mean.of.means, 1)))

Note, that I have used 500 mean values in my answer, since the comparison with a normal distribution may become meaningless with too few values. However, you can play with the breaks= option in the histogram function.

Data

set.seed(42)
x <- replicate(500, rnorm(100, 100, 25), simplify = FALSE)

Using curve() and dnorm() to overlay histogram

Over the range of the data (10-35), there is very little probability in the Normal distribution with a mean of 0 and a SD of 5 (i.e. the curve starts about at the upper end of the 95% confidence interval of the distribution).

If we add freq= FALSE to the hist() call (as is appropriate if you want to compare a probability distribution to the histogram), we can see a little bit of the red curve at the beginning (you could also multiply by a constant if you want the tail to be more visible). (Distribution shown for a more plausible value of the mean as well [blue line].)

## png("tmp.png"); par(las=1, bty="l")
hist(mtcars$mpg, freq=FALSE)
curve(dnorm(x, 0, 5), add = TRUE, col = "red")
curve(dnorm(x, 20, 5), add = TRUE, col = "blue")
## dev.off()

Sample Image

Graphically, this might be clearer/more noticeable if you shaded the area under the Normal distribution curve



Related Topics



Leave a reply



Submit