Overlay normal curve to histogram in R
Here's a nice easy way I found:
h <- hist(g, breaks = 10, density = 10,
col = "lightgray", xlab = "Accuracy", main = "Overall")
xfit <- seq(min(g), max(g), length = 40)
yfit <- dnorm(xfit, mean = mean(g), sd = sd(g))
yfit <- yfit * diff(h$mids[1:2]) * length(g)
lines(xfit, yfit, col = "black", lwd = 2)
Overlay normal curve to histogram in ggplot2
I suspect that stat_function
does indeed add the density of the normal distribution. But the y-axis range just let's it disappear all the way at the bottom of the plot. If you scale your histogram to a density with aes(x = dist, y=..density..)
instead of absolute counts, your curve from dnorm
should become visible.
(As a side note, your distribution does not look normal to me. You might want to check, e.g. with a qqplot
)
library(ggplot2)
dist = data.frame(dist = rnorm(100))
plot1 <-ggplot(data = dist) +
geom_histogram(mapping = aes(x = dist, y=..density..), fill="steelblue", colour="black", binwidth = 1) +
ggtitle("Frequences") +
stat_function(fun = dnorm, args = list(mean = mean(dist$dist), sd = sd(dist$dist)))
Overlay a Normal curve to Histogram
When we plot the density rather than the frequency histogram by setting freq=FALSE
, we may overlay a curve
of a normal distribution with the mean of the means. For the xlim
of the curve
we use the range
of the means.
mean.of.means <- mean(sapply(x, mean))
r <- range(sapply(x, mean))
v <- hist(sapply(x, mean), freq=FALSE, xlim=r, ylim=c(0, .5))
curve(dnorm(x, mean=mean.of.means, sd=1), r[1], r[2], add=TRUE, col="red")
Also possible is to draw a sufficient amount of a normal distribution, and overlay the histogram with the lines
of the density
distribution.
lines(density(rnorm(1e6, mean.of.means, 1)))
Note, that I have used 500 mean values in my answer, since the comparison with a normal distribution may become meaningless with too few values. However, you can play with the breaks=
option in the histogram
function.
Data
set.seed(42)
x <- replicate(500, rnorm(100, 100, 25), simplify = FALSE)
Using curve() and dnorm() to overlay histogram
Over the range of the data (10-35), there is very little probability in the Normal distribution with a mean of 0 and a SD of 5 (i.e. the curve starts about at the upper end of the 95% confidence interval of the distribution).
If we add freq= FALSE
to the hist()
call (as is appropriate if you want to compare a probability distribution to the histogram), we can see a little bit of the red curve at the beginning (you could also multiply by a constant if you want the tail to be more visible). (Distribution shown for a more plausible value of the mean as well [blue line].)
## png("tmp.png"); par(las=1, bty="l")
hist(mtcars$mpg, freq=FALSE)
curve(dnorm(x, 0, 5), add = TRUE, col = "red")
curve(dnorm(x, 20, 5), add = TRUE, col = "blue")
## dev.off()
Graphically, this might be clearer/more noticeable if you shaded the area under the Normal distribution curve
Related Topics
How to Tell What Is in One Vector and Not Another
Filtering a Data Frame on a Vector
Conditional Merge/Replacement in R
Why Does X[Y] Join of Data.Tables Not Allow a Full Outer Join, or a Left Join
How to Install Packages in Latest Version of Rstudio and R Version.3.1.1
How to Get a Vertical Geom_Vline to an X-Axis of Class Date
How to Suppress Warnings Globally in an R Script
How to Put Labels Over Geom_Bar in R With Ggplot2
How to Change Multiple Date Formats in Same Column
Idiomatic R Code For Partitioning a Vector by an Index and Performing an Operation on That Partition
Subset Data Frame Based on Multiple Conditions
R Install.Packages Returns "Failed to Create Lock Directory"
Aggregate a Dataframe on a Given Column and Display Another Column