Overlay normal curve to histogram in ggplot2
I suspect that stat_function
does indeed add the density of the normal distribution. But the y-axis range just let's it disappear all the way at the bottom of the plot. If you scale your histogram to a density with aes(x = dist, y=..density..)
instead of absolute counts, your curve from dnorm
should become visible.
(As a side note, your distribution does not look normal to me. You might want to check, e.g. with a qqplot
)
library(ggplot2)
dist = data.frame(dist = rnorm(100))
plot1 <-ggplot(data = dist) +
geom_histogram(mapping = aes(x = dist, y=..density..), fill="steelblue", colour="black", binwidth = 1) +
ggtitle("Frequences") +
stat_function(fun = dnorm, args = list(mean = mean(dist$dist), sd = sd(dist$dist)))
Plotting normal curve over histogram using ggplot2: Code produces straight line at 0
Your curve and histograms are on different y scales and you didn't check the help page on stat_function
, otherwise you'd've put the arguments in a list
as it clearly shows in the example. You also aren't doing the aes
right in your initial ggplot
call. I sincerely suggest hitting up more tutorials and books (or at a minimum the help pages) vs learn ggplot piecemeal on SO.
Once you fix the stat_function
arg problem and the ggplot``aes
issue, you need to tackle the y axis scale difference. To do that, you'll need to switch the y for the histogram to use the density from the underlying stat_bin
calculated data frame:
library(ggplot2)
gg <- ggplot(mtcars, aes(x=mpg))
gg <- gg + geom_histogram(binwidth=2, colour="black",
aes(y=..density.., fill=..count..))
gg <- gg + scale_fill_gradient("Count", low="#DCDCDC", high="#7C7C7C")
gg <- gg + stat_function(fun=dnorm,
color="red",
args=list(mean=mean(mtcars$mpg),
sd=sd(mtcars$mpg)))
gg
ggplot add Normal Distribution while using `facet_wrap`
A while I ago I sort of automated this drawing of theoretical densities with a function that I put in the ggh4x package I wrote, which you might find convenient. You would just have to make sure that the histogram and theoretical density are at the same scale (for example counts per x-axis unit).
library(palmerpenguins)
library(tidyverse)
library(ggh4x)
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(binwidth = 1) +
stat_theodensity(aes(y = after_stat(count))) +
facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
You can vary the bin size of the histogram, but you'd have to adjust the theoretical density count too. Typically you'd multiply by the binwidth.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(binwidth = 2) +
stat_theodensity(aes(y = after_stat(count)*2)) +
facet_wrap(~species)
#> Warning: Removed 2 rows containing non-finite values (stat_bin).
Created on 2021-01-27 by the reprex package (v0.3.0)
If this is too much of a hassle, you can always convert the histogram to density instead of the density to counts.
penguins %>%
ggplot(aes(x=bill_length_mm, fill = species)) +
geom_histogram(aes(y = after_stat(density))) +
stat_theodensity() +
facet_wrap(~species)
Histogram with normal Distribution in R using ggplot2 for illustrations
If your question how to plot histograms like the one you attached in your last figure, this 9 lines of code produce a very similar result.
library(magrittr) ; library(ggplot2)
set.seed(42)
data <- rnorm(1e5)
p <- data %>%
as.data.frame() %>%
ggplot(., aes(x = data)) +
geom_histogram(fill = "white", col = "black", bins = 30 ) +
geom_density(aes( y = 0.3 *..count..)) +
labs(x = "Statistics", y = "Probability/Density") +
theme_bw() + theme(axis.text = element_blank())
You could use annotate()
to add symbols or text and geom_segment
to show the intervals on the plot like this:
p + annotate(x = sd(data)/2 , y = 8000, geom = "text", label = "σ", size = 10) +
annotate(x = sd(data) , y = 6000, geom = "text", label = "2σ", size = 10) +
annotate(x = sd(data)*1.5 , y = 4000, geom = "text", label = "3σ", size = 10) +
geom_segment(x = 0, xend = sd(data), y = 7500, yend = 7500) +
geom_segment(x = 0, xend = sd(data)*2, y = 5500, yend = 5500) +
geom_segment(x = 0, xend = sd(data)*3, y = 3500, yend = 3500)
This chunk of code would give you something like this:
Related Topics
Basic Lag in R Vector/Dataframe
Replace Missing Values With Column Mean
Plotting Lines and the Group Aesthetic in Ggplot2
Frequency Count of Two Column in R
How to Perform Natural (Lexicographic) Sorting in R
Sum Values in a Rolling/Sliding Window
Identify Groups of Linked Episodes Which Chain Together
Scatterplot With Too Many Points
Subset Data Frame Based on Multiple Conditions
How to Save Plots That Are Made in a Shiny App
R Conditional Evaluation When Using the Pipe Operator %≫%
Quit and Restart a Clean R Session from Within R
Yaml Current Date in Rmarkdown
Understanding the Order() Function
How to See the Source Code of R .Internal or .Primitive Function