Fitting a density curve to a histogram in R
If I understand your question correctly, then you probably want a density estimate along with the histogram:
X <- c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4))
hist(X, prob=TRUE) # prob=TRUE for probabilities not counts
lines(density(X)) # add a density estimate with defaults
lines(density(X, adjust=2), lty="dotted") # add another "smoother" density
Edit a long while later:
Here is a slightly more dressed-up version:
X <- c(rep(65, times=5), rep(25, times=5), rep(35, times=10), rep(45, times=4))
hist(X, prob=TRUE, col="grey")# prob=TRUE for probabilities not counts
lines(density(X), col="blue", lwd=2) # add a density estimate with defaults
lines(density(X, adjust=2), lty="dotted", col="darkgreen", lwd=2)
along with the graph it produces:
How to fit a curve to a histogram
OK, so you are just struggling with the fact that density
goes beyond "natural range". Well, just set cut = 0
. You possibly want to read plot.density
extends “xlim” beyond the range of my data. Why and how to fix it? for why. In that answer, I was using from
and to
. But now I am using cut
.
## consider a mixture, that does not follow any parametric distribution family
## note, by construction, this is a strictly positive random variable
set.seed(0)
x <- rbeta(1000, 3, 5) + rexp(1000, 0.5)
## (kernel) density estimation offers a flexible nonparametric approach
d <- density(x, cut = 0)
## you can plot histogram and density on the density scale
hist(x, prob = TRUE, breaks = 50)
lines(d, col = 2)
Note, by cut = 0
, density estimation is done strictly within range(x)
. Outside this range, density is 0.
Fitting Density Curves to Histograms put into a Pairs Plot in R
Using the iris
data set that comes with R and simplifying your panel function, this seems to work:
data(iris)
panel.hist = function(x, ...) {
usr = par("usr"); on.exit(par(usr))
par(usr = c(usr[1:2], 0, 1.5))
hist(x, freq = FALSE, col="cyan", add=TRUE)
lines(density(x))
}
pairs(iris[, 1:4], upper.panel = panel.smooth, diag.panel = panel.hist)
You did not provide panel.cor
or panel.smooth
functions.
Overlay normal curve to histogram in R
Here's a nice easy way I found:
h <- hist(g, breaks = 10, density = 10,
col = "lightgray", xlab = "Accuracy", main = "Overall")
xfit <- seq(min(g), max(g), length = 40)
yfit <- dnorm(xfit, mean = mean(g), sd = sd(g))
yfit <- yfit * diff(h$mids[1:2]) * length(g)
lines(xfit, yfit, col = "black", lwd = 2)
Overlay histogram with density curve
Here you go!
# create some data to work with
x = rnorm(1000);
# overlay histogram, empirical density and normal density
p0 = qplot(x, geom = 'blank') +
geom_line(aes(y = ..density.., colour = 'Empirical'), stat = 'density') +
stat_function(fun = dnorm, aes(colour = 'Normal')) +
geom_histogram(aes(y = ..density..), alpha = 0.4) +
scale_colour_manual(name = 'Density', values = c('red', 'blue')) +
theme(legend.position = c(0.85, 0.85))
print(p0)
Fit curve to histogram ggplot
Depending on your goals, something like this may work by just scaling the density curve using multiplication:
ggplot(df, aes(x=x)) + geom_histogram() + geom_density(aes(y=..density..*10))
or
ggplot(df, aes(x=x)) + geom_histogram() + geom_density(aes(y=..count../10))
Choose other values (instead of 10) if you want to scale things differently.
Edit:
Since you are defining your scaling factor in the global environment, you can define it within aes
:
ggplot(df, aes(x=x)) + geom_histogram() + geom_density(aes(n=n, y=..density..*n))
# or
ggplot(df, aes(x=x, n=n)) + geom_histogram() + geom_density(aes(y=..density..*n))
or another, less nice way using get
:
ggplot(df, aes(x=x)) +
geom_histogram() +
geom_density(aes(y=..density.. * get("n", pos = .GlobalEnv)))
Related Topics
Combine Two Data Frames by Rows (Rbind) When They Have Different Sets of Columns
Why Is '[' Better Than 'Subset'
How to Fix Spaces in Column Names of a Data.Frame (Remove Spaces, Inject Dots)
Numbering Rows Within Groups in a Data Frame
Sum Across Multiple Columns With Dplyr
Replacing Nas With Latest Non-Na Value
Add Row to a Data Frame With Total Sum for Each Column
Filter Data.Frame Rows by a Logical Condition
How to Change Y Axis Limits in Decimal Points in R
Add Legend to Ggplot2 Line Plot
How to Find the Closest Date to a Given Date
R Memory Management/Cannot Allocate Vector of Size N Mb
How to Join (Merge) Data Frames (Inner, Outer, Left, Right)
Add Regression Line Equation and R^2 on Graph
Split Comma-Separated Strings in a Column into Separate Rows
Create Stacked Barplot Where Each Stack Is Scaled to Sum to 100%