Fitting a lognormal distribution to truncated data in R
Your data is not censored (that would mean that observations outside the interval
are there, but you do not know their exact value)
but truncated (those observations have been discarded).
You just have to provide fitdist
with the density and the cumulative distribution function
of your truncated distribution.
library(truncdist)
dtruncated_log_normal <- function(x, meanlog, sdlog)
dtrunc(x, "lnorm", a=.10, b=20, meanlog=meanlog, sdlog=sdlog)
ptruncated_log_normal <- function(q, meanlog, sdlog)
ptrunc(q, "lnorm", a=.10, b=20, meanlog=meanlog, sdlog=sdlog)
library(fitdistrplus)
fitdist(Dt, "truncated_log_normal", start = list(meanlog=0, sdlog=1))
# Fitting of the distribution ' truncated_log_normal ' by maximum likelihood
# Parameters:
# estimate Std. Error
# meanlog -0.7482085 0.08390333
# sdlog 1.4232373 0.0668787
R Optim() function: Truncated Log-normal Maximum Likelihood Estimation (solve for mu and sd)
I think you want to optimize a function of just two parameters. So, we could do:
x <- c(1,2,3,3,3,4,4,5,5,6,7)
log.lklh.lnorm <- function(x, mu, sd, input_min, input_max){-sum(log(dlnorm(x, meanlog = mu, sdlog = sd, log = FALSE)/((plnorm(input_max, meanlog = mu, sdlog = sd, lower.tail = TRUE, log.p = FALSE)) - (plnorm(input_min, meanlog = mu, sdlog = sd, lower.tail = TRUE, log.p = FALSE)))))}
f <- function(y) {
log.lklh.lnorm(x,y[1],y[2],1,100000)
}
optim(par = c(3,1), f)
Notes in response to further questions in the comments:
What is the function of
par = c(3,1)
? This has two functions. (1) It is the initial point (guess). (2) It determines the number of parameters to optimize for. In this case, this is 2. It is noted that you can (and should) calculate very good initial values. You can get very good estimates for μ and σ just by calculating the average and sample standard deviation. (In statistics this is sometimes calledmethod of moments
) This will make the task ofoptim
much easier. So instead ofpar=c(3,1)
we really should do:par = c(mean(x),sd(x))
.What does
y[1]/y[2]
do?. Theoptim
function organizes all parameters to optimize for, as a single vector. So I represented (μ,σ) by a single vectory
(of length 2). In the functionf
, this array is unpacked and passed on tolog.lklh.lnorm
as individual arguments.
R crashes when fitting truncated normal distribution
tl;dr fitdistr()
tries to test the distribution function by passing it an x
value of numeric(0)
, which crashes dtruncnorm()
. You could probably write a wrapper that didn't do that.
Debugging sequence
library(truncnorm)
library(fitdistrplus)
set.seed(101)
x <- rtruncnorm(100, a=8, mean=10, sd=1)
debug(fitdist)
fitdist(x,
distr = "truncnorm", fix.arg=list(a=8),
start = list(mean = 10+0.5, sd = 1+0.1))
Fails in
resdpq <- testdpqfun(distname, dpq2test, start.arg = arg_startfix$start.arg,
fix.arg = arg_startfix$fix.arg, discrete = discrete)
More minimal/deeper example:
library(truncnorm)
library(fitdistrplus)
distname <- "truncnorm"
dpq2test <- c("d", "p")
arg_startfix <- list(start.arg = list(mean = 10.5, sd = 1.1), fix.arg = list(
a = 8))
discrete <- FALSE
debug(fitdistrplus:::testdpqfun)
fitdistrplus:::testdpqfun(distname, dpq2test, start.arg = arg_startfix$start.arg,
fix.arg = arg_startfix$fix.arg, discrete = discrete)
So we don't even need x
!
Fails at
res <- rbind(res, test1fun(paste0("d", distr), start.arg, fix.arg))
So:
library(truncnorm)
library(fitdistrplus)
arg_startfix <- list(start.arg = list(mean = 10.5, sd = 1.1), fix.arg = list(
a = 8))
fitdistrplus:::test1fun("dtruncnorm",
arg_startfix$start.arg,
fix.arg = arg_startfix$fix.arg)
Which gets us down to
res0 <- try(do.call(fn, c(list(numeric(0)), start.arg, fix.arg)),
silent = TRUE)
Which suggests that just
dtruncnorm(numeric(0))
is sufficient to trigger the bug.
Related Topics
How to Change the Size of the Strip on Facets in a Ggplot
Remove Text Inside Brackets, Parens, And/Or Braces
How to Know a Function or an Operation in R Is Vectorized
Rmarkdown Removes Citation Hyperlink
How to Use Gsub() on Each Element of a Data Frame
Adding Annotation (Segment/Arrow) in Only Certain Facet Ggplot
Using Plotmath in Ggplot2 with Percent Sign (%)
Remove Some of the Axis Labels in Ggplot Faceted Plots
Dplyr . and _No Visible Binding for Global Variable '.'_ Note in Package Check
R: Ifelse Function Returns Vector Position Instead of Value (String)
Adjusting the Node Size in Igraph Using a Matrix
Specify Function Parameters in Do.Call
Prevent Knitr/Rmarkdown from Interleaving Chunk Output with Code