Fitting a Lognormal Distribution to Truncated Data in R

Fitting a lognormal distribution to truncated data in R

Your data is not censored (that would mean that observations outside the interval
are there, but you do not know their exact value)
but truncated (those observations have been discarded).

You just have to provide fitdist with the density and the cumulative distribution function
of your truncated distribution.

library(truncdist)
dtruncated_log_normal <- function(x, meanlog, sdlog)
dtrunc(x, "lnorm", a=.10, b=20, meanlog=meanlog, sdlog=sdlog)
ptruncated_log_normal <- function(q, meanlog, sdlog)
ptrunc(q, "lnorm", a=.10, b=20, meanlog=meanlog, sdlog=sdlog)

library(fitdistrplus)
fitdist(Dt, "truncated_log_normal", start = list(meanlog=0, sdlog=1))
# Fitting of the distribution ' truncated_log_normal ' by maximum likelihood
# Parameters:
# estimate Std. Error
# meanlog -0.7482085 0.08390333
# sdlog 1.4232373 0.0668787

R Optim() function: Truncated Log-normal Maximum Likelihood Estimation (solve for mu and sd)

I think you want to optimize a function of just two parameters. So, we could do:

x <- c(1,2,3,3,3,4,4,5,5,6,7)

log.lklh.lnorm <- function(x, mu, sd, input_min, input_max){-sum(log(dlnorm(x, meanlog = mu, sdlog = sd, log = FALSE)/((plnorm(input_max, meanlog = mu, sdlog = sd, lower.tail = TRUE, log.p = FALSE)) - (plnorm(input_min, meanlog = mu, sdlog = sd, lower.tail = TRUE, log.p = FALSE)))))}

f <- function(y) {
log.lklh.lnorm(x,y[1],y[2],1,100000)
}

optim(par = c(3,1), f)

Notes in response to further questions in the comments:

  1. What is the function of par = c(3,1)? This has two functions. (1) It is the initial point (guess). (2) It determines the number of parameters to optimize for. In this case, this is 2. It is noted that you can (and should) calculate very good initial values. You can get very good estimates for μ and σ just by calculating the average and sample standard deviation. (In statistics this is sometimes called method of moments) This will make the task of optim much easier. So instead of par=c(3,1) we really should do: par = c(mean(x),sd(x)).

  2. What does y[1]/y[2] do?. The optim function organizes all parameters to optimize for, as a single vector. So I represented (μ,σ) by a single vector y (of length 2). In the function f, this array is unpacked and passed on to log.lklh.lnorm as individual arguments.

R crashes when fitting truncated normal distribution

tl;dr fitdistr() tries to test the distribution function by passing it an x value of numeric(0), which crashes dtruncnorm(). You could probably write a wrapper that didn't do that.



Debugging sequence

library(truncnorm)
library(fitdistrplus)
set.seed(101)
x <- rtruncnorm(100, a=8, mean=10, sd=1)
debug(fitdist)
fitdist(x,
distr = "truncnorm", fix.arg=list(a=8),
start = list(mean = 10+0.5, sd = 1+0.1))

Fails in

 resdpq <- testdpqfun(distname, dpq2test, start.arg = arg_startfix$start.arg, 
fix.arg = arg_startfix$fix.arg, discrete = discrete)

More minimal/deeper example:

library(truncnorm)
library(fitdistrplus)
distname <- "truncnorm"
dpq2test <- c("d", "p")
arg_startfix <- list(start.arg = list(mean = 10.5, sd = 1.1), fix.arg = list(
a = 8))
discrete <- FALSE
debug(fitdistrplus:::testdpqfun)
fitdistrplus:::testdpqfun(distname, dpq2test, start.arg = arg_startfix$start.arg,
fix.arg = arg_startfix$fix.arg, discrete = discrete)

So we don't even need x!

Fails at

res <- rbind(res, test1fun(paste0("d", distr), start.arg, fix.arg))

So:

library(truncnorm)
library(fitdistrplus)
arg_startfix <- list(start.arg = list(mean = 10.5, sd = 1.1), fix.arg = list(
a = 8))
fitdistrplus:::test1fun("dtruncnorm",
arg_startfix$start.arg,
fix.arg = arg_startfix$fix.arg)

Which gets us down to

 res0 <- try(do.call(fn, c(list(numeric(0)), start.arg, fix.arg)), 
silent = TRUE)

Which suggests that just

dtruncnorm(numeric(0))

is sufficient to trigger the bug.



Related Topics



Leave a reply



Submit