How to Calculate Confidence Intervals for Nonlinear Least Squares in R

How to calculate confidence intervals for Nonlinear Least Squares in r?

There are 3 ways I know how to do this one of them described in the other answer. Here are some other options. This first one uses nls() to fit the model and investr::predFit to make the predictions and CI:

 library(tidyverse)
library(investr)
data <- tibble(date = 1:7,
cases = c(0, 0, 1, 4, 7, 8.5, 8.5))

model <- nls(cases ~ SSlogis(log(date), Asym, xmid, scal), data= data )
new.data <- data.frame(date=seq(1, 10, by = 0.1))
interval <- as_tibble(predFit(model, newdata = new.data, interval = "confidence", level= 0.9)) %>%
mutate(date = new.data$date)

p1 <- ggplot(data) + geom_point(aes(x=date, y=cases),size=2, colour="black") + xlab("Date") + ylab("Cases")

p1+
geom_line(data=interval, aes(x = date, y = fit ))+
geom_ribbon(data=interval, aes(x=date, ymin=lwr, ymax=upr), alpha=0.5, inherit.aes=F, fill="blue")+
theme_classic()

Sample Image

Another option is to do both the model fitting and predicting with the 'drc' pacakge (aka dose-response curves). This package uses built in starter functions that need to be used (or created), but an object of class 'drc' has many helpful methods that can utilized - one of them being predict.drc which supports confidence intervals (albeit for only some of built-in self-starters). Example with package 'drc':

library(drc)
model_drc <- drm(cases~date, data = data, fct=LL.4())
predict_drc <- as_tibble(predict(model_drc, newdata = new.data, interval = "confidence", level = 0.9)) %>%
mutate(date = new.data$date)

p1+
geom_line(data=predict_drc, aes(x = date, y = Prediction ))+
geom_ribbon(data=predict_drc, aes(x=date, ymin=Lower, ymax=Upper), alpha=0.5, inherit.aes=F, fill="red")+
ggtitle("with package 'drc'")+
theme_classic()

Sample Image

More info on the 'drc' package: journal paper, blog article describing custom self-starts for drc, and the package docs

Calculate and plot 95% confidence intervals of a generalised nonlinear model

I implemented a bootstrapping solution. I initially did standard nonparametric bootstrapping, which resamples observations, but this produces 95% CIs that look suspiciously wide — I think that this is because that form of bootstrapping fails to maintain the balance in the x-distribution (e.g. by resampling you could end up with no observations for small values of x). (It's also possible that there's just a bug in my code.)

As a second shot I switched to resampling the residuals from the initial fit and adding them to the predicted values; this is a fairly standard approach e.g. in bootstrapping time series (although I'm ignoring the possibility of autocorrelation in the residuals, which would require block bootstrapping).

Here's the basic bootstrap resampler.

df$res <- df$y-df$fit
bootfun <- function(newdata=df, perturb=0, boot_res=FALSE) {
start <- coef(mgnls)
## if we start exactly from the previously fitted coefficients we end
## up getting all-identical answers? Not sure what's going on here, but
## we can fix it by perturbing the starting conditions slightly
if (perturb>0) {
start <- start * runif(length(start), 1-perturb, 1+perturb)
}
if (!boot_res) {
## bootstrap raw data
dfboot <- df[sample(nrow(df),size=nrow(df), replace=TRUE),]
} else {
## bootstrap residuals
dfboot <- transform(df,
y=fit+sample(res, size=nrow(df), replace=TRUE))
}
bootfit <- try(update(mgnls,
start = start,
data=dfboot),
silent=TRUE)
if (inherits(bootfit, "try-error")) return(rep(NA,nrow(newdata)))
predict(bootfit,newdata=newdata)
}
set.seed(101)
bmat <- replicate(500,bootfun(perturb=0.1,boot_res=TRUE)) ## resample residuals
bmat2 <- replicate(500,bootfun(perturb=0.1,boot_res=FALSE)) ## resample observations
## construct envelopes (pointwise percentile bootstrap CIs)
df$lwr <- apply(bmat, 1, quantile, 0.025, na.rm=TRUE)
df$upr <- apply(bmat, 1, quantile, 0.975, na.rm=TRUE)
df$lwr2 <- apply(bmat2, 1, quantile, 0.025, na.rm=TRUE)
df$upr2 <- apply(bmat2, 1, quantile, 0.975, na.rm=TRUE)

Now draw the picture:

ggplot(df, aes(x,y)) +
geom_point() +
geom_ribbon(aes(ymin=lwr, ymax=upr), colour=NA, alpha=0.3) +
geom_ribbon(aes(ymin=lwr2, ymax=upr2), fill="red", colour=NA, alpha=0.3) +
geom_line(aes(y=fit)) +
theme_minimal()

The pink/light-red region is the observation-level bootstrap CIs (suspicious); the gray region is the residual bootstrap CIs.

curve with bootstrap CIs

It would be nice to try the delta method as well but (1) it makes stronger assumptions/approximations than bootstrapping anyway and (2) I'm out of time.

Cannot use predFit to get confidence interval data

Short answer

Try

a <- c(0.4,0.6)
predFit(gloss.nls, newdata = data.frame(stimulus=seq(0, 1, by = 0.1)), interval = "confidence", level= 0.9)

Long answer

First, note that your model is linear in parameters, so you can just estimate the model in plain ols, where confidence intervals are straightforward.

library(tidyverse)
gloss.lm <- lm(normP ~ I(stimulus^3)+stimulus,
data = data.mlds %>% filter(overall == TRUE) )
predict(gloss.lm, newdata = data.frame(stimulus=seq(0, 1, by = 0.1)), interval = "confidence", level= 0.9)
fit lwr upr
1 0.005554547 -0.02791979 0.03902889
2 0.061136954 0.03572392 0.08654999
3 0.119410056 0.09931945 0.13950067
4 0.183064551 0.16435972 0.20176938
5 0.254791132 0.23459593 0.27498634
6 0.337280497 0.31518226 0.35937873
7 0.433223342 0.41047149 0.45597519
8 0.545310361 0.52354420 0.56707652
9 0.676232250 0.65548488 0.69697962
10 0.828679707 0.80399326 0.85336616
11 1.005343426 0.96738940 1.04329745

If you insist on estimating the model using nonlinear least squares, then

gloss.nls <-  nls(normP ~ a[1]*stimulus^3+a[2]*stimulus,
data = data.mlds %>% filter(overall == TRUE) ,
start=list(a=c(.5, .5)) )

Annoyingly, predict.nls does not seem have confidence interval calculation, so this does not produce confidence intervals.

predict(gloss.nls, newdata = data.frame(stimulus=seq(0, 1, by = 0.1)), interval = "confidence", level= 0.9)
[1] 0.00000000 0.05704647 0.11672200 0.18165566 0.25447650 0.33781360
[7] 0.43429601 0.54655279 0.67721301 0.82890574 1.00426003

Luckily, investr::predFit has an implementation for confidence interval calculation.

library(investr)
predFit(gloss.nls, interval='prediction', newdata = data.frame(stimulus=seq(0, 1, by = 0.1), confidence=.9))

... but this returns an error (which you encountered in your question).

I did not dig too deep into predFit.nls code but it seems that it predFit silently runs gloss.nls$call in the background, and if it does not find everything it needs, it returns a weird error. It is enough to create an object into the namespace with the same shape as a to resolve the error.

a <- coef(gloss.nls)
investr::predFit(gloss.nls, interval='prediction', newdata = data.frame(stimulus=seq(0, 1, by = 0.1), confidence=.9))
fit lwr upr
[1,] 0.00000000 -0.050130071 0.05013007
[2,] 0.05704647 0.006916398 0.10717654
[3,] 0.11672200 0.066591929 0.16685207
[4,] 0.18165566 0.131525585 0.23178573
[5,] 0.25447650 0.204346430 0.30460657
[6,] 0.33781360 0.287683526 0.38794367
[7,] 0.43429601 0.384165935 0.48442608
[8,] 0.54655279 0.496422720 0.59668286
[9,] 0.67721301 0.627082944 0.72734309
[10,] 0.82890574 0.778775670 0.87903581
[11,] 1.00426003 0.954129960 1.05439010

Interestingly, the values in a do not make any difference. Try, e.g. a <- c(7500,-100) and you will get the same results. This might be a bug in investr?

a <- c(7500,-100)
predFit(gloss.nls, interval='prediction', newdata = data.frame(stimulus=seq(0, 1, by = 0.1), confidence=.9))
fit lwr upr
[1,] 0.00000000 -0.050130071 0.05013007
[2,] 0.05704647 0.006916398 0.10717654
[3,] 0.11672200 0.066591929 0.16685207
[4,] 0.18165566 0.131525585 0.23178573
[5,] 0.25447650 0.204346430 0.30460657
[6,] 0.33781360 0.287683526 0.38794367
[7,] 0.43429601 0.384165935 0.48442608
[8,] 0.54655279 0.496422720 0.59668286
[9,] 0.67721301 0.627082944 0.72734309
[10,] 0.82890574 0.778775670 0.87903581
[11,] 1.00426003 0.954129960 1.05439010

Data:

data.mlds <- structure(list(id = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L),
rank = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L), stimulus = c(0,
0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1, 0), pscale = c(0,
0.3151757, 0.9225827, 1.4164383, 1.7400011, 2.3531344, 3.1662257,
4.3538122, 5.3512879, 0), normP = c(0, 0.05889716, 0.17240385,
0.2646911, 0.32515557, 0.43973235, 0.59167546, 0.81360082,
1, 0), overall = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, FALSE)), row.names = c(NA, -10L), class = "data.frame")


Related Topics



Leave a reply



Submit