Plotting Dose Response Curves with Ggplot2 and Drc

ggplot for dose-response curve using drc package

you can define the X-axis limits within the scale_x_continuous() function:

ggplot(df, aes(x = conc0, y = gi)) +
  geom_point() +
  geom_ribbon(data=newdata, aes(x=conc, y=p, ymin=pmin, ymax=pmax), alpha=0.2) +
  geom_line(data=newdata, aes(x=conc, y=p)) +
  coord_trans(x="log") +
  # here you can decide the limits of the x-axis
  scale_x_continuous(limits = c(100,3000)) +
  ggtitle("Pyridine") + xlab("Concentration (mg/l)") + ylab("Growth inhibition")

acording to your comment:

ggplot(df, aes(x = conc0, y = gi)) +
  geom_point() +
  geom_ribbon(data=newdata, aes(x=conc, y=p, ymin=pmin, ymax=pmax), alpha=0.2) +
  geom_line(data=newdata, aes(x=conc, y=p)) +
  coord_trans(x="log") +
  # here you can decide the limits of the x-axis, breaks and labels
  scale_x_log10(limits = c(10, 3000), breaks = c(10, 100, 1000, 2000, 3000), labels = c(10, 100, 1000, 2000, 3000)) +
  ggtitle("Pyridine") + xlab("Concentration (mg/l)") + ylab("Growth inhibition") + theme(axis.text.x = element_text(angle = 90))

Error in plotting dose-response curve in ggplot with drc package

You have conc values above log(3000) and you only created newdata for values until log(100) so you're not able to fit until log(3000), you just need to increase log(100) in expand.grid to higher values, examples with log(3000) :

newdata <- expand.grid(conc=exp(seq(log(0.5), log(3000), length=100)))

Sample Image

Plotting Dose Response Curve from Survival Data

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
library(drc)
#> Loading required package: MASS
#> 
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#> 
#>     select
#> 
#> 'drc' has been loaded.
#> Please cite R and 'drc' if used for a publication,
#> for references type 'citation()' and 'citation('drc')'.
#> 
#> Attaching package: 'drc'
#> The following objects are masked from 'package:stats':
#> 
#>     gaussian, getInitial

df <- read.table("https://pastebin.com/raw/sH5hCr2J", header=T)

Making the mortality or as I do here survival, can be done easily with the dplyr package. This will help perform many calculations. It seems that you are interested in calcuating the percent survival for each Concentration across your four rooms (or replicates). So the first step is to group the data by these columns and then calculate the statistic we want.

df_calc <- df %>%
  group_by(Concentration, room) %>%
  summarise(surv = sum(Status)/n())
#> `summarise()` has grouped output by 'Concentration'. You can override using the `.groups` argument.

I don’t know if Concentration represents arbitrary
Concentration levels so I’m moving forward with the following
assumption:

1 == higher levels of sugar, 2 == lower levels of sugar
Concentrations were coding in log space - so I convert to linear space

df_calc <- mutate(df_calc, conc = exp(-Concentration))

Just to be clear, the conc variable is just my attempt at having something close to the true known concentrations of the experiment. If your data has the true concentrations, then don't mind this calculation.

df_calc
#> # A tibble: 12 x 4
#> # Groups:   Concentration [3]
#>    Concentration  room  surv   conc
#>            <int> <int> <dbl>  <dbl>
#>  1             1     1   0.5 0.368 
#>  2             1     2   0.4 0.368 
#>  3             1     3   0.5 0.368 
#>  4             1     4   0.6 0.368 
#>  5             2     1   0   0.135 
#>  6             2     2   0.4 0.135 
#>  7             2     3   0.2 0.135 
#>  8             2     4   0.4 0.135 
#>  9             3     1   0.2 0.0498
#> 10             3     2   0   0.0498
#> 11             3     3   0   0.0498
#> 12             3     4   0.2 0.0498

mod <- drm(surv ~ conc, data =  df_calc, fct = LL.3())

Make new conc data points

newdata <- data.frame(conc = exp(seq(log(0.01), log(10), length = 100)))

EDIT

To respond to your comment I'll explain the above code chunk. Again the conc variable is expected to be the unit concentration. In this hypothetical case, we have three concentration levels c(0.049, 0.135, 0.368). For brevity, lets assume the units are mg of sugar/ml of water. Our model was fit on these three dose levels with 4 data points per dose level. If we wanted, we could have just plotted the curve between these levels of c(0.049, 0.368), but in this example, I chose c(0.01, 10) mg/ml as the domain to plot on. This was just so that we could visualize where the curve would end up based on the model fit. In short, you choose the range that you are interested in most. As I show later - even though we can choose data points outside the range of the experimental data, the confidence intervals are extremely large indicating the model will be unhelpful for those points.

The reason behind casting these values with the log() function is to ensure that we are sampling points that look evenly distributed on a log10 scale (most does response curves are plotted with this transformation). Once we get the sequence of 100 points, we use exp() to return back to the linear space (which our model was fit on). These values are then used in the predict function as the new dose levels in conjunction with the fitted model.

All this is saved into newdata variable which allows for the plotting of the line and the confidence intervals.

Use the model and the generated data points to
predict a new surv value as well as the upper and lower bound

newdata <- cbind(newdata,
                 suppressWarnings(predict(mod, newdata = newdata, interval="confidence")))

plot with ggplot2

ggplot(df_calc, aes(conc)) +
  geom_point(aes(y = surv)) +
  geom_ribbon(aes(ymin = Lower, ymax = Upper), data = newdata, alpha = 0.2) +
  geom_line(aes(y = Prediction), data = newdata) +
  scale_x_log10() +
  coord_cartesian(ylim = c(0, 1))

Sample Image

As you may notice, the confidence intervals increase greately when we try and
predict ranges that has no data.

^{Created on 2021-10-27 by the reprex package (v1.0.0)}

Plotting dr4pl dose response curves, and how to integrate them with ggplot2?

I can get most of the way but not all the way. The main step is to write a predict() method for dr4pl objects:

predict.dr4pl <- function (object, newdata=NULL, se.fit=FALSE, level, interval) {
    xseq <- if (is.null(newdata)) object$data$Dose else newdata$x
    pred <- MeanResponse(xseq, object$parameters)
    if (!se.fit) {
        return(pred)
    }
    qq <- qnorm((1+level)/2)
    se <- sapply(xseq,
                 function(x) car::deltaMethod(object, 
    "UpperLimit + (LowerLimit - UpperLimit)/(1 + (x/IC50)^Slope)")[["Estimate"]])
    return(list(fit=data.frame(fit=pred,lwr=pred-qq*se,
                                upr=pred+qq*se), se.fit=se))
}

I included a slightly hacky way to compute the confidence intervals via the delta method - this might not be too reliable (bootstrapping would be better ...)

It works OK (sort of) for your data (changed the name to dd because it's sometimes dicey to name the data data (fortunes::fortune("dog"))).

dd <- data.frame(dose = c(0.078125,0.156250,0.312500,0.625000,1.25,
                          2.50,5.0,10.0,20.0),
                 POC = c(1.05637425, 0.87380081, 0.79171200,
                         0.83166848, 0.77361290, 0.35199288,
                         0.19404609,  0.09079221, 0.09850658))

library(dr4pl)
ggplot(dd, aes(dose,POC)) + geom_point() +
     geom_smooth(method="dr4pl",se=TRUE) + coord_trans(x="log10")

4-parameter logistic curve

the confidence intervals are terrible, turn them off with se=FALSE
dr4pl puts the x-axis on a log10-scale by default, but the standard scale_x_log10() screws this up because it is applied before the fitting and prediction, so I use coord_trans(x="log10") instead.
However, coord_trans() doesn't play so nicely if the axes are on a very broad logarithmic scale - I tried the example above with the sample_data_1 data from the package and it didn't work.

But I'm afraid I've spent enough time on this for now.

It would more robust to use the predict method above to generate the values you want, over the range you want, separately, and then use geom_line() + geom_ribbon() to add the information to the plot ....

If you're willing to fit the model first (outside geom_smooth) you can do this (this is using sample_data_1 from dr4pl package - it's from the first example in ?dr4pl)

model2 <- dr4pl(dose = sample_data_1$Dose,
           response = sample_data_1$Response)

ggplot(sample_data_1, aes(Dose,Response)) + geom_point() +
    stat_function(fun=function(x) predict(model2,newdata=data.frame(x=x))) +
    scale_x_log10()

which is less sensitive to the order of scaling/unscaling the x axis.

Improved but slow bootstrap CIs:

predictdf.dr4pl <- function (model, xseq, se, level, nboot=200) {
    pred <- MeanResponse(xseq, model$parameters)
    if (!se) {
        return(base::data.frame(x=xseq, y=pred))
    }
    ## bootstrap residuals
    pred0 <- MeanResponse(model$data$Dose, model$parameters)
    res <- pred0-model$data$Response
    bootres <- matrix(nrow=length(xseq), ncol=nboot)
    pb <- txtProgressBar(max=nboot,style=3)
    for (i in seq(nboot)) {
        setTxtProgressBar(pb,i)
        mboot <- dr4pl(model$data$Dose,
                       pred0 + sample(res, size=length(pred0),
                                      replace=TRUE))
        bootres[,i] <- MeanResponse(xseq, mboot$parameters)
    }
    fit <- data.frame(x = xseq,
                      y=pred,
                      ymin=apply(bootres,1,quantile,(1-level)/2),
                      ymax=apply(bootres,1,quantile,(1+level)/2))
    return(fit)
}

print(ggplot(dd, aes(dose,POC))
    + geom_point()
    + geom_smooth(method="dr4pl",se=TRUE) + coord_trans(x="log10")
)

Sample Image

drc:: drc plot with ggplot2

NB, you can skip to the final paragraph for the simple answer. The rest of this answer documents how I arrived at that solution

Looking at the code for drc:::plot.drc, we can see that the final line invisibly returns a data.frame retData

function (x, ..., add = FALSE, level = NULL, type = c("average", 
                                                      "all", "bars", "none", "obs", "confidence"), broken = FALSE, 
          bp, bcontrol = NULL, conName = NULL, axes = TRUE, gridsize = 100, 
          log = "x", xtsty, xttrim = TRUE, xt = NULL, xtlab = NULL, 
          xlab, xlim, yt = NULL, ytlab = NULL, ylab, ylim, cex, cex.axis = 1, 
          col = FALSE, lty, pch, legend, legendText, legendPos, cex.legend = 1, 
          normal = FALSE, normRef = 1, confidence.level = 0.95) 
{
  # ...lot of lines omitted...
  invisible(retData)
}

retData contains the coordinates for the fitted model line, so we can use this to ggplot the same model that plot.drc uses

pl <- plot(chickweed.m1, xlab = "Time (hours)", ylab = "Proportion germinated", 
        xlim=c(0, 340), ylim=c(0, 0.25), log="", lwd=2, cex=1.2)
names(pl) <- c("x", "y")
ggplot(data= dt1Means, mapping=aes(x=start, y=Germinated)) + 
  geom_point() +
  geom_line(data=pl, aes(x=x, y = y)) +
  lims(y=c(0, 0.25)) +
  theme_bw()

Sample Image

Which is the same as the version you created in ggplot using predict(object=chickweed.m1). So, the difference is not in the model lines, but in where the data points are plotted. We can export the data point from drc:::plot.drc by changing the last line of the function from invisible(retData) to list(retData, plotPoints). For convenience, I copied the entire code of drc:::plot.drc into a new function. Note that if you wish to replicate this step, there are a few functions called by drcplot that are not exported in the drc namespace, so drc::: needs to be prepended to all calls to the functions parFct, addAxes, brokenAxis, and makeLegend.

drcplot <- function (x, ..., add = FALSE, level = NULL, type = c("average", 
                                                      "all", "bars", "none", "obs", "confidence"), broken = FALSE, 
          bp, bcontrol = NULL, conName = NULL, axes = TRUE, gridsize = 100, 
          log = "x", xtsty, xttrim = TRUE, xt = NULL, xtlab = NULL, 
          xlab, xlim, yt = NULL, ytlab = NULL, ylab, ylim, cex, cex.axis = 1, 
          col = FALSE, lty, pch, legend, legendText, legendPos, cex.legend = 1, 
          normal = FALSE, normRef = 1, confidence.level = 0.95) 
{
  # ...lot of lines omitted...
  list(retData, plotPoints)
}

and run this with your data

pl <- drcplot(chickweed.m1, xlab = "Time (hours)", ylab = "Proportion germinated", 
          xlim=c(0, 340), ylim=c(0, 0.25), log="", lwd=2, cex=1.2)

germ.points <- as.data.frame(pl[[2]])
drc.fit <- as.data.frame(pl[[1]])
names(germ.points) <- c("x", "y")
names(drc.fit) <- c("x", "y")

Now, plotting these with ggplot2 gets what you were looking for

ggplot(data= dt1Means, mapping=aes(x=start, y=Germinated)) + 
  geom_point(data=germ.points, aes(x=x, y = y)) +
  geom_line(data=drc.fit, aes(x=x, y = y)) +
  lims(y=c(0, 0.25)) +
  theme_bw()

Sample Image

Finally, comparing the data point values of this plot (germ.points) with those in your original ggplot (dt1Means), shows the reason for the discrepancy. Your calculated points in dt1Means are shifted one time period earlier relative to those in plot.drc. In other words, plot.drc is assigning the events to the end time of the period in which they occur, whereas you are assigning germination events to the start of the time interval in which they occur. You can simply adjust this by, for example, using

dt1 <- data.table(chickweed)
dt1[, Germinated := mean(count)/200, by=start]
dt1[, cum_Germinated := cumsum(Germinated)]
dt1[, Pred := c(predict(object=chickweed.m1), NA)]  # Note that the final time period which ends at `Inf` can not be predicted by the model, therefore added `NA` in the final row

ggplot(data= dt1, mapping=aes(x=end, y=cum_Germinated)) + 
  geom_point() +
  geom_line(aes(y = Pred)) +
  lims(y=c(0, 0.25)) +
  theme_bw()

Plotting drc model in ggplot2; issue with seq( )

You mistook dose values to give predict() or aes(x) values:

log10000 <- exp(seq(log(0.5), log(10000), length=200))
log1000 <- exp(seq(log(0.5), log(1000), length=200))

log10000df <- as.data.frame(cbind(dose = log10000, predict(mod1, data.frame(dose = log10000), interval="confidence")))
log1000df <- as.data.frame(cbind(dose = log1000, predict(mod1, data.frame(dose = log1000), interval="confidence")))

 ## a common part
p0 <- ggplot(mydata1, aes(x = dose, y = probability)) +
  geom_point() + coord_trans(x="log") + 
  xlab("dose") + ylab("response") + xlim(0.5, 10001)

p10000 <- p0 + geom_line(data = log10000df, aes(x = dose, y = Prediction)) +
  geom_ribbon(data = log10000df, aes(x = dose, y = Prediction, ymin = Lower, ymax = Upper),
              alpha = 0.2, color = "blue", fill = "pink")
  
p1000 <- p0 + geom_line(data = log1000df, aes(x = dose, y = Prediction)) +
  geom_ribbon(data = log1000df, aes(x = dose, y = Prediction, ymin = Lower, ymax = Upper),
              alpha = 0.2, color = "blue", fill = "pink")

Sample Image

Plotting Dose Response Curves with Ggplot2 and Drc

ggplot for dose-response curve using drc package

Error in plotting dose-response curve in ggplot with drc package

Plotting Dose Response Curve from Survival Data

EDIT

Plotting dr4pl dose response curves, and how to integrate them with ggplot2?

drc:: drc plot with ggplot2

Plotting drc model in ggplot2; issue with seq( )

Related Topics

Leave a reply