Extract Coefficients from Ggplot2-Created Nls Fit

Extract coefficients from ggplot2-created nls fit

My question is: using this construction, is it possible to pull out the actual nls object from that call? I'd like to know my coefficients, etc.

This is currently not possible in ggplot2. The ggplot2 functions return predictions from the model, but not the model object itself. Thus, you cannot extract an nls object from the ggplot object to find the coefficients, etc.

There are two relevant discussions in the ggplot2 and ggplot2-dev mailing lists:

https://groups.google.com/d/topic/ggplot2/7tiUB2sjCxM/discussion

https://groups.google.com/d/topic/ggplot2-dev/dLGJnzIg4ko/discussion

Quick synopsis:

While many users have asked for the ability to extract statistics from ggplot objects, the developers are considering it but seem somewhat opposed. They would prefer users to use ggplot2 for visualization, and appropriate modelling functions to explore modelling parameters. However, Hadley supports the idea of implementing the ability to pass a model object to a ggplot() call. So, instead of trying to extract the nls object from your ggplot object, you would instead:

mod <- nls(y ~ N * dnorm(x, m, s), se = F, start = list(m = 20, s = 5, N = 300), 
        data = myhist)
ggplot(data = myhist, aes(x = size, y = counts)) + geom_point() + 
        geom_smooth(mod)

That way, the model only needs to be called once, you can do anything you want to it, and you don't have to go searching through ggplot objects to find it. However, I don't know when or if this will be implemented.

How to plot the output from an nls model fit in ggplot2

The ideal solution would plot the results of nls() using ggplot, but here's a "quick and dirty" solution based on a couple of observations.

First, you can be sure that if you use the same formula for nls() and geom_smooth(method = "nls"), you will get the same coefficients. That's because the latter is calling the former.

Second, using your example data, nls() converges to the same values of Vmax and Km (different for each drug), regardless of start value. In other words, there's no need to build models using start values in the range for each individual drug. Any of the following give the same result for drug 1 (and similarly for drug 2):

library(dplyr)
# use maximum as start
df1 %>% 
  filter(drug == 1) %>% 
  nls(rate ~ Vm * Concentration/(K + Concentration), 
      data = ., 
      start = list(K = max(.$Concentration), Vm = max(.$rate)))

# use minimum as start
df1 %>% 
  filter(drug == 1) %>% 
  nls(rate ~ Vm * Concentration/(K + Concentration), 
      data = ., 
      start = list(K = min(.$Concentration), Vm = min(.$rate)))

# use arbitrary values as start
df1 %>% 
  filter(drug == 1) %>% 
  nls(rate ~ Vm * Concentration/(K + Concentration), 
      data = ., 
      start = list(K = 50, Vm = 2))

So the quickest way to plot the curves is simply to map the drug to a ggplot aesthetic, such as color. This will construct separate nls curves from the same start values and you can then go back to nls() if required to get the coefficients, knowing that the models should be the same as the plot.

Using your example data file (but don't call it file, I used df1):

library(ggplot2)
df1 <- structure(list(Concentration = c(500, 250, 100, 62.5, 50, 25, 12.5, 5, 
                                        500, 250, 100, 62.5, 50, 25, 12.5, 5), 
                      drug = c(1, 1, 1, 1, 1, 1, 1, 1, 
                               2, 2, 2, 2, 2, 2, 2, 2), 
                      rate = c(1.88922, 1.4265, 0.86472, 0.66221, 0.56434, 0.34314, 
                               0.18112, 0.07717, 3.995055, 3.0118, 1.824505, 1.397237, 
                               1.190078, 0.723637, 0.381865, 0.162771)),
                      .Names = c("Concentration", "drug", "rate"), 
                      row.names = c(NA, -16L), 
                      class = "data.frame")

# could use e.g. Km = min(df1$Concentration) for start
# but here we use arbitrary values
ggplot(df1, aes(Concentration, rate)) + 
  geom_point() + 
  geom_smooth(method = "nls", 
              method.args = list(formula = y ~ Vmax * x / (Km + x),
                                 start = list(Km = 50, Vmax = 2)), 
              data = df1,
              se = FALSE,
              aes(color = factor(drug)))

Sample Image

Method to extract stat_smooth line fit

stat_smooth does produce output that you can use elsewhere, and with a slightly hacky way, you can put it into a variable in the global environment.

You enclose the output variable in .. on either side to use it. So if you add an aes in the stat_smooth call and use the global assign, <<-, to assign the output to a varible in the global environment you can get the the fitted values, or others - see below.

qplot(hp,wt,data=mtcars) + stat_smooth(aes(outfit=fit<<-..y..))
fit
 [1] 1.993594 2.039986 2.087067 2.134889 2.183533 2.232867 2.282897 2.333626
 [9] 2.385059 2.437200 2.490053 2.543622 2.597911 2.652852 2.708104 2.764156
[17] 2.821771 2.888224 2.968745 3.049545 3.115893 3.156368 3.175495 3.181411
[25] 3.182252 3.186155 3.201258 3.235698 3.291766 3.353259 3.418409 3.487074
[33] 3.559111 3.634377 3.712729 3.813399 3.910849 3.977051 4.037302 4.091635
[41] 4.140082 4.182676 4.219447 4.250429 4.275654 4.295154 4.308961 4.317108
[49] 4.319626 4.316548 4.308435 4.302276 4.297902 4.292303 4.282505 4.269040
[57] 4.253361 4.235474 4.215385 4.193098 4.168621 4.141957 4.113114 4.082096
[65] 4.048910 4.013560 3.976052 3.936392 3.894586 3.850639 3.804557 3.756345
[73] 3.706009 3.653554 3.598987 3.542313 3.483536 3.422664 3.359701 3.294654

The outputs you can obtain are:

y, predicted value
ymin, lower pointwise confidence interval around
the mean
ymax, upper pointwise confidence interval around the mean
se, standard error

Note that by default it predicts on 80 data points, which may not be aligned with your original data.

ggplot2: how to get values for the regression line equation, r^2 and p value?

Don't use a plotting function for modelling. Fit the model using the lm function.

Then use the summary method to get everything you need to know about the fit.

You should get the same results as the plotting function, which I suspect uses lm internally.

How get plot from nls in R?

Using the first example from ?nls and following the example I pointed you to line by line achieves the following:

#This is just our data frame
DNase1 <- subset(DNase, Run == 1)
DNase1$lconc <- log(DNase1$conc)
#Fit the model
fm1DNase1 <- nls(density ~ SSlogis(lconc, Asym, xmid, scal), DNase1)

#Plot the original points
# first argument is the x values, second is the y values
plot(DNase1$lconc,DNase1$density)

#This adds to the already created plot a line
# once again, first argument is x values, second is y values
lines(DNase1$lconc,predict(fm1DNase1))

The predict method for a nls argument is automatically returning the fitted y values. Alternatively, you add a step and do

yFitted <- predict(fm1DNase1)

and pass yFitted in the second argument to lines instead. The result looks like this:

Sample Image

Or if you want a "smooth" curve, what you do is to simply repeat this but evaluate the function at more points:

r <- range(DNase1$lconc)
xNew <- seq(r[1],r[2],length.out = 200)
yNew <- predict(fm1DNase1,list(lconc = xNew))

plot(DNase1$lconc,DNase1$density)
lines(xNew,yNew)

R expression from `nls` fit?

Is this what you're looking for?

do.call(substitute, args=list(formula(mod), as.list(round(coef(mod),4))))
# m ~ 13.0097 * exp(-0.0501 * x^2) + 0.1536

It works because do.call first evaluates both of the arguments in args and only then uses substitute() to substitute the coefficients into the formula expression. i.e., the expression that do.call() ultimately evaluates looks like this one, as desired:

as.call(list(substitute, formula(mod), as.list(round(coef(mod),4))))
# .Primitive("substitute")(m ~ a * exp(-b * x^2) + c, list(a = 13.0097, 
# b = 0.0501, c = 0.1536))

trying to display original and fitted data (nls + dnorm) with ggplot2's geom_smooth()

the first error indicates that ggplot2 cannot find the variable 'count', which is used in formula, in data.

Stats take place after mapping, that is, size -> x, and counts -> y.

Here is an example for using nls in geom_smooth:

ggplot(data=myhist, aes(x=size, y=counts)) + geom_point() + 
  geom_smooth(method="nls", formula = y ~ N * dnorm(x, m, s), se=F, 
              start=list(m=20, s=5, N=300))

The point is that using x and y, instead of size and counts, in the specification of formula.

Error with pred$fit using nls in ggplot2

Your question is answered in this question on the ggplot2 mailing list. Briefly,

According to the documentation for predict.nls, it is unable to create
standard errors for the predictions, so that has to be turned off in
the stat_smooth call. .

So, we need to turn off the standard errors:

ggplot(df, aes(x=Mass, y=Solv)) +
  stat_smooth(method="nls", formula=y~i*x^z, se=FALSE, 
              start=list(i=1,z=0.2)) +
  geom_point(shape=1)

Update 2019: for new versions of ggplot2, we need the start argument to nls to be passed like this:

ggplot(df, aes(x = Mass, y = Solv)) +
  stat_smooth(method = "nls", 
              se = FALSE,
              method.args = list( 
                formula = y ~ i*x^z, 
                start = list(i=1, z=2)
              )) +
  geom_point()

Extract Coefficients from Ggplot2-Created Nls Fit