Showing equation of nls model with ggpmisc
It is also possible to add the equation using package 'ggpmisc', assembling the character string to be parsed with paste()
or sprintf()
. In this answer I will use sprintf()
. I answer the question using the example it included. I do not show this in this answer, but this approach supports grouping and facets. The drawback is that the model is fit twice, once to draw the fitted line and once to add the equation.
To find the names of the variables returned by stat_fit_tidy()
I used geom_debug()
from package 'gginnards', although the names even if dependent on the model formula and method are rather easy to predict. Instead of adding a plot layer geom_debug()
echoes is data
input to the R console. Next, once we know the names of the variables we wish to use in the label, we can assemble the string to be parsed as an R expression.
When assembling the label with sprintf()
one needs to escape the %
characters to be returned unchanged as %%
, so the multiplication sign %*%
becomes %%*%%
. It is possible, and in this case useful to embed character strings in the R expression, but we need to escape the embedded quotation marks as \"
.
library(tidyverse)
library(ggpmisc)
#> Loading required package: ggpp
#>
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#>
#> annotate
library(gginnards)
args <- list(formula = y ~ k * e ^ x,
start = list(k = 1, e = 2))
# we find the names of computed values
ggplot(mtcars, aes(wt, mpg)) +
stat_fit_tidy(method = "nls",
method.args = args,
geom = "debug")
#> Input 'data' to 'draw_panel()':
#> npcx npcy k_estimate e_estimate k_se e_se k_stat e_stat
#> 1 NA NA 49.65969 0.7455911 3.788755 0.01985924 13.10713 37.54378
#> k_p.value e_p.value x y fm.class fm.method fm.formula
#> 1 5.963165e-14 8.861929e-27 1.70855 32.725 nls nls y ~ k * e^x
#> fm.formula.chr PANEL group
#> 1 y ~ k * e^x 1 -1
# plot with formula
ggplot(mtcars, aes(wt, mpg)) +
geom_point() +
stat_fit_augment(method = "nls",
method.args = args) +
stat_fit_tidy(method = "nls",
method.args = args,
label.x = "right",
label.y = "top",
aes(label = sprintf("\"mpg\"~`=`~%.3g %%*%% %.3g^{\"wt\"}",
after_stat(k_estimate),
after_stat(e_estimate))),
parse = TRUE )
Created on 2022-09-02 with reprex v2.0.2
Label ggplot with group names and their equation, possibly with ggpmisc?
This first example puts the label to the right of the equation, and is partly manual. On the other hand it is very simple to code. Why this works is because group
is always present in the data
as seen by layer functions (statistics and geoms).
library(tidyverse)
library(ggpmisc)
df_mtcars <- mtcars %>% mutate(factor_cyl = as.factor(cyl))
my_formula <- y ~ x
p <- ggplot(df_mtcars, aes(x = wt, y = mpg, group = factor_cyl, colour = factor_cyl)) +
geom_smooth(method="lm")+
geom_point()+
stat_poly_eq(formula = my_formula,
label.x = "centre",
eq.with.lhs = "italic(hat(y))~`=`~",
aes(label = paste(stat(eq.label), "*\", \"*",
c("4", "6", "8")[stat(group)],
"~cylinders.", sep = "")),
label.x.npc = "right",
parse = TRUE) +
scale_colour_discrete(guide = FALSE)
p
In fact with a little bit of additional juggling one can achieve almost an answer to the question. We need to add the lhs by pasting it explicitly in aes()
so that we can add also paste text to its left based on a computed variable.
library(tidyverse)
library(ggpmisc)
df_mtcars <- mtcars %>% mutate(factor_cyl = as.factor(cyl))
my_formula <- y ~ x
p <- ggplot(df_mtcars, aes(x = wt, y = mpg, group = factor_cyl, colour = factor_cyl)) +
geom_smooth(method="lm")+
geom_point()+
stat_poly_eq(formula = my_formula,
label.x = "centre",
eq.with.lhs = "",
aes(label = paste("bold(\"", c("4", "6", "8")[stat(group)],
" cylinders: \")*",
"italic(hat(y))~`=`~",
stat(eq.label),
sep = "")),
label.x.npc = "right",
parse = TRUE) +
scale_colour_discrete(guide = FALSE)
p
ggpmisc::stat_poly_eq() does not consider weighting by a variable
Thanks for reporting this! I have created an Issue from your message. This is now fixed in the development version to be submitted before 2018-07-31 to CRAN as 'ggpmisc' 0.3.0.
Using `round` or `sprintf` function for Regression equation in ggpmisc and `dev= tikz `
1) The code below answers the dev="tikz"
part of the question if used with the 'ggpmisc' (version >= 0.2.9)
\documentclass{article}
\begin{document}
<<setup, include=FALSE, cache=FALSE>>=
library(knitr)
opts_chunk$set(fig.path = 'figure/pos-', fig.align = 'center', fig.show = 'hold',
fig.width = 7, fig.height = 6, size = "footnotesize", dev="tikz")
@
<<>>=
library(ggplot2)
library(ggpmisc)
@
<<>>=
# generate artificial data
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x,
y,
group = c("A", "B"),
y2 = y * c(0.5,2),
block = c("a", "a", "b", "b"))
str(my.data)
@
<<>>=
# plot
ggplot(data = my.data, mapping=aes(x = x, y = y2, colour = group)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE,
formula = y ~ poly(x=x, degree = 2, raw = TRUE)) +
stat_poly_eq(
mapping = aes(label = paste("$", ..eq.label.., "$\\ \\ \\ $",
..rr.label.., "$", sep = ""))
, geom = "text"
, formula = y ~ poly(x, 2, raw = TRUE)
, eq.with.lhs = "\\hat{Y} = "
, output.type = "LaTeX"
) +
theme_bw()
@
\end{document}
Thanks for suggesting this enhancement, I will surely also find a use for it myself!
2) Answer to the round
and sprintf
part of the question. You cannot use round
or sprintf
to change the number of digits, stat_poly_eq
currently uses signif
with three significant digits as argument applied to the whole vector of coefficients. If you want full control then you could use another statistics, stat_fit_glance
, that is also in ggpmisc
(>= 0.2.8), which uses broom:glance
internally. It is much more flexible, but you will have to take care of all the formating by yourself within the call to aes
. At the moment there is one catch, broom::glance
does not seem to work correctly with poly
, you will need to explicitly write the polynomial equation to pass as argument to formula
.
R package ggpmisc: Putting hat on y in Regression Equation
I would turn off the default value for y
that is pasted in and build your own formula. For example
ggplot(data = df, aes(x = x1, y = y1)) +
geom_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) +
stat_poly_eq(formula = y ~ x, eq.with.lhs=FALSE,
aes(label = paste("hat(italic(y))","~`=`~",..eq.label..,"~~~", ..rr.label.., sep = "")),
parse = TRUE) +
geom_point()
We use eq.with.lhs=FALSE
to turn off the automatic inclusion of y=
and then we paste()
the hat(y)
on to the front (with the equals sign). Note that the formatting comes from the ?plotmath
help page.
How to fit non-linear function to data in ggplot2 using maximum likelihood model in R?
A few things:
- you need to use
y
andx
as the variable names in theformula
argument togeom_smooth
, regardless of what the names are in your data set - you need better starting values (see below)
- there's a GLM trick you can use to fit this model; doesn't always work (can be numerically unstable), but it doesn't need starting values and will work more often than
nls()
- I don't think
lm()
andstat_poly_eq()
are going to work as expected (or maybe at all) with a nonlinear formula ...
simulate data
(same as your code but using set.seed()
- probably not important here but good practice)
set.seed(101)
x.test <- runif(50,2,8)
y.test <- 0.5^(x.test)
df <- data.frame(x.test, y.test)
attempt nls fit with your starting values
It's usually a good idea to troubleshoot by fitting any smoothing terms outside of ggplot2
, so you have fewer layers to dig through to find the problems:
nls(y.test ~ lambda/(1+ aii*x.test),
start = list(lambda=1000,aii=-816.39),
data = df)
Error in nls(y.test ~ lambda/(1 + aii * x.test), start = list(lambda = 1000, :
singular gradient
OK, still doesn't work. Let's use glm()
to get better starting values: we use an inverse-link GLM:
1/y = b0 + b1*x
y = 1/(b0 + b1*x)
= (1/b0)/(1 + (b1/b0)*x)
So:
g1 <- glm(y.test ~ x.test, family = gaussian(link = "inverse"))
s0 <- with(as.list(coef(g1)), list(lambda = 1/`(Intercept)`, aii = x.test/`(Intercept)`))
This gives lambda = -0.09, aii = -0.638 (with a little bit more work we could probably also figure out how to eyeball these by looking at the starting point and scale of the curve).
ggplot(data = df, aes(x=x.test,y=y.test)) +
geom_point(shape=21, fill="white", color="red", size=3) +
stat_smooth(method="nls",
formula = y ~ lambda/ (1 + aii*x),
method.args=list(start=s0),
se=FALSE,color="red") +
stat_smooth(method = "glm",
formula = y ~ x,
method.args = list(gaussian(link = "inverse")),
color = "blue", linetype = 2)
Coefficients per facet with output.type= numeric in ggpmisc::stat_poly_eq
Ok, I figured it out.
ggplot(my.data, aes(x, y)) +
facet_wrap(~ group) +
geom_point() +
geom_smooth(method = "lm", formula = formula) +
stat_poly_eq(
formula = formula, output.type = "numeric",
mapping = aes(label =
sprintf(myformat,
c(formatC(stat(coef.ls)[[1]][[1, "Estimate"]]),
formatC(stat(coef.ls)[[2]][[1, "Estimate"]])),
c(formatC(stat(coef.ls)[[1]][[2, "Estimate"]]),
formatC(stat(coef.ls)[[2]][[2, "Estimate"]])),
formatC(stat(r.squared)))))
Related Topics
Using Geom_Rect for Time Series Shading in R
Automate Zip File Reading in R
Ggplot2: Splitting Facet/Strip Text into Two Lines
How to Plot a Normal Distribution by Labeling Specific Parts of the X-Axis
Use Loop to Generate Section of Text in Rmarkdown
Dplyr Without Hard-Coding the Variable Names
R Fast Single Item Lookup from List VS Data.Table VS Hash
How to Paste Together the Elements of a Vector in R Without Using a Loop
What Does the Error "Arguments Imply Differing Number of Rows: X, Y" Mean
What Are the Differences Between Concatenating Strings with Cat() and Paste()
Convert Hours:Minutes:Seconds to Minutes
"Adding Missing Grouping Variables" Message in Dplyr in R
How to Create a New Variable in a Data.Frame Based on a Condition
Gsub in R with Unicode Replacement Give Different Results Under Windows Compared with Unix