How to Neatly Align the Regression Equation and R2 and P Value

Can we neatly align the regression equation and R2 and p value?

I have updated 'ggpmisc' to make this easy. Version 0.3.4 is now on its way to CRAN, source package is on-line, binaries should be built in a few days' time.

library(ggpmisc) # version >= 0.3.4 !!

ggplot(mtcars, aes(x = wt, y = mpg, group = cyl)) +
geom_smooth(method="lm")+
geom_point()+
stat_poly_eq(formula = y ~ x,
aes(label = paste(..eq.label.., ..rr.label.., ..p.value.label.., sep = "*`,`~")),
parse = TRUE,
label.x.npc = "right",
vstep = 0.05) # sets vertical spacing

Sample Image

Displaying regression lines based on P-value

you can try a tidyverse

library(tidyverse)
library(ggpmisc)
library(broom)

the idea is to calculate the pvalues beforehand using tidyr's nest, purrr'smap as well as boom's tidy function. The resulting pvalue is added to the dataframe.

df %>% 
as_tibble() %>% # optional
nest(data =-days) %>% # calculate p values
mutate(p=map(data, ~lm(acetone~bacteria, data= .) %>%
broom::tidy() %>%
slice(2) %>%
pull(p.value))) %>%
unnest(p) # to show the pvalue output
# A tibble: 4 x 3
days data p
<dbl> <list> <dbl>
1 0 <tibble [48 x 4]> 0.955
2 10 <tibble [48 x 4]> 0.746
3 24 <tibble [48 x 4]> 0.475
4 94 <tibble [44 x 4]> 0.0152

Finally, the plot. The data is filtered for p<0.2 in the respective geoms.

df %>% 
as_tibble() %>% # optional
nest(data =-days) %>% # calculate p values
mutate(p=map(data, ~lm(acetone~bacteria, data= .) %>%
broom::tidy() %>%
slice(2) %>%
pull(p.value))) %>%
unnest(cols = c(data, p)) %>%
ggplot(aes(bacteria, acetone)) +
geom_point(aes(shape=soil_type, color=soil_type, size=soil_type, fill=soil_type)) +
facet_wrap(~days, ncol = 4, scales = "free") +
geom_smooth(data = . %>% group_by(days) %>% filter(any(p<0.2)), method = "lm", formula = y~x, color="black") +
stat_poly_eq(data = . %>% group_by(days) %>% filter(any(p<0.2)),
aes(label = paste(stat(adj.rr.label),
stat(p.value.label),
sep = "*\", \"*")),
formula = formula,
rr.digits = 1,
p.digits = 1,
parse = TRUE,size=3.5) +
scale_fill_manual(values=c("#00AFBB", "brown")) +
scale_color_manual(values=c("black", "black")) +
scale_shape_manual(values=c(21, 24))+
scale_size_manual(values=c(2.4, 1.7))+
labs(shape="Soil type", color="Soil type", size="Soil type", fill="Soil type") +
theme_bw()

Sample Image

ggplot2: add p-values to the plot

Use stat_fit_glance which is part of the ggpmisc package in R. This package is an extension of ggplot2 so it works well with it.

ggplot(df, aes(x= new_price, y= carat, color = cut)) +
geom_point(alpha = 0.3) +
facet_wrap(~clarity, scales = "free_y") +
geom_smooth(method = "lm", formula = formula, se = F) +
stat_poly_eq(aes(label = paste(..rr.label..)),
label.x.npc = "right", label.y.npc = 0.15,
formula = formula, parse = TRUE, size = 3)+
stat_fit_glance(method = 'lm',
method.args = list(formula = formula),
geom = 'text',
aes(label = paste("P-value = ", signif(..p.value.., digits = 4), sep = "")),
label.x.npc = 'right', label.y.npc = 0.35, size = 3)

stat_fit_glance basically takes anything passed through lm() in R and allows it to processed and printed using ggplot2. The user-guide has the rundown of some of the functions like stat_fit_glance: https://cran.r-project.org/web/packages/ggpmisc/vignettes/user-guide.html. Also I believe this gives model p-value, not slope p-value (in general), which would be different for multiple linear regression. For simple linear regression they should be the same though.

Here is the plot:

Sample Image

Is there a neat approach to label a ggplot plot with the equation and other statistics from geom_quantile()?

Package 'ggpmisc' (>= 0.4.5) allows a much simpler answer, which is closer to the solution hoped for by @MarkNeal in his question about median regression. This answer should be preferred to earlier ones when using a recent version of 'ggpmisc'. Not shown: passing se = FALSE to stat_quant_line() disables the confidence band.

library(ggplot2)
library(ggpmisc)
#> Loading required package: ggpp
#>
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#>
#> annotate

m <- ggplot(mpg, aes(displ, 1 / hwy)) +
geom_point()

m +
stat_quant_line(quantiles = 0.5) +
stat_quant_eq(aes(label = paste(after_stat(eq.label), "*\" with \"*",
after_stat(rho.label), "*\", \"*",
after_stat(n.label), "*\".\"",
sep = "")),
quantiles = 0.5,
size = 3)
#> Warning in rq.fit.br(x, y, tau = tau, ci = TRUE, ...): Solution may be nonunique

Sample Image

Created on 2022-06-03 by the reprex package (v2.0.1)

The default is to plot the median and quartiles.

m + 
stat_quant_line() +
stat_quant_eq(aes(label = paste(after_stat(eq.label), "*\" with \"*",
after_stat(rho.label), "*\", \"*",
after_stat(n.label), "*\".\"",
sep = "")),
size = 3)
#> Warning in rq.fit.br(x, y, tau = tau, ci = TRUE, ...): Solution may be nonunique

Sample Image

Created on 2022-06-03 by the reprex package (v2.0.1)

We can also map the quantiles to color and linetype aesthetics easily.

m + 
stat_quant_line(aes(linetype = after_stat(quantile.f),
color = after_stat(quantile.f))) +
stat_quant_eq(aes(label = paste(after_stat(eq.label), "*\" with \"*",
after_stat(rho.label), "*\", \"*",
after_stat(n.label), "*\".\"",
sep = ""),
color = after_stat(quantile.f)),
size = 3)
#> Warning in rq.fit.br(x, y, tau = tau, ci = TRUE, ...): Solution may be nonunique

Sample Image

Created on 2022-06-03 by the reprex package (v2.0.1)

We can also plot the quartiles as a band by using stat_quant_band() instead of stat_quant_line().

m + 
stat_quant_band() +
stat_quant_eq(aes(label = paste(after_stat(eq.label), "*\" with \"*",
after_stat(rho.label), "*\", \"*",
after_stat(n.label), "*\".\"",
sep = "")),
size = 3)
#> Warning in rq.fit.br(x, y, tau = tau, ci = TRUE, ...): Solution may be nonunique

Sample Image

Created on 2022-06-03 by the reprex package (v2.0.1)

ggplot: Adding Regression Line Equation and R2 with Facet

Here is an example starting from this answer

require(ggplot2)
require(plyr)

df <- data.frame(x = c(1:100))
df$y <- 2 + 3 * df$x + rnorm(100, sd = 40)

lm_eqn = function(df){
m = lm(y ~ x, df);
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2),
r2 = format(summary(m)$r.squared, digits = 3)))
as.character(as.expression(eq));
}

Create two groups on which you want to facet

df$group <- c(rep(1:2,50))

Create the equation labels for the two groups

eq <- ddply(df,.(group),lm_eqn)

And plot

p <- ggplot(data = df, aes(x = x, y = y)) +
geom_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) +
geom_point()
p1 = p + geom_text(data=eq,aes(x = 25, y = 300,label=V1), parse = TRUE, inherit.aes=FALSE) + facet_grid(group~.)
p1

Sample Image

removing the intercept from regression line equation from ggplot using stat_reg_line() function

You can use stat_fit_tidy from the ggpmisc package:

df <- data.frame(x = c(1:100))
df$y <- 20 * c(0, 1) + 3 * df$x + rnorm(100, sd = 40)
df$group <- factor(rep(c("A", "B"), 50))

library(ggpmisc)
my_formula <- y ~ x

ggplot(df, aes(x = x, y = y, colour = group)) +
geom_point() +
geom_smooth(method = "lm", formula = my_formula, se = FALSE) +
stat_fit_tidy(
method = "lm",
method.args = list(formula = my_formula),
mapping = aes(label = sprintf('slope~"="~%.3g',
after_stat(x_estimate))),
parse = TRUE)

Sample Image



EDIT

If you want the R squared as well:

ggplot(df, aes(x = x, y = y, colour = group)) +
geom_point() +
geom_smooth(method = "lm", formula = my_formula, se = FALSE) +
stat_fit_tidy(
method = "lm",
method.args = list(formula = my_formula),
mapping = aes(label = sprintf('slope~"="~%.3g',
after_stat(x_estimate))),
parse = TRUE) +
stat_poly_eq(formula = my_formula,
aes(label = ..rr.label..),
parse = TRUE,
label.x = 0.6)

Sample Image



EDIT

Another way:

myformat <- "Slope: %s --- R²: %s"
ggplot(df, aes(x, y, colour = group)) +
geom_point() +
geom_smooth(method = "lm", formula = my_formula, se = FALSE) +
stat_poly_eq(
formula = my_formula, output.type = "numeric",
mapping = aes(label =
sprintf(myformat,
formatC(stat(coef.ls)[[1]][[2, "Estimate"]]),
formatC(stat(r.squared)))),
vstep = 0.1
)

Sample Image

Using `round` or `sprintf` function for Regression equation in ggpmisc and `dev=tikz`

1) The code below answers the dev="tikz" part of the question if used with the 'ggpmisc' (version >= 0.2.9)

\documentclass{article}

\begin{document}

<<setup, include=FALSE, cache=FALSE>>=
library(knitr)
opts_chunk$set(fig.path = 'figure/pos-', fig.align = 'center', fig.show = 'hold',
fig.width = 7, fig.height = 6, size = "footnotesize", dev="tikz")
@

<<>>=
library(ggplot2)
library(ggpmisc)
@

<<>>=
# generate artificial data
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x,
y,
group = c("A", "B"),
y2 = y * c(0.5,2),
block = c("a", "a", "b", "b"))

str(my.data)
@

<<>>=
# plot
ggplot(data = my.data, mapping=aes(x = x, y = y2, colour = group)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE,
formula = y ~ poly(x=x, degree = 2, raw = TRUE)) +
stat_poly_eq(
mapping = aes(label = paste("$", ..eq.label.., "$\\ \\ \\ $",
..rr.label.., "$", sep = ""))
, geom = "text"
, formula = y ~ poly(x, 2, raw = TRUE)
, eq.with.lhs = "\\hat{Y} = "
, output.type = "LaTeX"
) +
theme_bw()
@

\end{document}

Sample Image

Thanks for suggesting this enhancement, I will surely also find a use for it myself!

2) Answer to the roundand sprintf part of the question. You cannot use round or sprintf to change the number of digits, stat_poly_eq currently uses signif with three significant digits as argument applied to the whole vector of coefficients. If you want full control then you could use another statistics, stat_fit_glance, that is also in ggpmisc (>= 0.2.8), which uses broom:glance internally. It is much more flexible, but you will have to take care of all the formating by yourself within the call to aes. At the moment there is one catch, broom::glance does not seem to work correctly with poly, you will need to explicitly write the polynomial equation to pass as argument to formula.

How can I run a exponential regression in R with an annotated regression equation in ggplot?

A few comments:

  1. In your exponential data example, geom_smooth without additional arguments fits a LOESS model. So this is probably not what you want.
  2. Note that your exponential data is linear on a log-scale.

Now as to how to fit a model: We can fit a linear model to the log-transformed data.

# Fit a linear model
fit <- lm(log(y2) ~ log(x2), data = df2)

# Create `data.frame` with predictions
df_predict <- data.frame(x2 = seq(min(df2$x2), max(df2$x2), length.out = 1000))
df_predict$y2_pred = exp(predict(fit, newdata = df_predict))

# Plot
ggplot(df2, aes(x = x2, y = y2)) +
geom_point() +
geom_line(data = df_predict, aes(y = y2_pred))

Sample Image

The coefficients of fit are

summary(fit)$coefficients
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) -30.15675 7.152374e-16 -4.216327e+16 0
#log(x2) 8.00000 2.848167e-16 2.808824e+16 0
#Warning message:
# In summary.lm(fit) : essentially perfect fit: summary may be unreliable

Note the warning which is due to the way that you generated data from dexp (without any errors).

Also note the slope estimate (on the log scale) of 8.0, which is just the ratio of your two dexp rate parameters 0.8/0.1.



Related Topics



Leave a reply



Submit