Can we neatly align the regression equation and R2 and p value?
I have updated 'ggpmisc' to make this easy. Version 0.3.4 is now on its way to CRAN, source package is on-line, binaries should be built in a few days' time.
library(ggpmisc) # version >= 0.3.4 !!
ggplot(mtcars, aes(x = wt, y = mpg, group = cyl)) +
geom_smooth(method="lm")+
geom_point()+
stat_poly_eq(formula = y ~ x,
aes(label = paste(..eq.label.., ..rr.label.., ..p.value.label.., sep = "*`,`~")),
parse = TRUE,
label.x.npc = "right",
vstep = 0.05) # sets vertical spacing
Displaying regression lines based on P-value
you can try a tidyverse
library(tidyverse)
library(ggpmisc)
library(broom)
the idea is to calculate the pvalues beforehand using tidyr
's nest
, purrr
'smap
as well as boom
's tidy
function. The resulting pvalue is added to the dataframe.
df %>%
as_tibble() %>% # optional
nest(data =-days) %>% # calculate p values
mutate(p=map(data, ~lm(acetone~bacteria, data= .) %>%
broom::tidy() %>%
slice(2) %>%
pull(p.value))) %>%
unnest(p) # to show the pvalue output
# A tibble: 4 x 3
days data p
<dbl> <list> <dbl>
1 0 <tibble [48 x 4]> 0.955
2 10 <tibble [48 x 4]> 0.746
3 24 <tibble [48 x 4]> 0.475
4 94 <tibble [44 x 4]> 0.0152
Finally, the plot. The data is filtered for p<0.2 in the respective geoms.
df %>%
as_tibble() %>% # optional
nest(data =-days) %>% # calculate p values
mutate(p=map(data, ~lm(acetone~bacteria, data= .) %>%
broom::tidy() %>%
slice(2) %>%
pull(p.value))) %>%
unnest(cols = c(data, p)) %>%
ggplot(aes(bacteria, acetone)) +
geom_point(aes(shape=soil_type, color=soil_type, size=soil_type, fill=soil_type)) +
facet_wrap(~days, ncol = 4, scales = "free") +
geom_smooth(data = . %>% group_by(days) %>% filter(any(p<0.2)), method = "lm", formula = y~x, color="black") +
stat_poly_eq(data = . %>% group_by(days) %>% filter(any(p<0.2)),
aes(label = paste(stat(adj.rr.label),
stat(p.value.label),
sep = "*\", \"*")),
formula = formula,
rr.digits = 1,
p.digits = 1,
parse = TRUE,size=3.5) +
scale_fill_manual(values=c("#00AFBB", "brown")) +
scale_color_manual(values=c("black", "black")) +
scale_shape_manual(values=c(21, 24))+
scale_size_manual(values=c(2.4, 1.7))+
labs(shape="Soil type", color="Soil type", size="Soil type", fill="Soil type") +
theme_bw()
ggplot2: add p-values to the plot
Use stat_fit_glance
which is part of the ggpmisc
package in R. This package is an extension of ggplot2
so it works well with it.
ggplot(df, aes(x= new_price, y= carat, color = cut)) +
geom_point(alpha = 0.3) +
facet_wrap(~clarity, scales = "free_y") +
geom_smooth(method = "lm", formula = formula, se = F) +
stat_poly_eq(aes(label = paste(..rr.label..)),
label.x.npc = "right", label.y.npc = 0.15,
formula = formula, parse = TRUE, size = 3)+
stat_fit_glance(method = 'lm',
method.args = list(formula = formula),
geom = 'text',
aes(label = paste("P-value = ", signif(..p.value.., digits = 4), sep = "")),
label.x.npc = 'right', label.y.npc = 0.35, size = 3)
stat_fit_glance
basically takes anything passed through lm()
in R and allows it to processed and printed using ggplot2
. The user-guide has the rundown of some of the functions like stat_fit_glance
: https://cran.r-project.org/web/packages/ggpmisc/vignettes/user-guide.html. Also I believe this gives model p-value, not slope p-value (in general), which would be different for multiple linear regression. For simple linear regression they should be the same though.
Here is the plot:
Is there a neat approach to label a ggplot plot with the equation and other statistics from geom_quantile()?
Package 'ggpmisc' (>= 0.4.5) allows a much simpler answer, which is closer to the solution hoped for by @MarkNeal in his question about median regression. This answer should be preferred to earlier ones when using a recent version of 'ggpmisc'. Not shown: passing se = FALSE
to stat_quant_line()
disables the confidence band.
library(ggplot2)
library(ggpmisc)
#> Loading required package: ggpp
#>
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#>
#> annotate
m <- ggplot(mpg, aes(displ, 1 / hwy)) +
geom_point()
m +
stat_quant_line(quantiles = 0.5) +
stat_quant_eq(aes(label = paste(after_stat(eq.label), "*\" with \"*",
after_stat(rho.label), "*\", \"*",
after_stat(n.label), "*\".\"",
sep = "")),
quantiles = 0.5,
size = 3)
#> Warning in rq.fit.br(x, y, tau = tau, ci = TRUE, ...): Solution may be nonunique
Created on 2022-06-03 by the reprex package (v2.0.1)
The default is to plot the median and quartiles.
m +
stat_quant_line() +
stat_quant_eq(aes(label = paste(after_stat(eq.label), "*\" with \"*",
after_stat(rho.label), "*\", \"*",
after_stat(n.label), "*\".\"",
sep = "")),
size = 3)
#> Warning in rq.fit.br(x, y, tau = tau, ci = TRUE, ...): Solution may be nonunique
Created on 2022-06-03 by the reprex package (v2.0.1)
We can also map the quantiles to color
and linetype
aesthetics easily.
m +
stat_quant_line(aes(linetype = after_stat(quantile.f),
color = after_stat(quantile.f))) +
stat_quant_eq(aes(label = paste(after_stat(eq.label), "*\" with \"*",
after_stat(rho.label), "*\", \"*",
after_stat(n.label), "*\".\"",
sep = ""),
color = after_stat(quantile.f)),
size = 3)
#> Warning in rq.fit.br(x, y, tau = tau, ci = TRUE, ...): Solution may be nonunique
Created on 2022-06-03 by the reprex package (v2.0.1)
We can also plot the quartiles as a band by using stat_quant_band()
instead of stat_quant_line()
.
m +
stat_quant_band() +
stat_quant_eq(aes(label = paste(after_stat(eq.label), "*\" with \"*",
after_stat(rho.label), "*\", \"*",
after_stat(n.label), "*\".\"",
sep = "")),
size = 3)
#> Warning in rq.fit.br(x, y, tau = tau, ci = TRUE, ...): Solution may be nonunique
Created on 2022-06-03 by the reprex package (v2.0.1)
ggplot: Adding Regression Line Equation and R2 with Facet
Here is an example starting from this answer
require(ggplot2)
require(plyr)
df <- data.frame(x = c(1:100))
df$y <- 2 + 3 * df$x + rnorm(100, sd = 40)
lm_eqn = function(df){
m = lm(y ~ x, df);
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2),
r2 = format(summary(m)$r.squared, digits = 3)))
as.character(as.expression(eq));
}
Create two groups on which you want to facet
df$group <- c(rep(1:2,50))
Create the equation labels for the two groups
eq <- ddply(df,.(group),lm_eqn)
And plot
p <- ggplot(data = df, aes(x = x, y = y)) +
geom_smooth(method = "lm", se=FALSE, color="black", formula = y ~ x) +
geom_point()
p1 = p + geom_text(data=eq,aes(x = 25, y = 300,label=V1), parse = TRUE, inherit.aes=FALSE) + facet_grid(group~.)
p1
removing the intercept from regression line equation from ggplot using stat_reg_line() function
You can use stat_fit_tidy
from the ggpmisc package:
df <- data.frame(x = c(1:100))
df$y <- 20 * c(0, 1) + 3 * df$x + rnorm(100, sd = 40)
df$group <- factor(rep(c("A", "B"), 50))
library(ggpmisc)
my_formula <- y ~ x
ggplot(df, aes(x = x, y = y, colour = group)) +
geom_point() +
geom_smooth(method = "lm", formula = my_formula, se = FALSE) +
stat_fit_tidy(
method = "lm",
method.args = list(formula = my_formula),
mapping = aes(label = sprintf('slope~"="~%.3g',
after_stat(x_estimate))),
parse = TRUE)
EDIT
If you want the R squared as well:
ggplot(df, aes(x = x, y = y, colour = group)) +
geom_point() +
geom_smooth(method = "lm", formula = my_formula, se = FALSE) +
stat_fit_tidy(
method = "lm",
method.args = list(formula = my_formula),
mapping = aes(label = sprintf('slope~"="~%.3g',
after_stat(x_estimate))),
parse = TRUE) +
stat_poly_eq(formula = my_formula,
aes(label = ..rr.label..),
parse = TRUE,
label.x = 0.6)
EDIT
Another way:
myformat <- "Slope: %s --- R²: %s"
ggplot(df, aes(x, y, colour = group)) +
geom_point() +
geom_smooth(method = "lm", formula = my_formula, se = FALSE) +
stat_poly_eq(
formula = my_formula, output.type = "numeric",
mapping = aes(label =
sprintf(myformat,
formatC(stat(coef.ls)[[1]][[2, "Estimate"]]),
formatC(stat(r.squared)))),
vstep = 0.1
)
Using `round` or `sprintf` function for Regression equation in ggpmisc and `dev=tikz`
1) The code below answers the dev="tikz"
part of the question if used with the 'ggpmisc' (version >= 0.2.9)
\documentclass{article}
\begin{document}
<<setup, include=FALSE, cache=FALSE>>=
library(knitr)
opts_chunk$set(fig.path = 'figure/pos-', fig.align = 'center', fig.show = 'hold',
fig.width = 7, fig.height = 6, size = "footnotesize", dev="tikz")
@
<<>>=
library(ggplot2)
library(ggpmisc)
@
<<>>=
# generate artificial data
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x,
y,
group = c("A", "B"),
y2 = y * c(0.5,2),
block = c("a", "a", "b", "b"))
str(my.data)
@
<<>>=
# plot
ggplot(data = my.data, mapping=aes(x = x, y = y2, colour = group)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE,
formula = y ~ poly(x=x, degree = 2, raw = TRUE)) +
stat_poly_eq(
mapping = aes(label = paste("$", ..eq.label.., "$\\ \\ \\ $",
..rr.label.., "$", sep = ""))
, geom = "text"
, formula = y ~ poly(x, 2, raw = TRUE)
, eq.with.lhs = "\\hat{Y} = "
, output.type = "LaTeX"
) +
theme_bw()
@
\end{document}
Thanks for suggesting this enhancement, I will surely also find a use for it myself!
2) Answer to the round
and sprintf
part of the question. You cannot use round
or sprintf
to change the number of digits, stat_poly_eq
currently uses signif
with three significant digits as argument applied to the whole vector of coefficients. If you want full control then you could use another statistics, stat_fit_glance
, that is also in ggpmisc
(>= 0.2.8), which uses broom:glance
internally. It is much more flexible, but you will have to take care of all the formating by yourself within the call to aes
. At the moment there is one catch, broom::glance
does not seem to work correctly with poly
, you will need to explicitly write the polynomial equation to pass as argument to formula
.
How can I run a exponential regression in R with an annotated regression equation in ggplot?
A few comments:
- In your exponential data example,
geom_smooth
without additional arguments fits a LOESS model. So this is probably not what you want. - Note that your exponential data is linear on a log-scale.
Now as to how to fit a model: We can fit a linear model to the log-transformed data.
# Fit a linear model
fit <- lm(log(y2) ~ log(x2), data = df2)
# Create `data.frame` with predictions
df_predict <- data.frame(x2 = seq(min(df2$x2), max(df2$x2), length.out = 1000))
df_predict$y2_pred = exp(predict(fit, newdata = df_predict))
# Plot
ggplot(df2, aes(x = x2, y = y2)) +
geom_point() +
geom_line(data = df_predict, aes(y = y2_pred))
The coefficients of fit
are
summary(fit)$coefficients
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) -30.15675 7.152374e-16 -4.216327e+16 0
#log(x2) 8.00000 2.848167e-16 2.808824e+16 0
#Warning message:
# In summary.lm(fit) : essentially perfect fit: summary may be unreliable
Note the warning which is due to the way that you generated data from dexp
(without any errors).
Also note the slope estimate (on the log scale) of 8.0, which is just the ratio of your two dexp
rate parameters 0.8/0.1.
Related Topics
Drawing a Tangent to the Plot and Finding the X-Intercept Using R
Changing Styles When Selecting and Deselecting Multiple Polygons with Leaflet/Shiny
Remove Weekend Data in a Dataframe
R, Conditionally Remove Duplicate Rows
Why Does Lm Run Out of Memory While Matrix Multiplication Works Fine for Coefficients
Specify Position of Geom_Text by Keywords Like "Top", "Bottom", "Left", "Right", "Center"
Loop for Reverse Geocoding in R
Subtract Pairs of Columns Based on Matching Column
Ggplot2: Plotting Order of Factors Within a Geom
How to Plot Igraph Community with Defined Colors
R: Saving Ggplot2 Plots in a List
Flexdashboard - Change Title Bar Color
2 Knitr/R Markdown/Rstudio Issues: Highcharts and Morris.Js
Ggplot Boxplot - Length of Whiskers with Logarithmic Axis
How to Get Outliers for All the Columns in a Dataframe in R