R Print Equation of Linear Regression on the Plot Itself

R print equation of linear regression on the plot itself

I tried to automate the output a bit:

fit <- lm(mpg ~ cyl + hp, data = mtcars)
summary(fit)
##Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.90833 2.19080 16.847 < 2e-16 ***
## cyl -2.26469 0.57589 -3.933 0.00048 ***
## hp -0.01912 0.01500 -1.275 0.21253

plot(mpg ~ cyl, data = mtcars, xlab = "Cylinders", ylab = "Miles per gallon")
abline(coef(fit)[1:2])

## rounded coefficients for better output
cf <- round(coef(fit), 2)

## sign check to avoid having plus followed by minus for negative coefficients
eq <- paste0("mpg = ", cf[1],
ifelse(sign(cf[2])==1, " + ", " - "), abs(cf[2]), " cyl ",
ifelse(sign(cf[3])==1, " + ", " - "), abs(cf[3]), " hp")

## printing of the equation
mtext(eq, 3, line=-2)

Sample Image

Hope it helps,

alex

How do I add a regression line equation to plot?

Using the data set cars

m <- lm(dist ~ speed, data = cars)

plot(dist ~ speed,data = cars)

abline(m, col = "red")

Here we use summary() to extract the R-squared and P-value.

m1 <-  summary(m)

mtext(paste0("R squared: ",round(m1$r.squared,2)),adj = 0)

mtext(paste0("P-value: ", format.pval(pf(m1$fstatistic[1], # F-statistic
m1$fstatistic[2], # df
m1$fstatistic[3], # df
lower.tail = FALSE))))

mtext(paste0("dist ~ ",round(m1$coefficients[1],2)," + ",
round(m1$coefficients[2],2),"x"),
adj = 1)

Linear equation on plot

Take the built-in dataset iris for example:

model <- lm(Sepal.Length ~ Petal.Length, data = iris)
modsum <- summary(model)
r2 <- modsum$r.squared
b <- round(coef(model), 3)
rp <- substitute(atop(italic(R)^2 == MYVALUE, y == b1~x + b0),
list(MYVALUE = round(r2, 4), b0 = b[1], b1 = b[2]))
plot(Sepal.Length ~ Petal.Length, iris)
abline(model, col = 2)
legend("bottomright", legend = rp, bty = 'n', cex = 1.5)

Displaying regression lines based on P-value

you can try a tidyverse

library(tidyverse)
library(ggpmisc)
library(broom)

the idea is to calculate the pvalues beforehand using tidyr's nest, purrr'smap as well as boom's tidy function. The resulting pvalue is added to the dataframe.

df %>% 
as_tibble() %>% # optional
nest(data =-days) %>% # calculate p values
mutate(p=map(data, ~lm(acetone~bacteria, data= .) %>%
broom::tidy() %>%
slice(2) %>%
pull(p.value))) %>%
unnest(p) # to show the pvalue output
# A tibble: 4 x 3
days data p
<dbl> <list> <dbl>
1 0 <tibble [48 x 4]> 0.955
2 10 <tibble [48 x 4]> 0.746
3 24 <tibble [48 x 4]> 0.475
4 94 <tibble [44 x 4]> 0.0152

Finally, the plot. The data is filtered for p<0.2 in the respective geoms.

df %>% 
as_tibble() %>% # optional
nest(data =-days) %>% # calculate p values
mutate(p=map(data, ~lm(acetone~bacteria, data= .) %>%
broom::tidy() %>%
slice(2) %>%
pull(p.value))) %>%
unnest(cols = c(data, p)) %>%
ggplot(aes(bacteria, acetone)) +
geom_point(aes(shape=soil_type, color=soil_type, size=soil_type, fill=soil_type)) +
facet_wrap(~days, ncol = 4, scales = "free") +
geom_smooth(data = . %>% group_by(days) %>% filter(any(p<0.2)), method = "lm", formula = y~x, color="black") +
stat_poly_eq(data = . %>% group_by(days) %>% filter(any(p<0.2)),
aes(label = paste(stat(adj.rr.label),
stat(p.value.label),
sep = "*\", \"*")),
formula = formula,
rr.digits = 1,
p.digits = 1,
parse = TRUE,size=3.5) +
scale_fill_manual(values=c("#00AFBB", "brown")) +
scale_color_manual(values=c("black", "black")) +
scale_shape_manual(values=c(21, 24))+
scale_size_manual(values=c(2.4, 1.7))+
labs(shape="Soil type", color="Soil type", size="Soil type", fill="Soil type") +
theme_bw()

Sample Image

Trying to graph different linear regression models with ggplot and equation labels

If you are regressing Y on both X and Z, and these are both numerical variables (as they are in your example) then a simple linear regression represents a 2D plane in 3D space, not a line in 2D space. Adding an interaction term means that your regression represents a curved surface in a 3D space. This can be difficult to represent in a simple plot, though there are some ways to do it : the colored lines in the smoking / cycling example you show are slices through the regression plane at various (aribtrary) values of the Z variable, which is a reasonable way to display this type of model.

Although ggplot has some great shortcuts for plotting simple models, I find people often tie themselves in knots because they try to do all their modelling inside ggplot. The best thing to do when you have a more complex model to plot is work out what exactly you want to plot using the right tools for the job, then plot it with ggplot.

For example, if you make a prediction data frame for your interaction model:

model2 <- lm(Y ~ X * Z, data = hw_data)

predictions <- expand.grid(X = seq(min(hw_data$X), max(hw_data$X), length.out = 5),
Z = seq(min(hw_data$Z), max(hw_data$Z), length.out = 5))

predictions$Y <- predict(model2, newdata = predictions)

Then you can plot your interaction model very simply:

ggplot(hw_data, aes(X, Y)) + 
geom_point() +
geom_line(data = predictions, aes(color = factor(Z))) +
labs(color = "Z")

Sample Image

You can easily work out the formula from the coefficients table and stick it together with paste:

labs <- trimws(format(coef(model2), digits = 2))
form <- paste("Y =", labs[1], "+", labs[2], "* x +",
labs[3], "* Z + (", labs[4], " * X * Z)")
form
#> [1] "Y = -69.07 + 5.58 * x + 2.00 * Z + ( -0.13 * X * Z)"

This can be added as an annotation to your plot using geom_text or annotation


Update

A complete solution if you wanted to have only 3 levels for Z, effectively "high", "medium" and "low", you could do something like:

library(ggplot2)

model2 <- lm(Y ~ X * Z, data = hw_data)

predictions <- expand.grid(X = quantile(hw_data$X, c(0, 0.5, 1)),
Z = quantile(hw_data$Z, c(0.1, 0.5, 0.9)))

predictions$Y <- predict(model2, newdata = predictions)

labs <- trimws(format(coef(model2), digits = 2))
form <- paste("Y =", labs[1], "+", labs[2], "* x +",
labs[3], "* Z + (", labs[4], " * X * Z)")

form <- paste(form, " R\u00B2 =",
format(summary(model2)$r.squared, digits = 2))

ggplot(hw_data, aes(X, Y)) +
geom_point() +
geom_line(data = predictions, aes(color = factor(Z))) +
geom_text(x = 15, y = 25, label = form, check_overlap = TRUE,
fontface = "italic") +
labs(color = "Z")

Sample Image

How to plot trend line with regression equation in R?

Here's the method I usually use:

library(tidyverse)
library(lubridate)
library(ggpmisc)

df <- tibble::tribble(
~Date, ~Speed,
"1/2019", 4500L,
"2/2019", 3400L,
"3/2019", 5300L,
"4/2019", 2000L
)
df$Date <- lubridate::my(df$Date)

ggplot(df, aes(x = Date, y = Speed)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ x, se = FALSE) +
stat_poly_eq(formula = x ~ y,
aes(label = paste(..eq.label.., ..rr.label.., sep = "~~~")),
parse = TRUE)

example_1.png

EDIT

With the p-value:

  geom_point() +
geom_smooth(method = "lm", formula = y ~ x, se = FALSE) +
stat_poly_eq(formula = x ~ y,
aes(label = paste(..eq.label.., ..rr.label.., ..p.value.label.., sep = "~~~")),
parse = TRUE)

example_2.png

Adding Regression Line Equation and R2 on SEPARATE LINES graph

EDIT:

In addition to inserting the equation, I have fixed the sign of the intercept value. By setting the RNG to set.seed(2L) will give positive intercept. The below example produces negative intercept.

I also fixed the overlapping text in the geom_text

set.seed(3L)
library(ggplot2)
df <- data.frame(x = c(1:100))
df$y <- 2 + 3 * df$x + rnorm(100, sd = 40)

lm_eqn <- function(df){
# browser()
m <- lm(y ~ x, df)
a <- coef(m)[1]
a <- ifelse(sign(a) >= 0,
paste0(" + ", format(a, digits = 4)),
paste0(" - ", format(-a, digits = 4)) )
eq1 <- substitute( paste( italic(y) == b, italic(x), a ),
list(a = a,
b = format(coef(m)[2], digits = 4)))
eq2 <- substitute( paste( italic(R)^2 == r2 ),
list(r2 = format(summary(m)$r.squared, digits = 3)))
c( as.character(as.expression(eq1)), as.character(as.expression(eq2)))
}

labels <- lm_eqn(df)

p <- ggplot(data = df, aes(x = x, y = y)) +
geom_smooth(method = "lm", se=FALSE, color="red", formula = y ~ x) +
geom_point() +
geom_text(x = 75, y = 90, label = labels[1], parse = TRUE, check_overlap = TRUE ) +
geom_text(x = 75, y = 70, label = labels[2], parse = TRUE, check_overlap = TRUE )

print(p)

Sample Image



Related Topics



Leave a reply



Submit