Adding a 3Rd Order Polynomial and Its Equation to a Ggplot in R

Adding a 3rd order polynomial and its equation to a ggplot in r

Part 1: to fit a polynomial, use the arguments:

  • method=lm - you did this correctly
  • formula=y ~ poly(x, 3, raw=TRUE) - i.e. don't wrap this in a call to lm

The code:

p + stat_smooth(method="lm", se=TRUE, fill=NA,
formula=y ~ poly(x, 3, raw=TRUE),colour="red")

Sample Image


Part 2: To add the equation:

  • Modify your functionlm_eqn() to correctly specify the data source to lm - you had a closing parentheses in the wrong place
  • Use annotate() to position the label, rather than geom_text

The code:

lm_eqn = function(df){
m=lm(y ~ poly(x, 3), df)#3rd degree polynomial
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2),
r2 = format(summary(m)$r.squared, digits = 3)))
as.character(as.expression(eq))
}

p + annotate("text", x=0.5, y=15000, label=lm_eqn(df), hjust=0, size=8,
family="Times", face="italic", parse=TRUE)

Sample Image

adding 3 third order polynomial equations in a ggplot

Your code is right, except for two things.

  • First, the function lm_eqn function does not undertand what is y and x. So, you'll have to pass the corresponding column names of lai.se1 as follows:

    lm_eqn = function(lai.se1){
    m=lm(LAI ~ poly(DAS, 3), lai.se1) #3rd degree polynomial
    eq <- substitute(italic(LAI) == a + b %.% italic(DAS)*","~~italic(r)^2~"="~r2,
    list(a = format(coef(m)[1], digits = 2),
    b = format(coef(m)[2], digits = 2),
    r2 = format(summary(m)$r.squared, digits = 3)))
    as.character(as.expression(eq))
    }
  • 2) Just the order has to be reversed a bit (due to stat_smooth as annotate just adds geoms to a plot. Otherwise, the stat_smooth output will be replaced by annotate ):

    p <- ggplot(lai.se1, aes(x = DAS, y = LAI, colour = DOS))
    p <- p + geom_errorbar(aes(ymin = LAI-sd, ymax = LAI+sd),
    colour = "black", size = .3, width = 1,
    position = position_dodge(.9))
    p <- p + stat_smooth(method = "lm", se = TRUE, fill = NA,
    formula = y ~ poly(x, 3, raw = TRUE))
    p <- p + geom_point(size = 1, shape = 21, fill = FALSE) + theme_bw()
    p + annotate("text", x=0.5, y=7.5, label=lm_eqn(lai.se1), hjust=0, size=8,
    family="Times", face="italic", parse=TRUE)
    p

Edit: Proposed solution after OP's comment. How about this?

require(plyr)
lm_eqn = daply(lai.se1, .(DOS), function(w) {
m = lm(LAI ~ poly(DAS, 3), w)
eq <- substitute(italic(LAI) == a + b %.% italic(DAS)*","~~italic(r)^2~"="~r2,
list(a = format(coef(m)[1], digits = 2),
b = format(coef(m)[2], digits = 2),
r2 = format(summary(m)$r.squared, digits = 3)))
as.character(as.expression(eq))
})

p <- ggplot(lai.se1, aes(x = DAS, y = LAI, colour = DOS))
p <- p + geom_errorbar(aes(ymin = LAI-sd, ymax = LAI+sd),
colour = "black", size = .3, width = 1,
position = position_dodge(.9))
p <- p + stat_smooth(method = "lm", se = TRUE, fill = NA,
formula = y ~ poly(x, 3, raw = TRUE))
p <- p + geom_point(size = 1, shape = 21, fill = FALSE) + theme_bw()
p <- p + annotate("text", x=0.5, y=5.5, label=lm_eqn[1], hjust=0, size=8,
family="Times", face="italic", parse=TRUE)
p <- p + annotate("text", x=0.5, y=6.5, label=lm_eqn[2], hjust=0, size=8,
family="Times", face="italic", parse=TRUE)
p <- p + annotate("text", x=0.5, y=7.5, label=lm_eqn[3], hjust=0, size=8,
family="Times", face="italic", parse=TRUE)
p

Basically, you generate all equations and then annotate that many times (you can probably loop them if you have more...)

ggplot2_edit2

How to plot a polynomial line and equation using ggplot and then combine them into one plot'

It looks like you have a variable of interest (LCU1, LCU2, LCU3, LCU4) in the column names. You can use gather from the tidyr package to reshape the data frame:

library(tidyr)
long_data <- gather(NganokeData, key = "core", value = "density",
LCU1, LCU2, LCU3, LCU4)

And then use facet_grid from the ggplot2 package to divide your plot into the four facets you are looking for.

p <- ggplot(long_data, aes(x=Depth,y=density)) + 
geom_point() +
labs(x ='Depths (cm)', y ='Density (Hu)',
title = 'Density Regression of Lake Nganoke Cores') +
ylim(1,2) +
facet_grid(rows = vars(core)) #can also use cols instead

p + stat_smooth(method = "lm", formula = y ~ poly(x, 3), size = 1)

Your code is great. But as a beginner, I highly recommend taking a few minutes to read and learn to use the tidyr package, as ggplot2 is built on the concepts of tidy data, and you'll find it much easier to make your visualizations if you can manipulate the data frame into the format you need before trying to plot it.

https://tidyr.tidyverse.org/index.html

EDIT:

To add an annotation detailing the regression equation, I found code taken from this blog post by Jodie Burchell:

http://t-redactyl.io/blog/2016/05/creating-plots-in-r-using-ggplot2-part-11-linear-regression-plots.html

First, though, it's not going to be possible to glean a displayable regression equation using the poly function in your formula as you have it. The advantage of orthogonal polynomials is that they avoid collinearity, but the drawback is that you no longer have an easily interpretable regression equation with x and x squared and x cubed as regressors.

So we'll have to change the lm fit formula to

y ~ poly(x, 3, raw = TRUE)

which will fit the raw polynomials and give you the regression equation you are looking for.

You are going to have to alter the x and y position values to determine where on the graph to place the annotation, since I don't have your data, but here's the custom function you need:

equation = function(x) {
lm_coef <- list(a = round(coef(x)[1], digits = 2),
b = round(coef(x)[2], digits = 2),
c = round(coef(x)[3], digits = 2),
d = round(coef(x)[4], digits = 2),
r2 = round(summary(x)$r.squared, digits = 2));
lm_eq <- substitute(italic(y) == a + b %.% italic(x) + c %.% italic(x)^2 + d %.% italic(x)^3*","~~italic(R)^2~"="~r2,lm_coef)
as.character(as.expression(lm_eq));
}

Then just add the annotation to your plot, adjust the x and y parameters as needed, and you'll be set:

p + 
stat_smooth(method = "lm", formula = y ~ poly(x, 3, raw = TRUE), size = 1) +
annotate("text", x = 1, y = 10, label = equation(fit), parse = TRUE)

Polynomial regression equation

Your problem is that the lm_eqn is tailored to show the equation of a linear regression, i.e. the polynomial of the first degree. I've modified it to show the equation of a polynomial of the Nth degree. Since you haven't posted your data (sth you should do in the future and prob why your question was initially down voted), I've used the cars data set from datasets.

library(datasets)
library(ggplot2)

lm_eqn <- function(df, degree, raw=TRUE){
m <- lm(y ~ poly(x, degree, raw=raw), df) # get the fit
cf <- round(coef(m), 2) # round the coefficients
r2 <- round(summary(m)$r.squared, 4) # round the r.squared
powers <- paste0("^", seq(length(cf)-1)) # create the powers for the equation
powers[1] <- "" # remove the first one as it's redundant (x^1 = x)
# first check the sign of the coefficient and assign +/- and paste it with
# the appropriate *italic(x)^power. collapse the list into a string
pcf <- paste0(ifelse(sign(cf[-1])==1, " + ", " - "), abs(cf[-1]),
paste0("*italic(x)", powers), collapse = "")
# paste the rest of the equation together
eq <- paste0("italic(y) == ", cf[1], pcf, "*','", "~italic(r)^2==", r2)
eq
}

df <- data.frame("x" = cars$speed, "y" = cars$dist)
ggplot(cars, aes(x = speed, y = dist)) +
geom_point() +
stat_smooth(method = "lm", formula = y ~ poly(x, 5, raw = TRUE), size = 1) +
annotate("text", x = 0, y = 100, label = lm_eqn(df, 5, raw = TRUE),
hjust = 0, family = "Times", parse = TRUE)

example

Plot polynomial curve in ggplot using equation, not data points

You're looking for stat_function(), I think:

x <- 1:100
dat <- data.frame(x,y=x^3+x^2+x+5)
f <- function(x) x^3+x^2+x+5
ggplot(dat, aes(x,y)) +
geom_point()+
stat_function(fun=f, colour="red")

Sample Image

Plotting Polynomial Regression Curves in R

The first thing I would do here is to convert the numbers you are treating as dates into actual dates. If you don't do this, lm will give the wrong result; as an example, rows 1 and 2 of your data frame represent data 15 days apart (20080316 - 20080301 = 15), but then rows 2 and 3 are 17 days apart, yet the regression will see them as being 86 days apart (20080402 - 20080316 = 86). This will cause lm to produce nonsensical results.

It is good practice to get into the habit of changing numbers or character strings that represent date and time data into actual dates and times as early in your analysis as you can. It makes the rest of your code much easier. In your case, that would just be:

Mangan_2008_nymph$Time <- as.Date(as.character(Mangan_2008_nymph$Time), "%Y%m%d")

For the plot itself, you can add the polynomial regression lines using geom_smooth. You specify the method lm, and the formula (in terms of x and y, not in terms of the variable names). It is also good idea to map the line to a color aesthetic so that it appears in a legend. Do this once for each polynomial order you wish to add.

library(ggplot2)

ggplot(Mangan_2008_nymph, aes(Time, Abundance)) +
geom_point(size = 4, col = 'red') +
geom_smooth(method = "lm", formula = y ~ poly(x, 2), aes(color = "2"), se = F) +
geom_smooth(method = "lm", formula = y ~ poly(x, 3), aes(color = "3"), se = F) +
geom_smooth(method = "lm", formula = y ~ poly(x, 4), aes(color = "4"), se = F) +
geom_smooth(method = "lm", formula = y ~ poly(x, 5), aes(color = "5"), se = F) +
geom_smooth(method = "lm", formula = y ~ poly(x, 6), aes(color = "6"), se = F) +
geom_smooth(method = "lm", formula = y ~ poly(x, 7), aes(color = "7"), se = F) +
labs(x = "Time", y = "Nymph abundance", color = "Degree") +
ggtitle("2008 Amblyomma americanum Abundance")

Sample Image

Personally, I think this ends up making the plot a little cluttered, and I would consider dropping a couple of polynomial levels to make this plot easier to read. You can also easily add a few stylistic tweaks:

ggplot(Mangan_2008_nymph, aes(Time, Abundance)) + 
geom_point(size = 2, col = 'gray50') +
geom_smooth(method = "lm", formula = y ~ poly(x, 3), aes(color = "3"), se = FALSE) +
geom_smooth(method = "lm", formula = y ~ poly(x, 5), aes(color = "5"), se = FALSE) +
geom_smooth(method = "lm", formula = y ~ poly(x, 7), aes(color = "7"), se = FALSE) +
scale_color_brewer(palette = "Set1") +
labs(x = "Time", y = "Nymph abundance", color = "Degree") +
ggtitle("2008 Amblyomma americanum Abundance") +
theme_classic() +
theme(panel.grid.major = element_line())

Sample Image


Data used

Mangan_2008_nymph <- structure(list(Time = c(20080301L, 20080316L, 20080402L, 
20080416L, 20080428L, 20080514L, 20080527L, 20080608L, 20080627L, 20080709L,
20080722L, 20080806L, 20080818L, 20080901L, 20080915L, 20080930L,
20081013L, 20081029L, 20081110L, 20081124L), Abundance = c(0L,
0L, 26L, 337L, 122L, 232L, 190L, 381L, 148L, 201L, 69L, 55L,
35L, 15L, 6L, 1L, 0L, 1L, 0L, 0L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20"))

Add regression line equation and R^2 on graph

Here is one solution

# GET EQUATION AND R-SQUARED AS STRING
# SOURCE: https://groups.google.com/forum/#!topic/ggplot2/1TgH-kG5XMA

lm_eqn <- function(df){
m <- lm(y ~ x, df);
eq <- substitute(italic(y) == a + b %.% italic(x)*","~~italic(r)^2~"="~r2,
list(a = format(unname(coef(m)[1]), digits = 2),
b = format(unname(coef(m)[2]), digits = 2),
r2 = format(summary(m)$r.squared, digits = 3)))
as.character(as.expression(eq));
}

p1 <- p + geom_text(x = 25, y = 300, label = lm_eqn(df), parse = TRUE)

EDIT. I figured out the source from where I picked this code. Here is the link to the original post in the ggplot2 google groups

Output

How to perform a polynomial equation and curve?

I think what you're asking is how to how to fit a polynomial regression line to your data while still using the ggpubr functions to annotate it.

This is possible, but it seems that the in-built regression line can only be either a straight line or a loess model, neither of which is appropriate. However, you can fit a polynomial curve and get the equation and adjusted R-squared on the plot using the method below. In your case I have used a cubic formula, but you should choose your polynomial based on a known model or whatever makes most sense based on what you already know about the relationship between your variables.

You can use ggplot to add the actual line as suggested by @Roland, as long as this uses the same formula as the one you supply to stat_regline_equation

ggscatter(dftest, x = "percent_power", y = "PETCO2") +
stat_regline_equation(label.x = 20, label.y = 0.5,
formula = y ~ poly(x, 3),
aes(label = paste(..eq.label.., ..adj.rr.label.., sep = "~~~~")),) +
geom_smooth(method = "lm", formula = y ~ poly(x, 3)) +
xlab("Percentage of power (%)") +
ylab(expression(paste("PETC", O[2]," (mmHg)")))

Which gives this result:

Sample Image



Related Topics



Leave a reply



Submit