Ggplot2: Issues with Dual Y-Axes and Loess Smoothing

ggplot2: issues with dual y-axes and Loess smoothing

This should give you a good start. You can play around with scale_ratio & dif if you want to

library(tidyverse)

mydata <- read_csv(text, col_types = paste0(c("c", rep("d", 4), rep("_", 9)), collapse = ""))
mydata
#> # A tibble: 67 x 5
#> `Country Name` Score GDP Infant Longevity
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Afghanistan 48.9 586. 53.2 63.7
#> 2 Albania 64.4 4538. 8.1 78.3
#> 3 Algeria 46.5 4120 21 76.1
#> 4 Angola 48.5 4170 55.8 61.5
#> 5 Argentina 50.4 14400 9.7 76.6
#> 6 Armenia 70.3 3937. 11.9 74.6
#> 7 Australia 81 53800 3.1 82.5
#> 8 Austria 72.3 47300 3 80.9
#> 9 Azerbaijan 63.6 4132. 21.9 72.0
#> 10 Bahrain 68.5 23655. 6.4 76.9
#> # ... with 57 more rows

Calculate ratios needed to scale the two y-axes

scale_ratio <- (max(mydata$Infant, na.rm = TRUE) - min(mydata$Infant, na.rm = TRUE)) /
(max(mydata$Longevity, na.rm = TRUE) - min(mydata$Longevity, na.rm = TRUE))

dif <- min(mydata$Longevity, na.rm = TRUE) - min(mydata$Infant, na.rm = TRUE)

myColor <- c("#d95f02", "#1b9e77")

p <- ggplot(mydata, aes(x = Score, y = Longevity)) +
geom_point(aes(colour = "Life Expectancy"),
shape = "triangle",
alpha = 0.7, size = 2) +
geom_point(aes(y = Infant/scale_ratio + dif,
colour = "Infant mortality (per capita)"),
alpha = 0.7, size = 2) +
scale_y_continuous(sec.axis = sec_axis(~ (. - dif) * scale_ratio,
name = "Infant mortality (per capita)")) +
scale_colour_manual(values = myColor) +
theme_bw(base_size = 14) +
labs(y = "Life Expectancy (years)",
x = "Score",
colour = " ") +
guides(colour = guide_legend(title = "",
override.aes = list(shape = c("circle", "triangle")))) +
theme(legend.position = 'bottom') +
NULL
p

Sample Image

Add fitted lines and their corresponding equations/R2

### https://docs.r4photobiology.info/ggpmisc/articles/user-guide.html
library(ggpmisc)

formula <- y ~ poly(x, 2, raw = TRUE)

p +
stat_smooth(aes(y = Longevity),
method = "lm", formula = formula, se = FALSE, size = 1, color = myColor[2]) +
stat_smooth(aes(y = Infant/scale_ratio + dif),
method = "lm", formula = formula, se = FALSE, size = 1, color = myColor[1]) +
stat_poly_eq(aes(y = Longevity,
label = paste(..eq.label.., ..adj.rr.label..,
sep = "~~italic(\"with\")~~")),
geom = "text", alpha = 0.7,
formula = formula, parse = TRUE,
color = myColor[2],
label.x.npc = 0.5,
label.y.npc = 0.95) +
stat_poly_eq(aes(y = Infant/scale_ratio + dif,
label = paste(..eq.label.., ..adj.rr.label..,
sep = "~~italic(\"with\")~~")),
geom = "text", alpha = 0.7,
color = myColor[1],
formula = formula, parse = TRUE,
label.x.npc = 0.75,
label.y.npc = 0.15) +
NULL

Sample Image

Created on 2018-10-07 by the reprex package (v0.2.1.9000)

How to smooth my line on a graph with 2 y axis in R?

The easiest way is to interpolate the data you have with a spline. The first thing you should realize is that month and month_new are treated as numerics with values 1:12 by ggplot. But your way of plotting a discrete x-axis with custom limits causes trouble. Instead, to have better control of this mapping you should code your discrete x-values as a factor with a given order on its levels:

plot_data <- data.frame(month = factor(month.abb, levels = month.abb), 
temp = sample(25, 12),
num_unique_tags = sample(25, 12))

You can plot your data

library(ggplot2)

ggplot(plot_data, aes(x = month, y = num_unique_tags)) +
geom_col() +
geom_line(aes(y = temp, group = 2), color = "forestgreen", size = 2) +
scale_y_continuous(limits = c(0, 30), name = "Total Unique Detections",
sec.axis = sec_axis(~ . -2 , name = "Temperature (°C)"))

To interpolate, you just make a new dataset with interpolated data and plot that:

plot_data_interp <- as.data.frame(spline(x = 1:12, y = plot_data$temp, xout = seq(1, 12, length = 100)),
col.names = c('month', 'temp'))

ggplot(plot_data, aes(x = month, y = num_unique_tags)) +
geom_col() +
geom_line(aes(y = temp, group = 2), data = plot_data_interp, color = "forestgreen", size = 2) +
scale_y_continuous(limits = c(0, 30), name = "Total Unique Detections",
sec.axis = sec_axis(~ . -2 , name = "Temperature (°C)"))

And if you want to smooth data, you can do that (e.g. with a smoothing spline):

plot_data_smooth <- as.data.frame(predict(smooth.spline(x = 1:12, y = plot_data$temp, spar = 0.5), x = seq(1, 12, length = 100)),
col.names = c('month', 'temp'))

ggplot(plot_data, aes(x = month, y = num_unique_tags)) +
geom_col() +
geom_line(aes(y = temp, group = 2), data = plot_data_smooth, color = "forestgreen", size = 2) +
scale_y_continuous(limits = c(0, 30), name = "Total Unique Detections",
sec.axis = sec_axis(~ . -2 , name = "Temperature (°C)"))

The results are as follows:

Sample Image

ggplot with 2 y axes on each side and different scales

Sometimes a client wants two y scales. Giving them the "flawed" speech is often pointless. But I do like the ggplot2 insistence on doing things the right way. I am sure that ggplot is in fact educating the average user about proper visualization techniques.

Maybe you can use faceting and scale free to compare the two data series? - e.g. look here: https://github.com/hadley/ggplot2/wiki/Align-two-plots-on-a-page

Plot with geom_smooth(,) multiple colours, double y-axis with four variables in ggplot2

Probably the easiest way is to add information to the variable at the specification of aesthetics. In the example below, we paste0() the extra information whether the series is Vix or monomer to the colours.

Graph <- ggplot(Dati, aes(x= Time)) +
geom_point(aes(y= Vix, col=paste0("Vix ", ref)), shape = 1, size = 3.5) +
geom_smooth(aes(y= Vix, col = paste0("Vix ", ref)), method="loess") +
geom_point(aes(y= monomer * scaleFactor, col=paste0("Monomer ", ref)), shape = 1, size = 3.5) +
geom_smooth(aes(y=monomer * scaleFactor, col = paste0("Monomer ", ref)), method="loess") +
scale_color_manual(values=c('#644196', '#bba6d9', '#f92410', '#fca49c'),
name = "Series?") +
scale_y_continuous(name="Vix", sec.axis=sec_axis(~./scaleFactor, name="monomer")) +
theme(
axis.title.y.left=element_text(color='#f92410'),
axis.text.y.left=element_text(color='#f92410'),
axis.title.y.right=element_text(color='#644196'),
axis.text.y.right=element_text(color='#644196')
)

Graph

Sample Image

Dual y-axis while using facet_wrap in ggplot with varying y-axis scale

If this is about making the ranges of the data overlap instead of just rescaling the maximum, you can try the following.

First we'll make function factory to make our job easier:

library(ggplot2)
library(scales)
#> Warning: package 'scales' was built under R version 4.0.3

# Function factory for secondary axis transforms
train_sec <- function(from, to) {
from <- range(from)
to <- range(to)
# Forward transform for the data
forward <- function(x) {
rescale(x, from = from, to = to)
}
# Reverse transform for the secondary axis
reverse <- function(x) {
rescale(x, from = to, to = from)
}
list(fwd = forward, rev = reverse)
}

Then, we can use the function factory to make transformation functions for the data and for the secondary axis.

# Learn the `from` and `to` parameters
sec <- train_sec(mtcars$hp, mtcars$cyl)

Which you can apply like this:

ggplot(mtcars, aes(x=disp)) +
geom_smooth(aes(y=cyl), method="loess", col="blue") +
geom_smooth(aes(y= sec$fwd(hp)), method="loess", col="red") +
scale_y_continuous(name="cyl", sec.axis=sec_axis(~sec$rev(.), name="hp")) +
theme(
axis.title.y.left=element_text(color="blue"),
axis.text.y.left=element_text(color="blue"),
axis.title.y.right=element_text(color="red"),
axis.text.y.right=element_text(color="red")
)
#> `geom_smooth()` using formula 'y ~ x'
#> `geom_smooth()` using formula 'y ~ x'

Sample Image

Here is an example with a different dataset.

sec <- train_sec(economics$psavert, economics$unemploy)

ggplot(economics, aes(date)) +
geom_line(aes(y = unemploy), colour = "blue") +
geom_line(aes(y = sec$fwd(psavert)), colour = "red") +
scale_y_continuous(sec.axis = sec_axis(~sec$rev(.), name = "psavert"))

Sample Image

Created on 2021-02-04 by the reprex package (v1.0.0)

R to scale the y-axis to fit the loess curves

You were almost there, just add a coord_cartesian call

ggplot(myData, aes(X,Y))+
geom_point()+
stat_smooth(method="loess", se=F, size=3)+
geom_line(aes(X,pred),colour="yellow") +
coord_cartesian(ylim=c(min(myData$pred)-.1, max(myData$pred)+.1))

smoothing a timeseries with multiple y per x

I just realized I was just being dense. I think it's pretty trivial to just set up an additional column with the output from the smoothing formula and then to a full_join on the x-axis values.

data <- structure(list(time = c(0, 0, 6, 6, 12, 12, 18, 18, 24, 24, 30, 
30, 36, 36, 42, 42, 48, 48, 54, 54, 60, 60, 66, 66, 72, 72, 78,
78, 84, 84, 90, 90, 96, 96, 102, 102, 108, 108, 114, 114, 120,
120, 126, 126, 132, 132, 138, 138), confluence = c(14.68764,
19.73559, 2.897458, 3.478664, 3.46789, 4.122939, 4.270285, 4.534702,
4.838222, 5.578382, 5.938678, 6.337464, 7.116287, 7.824044, 8.50258,
10.16758, 11.13803, 13.25756, 18.46681, 11.97336, 24.45211, 14.61754,
30.7178, 19.91414, 37.93423, 26.0687, 45.91022, 33.69255, 57.83714,
42.13477, 69.2417, 54.8134, 79.81015, 68.28696, 89.50358, 78.21476,
95.31271, 87.13279, 97.71458, 94.69752, 98.59245, 97.71144, 98.8707,
98.87447, 98.99731, 99.42957, 99.02805, 99.6716)), row.names = c(NA,
-48L), class = c("tbl_df", "tbl", "data.frame"))

library(tidyverse )

smooth <- data.frame(supsmu(data$time, data$confluence))
data <- full_join(data, smooth, by= c("time" = "x"))

ggplot(data = data) +
geom_point(aes(x = time, y = confluence)) +
geom_smooth(aes(x = time, y = confluence)) +
geom_point(aes(x = time, y = y), color = "red")

head(data, 10)

# # A tibble: 10 x 3
# time confluence y
# <dbl> <dbl> <dbl>
# 1 0 14.7 14.7
# 2 0 19.7 14.7
# 3 6 2.90 8.72
# 4 6 3.48 8.72
# 5 12 3.47 5.10
# 6 12 4.12 5.10
# 7 18 4.27 4.49
# 8 18 4.53 4.49
# 9 24 4.84 5.30
# 10 24 5.58 5.30

Sample Image

R - ggplot2 - Unable to see the standard error range doing a double geom_smooth() on a graph with two y axis

You will not get an 'error band' because you only have single values of y defined for each x. If you have multiple y values for each x the band shows up with the default settings.

(Added some randome y values for 30, 60, and 90. Code simplified to reduce clutter.)

Dati <- data.frame("Vix" = c(40000, 62500, 80000, 60000, 87000, 12000, 122000, 180000, 80000, 140000, 154000), "Time" = c(30, 30, 30 ,60, 60, 60 ,90, 90, 90, 120, 135))
attach(Dati)
library(ggplot2)
library(readxl)

scaleFactor <- max(Vix) / max(monomer)
Graph <- ggplot(Dati, aes(x= Time)) +

geom_smooth(aes(y=Vix), method="loess", col='#f92410')

Graph

OutputSample Image



Related Topics



Leave a reply



Submit