Plotting Average of Multiple Variables in Time-Series Using Ggplot

Plotting the average of multiple time series objects and illustrating the error from that plot

Using a second geom_line you can plot the "raw" data in the background as e.g. grey lines.

set.seed(123)
ID = factor(letters[seq(6)])
time = c(100, 102, 120, 105, 109, 130)
dat <- data.frame(ID = rep(ID,time), Time = sequence(time))
dat$group <- rep(c("GroupA","GroupB"), c(322,344))
dat$values <- sample(100, nrow(dat), TRUE)

library(dplyr)
library(ggplot2)
d <- dat %>%
group_by(ID) %>%
mutate(maxtime = max(Time)) %>%
group_by(group) %>%
mutate(maxtime = min(maxtime)) %>%
group_by(group, Time) %>%
summarize(values = mean(values))
#> `summarise()` regrouping output by 'group' (override with `.groups` argument)

ggplot()+
geom_line(data = dat, aes(Time, values, group = ID), color = "grey80", alpha = .7) +
geom_line(data = d, aes(Time, values, colour = group)) +
facet_wrap(.~group)

Sample Image

Plotting average of multiple groups across time in ggplot2

Here are two variations. I'd recommend pre-calculating your summary stats and feeding that into ggplot.

sample_sum <- sample_data %>%
group_by(xvar, group) %>%
summarize(mean = mean(yvar),
sd = sd(yvar),
mean_p2sd = mean + 2 * sd,
mean_m2sd = mean - 2 * sd) %>%
ungroup()

This first approach gathers mean, mean minus 2 SD, and mean plus 2 SD into the same columns, with "stat" marking which stat it is, and yvar storing the value. (I picked those because +/- 2 SD captures ~95% of a normal distribution.) Then we can plot them together in a single call to geom_line.

p <- ggplot(sample_sum %>%
gather(stat, yvar, mean, mean_p2sd:mean_m2sd),
aes(x = xvar, y = yvar)) +
geom_line(aes(color = group, linetype = stat))
p

Sample Image

Alternatively, we can keep them apart and plot the +/- 2 SD area using geom_ribbon.

p <- ggplot(sample_sum, aes(x = xvar,  color = group, fill = group)) + 
geom_ribbon(aes(ymin = mean_m2sd, ymax = mean_p2sd), alpha = 0.1) +
geom_line(aes(y= mean))

p

Sample Image

Plotting multiple time-series in ggplot

If your data is called df something like this:

library(ggplot2)
library(reshape2)
meltdf <- melt(df,id="Year")
ggplot(meltdf,aes(x=Year,y=value,colour=variable,group=variable)) + geom_line()

Sample Image

So basically in my code when I use aes() im telling it the x-axis is Year, the y-axis is value and then the colour/grouping is by the variable.

The melt() function was to get your data in the format ggplot2 would like. One big column for year, etc.. which you then effectively split when you tell it to plot by separate lines for your variable.

For loop in ggplot for multiple time series viz

You could try following code using lapply instead of for loop.

# transforming timestamp in date object
df$timestamp <- as.Date(df$timestamp, format = "%d/%m/%Y")
# create function that is used in lapply
plotlines <- function(variables){
ggplot(df, aes(x = timestamp, y = variables)) +
geom_line()
}
# plot all plots with lapply
plots <- lapply(df[names(df) != "timestamp"], plotlines) # all colums except timestamp
plots

Plotting multiple time series on the same plot using ggplot()

ggplot allows you to have multiple layers, and that is what you should take advantage of here.

In the plot created below, you can see that there are two geom_line statements hitting each of your datasets and plotting them together on one plot. You can extend that logic if you wish to add any other dataset, plot, or even features of the chart such as the axis labels.

library(ggplot2)

jobsAFAM1 <- data.frame(
data_date = runif(5,1,100),
Percent.Change = runif(5,1,100)
)

jobsAFAM2 <- data.frame(
data_date = runif(5,1,100),
Percent.Change = runif(5,1,100)
)

ggplot() +
geom_line(data = jobsAFAM1, aes(x = data_date, y = Percent.Change), color = "red") +
geom_line(data = jobsAFAM2, aes(x = data_date, y = Percent.Change), color = "blue") +
xlab('data_date') +
ylab('percent.change')

Plot time series in R ggplot using multiple groups

You can try something like this, I advice you to convert date as date, using for example lubridate::ymd():

library(tidyverse)
library(lubridate)
# your data
nat %>%
# add date as date
mutate(date = ymd(date)) %>%
# plot them
ggplot( aes(x = date, y = value, color = group, group = group)) +
geom_line() + geom_point() + theme_test()

Sample Image

plotting average with confidence interval in ggplot2 for time-series data

If i understood correctly you wanna display average of all three parameters (var0,var1 and var3) with standard deviation.

I do have for you two solutions. First one imply dplyr package and calculation of the standard deviation and average row-wise and further display using geom_ribbon():

library(dplyr)
library(magrittr)
q <- test_data
q <- q %>% rowwise() %>% transmute(date, mean=mean(c(var0,var1,var2), na.rm=TRUE), sd = sd(c(var0,var1,var2), na.rm=TRUE))

eb <- aes(ymax = mean + sd, ymin = mean - sd)
ggplot(data = q, aes(x = date, y = mean)) +
geom_line(size = 2) +
geom_ribbon(eb, alpha = 0.5)

Sample Image

Second solution imply mentioned by you stat_summary(), which actually works well with the code you have provided:

ggplot(data=test_data_long, aes(x=date, y=value)) +
stat_summary(fun.data ="mean_sdl", mult=1, geom = "smooth") + theme_bw()

Sample Image

Moving average on several time series using ggplot

This is what you need?

f <- ma_12(df[df$taxon=="Flower", ]$density)
s <- ma_12(df[df$taxon=="Seeds", ]$density)

f <- cbind(f,time(f))
s <- cbind(s,time(s))

serie <- data.frame(rbind(f,s),
taxon=c(rep("Flower", dim(f)[1]), rep("Seeds", dim(s)[1])))
serie$density <- exp(serie$f)

library(lubridate)
serie$time <- ymd(format(date_decimal(serie$time), "%Y-%m-%d"))

library(ggplot2)
ggplot() + geom_point(data=df, aes(x=ymd, y=density, color=taxon, group=taxon)) +
geom_line(data=serie, aes(x= time, y=density, color=taxon, group=taxon))


Related Topics



Leave a reply



Submit