Rolling window function for irregular time series that can handle duplicates
This can be solved by grouping in a non-equi join to aggregate over a rolling window of length k
, filtering for k
consecutive years, and an update join:
library(data.table)
k <- 3L
# group by join parameters of a non-equi join
mDT <- setDT(DT)[.(grp = grp, upper = yr, lower = yr - k),
on = .(grp, yr <= upper, yr > lower),
.(uniqueN(x.yr), mean(nr)), by = .EACHI]
# update join with filtered intermediate result
DT[mDT[V1 == k], on = .(grp, yr), paste0("nr_roll_period_", k) := V2]
DT
which returns OP's expected result:
grp nr yr nr_roll_period
1: A 1.0 2009 NA
2: A 2.0 2009 NA
3: A 1.5 2009 NA
4: A 1.0 2010 NA
5: B 3.0 2009 NA
6: B 2.0 2010 NA
7: B NA 2011 NA
8: C 3.0 2014 NA
9: C 3.0 2019 NA
10: C 3.0 2020 NA
11: C 4.0 2021 3.333333
The intermediate result mDT
contains the rolling mean V2
over k
periods and the count of unique/distinct years V1
within each period. It is created by a non-equi join of DT
with a data.table containing the upper and lower bounds which is created on-the-fly by .(grp = grp, upper = yr, lower = yr - k)
.
mDT
grp yr yr V1 V2
1: A 2009 2006 1 1.500000
2: A 2009 2006 1 1.500000
3: A 2009 2006 1 1.500000
4: A 2010 2007 2 1.375000
5: B 2009 2006 1 3.000000
6: B 2010 2007 2 2.500000
7: B 2011 2008 3 NA
8: C 2014 2011 1 3.000000
9: C 2019 2016 1 3.000000
10: C 2020 2017 2 3.000000
11: C 2021 2018 3 3.333333
This is filtered for rows which contain exactly k
distinct years:
mDT[V1 == k]
grp yr yr V1 V2
1: B 2011 2008 3 NA
2: C 2021 2018 3 3.333333
Finally, this is joined with DT
to append the new column to DT
.
Note, that mean()
returns NA
by default if there is an NA
in the input data.
Data
library(data.table)
DT <- fread(text = "rn grp nr yr
1: A 1.0 2009
2: A 2.0 2009
3: A 1.5 2009
4: A 1.0 2010
5: B 3.0 2009
6: B 2.0 2010
7: B NA 2011
8: C 3.0 2014
9: C 3.0 2019
10: C 3.0 2020
11: C 4.0 2021", drop = 1L)
R Rolling average from irregular time series
Calcuations within a sliding or rolling window of an irregular time series can be solved by data.table's ability to aggregate in a non-equi join.
There are many similar questions, e.g., r calculating rolling average with window based on value (not number of rows or date/time variable) or Rolling regression on irregular time series.
However, this question is different and thus deserves an answer on its own. From OP's own answer it can be concluded that the OP is looking for a centred rolling window. In addition, the rolling mean is to be computed for several columns.
library(data.table)
cols <- c("value2", "value3")
setDT(df)[SJ(year = (min(year) + 2):(max(year) - 2))[, c("start", "end") := .(year - 2, year + 2)],
on = .(year >= start, year < end),
c(.(year = i.year), lapply(.SD, mean)), .SDcols = cols, by = .EACHI][, -(1:2)]
year value2 value3
1: 2002 0.57494219 -0.53001134
2: 2003 0.33925292 0.75541896
3: 2004 -0.05834453 0.23987209
4: 2005 0.17031099 0.13074666
5: 2006 0.05272739 0.09297215
6: 2007 -0.12935805 -0.38780964
7: 2008 0.19716437 -0.11587017
The result is identical to OP's own result rmeans
.
Data
set.seed(123) # ensure reproducible sample data
df <- data.frame(
year = rep(2000:2010, c(3, 1, 0, 0, 4, 3, 3, 1, 2, 6, 8)),
value1 = rnorm(31), value2 = rnorm(31), value3 = rnorm(31))
Rollapply with different rolling window on each vector of time series
1) Convert to a wide form zoo object z and then to a list of zoo objects L, one per column of z, apply rollfun to each component of L creating a list of zoo objects and then merge back into a wide form zoo object zroll and either use that or optionally convert to long form data frame droll.
library(zoo)
z <- read.zoo(dat, split = "fact")
L <- as.list(z)
rollfun <- function(x) rollapplyr(x, length(na.omit(x)) - 252, mean)
zroll <- do.call("merge", Map(rollfun, L))
droll <- fortify.zoo(zroll, melt = TRUE)
2) This could also be expressed as a pipeline where rollfun is from above.
droll2 <- dat |>
read.zoo(split = "fact") |>
as.list() |>
Map(f = rollfun) |>
do.call(what = "merge") |>
fortify.zoo(melt = TRUE)
3) With dplyr
library(dplyr, exclude = c("lag", "filter"))
library(zoo)
dat %>%
group_by(fact) %>%
mutate(roll = rollapplyr(value, n() - 252, mean, fill = NA)) %>%
ungroup
Related Topics
How to Find Useful R Tutorials with Various Implementations
How to Refer to a Variable Name with Spaces
Copy/Move One Environment to Another
Debugging (Line by Line) of Rcpp-Generated Dll Under Windows
R - How to Replace Parts of Variable Strings Within Data Frame
Condition a ..Count.. Summation on the Faceting Variable
Order of Legend Entries in Ggplot2 Barplots with Coord_Flip()
Cumulative Sum for Positive Numbers Only
Display Correlation Tables as Descending List
What's My User Agent When I Parse Website with Rvest Package in R
Apply Grouped Model Back Onto Data
Is There a Technical Difference Between "=" and "<-"
Plotting Multiple Curves Same Graph and Same Scale
Summing Across Rows of a Data.Table for Specific Columns
How to Draw Two Half Circles in Ggplot in R
Save Object Using Variable with Object Name
Subscripts and Superscripts "-" or "+" with Ggplot2 Axis Labels? (Ionic Chemical Notation)