Optimized Rolling Functions on Irregular Time Series with Time-Based Window

Rolling window function for irregular time series that can handle duplicates

This can be solved by grouping in a non-equi join to aggregate over a rolling window of length k, filtering for k consecutive years, and an update join:

library(data.table)
k <- 3L
# group by join parameters of a non-equi join
mDT <- setDT(DT)[.(grp = grp, upper = yr, lower = yr - k),
on = .(grp, yr <= upper, yr > lower),
.(uniqueN(x.yr), mean(nr)), by = .EACHI]
# update join with filtered intermediate result
DT[mDT[V1 == k], on = .(grp, yr), paste0("nr_roll_period_", k) := V2]
DT

which returns OP's expected result:

    grp  nr   yr nr_roll_period
1: A 1.0 2009 NA
2: A 2.0 2009 NA
3: A 1.5 2009 NA
4: A 1.0 2010 NA
5: B 3.0 2009 NA
6: B 2.0 2010 NA
7: B NA 2011 NA
8: C 3.0 2014 NA
9: C 3.0 2019 NA
10: C 3.0 2020 NA
11: C 4.0 2021 3.333333

The intermediate result mDT contains the rolling mean V2 over k periods and the count of unique/distinct years V1 within each period. It is created by a non-equi join of DT with a data.table containing the upper and lower bounds which is created on-the-fly by .(grp = grp, upper = yr, lower = yr - k).

mDT
    grp   yr   yr V1       V2
1: A 2009 2006 1 1.500000
2: A 2009 2006 1 1.500000
3: A 2009 2006 1 1.500000
4: A 2010 2007 2 1.375000
5: B 2009 2006 1 3.000000
6: B 2010 2007 2 2.500000
7: B 2011 2008 3 NA
8: C 2014 2011 1 3.000000
9: C 2019 2016 1 3.000000
10: C 2020 2017 2 3.000000
11: C 2021 2018 3 3.333333

This is filtered for rows which contain exactly k distinct years:

mDT[V1 == k]
   grp   yr   yr V1       V2
1: B 2011 2008 3 NA
2: C 2021 2018 3 3.333333

Finally, this is joined with DT to append the new column to DT.

Note, that mean() returns NA by default if there is an NA in the input data.

Data

library(data.table)
DT <- fread(text = "rn grp nr yr
1: A 1.0 2009
2: A 2.0 2009
3: A 1.5 2009
4: A 1.0 2010
5: B 3.0 2009
6: B 2.0 2010
7: B NA 2011
8: C 3.0 2014
9: C 3.0 2019
10: C 3.0 2020
11: C 4.0 2021", drop = 1L)

R Rolling average from irregular time series

Calcuations within a sliding or rolling window of an irregular time series can be solved by data.table's ability to aggregate in a non-equi join.

There are many similar questions, e.g., r calculating rolling average with window based on value (not number of rows or date/time variable) or Rolling regression on irregular time series.

However, this question is different and thus deserves an answer on its own. From OP's own answer it can be concluded that the OP is looking for a centred rolling window. In addition, the rolling mean is to be computed for several columns.

library(data.table)
cols <- c("value2", "value3")
setDT(df)[SJ(year = (min(year) + 2):(max(year) - 2))[, c("start", "end") := .(year - 2, year + 2)],
on = .(year >= start, year < end),
c(.(year = i.year), lapply(.SD, mean)), .SDcols = cols, by = .EACHI][, -(1:2)]
   year      value2      value3
1: 2002 0.57494219 -0.53001134
2: 2003 0.33925292 0.75541896
3: 2004 -0.05834453 0.23987209
4: 2005 0.17031099 0.13074666
5: 2006 0.05272739 0.09297215
6: 2007 -0.12935805 -0.38780964
7: 2008 0.19716437 -0.11587017

The result is identical to OP's own result rmeans.

Data

set.seed(123)   # ensure reproducible sample data
df <- data.frame(
year = rep(2000:2010, c(3, 1, 0, 0, 4, 3, 3, 1, 2, 6, 8)),
value1 = rnorm(31), value2 = rnorm(31), value3 = rnorm(31))

What's the fastest way to produce rolling window embeddings in time series data?

Group the column values on date then inside a list comprehension iterate over each group and apply the sliding_window_view transformation, then vertically stack all the sliding views corresponding to each group

For numpy version >= 1.20

from numpy.lib.stride_tricks import sliding_window_view

grp = df['values'].groupby(df['date'].dt.floor('D'))
np.vstack([sliding_window_view(v, 3) for _, v in grp])

For numpy version < 1.20

def sliding_view(a, w):
s = a.strides[0]
shape = a.shape[0] - w + 1, w
return np.lib.stride_tricks.as_strided(a, shape, (s, s))

grp = df['values'].groupby(df['date'].dt.floor('D'))
np.vstack([sliding_view(v.values, 3) for _, v in grp])


array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])


Related Topics



Leave a reply



Submit