Rolling window function for irregular time series that can handle duplicates
This can be solved by grouping in a non-equi join to aggregate over a rolling window of length k
, filtering for k
consecutive years, and an update join:
library(data.table)
k <- 3L
# group by join parameters of a non-equi join
mDT <- setDT(DT)[.(grp = grp, upper = yr, lower = yr - k),
on = .(grp, yr <= upper, yr > lower),
.(uniqueN(x.yr), mean(nr)), by = .EACHI]
# update join with filtered intermediate result
DT[mDT[V1 == k], on = .(grp, yr), paste0("nr_roll_period_", k) := V2]
DT
which returns OP's expected result:
grp nr yr nr_roll_period
1: A 1.0 2009 NA
2: A 2.0 2009 NA
3: A 1.5 2009 NA
4: A 1.0 2010 NA
5: B 3.0 2009 NA
6: B 2.0 2010 NA
7: B NA 2011 NA
8: C 3.0 2014 NA
9: C 3.0 2019 NA
10: C 3.0 2020 NA
11: C 4.0 2021 3.333333
The intermediate result mDT
contains the rolling mean V2
over k
periods and the count of unique/distinct years V1
within each period. It is created by a non-equi join of DT
with a data.table containing the upper and lower bounds which is created on-the-fly by .(grp = grp, upper = yr, lower = yr - k)
.
mDT
grp yr yr V1 V2
1: A 2009 2006 1 1.500000
2: A 2009 2006 1 1.500000
3: A 2009 2006 1 1.500000
4: A 2010 2007 2 1.375000
5: B 2009 2006 1 3.000000
6: B 2010 2007 2 2.500000
7: B 2011 2008 3 NA
8: C 2014 2011 1 3.000000
9: C 2019 2016 1 3.000000
10: C 2020 2017 2 3.000000
11: C 2021 2018 3 3.333333
This is filtered for rows which contain exactly k
distinct years:
mDT[V1 == k]
grp yr yr V1 V2
1: B 2011 2008 3 NA
2: C 2021 2018 3 3.333333
Finally, this is joined with DT
to append the new column to DT
.
Note, that mean()
returns NA
by default if there is an NA
in the input data.
Data
library(data.table)
DT <- fread(text = "rn grp nr yr
1: A 1.0 2009
2: A 2.0 2009
3: A 1.5 2009
4: A 1.0 2010
5: B 3.0 2009
6: B 2.0 2010
7: B NA 2011
8: C 3.0 2014
9: C 3.0 2019
10: C 3.0 2020
11: C 4.0 2021", drop = 1L)
R Rolling average from irregular time series
Calcuations within a sliding or rolling window of an irregular time series can be solved by data.table's ability to aggregate in a non-equi join.
There are many similar questions, e.g., r calculating rolling average with window based on value (not number of rows or date/time variable) or Rolling regression on irregular time series.
However, this question is different and thus deserves an answer on its own. From OP's own answer it can be concluded that the OP is looking for a centred rolling window. In addition, the rolling mean is to be computed for several columns.
library(data.table)
cols <- c("value2", "value3")
setDT(df)[SJ(year = (min(year) + 2):(max(year) - 2))[, c("start", "end") := .(year - 2, year + 2)],
on = .(year >= start, year < end),
c(.(year = i.year), lapply(.SD, mean)), .SDcols = cols, by = .EACHI][, -(1:2)]
year value2 value3
1: 2002 0.57494219 -0.53001134
2: 2003 0.33925292 0.75541896
3: 2004 -0.05834453 0.23987209
4: 2005 0.17031099 0.13074666
5: 2006 0.05272739 0.09297215
6: 2007 -0.12935805 -0.38780964
7: 2008 0.19716437 -0.11587017
The result is identical to OP's own result rmeans
.
Data
set.seed(123) # ensure reproducible sample data
df <- data.frame(
year = rep(2000:2010, c(3, 1, 0, 0, 4, 3, 3, 1, 2, 6, 8)),
value1 = rnorm(31), value2 = rnorm(31), value3 = rnorm(31))
What's the fastest way to produce rolling window embeddings in time series data?
Group the column values
on date
then inside a list comprehension iterate over each group and apply the sliding_window_view
transformation, then vertically stack all the sliding views corresponding to each group
For numpy version >= 1.20
from numpy.lib.stride_tricks import sliding_window_view
grp = df['values'].groupby(df['date'].dt.floor('D'))
np.vstack([sliding_window_view(v, 3) for _, v in grp])
For numpy version < 1.20
def sliding_view(a, w):
s = a.strides[0]
shape = a.shape[0] - w + 1, w
return np.lib.stride_tricks.as_strided(a, shape, (s, s))
grp = df['values'].groupby(df['date'].dt.floor('D'))
np.vstack([sliding_view(v.values, 3) for _, v in grp])
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
Related Topics
Gantt Style Time Line Plot (In Base R)
Solving Non-Square Linear System with R
Shiny Leaflet Ploygon Click Event
Flip Ordering of Legend Without Altering Ordering in Plot
A Way to Always Dodge a Histogram
Keyed Lookup on Data.Table Without 'With'
Save Imported CSV Data in Vector - R
Control the Size of Points in an R Scatterplot
Categorize Continuous Variable with Dplyr
Group Data and Plot Multiple Lines
How to Create a Bipartite Network in R with Igraph or Tnet
How to Cross-Paste All Combinations of Two Vectors (Each-To-Each)
How to Generalize Outer to N Dimensions
R Solve:System Is Exactly Singular