Is There a _Fast_ Way to Run a Rolling Regression Inside Data.Table

Is there a _fast_ way to run a rolling regression inside data.table?

Not as far as I know; data.table doesn't have any special features for rolling windows. Other packages already implement rolling functionality on vectors, so they can be used in the j of data.table. If they are not efficient enough, and no package has faster versions (?), then it's a case of writing faster versions yourself and (of course) contributing them: either to an existing package or creating your own.

Related questions (follow links in links) :

Using data.table to speed up rollapply

R data.table sliding window

Rolling regression over multiple columns in R

How do I speed up rolling regressions?

Here's how to do that with data.table, which should be the fastest way to do what you want. You first need to build a sigma function and then use rollaplyr with .SD.

set.seed(1)
library(data.table)
dt <- data.table(PERMNO=rep(LETTERS[1:3],each=13),
YearMonth=seq.Date(from=Sys.Date(),by="month",length.out =13),
Return=runif(39),VWReturn=runif(39))

#create sigma function
stdev <- function(x) sd(lm(x[, 1]~ x[, 2])$residuals)

#create new column with rollapply
dt[,roll_sd:=rollapplyr(.SD, 12, stdev, by.column = FALSE, fill = NA),
by=.(PERMNO),.SDcols = c("Return", "VWReturn")]

PERMNO YearMonth Return VWReturn roll_sd
1: A 2017-11-19 0.26550866 0.41127443 NA
2: A 2017-12-19 0.37212390 0.82094629 NA
3: A 2018-01-19 0.57285336 0.64706019 NA
4: A 2018-02-19 0.90820779 0.78293276 NA
5: A 2018-03-19 0.20168193 0.55303631 NA
6: A 2018-04-19 0.89838968 0.52971958 NA
7: A 2018-05-19 0.94467527 0.78935623 NA
8: A 2018-06-19 0.66079779 0.02333120 NA
9: A 2018-07-19 0.62911404 0.47723007 NA
10: A 2018-08-19 0.06178627 0.73231374 NA
11: A 2018-09-19 0.20597457 0.69273156 NA
12: A 2018-10-19 0.17655675 0.47761962 0.3181427
13: A 2018-11-19 0.68702285 0.86120948 0.3141638
14: B 2017-11-19 0.38410372 0.43809711 NA
....

Rolling Regression Data Frame

Consider using the roll package.

library(magrittr); requireNamespace("roll")
ds <- readr::read_csv(
" Date, open.x, high.x, low.x, x_Close, volume.x, open.y, high.y, low.y, y_Close, volume.y
2010-01-04, 57.32, 58.13, 57.32, 57.85, 442900, 6.61, 6.8400, 6.61, 6.83, 833100
2010-01-05, 57.90, 58.33, 57.54, 58.20, 436900, 6.82, 7.1200, 6.80, 7.12, 904500
2010-01-06, 58.20, 58.56, 58.01, 58.42, 850600, 7.05, 7.3800, 7.05, 7.27, 759800
2010-01-07, 58.31, 58.41, 57.14, 57.90, 463600, 7.24, 7.3000, 7.06, 7.11, 557800
2010-01-08, 57.45, 58.62, 57.45, 58.47, 206500, 7.08, 7.3500, 6.95, 7.29, 588100
2010-01-11, 58.79, 59.00, 57.22, 57.73, 331900, 7.38, 7.4500, 7.17, 7.22, 450500
2010-01-12, 57.20, 57.21, 56.15, 56.34, 428500, 7.15, 7.1900, 6.87, 7.00, 694700
2010-01-13, 56.32, 56.66, 54.83, 56.56, 577500, 7.05, 7.1700, 6.98, 7.15, 528800
2010-01-14, 56.51, 57.05, 55.37, 55.53, 368100, 7.08, 7.1701, 7.08, 7.11, 279900
2010-01-15, 56.59, 56.59, 55.19, 55.84, 417900, 7.03, 7.0500, 6.95, 7.03, 407600"
)

runs <- roll::roll_lm(
x = as.matrix(ds$x_Close),
y = as.matrix(ds$y_Close),
width = 5,
intercept = FALSE
)

# Nested in a named-column, within a matrix, within a list.
ds$beta <- runs$coefficients[, "x1"]

ds$beta
# [1] NA NA NA NA 0.1224813
# [6] 0.1238653 0.1242478 0.1246279 0.1256553 0.1259121

Double-check the alignment of the variables in your dataset. x_Close is around 50, while y_Close is around 7. That might explain the small disparity between the expected 0.1229065 and the 0.1224813 value above.



Related Topics



Leave a reply



Submit