R Data.Table Sliding Window

R data.table sliding window

data.table doesn't have any special features for rolling windows, currently. Further detail here in my answer to another similar question here :

Is there a fast way to run a rolling regression inside data.table?

Rolling median is interesting. It would need a specialized function to do efficiently (same link as in earlier comment) :

Rolling median algorithm in C

The data.table solutions in the question and answers here are all very inefficient, relative to a proper specialized rollingmedian function (which isn't available for R afaik).

Compute the rolling minimum in data.table with adative window

Here is something quick with fcoalesce():

d[, xx := fcoalesce(frollapply(x, n = window, FUN = min), cummin(x)), by = grp]
# or
d[, xx := fcoalesce(frollapply(x, n = window:1L, FUN = min)), by = grp]

x grp xx
1: 10 a 10
2: 4 a 4
3: 8 a 4
4: 1 a 1
5: 2 a 1
6: 3 a 1
7: 8 b 8
8: 9 b 8
9: 10 b 8

Sliding window on date-time field in data frame R

Here is an option using data.table:

dt[, dayago := date - 24 * 60 * 60]
dt[, c("n", "avg") :=
dt[dt, on=.(customer_id, date>=dayago, date<date),
by=.EACHI, .(n=.N, avg=mean(amount))][, (1L:3L) := NULL]
]

data:

library(data.table)
dt <- data.table(
order_id = 1:10,
customer_id = c(1, rep(2, 2), rep(3, 3), rep(4, 4)),
amount = seq(10, 100, by = 10),
date = as.POSIXct(c("2020-10-07 12:00", # 1st customer
"2020-10-07 12:00", "2020-10-08 11:00", # 2st customer
"2020-10-07 12:00", "2020-10-08 11:00", "2020-10-08 20:00", # 3rd customer
"2020-10-07 12:00", "2020-10-08 11:00", "2020-10-08 20:00", "2020-10-08 21:00" # 4th customer
), format=("%Y-%m-%d %H:%M")))

Rolling sum with varying window of a data.table

Another option using frollsum mentioned by jangorecki in the comments and cumsum from Cole's answer:

sz <- seq(10, 20, 2)
DT[, c(t(outer(names(DT), sz, paste0))) := {

#use frollsum with centering alignment
C <- matrix(unlist(frollsum(.SD, 2L*sz + 1L, align="center")), nrow=.N)

#largest window size
winsz <- 2L*last(sz)+1L

#extract head and tail of data and reverse row order of tail
H <- head(.SD, winsz)
B <- tail(.SD, winsz)[.N:1L]

#calculate sums of those head and tail using frollmean and cumsum
U <- matrix(unlist(frollsum(H, sz+1L, align="left")), nrow=winsz) +
rep(H[, as.matrix(lapply(.SD, cumsum) - .SD)], length(sz))
D <- matrix(unlist(frollsum(B, sz+1L, align="left")), nrow=winsz) +
rep(B[, as.matrix(lapply(.SD, cumsum) - .SD)], length(sz))
D <- D[rev(seq_len(nrow(D))), ]

#update NAs in C with values from U and D
C[is.na(C) & row(C) <= winsz] <- U[is.na(C) & row(C) <= winsz]
C[is.na(C) & row(C) >= .N - winsz] <- D[is.na(C) & row(C) >= .N - winsz]
as.data.table(C)
}]

output:

          money       debt      misc   money10   money12   money14   money16   money18   money20    debt10   debt12   debt14   debt16   debt18    debt20    misc10    misc12    misc14    misc16    misc18   misc20
1: 0.089669720 0.09104731 0.7268889 0.6411836 0.6794367 0.7865494 0.9133034 1.0842559 1.200004 0.8763139 1.041279 1.157053 1.277840 1.436872 1.602857 4.271920 5.814550 7.411962 8.334052 9.779066 10.83659
2: 0.026550866 0.08235301 0.4299947 0.6617810 0.7495166 1.5007527 0.9850653 1.1236370 1.930694 0.9679968 1.103519 1.847868 1.352394 1.507213 2.303379 4.294875 5.968677 7.894473 8.517160 9.815212 11.14334
3: 0.037212390 0.08914664 0.3590845 0.6794367 0.8437291 1.9539665 1.0842559 1.2571837 2.355352 0.9840995 1.157053 2.261323 1.379692 1.602857 2.723974 4.773886 6.428479 8.334052 8.738403 9.853108 12.13639
4: 0.057285336 0.07765182 0.7692328 0.7481390 0.9726475 2.3476005 1.1222594 1.4025885 2.742391 0.9944051 1.212026 2.607193 1.398099 1.667537 3.060532 5.241983 6.641051 9.154379 9.088518 9.889917 12.78821
5: 0.090820779 0.07648598 0.2425576 0.7865494 1.0427839 3.1587385 1.2000039 1.4441692 3.466894 1.0275726 1.277840 3.381874 1.473377 1.740491 3.834656 5.337479 6.389050 9.779066 8.762108 10.191386 13.04075
6: 0.020168193 0.08946781 0.7255652 0.8635335 1.1002110 3.3484789 1.2934745 1.4950018 3.645353 1.0968807 1.353772 3.618287 1.552392 1.807110 4.063629 5.668253 7.043305 10.451054 8.917119 10.677133 13.21366
7: 0.089838968 0.05116656 0.1656073 0.9133034 1.2687012 4.1316204 1.3146887 1.5768569 4.389362 1.0933946 1.436872 4.350028 1.556045 1.889654 4.773653 5.402436 7.031895 10.836591 9.204771 10.293583 13.71787
8: 0.094467527 0.07386150 0.2832141 0.9850653 1.2680323 4.3008593 1.3798561 1.5649066 4.466469 1.2079988 1.507213 4.529149 1.661338 1.952555 4.976409 6.146994 7.589442 11.143338 9.780822 10.352046 14.64574
9: 0.066079779 0.08661569 0.1861392 1.0842559 1.3251708 4.5108201 1.3924116 1.5829119 4.693454 1.3117050 1.602857 4.811455 1.764487 2.026482 5.239974 6.582934 7.765626 12.136388 9.844623 10.646906 15.26456
10: 0.062911404 0.08463658 0.2776479 1.1222594 1.4391772 4.6960468 1.4191337 1.6047869 4.900483 1.3615106 1.667537 4.977598 1.806852 2.114798 5.433264 7.134863 7.972850 12.788207 9.897467 11.475253 16.24193
11: 0.006178627 0.07388098 0.1059877 1.2000039 1.4821167 4.9233389 1.4577451 1.6647508 5.149254 1.6028572 1.740491 5.253153 1.859054 2.169010 5.693229 10.836591 8.772889 13.040754 10.186943 11.901064 16.98713
12: 0.020597457 0.09306047 0.6601738 1.2038047 1.6149864 5.0498700 1.4590841 1.8194223 5.297271 1.5764900 1.807110 5.348161 1.879667 2.262776 5.817308 10.416449 9.392601 13.213658 11.015005 12.846320 17.37602
13: 0.017655675 0.07190486 0.8824558 1.1984681 1.3924116 5.7280578 1.4973229 1.9259202 5.996804 1.5670903 1.889654 5.989201 1.861417 2.329730 6.451755 10.979504 13.040754 13.717870 10.994251 13.024409 17.83592
14: 0.068702285 0.06223986 0.7899689 1.2264231 1.3294640 6.5941969 1.5842920 2.0283774 6.910958 1.5445634 1.861507 6.888068 1.900933 2.421702 7.328995 11.272238 12.486769 14.645741 11.106813 12.602749 18.02672
15: 0.038410372 0.05353395 0.8074434 1.1816933 1.3415245 1.4973229 1.6183269 2.0818716 7.650847 1.5494551 1.853082 2.169010 1.974350 2.489036 8.130541 10.755553 12.560987 15.264564 11.130749 12.334920 18.08914
16: 0.076984142 0.05497331 0.4825107 1.1175946 1.3056511 1.4946223 1.6665349 2.1463493 8.502617 1.5358700 1.852251 2.171729 2.051198 2.555725 8.979061 10.685899 13.129773 15.515037 10.750607 11.771812 18.81893
17: 0.049769924 0.06581359 0.4395799 1.1360378 1.2866046 1.5021063 1.7264915 2.1429602 8.974990 1.5203294 1.828811 2.156329 2.489036 2.629542 9.499815 10.464546 12.979363 15.830245 17.835919 11.406698 18.96696
18: 0.071761851 0.07593171 0.8203267 1.0475379 1.2827529 1.5131019 1.6861759 2.2417412 9.444224 1.5574784 1.846091 2.159155 2.464677 2.724152 9.943081 11.226810 13.714167 15.860051 17.299831 11.762719 19.44093
19: 0.099190609 0.08310025 0.6246866 0.9913091 1.2966196 1.5157732 1.6782468 1.9440514 10.203584 1.5378292 1.823577 2.148837 2.456142 2.817369 10.776342 11.562419 13.733805 15.550718 16.932260 18.966957 20.09902
20: 0.038003518 0.07034151 0.6719877 1.0121984 1.2549887 1.4743065 1.7237717 1.9338057 10.801451 1.5449796 1.864382 2.139040 2.461605 2.795821 11.415959 12.353642 13.957088 15.498962 17.302964 18.714038 21.09127
21: 0.077744522 0.09564380 0.3855374 0.9833219 1.2204777 1.4727601 1.7333331 1.9180492 2.147768 1.5272967 1.857855 2.123390 2.477170 2.802334 3.145498 12.821197 14.133774 14.835820 16.681756 18.942139 21.61208
22: 0.093470523 0.06468017 0.3067471 1.0253513 1.2037521 1.4656585 1.7219363 1.9532079 11.672550 1.5453877 1.837928 2.166833 2.470184 2.811218 12.045236 13.104099 14.138436 15.400001 16.913168 19.575302 20.88519
23: 0.021214252 0.07295329 0.9930499 1.0647104 1.1594624 1.4380376 1.7125625 1.9370500 10.674164 1.5196614 1.827109 2.186188 2.469582 2.805300 11.042786 12.903826 13.923212 15.264812 16.845699 19.326883 20.45520
24: 0.065167377 0.06661973 0.6518186 1.0964089 1.2360211 1.4513818 1.6950946 2.4167050 10.050300 1.5144453 1.847838 2.169072 2.516446 2.728814 10.368440 12.212171 14.547012 15.552643 17.672339 11.598836 20.09612
25: 0.012555510 0.08254352 0.2525477 1.0463284 1.2822703 1.3992648 1.6417545 2.3265488 9.753023 1.5260230 1.847995 2.173302 2.514318 2.639346 10.120784 11.484626 14.034863 15.933091 17.909938 11.043275 19.32688
26: 0.026722067 0.06290084 0.1729037 1.0906553 1.3440387 1.4654572 1.5756747 2.2005311 9.516020 1.5670990 1.845589 2.197452 1.963393 2.588179 9.934296 11.406970 13.626378 16.819350 10.054008 10.987671 19.08433
27: 0.038611409 0.07392726 0.5042121 1.0805179 1.2861307 1.4859872 1.5127633 2.0707477 9.030251 1.6053427 1.872215 2.176124 1.914072 2.514318 9.414543 11.072484 13.494505 16.679990 10.241961 11.134741 18.35876
28: 0.001339033 0.08831553 0.9278707 1.1101719 1.3200915 7.7459901 1.5065847 1.9176914 8.013881 1.6090285 1.916712 8.039861 1.927168 2.427702 8.523821 11.106875 13.679308 15.797534 11.062505 11.788157 18.19315
29: 0.038238796 0.05421235 0.6188229 1.0492044 1.2842348 7.0967037 1.4859872 1.8388064 7.338829 1.6219629 1.933472 7.413011 1.850081 2.343065 7.885349 10.944644 13.717611 15.007565 10.982915 12.075120 17.90994
30: 0.086969085 0.09376607 0.9773622 1.0223849 1.5537110 6.1679003 1.4683316 1.8258308 6.382356 1.6368935 1.867658 6.475881 1.784973 2.269184 6.915137 11.312204 8.790596 14.200122 10.990853 12.852728 17.72380
31: 0.034034900 0.06695365 0.7452029 1.0255088 1.4490304 5.3797481 1.3996293 1.7723146 5.608277 1.6382850 1.791727 5.742658 1.755652 2.176124 6.152251 11.161030 8.648518 13.717611 10.912052 12.870804 17.44615
32: 0.048208012 0.09197202 0.3888906 0.9477643 1.3060758 4.9892956 1.3612189 1.7108949 5.261416 1.3000778 1.708626 5.379926 1.745882 2.104219 5.781452 7.020662 8.320750 13.278031 10.445291 12.285267 17.34016
33: 0.059956583 0.06733417 0.4599000 0.8542938 1.2606947 4.5175904 1.2842348 1.6348151 4.840875 1.2427752 1.638285 4.911428 1.698286 2.041979 5.295826 7.113858 8.041328 12.457704 10.362724 11.887864 16.67999
34: 0.049354131 0.06668875 0.1908010 0.8330796 1.1656155 4.2769529 1.2344648 1.5790701 4.681772 1.1871565 1.542641 4.704216 1.649807 1.988445 5.099809 6.262255 7.779903 11.833018 10.064591 11.204532 15.79753
35: 0.018621760 0.07381756 0.0624237 0.7679122 1.0169492 4.1951475 1.1627030 1.4468901 4.569268 1.1757326 1.477961 4.645268 1.629071 1.933472 5.048963 5.654238 7.461762 11.161030 9.288066 10.710628 15.00757
36: 0.082737332 0.09460992 0.7297878 0.7553567 0.9838624 3.4703525 1.0635124 1.3852476 3.883807 1.1050617 1.405008 3.914447 1.557844 1.867658 4.360251 6.048741 7.103890 10.775493 9.310430 10.906226 14.20012
37: 0.066846674 0.09321697 0.1480250 0.7286346 0.8923247 3.2957036 1.0255088 1.2871155 3.725645 1.0685311 1.338388 3.794958 1.513872 1.791727 4.250469 5.957016 6.506880 10.468746 8.719620 10.140707 13.71761
38: 0.079423986 0.06949948 0.4739701 0.6900232 0.8896937 2.8799432 0.9477643 1.1978494 3.281329 0.9846794 1.255844 3.317534 1.408304 1.708626 3.780185 5.847350 6.658803 9.475696 8.728629 9.920491 13.27803
39: 0.010794363 0.08886603 0.6580960 0.6886842 0.7848999 2.1674742 0.8542938 1.0817742 2.562265 0.9744355 1.192944 2.681685 1.421696 1.638285 3.135023 5.566781 7.055129 8.823877 9.069183 9.817733 12.45770
40: 0.072371095 0.09803090 0.9922467 0.6504454 0.7206287 1.2350431 0.8330796 0.9783699 1.543199 0.9458830 1.119016 1.704925 1.374402 1.542641 2.157707 5.867833 7.445133 8.571329 8.996009 10.326412 11.83302
41: 0.041127443 0.07173297 0.5208139 0.5634763 0.6886842 0.7286346 0.7679122 0.8542938 1.025509 0.8827224 1.030701 1.192944 1.338388 1.477961 1.638285 5.370158 6.966343 8.398426 8.823877 10.468746 11.16103
money debt misc money10 money12 money14 money16 money18 money20 debt10 debt12 debt14 debt16 debt18 debt20 misc10 misc12 misc14 misc16 misc18 misc20

data:

library(data.table)
N <- 41
set.seed(0)
data <- data.table(money=runif(N, min=0, max=.1),
debt=runif(N, min=.05, max=.1),
misc = runif(N, min=.05, max=1))
DT <- copy(data)

Simple moving average (partial window) of a vector using data.table in R

You can use zoo's rollapplyr function with partial = TRUE.

zoo::rollapplyr(x, 5, mean, partial = TRUE)
#[1] 14.24 14.03 13.60 13.43 13.33 13.09 13.15 13.86 15.40 16.34

data.table rolling average timestamp window

Another option non-equi join in data.table:

DT[, posix_dt := as.POSIXct(posix_dt, format="%Y-%m-%d %T")]
DT[, c("start", "end") := .(posix_dt - 2*60, posix_dt)]
DT[, c("rm_sentiment", "rm_score") :=
.SD[.SD, on=.(posix_dt>=start, posix_dt<=end),
by=.EACHI, lapply(.SD, mean), .SDcols=c("sentiment", "score")][,
(1L:2L) := NULL]
]

output:

               posix_dt sentiment score               start                 end rm_sentiment rm_score
1: 2019-11-02 08:45:06 0.0000 2 2019-11-02 08:43:06 2019-11-02 08:45:06 0.00000000 2.000000
2: 2019-11-02 08:45:07 0.0000 5 2019-11-02 08:43:07 2019-11-02 08:45:07 0.00000000 3.500000
3: 2019-11-02 08:45:08 0.0201 4 2019-11-02 08:43:08 2019-11-02 08:45:08 0.00670000 3.666667
4: 2019-11-02 08:45:14 0.2732 7 2019-11-02 08:43:14 2019-11-02 08:45:14 0.07332500 4.500000
5: 2019-11-02 08:45:25 0.0000 3 2019-11-02 08:43:25 2019-11-02 08:45:25 0.05866000 4.200000
6: 2019-11-02 08:45:35 0.3182 16 2019-11-02 08:43:35 2019-11-02 08:45:35 0.10191667 6.166667
7: 2019-11-02 08:45:48 0.0000 3 2019-11-02 08:43:48 2019-11-02 08:45:48 0.08735714 5.714286
8: 2019-11-02 08:45:53 -0.3582 6 2019-11-02 08:43:53 2019-11-02 08:45:53 0.03166250 5.750000
9: 2019-11-02 08:46:00 0.4003 6 2019-11-02 08:44:00 2019-11-02 08:46:00 0.06536000 5.900000
10: 2019-11-02 08:46:00 0.0000 7 2019-11-02 08:44:00 2019-11-02 08:46:00 0.06536000 5.900000
11: 2019-11-02 08:46:04 0.0000 4 2019-11-02 08:44:04 2019-11-02 08:46:04 0.05941818 5.727273
12: 2019-11-02 08:46:07 0.0000 2 2019-11-02 08:44:07 2019-11-02 08:46:07 0.05446667 5.416667
13: 2019-11-02 08:46:16 0.4939 0 2019-11-02 08:44:16 2019-11-02 08:46:16 0.08826923 5.000000
14: 2019-11-02 08:46:19 0.0000 2 2019-11-02 08:44:19 2019-11-02 08:46:19 0.08196429 4.785714
15: 2019-11-02 08:46:32 -0.5267 2 2019-11-02 08:44:32 2019-11-02 08:46:32 0.04138667 4.600000
16: 2019-11-02 08:46:49 0.2960 0 2019-11-02 08:44:49 2019-11-02 08:46:49 0.05730000 4.312500
17: 2019-11-02 08:47:05 0.9753 7 2019-11-02 08:45:05 2019-11-02 08:47:05 0.10511667 4.722222
18: 2019-11-02 08:47:05 0.0000 9 2019-11-02 08:45:05 2019-11-02 08:47:05 0.10511667 4.722222
19: 2019-11-02 08:47:07 0.0000 3 2019-11-02 08:45:07 2019-11-02 08:47:07 0.10511667 4.777778
20: 2019-11-02 08:47:10 -0.2960 9 2019-11-02 08:45:10 2019-11-02 08:47:10 0.09270588 5.058824

data:

library(data.table)
DT <- fread("posix_dt,sentiment,score
2019-11-02 08:45:06, 0.0000 , 2
2019-11-02 08:45:07, 0.0000 , 5
2019-11-02 08:45:08, 0.0201 , 4
2019-11-02 08:45:14, 0.2732 , 7
2019-11-02 08:45:25, 0.0000 , 3
2019-11-02 08:45:35, 0.3182 , 16
2019-11-02 08:45:48, 0.0000 , 3
2019-11-02 08:45:53, -0.3582 , 6
2019-11-02 08:46:00, 0.4003 , 6
2019-11-02 08:46:00, 0.0000 , 7
2019-11-02 08:46:04, 0.0000 , 4
2019-11-02 08:46:07, 0.0000 , 2
2019-11-02 08:46:16, 0.4939 , 0
2019-11-02 08:46:19, 0.0000 , 2
2019-11-02 08:46:32, -0.5267 , 2
2019-11-02 08:46:49, 0.2960 , 0
2019-11-02 08:47:05, 0.9753 , 7
2019-11-02 08:47:05, 0.0000 , 9
2019-11-02 08:47:07, 0.0000 , 3
2019-11-02 08:47:10, -0.2960 ,9")

Another approach using rolling join which should be faster:

#because there are duplicate of posix_dt, 
#thats why there is a need to aggregate first to make posix_dt unique
twomins <- 2L * 60L
aggDT <- DT[, c(.(N=.N), lapply(.SD, sum)), .(posix_dt), .SDcols=cols]

#calculate cumulative sums for calculating means later
cols <- c("N", "sentiment", "score")
aggDT[, c("start", paste0("cs_", cols)) :=
c(.(posix_dt - twomins), lapply(.SD, cumsum)), .SDcols=cols]

#performing rolling join to find first timing that is >= time 2 minutes ago
#for current row
newcols <- c("rm_sentiment", "rm_score")
aggDT[, (newcols) := aggDT[aggDT, on=.(posix_dt=start), roll=-twomins,
.((i.cs_sentiment - x.cs_sentiment + x.sentiment) / (i.cs_N - x.cs_N + x.N),
(i.cs_score - x.cs_score + x.score) / (i.cs_N - x.cs_N + x.N))]
]

#lookup mean values into original DT using update join
DT[aggDT, on=.(posix_dt), paste0(newcols,"2") := mget(paste0("i.", newcols))]
DT

output:

               posix_dt sentiment score               start                 end rm_sentiment rm_score rm_sentiment2 rm_score2
1: 2019-11-02 08:45:06 0.0000 2 2019-11-02 08:43:06 2019-11-02 08:45:06 0.00000000 2.000000 0.00000000 2.000000
2: 2019-11-02 08:45:07 0.0000 5 2019-11-02 08:43:07 2019-11-02 08:45:07 0.00000000 3.500000 0.00000000 3.500000
3: 2019-11-02 08:45:08 0.0201 4 2019-11-02 08:43:08 2019-11-02 08:45:08 0.00670000 3.666667 0.00670000 3.666667
4: 2019-11-02 08:45:14 0.2732 7 2019-11-02 08:43:14 2019-11-02 08:45:14 0.07332500 4.500000 0.07332500 4.500000
5: 2019-11-02 08:45:25 0.0000 3 2019-11-02 08:43:25 2019-11-02 08:45:25 0.05866000 4.200000 0.05866000 4.200000
6: 2019-11-02 08:45:35 0.3182 16 2019-11-02 08:43:35 2019-11-02 08:45:35 0.10191667 6.166667 0.10191667 6.166667
7: 2019-11-02 08:45:48 0.0000 3 2019-11-02 08:43:48 2019-11-02 08:45:48 0.08735714 5.714286 0.08735714 5.714286
8: 2019-11-02 08:45:53 -0.3582 6 2019-11-02 08:43:53 2019-11-02 08:45:53 0.03166250 5.750000 0.03166250 5.750000
9: 2019-11-02 08:46:00 0.4003 6 2019-11-02 08:44:00 2019-11-02 08:46:00 0.06536000 5.900000 0.06536000 5.900000
10: 2019-11-02 08:46:00 0.0000 7 2019-11-02 08:44:00 2019-11-02 08:46:00 0.06536000 5.900000 0.06536000 5.900000
11: 2019-11-02 08:46:04 0.0000 4 2019-11-02 08:44:04 2019-11-02 08:46:04 0.05941818 5.727273 0.05941818 5.727273
12: 2019-11-02 08:46:07 0.0000 2 2019-11-02 08:44:07 2019-11-02 08:46:07 0.05446667 5.416667 0.05446667 5.416667
13: 2019-11-02 08:46:16 0.4939 0 2019-11-02 08:44:16 2019-11-02 08:46:16 0.08826923 5.000000 0.08826923 5.000000
14: 2019-11-02 08:46:19 0.0000 2 2019-11-02 08:44:19 2019-11-02 08:46:19 0.08196429 4.785714 0.08196429 4.785714
15: 2019-11-02 08:46:32 -0.5267 2 2019-11-02 08:44:32 2019-11-02 08:46:32 0.04138667 4.600000 0.04138667 4.600000
16: 2019-11-02 08:46:49 0.2960 0 2019-11-02 08:44:49 2019-11-02 08:46:49 0.05730000 4.312500 0.05730000 4.312500
17: 2019-11-02 08:47:05 0.9753 7 2019-11-02 08:45:05 2019-11-02 08:47:05 0.10511667 4.722222 0.10511667 4.722222
18: 2019-11-02 08:47:05 0.0000 9 2019-11-02 08:45:05 2019-11-02 08:47:05 0.10511667 4.722222 0.10511667 4.722222
19: 2019-11-02 08:47:07 0.0000 3 2019-11-02 08:45:07 2019-11-02 08:47:07 0.10511667 4.777778 0.10511667 4.777778
20: 2019-11-02 08:47:10 -0.2960 9 2019-11-02 08:45:10 2019-11-02 08:47:10 0.09270588 5.058824 0.09270588 5.058824

R: fast sliding window with given coordinates

Data generation:

N <- 1e5 # rows
M <- 200 # columns
W <- 10 # window size

set.seed(1)
intensities <- matrix(rnorm(N*M), nrow=N, ncol=M)
coords <- 8000000 + sort(sample(1:(5*N), N))

Original function with minor modifications I used for benchmarks:

doSlidingWindow <- function(intensities, coords, windsize) {
windHalfSize <- ceiling(windsize/2)
### whole range inds
RANGE <- integer(max(coords)+windsize)
RANGE[coords] <- c(1:length(coords)[1])

### get indices of rows falling in each window
### NOTE: Each elements of WINDOWINDS holds zero. Not a big problem though.
WINDOWINDS <- sapply(coords, function(crds) ret <- unique(RANGE[(crds-windHalfSize):(crds+windHalfSize)]))

### do windowing
wind_ints <- intensities
wind_ints[] <- 0
for(i in 1:length(coords)) {
# CORRECTION: When it's only one row in window there was a trouble
wind_ints[i,] <- apply(matrix(intensities[WINDOWINDS[[i]],], ncol=ncol(intensities)), 2, mean)
}
return(wind_ints)
}

POSSIBLE SOLUTIONS:


1) data.table

data.table is known to be fast with subsetting, but this page (and other related to sliding window) suggests, that this is not the case. Indeed, data.table code is elegant, but unfortunately very slow:

require(data.table)
require(plyr)
dt <- data.table(coords, intensities)
setkey(dt, coords)
aaply(1:N, 1, function(i) dt[WINDOWINDS[[i]], sapply(.SD,mean), .SDcols=2:(M+1)])

2) foreach+doSNOW

Basic routine is easy to run in parallel, so, we can benefit from it:

require(doSNOW)
doSlidingWindow2 <- function(intensities, coords, windsize) {
NC <- 2 # number of nodes in cluster
cl <- makeCluster(rep("localhost", NC), type="SOCK")
registerDoSNOW(cl)

N <- ncol(intensities) # total number of columns
chunk <- ceiling(N/NC) # number of columns send to the single node

result <- foreach(i=1:NC, .combine=cbind, .export=c("doSlidingWindow")) %dopar% {
start <- (i-1)*chunk+1
end <- ifelse(i!=NC, i*chunk, N)
doSlidingWindow(intensities[,start:end], coords, windsize)
}

stopCluster(cl)
return (result)
}

Benchmark shows notable speed-up on my Dual-Core processor:

system.time(res <- doSlidingWindow(intensities, coords, W))
# user system elapsed
# 306.259 0.204 307.770
system.time(res2 <- doSlidingWindow2(intensities, coords, W))
# user system elapsed
# 1.377 1.364 177.223
all.equal(res, res2, check.attributes=FALSE)
# [1] TRUE

3) Rcpp

Yes, I know you asked "without going to C". But, please, take a look. This code is inline and rather straightforward:

require(Rcpp)
require(inline)
doSlidingWindow3 <- cxxfunction(signature(intens="matrix", crds="numeric", wsize="numeric"), plugin="Rcpp", body='
#include <vector>
Rcpp::NumericMatrix intensities(intens);
const int N = intensities.nrow();
const int M = intensities.ncol();
Rcpp::NumericMatrix wind_ints(N, M);

std::vector<int> coords = as< std::vector<int> >(crds);
int windsize = ceil(as<double>(wsize)/2);

for(int i=0; i<N; i++){
// Simple search for window range (begin:end in coords)
// Assumed that coords are non-decreasing
int begin = (i-windsize)<0?0:(i-windsize);
while(coords[begin]<(coords[i]-windsize)) ++begin;
int end = (i+windsize)>(N-1)?(N-1):(i+windsize);
while(coords[end]>(coords[i]+windsize)) --end;

for(int j=0; j<M; j++){
double result = 0.0;
for(int k=begin; k<=end; k++){
result += intensities(k,j);
}
wind_ints(i,j) = result/(end-begin+1);
}
}

return wind_ints;
')

Benchmark:

system.time(res <- doSlidingWindow(intensities, coords, W))
# user system elapsed
# 306.259 0.204 307.770
system.time(res3 <- doSlidingWindow3(intensities, coords, W))
# user system elapsed
# 0.328 0.020 0.351
all.equal(res, res3, check.attributes=FALSE)
# [1] TRUE

I hope results are quite motivating. While data fits in memory Rcpp version is pretty fast. Say, with N <- 1e6 and M <-100 I got:

   user  system elapsed 
2.873 0.076 2.951

Naturally, after R starts using swap everything slows down. With really large data that doesn't fit in memory you should consider sqldf, ff or bigmemory.

R - return datatable row number of max or min value in sliding window

I don't understand the purpose of your code, why do you need moving window aggregates? Maybe there is a data structure more suitable for your problem. However, using given data, I suggest the following:

"NegativeChange" is the minimum deviance from average in a given interval. This is per definition the minimum value per interval.
You are looking for minmal (maximal) values in an moving window. package RcppRoll provieds usefull functions for this task:

library(RcppRoll)
DATAFRAME2$min_Average = roll_minl(Average, 90)
DATAFRAME2$max_Average = roll_maxl(Average, 90)

In your next step you are trying to get the row number (or the position in the interval?) of min/max value per interval
If you need this information you probably have to use a loop.

#Calculate row averages, adding an "Average" column to the data set
DATAFRAME2 <- DATAFRAME[, .(Average = rowMeans(.SD, na.rm = TRUE)), "V1"]

# calculate min/max of rolling Window
for (i in 1:nrow(DATAFRAME2)) {
j = min(i+90, nrow(DATAFRAME2)) # upper bound of window
DATAFRAME2$min_Average[i] = min(DATAFRAME2$Average[i:j])
DATAFRAME2$pos_min_Average[i] = (i-1) + which.min(DATAFRAME2$Average[i:j])
DATAFRAME2$max_Average[i] = max(DATAFRAME2$Average[i:j])
DATAFRAME2$pos_max_Average[i] = (i-1) + which.max(DATAFRAME2$Average[i:j])
}


Related Topics



Leave a reply



Submit