R Dplyr Rolling Sum

R dplyr rolling sum

You can instead use RcppRoll::roll_sum which returns NA if the sample size(n) is less than the window size(k).

set.seed(1)
dg$count = rpois(dim(dg)[1], 5)
library(RcppRoll)
library(dplyr)
dg %>%
arrange(site,year,animal) %>%
group_by(site, animal) %>%
mutate(roll_sum = roll_sum(count, 2, align = "right", fill = NA))
# site year animal count roll_sum
#1 Boston 2000 dog 4 NA
#2 Boston 2001 dog 5 9
#3 Boston 2002 dog 3 8
#4 Boston 2003 dog 9 12
#5 Boston 2004 dog 6 15
#6 New York 2000 dog 4 NA
#7 New York 2001 dog 8 12
#8 New York 2002 dog 8 16
#9 New York 2003 dog 6 14
#10 New York 2004 cat 2 NA

Rolling sum in dplyr

There is the rollify function in the tibbletime package that you could use. You can read about it in this vignette: Rolling calculations in tibbletime.

library(tibbletime)
library(dplyr)
rollig_sum <- rollify(.f = sum, window = 5)

df %>%
group_by(id) %>%
mutate(roll.sum = lag(rollig_sum(x))) #added lag() here
# A tibble: 20 x 3
# Groups: id [2]
# x id roll.sum
# <int> <int> <int>
# 1 3 1 NA
# 2 8 1 NA
# 3 5 1 NA
# 4 9 1 NA
# 5 10 1 NA
# 6 1 1 35
# 7 6 1 33
# 8 9 1 31
# 9 6 1 35
#10 5 1 32
#11 10 2 NA
#12 5 2 NA
#13 7 2 NA
#14 6 2 NA
#15 2 2 NA
#16 9 2 30
#17 3 2 29
#18 1 2 27
#19 4 2 21
#20 10 2 19

If you want the NAs to be some other value, you can use, for example, if_else

df %>% 
group_by(id) %>%
mutate(roll.sum = lag(rollig_sum(x))) %>%
mutate(roll.sum = if_else(is.na(roll.sum), x, roll.sum))

Calculate rolling sum by group

It could be that plyr was also loaded and the mutate from plyr masked the mutate from dplyr. We could use dplyr::mutate

library(dplyr)
library(zoo)
df %>%
group_by(person) %>%
dplyr::mutate(s1_rolling = rollsumr(score1, k = 3, fill = NA),
s2_rolling = rollsumr(score2, k = 3, fill = NA))
# A tibble: 10 x 5
# Groups: person [2]
# person score1 score2 s1_rolling s2_rolling
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Peter 1 1 NA NA
# 2 Peter 3 1 NA NA
# 3 Peter 2 1 6 3
# 4 Peter 5 5 10 7
# 5 Peter 4 1 11 7
# 6 James 6 3 NA NA
# 7 James 8 4 NA NA
# 8 James 4 8 18 15
# 9 James 5 9 17 21
#10 James 3 0 12 17

If there are more than one column, we can also use across

df %>%
group_by(person) %>%
dplyr::mutate(across(starts_with('score'),
~ rollsumr(., k = 3, fill = NA), .names = '{col}_rolling'))

For a faster version, use RcppRoll::roll_sumr

df %>% 
group_by(person) %>%
dplyr::mutate(across(starts_with('score'),
~ RcppRoll::roll_sumr(., 3, fill = NA), .names = '{col}_rolling'))

The behavior can be reproduced with plyr::mutate

df %>% 
group_by(person) %>%
plyr::mutate(s1_rolling = rollsumr(score1, k = 3, fill = NA),
s2_rolling = rollsumr(score2, k = 3, fill = NA))
# A tibble: 10 x 5
# Groups: person [2]
# person score1 score2 s1_rolling s2_rolling
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Peter 1 1 NA NA
# 2 Peter 3 1 NA NA
# 3 Peter 2 1 6 3
# 4 Peter 5 5 10 7
# 5 Peter 4 1 11 7
# 6 James 6 3 15 9
# 7 James 8 4 18 8
# 8 James 4 8 18 15
# 9 James 5 9 17 21
#10 James 3 0 12 17

how to make a rolling sum (or rolling average) with multiple variables

I did it using dplyr::lag()

library(dplyr)
library(tibble)

## Data
data <- tribble(
~Date, ~Prod, ~Amount,
"2010-01-28", "Corn", 1,
"2010-01-28", "Potato", 2,
"2010-02-28", "Corn", 3,
"2010-02-28", "Potato", 4,
"2010-03-28", "Corn", 5,
"2010-03-28", "Potato", 6,
"2010-04-28", "Corn", 7,
"2010-04-28", "Potato", 8
)

# Code

data %>%
group_by(Prod) %>%
mutate(cum_amount = Amount + lag(Amount, 1) + lag(Amount, 2)) %>%
filter(!is.na(cum_amount))


# A tibble: 4 x 4
# Groups: Prod [2]
Date Prod Amount cum_amount
<chr> <chr> <dbl> <dbl>
1 2010-03-28 Corn 5 9
2 2010-03-28 Potato 6 12
3 2010-04-28 Corn 7 15
4 2010-04-28 Potato 8 18

Update in order to your comment

data %>% 
group_by(Prod) %>%
mutate(cum_amount = c(rep(NA, 2), zoo::rollsum(Amount, 3))) %>%
filter(!is.na(cum_amount))

PS: Remember to include the R tag in your questions

Calculate a Rolling Sum of Count in R

For the rollapplyr function in the last line, can you try zoo::rollapplyr:

df %>%
arrange(client, date) %>%
group_by(client) %>%
mutate(roll_sum = zoo::rollapplyr(count, 12, sum, partial=TRUE))

Rolling Sum Dplyr

There are likely dedicated functions, but this seems to work. It gives you some control on how you want it to behave. For example, the default = 0 in the lead function allows it to go to the last record, even though there are no leading values. My bet is that this is relatively slow and inefficient.

library(dplyr)
library(purrr)

rolling_sum <- function(v, window = 1) {

k <- 1:window

vLag <- k %>%
map_dfc(~lag(v, .))

vLead <- k %>%
map_dfc(~lead(v, ., default = 0))

rowSums(bind_cols(vLag, V = v, vLead))

}

df <- data.frame(n = c(1,3,4,5,6,7,9,1,5))

df %>%
mutate(window1 = rolling_sum(n, 1),
window2 = rolling_sum(n, 2))


Related Topics



Leave a reply



Submit