R dplyr rolling sum
You can instead use RcppRoll::roll_sum
which returns NA if the sample size(n
) is less than the window size(k
).
set.seed(1)
dg$count = rpois(dim(dg)[1], 5)
library(RcppRoll)
library(dplyr)
dg %>%
arrange(site,year,animal) %>%
group_by(site, animal) %>%
mutate(roll_sum = roll_sum(count, 2, align = "right", fill = NA))
# site year animal count roll_sum
#1 Boston 2000 dog 4 NA
#2 Boston 2001 dog 5 9
#3 Boston 2002 dog 3 8
#4 Boston 2003 dog 9 12
#5 Boston 2004 dog 6 15
#6 New York 2000 dog 4 NA
#7 New York 2001 dog 8 12
#8 New York 2002 dog 8 16
#9 New York 2003 dog 6 14
#10 New York 2004 cat 2 NA
Rolling sum in dplyr
There is the rollify
function in the tibbletime
package that you could use. You can read about it in this vignette: Rolling calculations in tibbletime.
library(tibbletime)
library(dplyr)
rollig_sum <- rollify(.f = sum, window = 5)
df %>%
group_by(id) %>%
mutate(roll.sum = lag(rollig_sum(x))) #added lag() here
# A tibble: 20 x 3
# Groups: id [2]
# x id roll.sum
# <int> <int> <int>
# 1 3 1 NA
# 2 8 1 NA
# 3 5 1 NA
# 4 9 1 NA
# 5 10 1 NA
# 6 1 1 35
# 7 6 1 33
# 8 9 1 31
# 9 6 1 35
#10 5 1 32
#11 10 2 NA
#12 5 2 NA
#13 7 2 NA
#14 6 2 NA
#15 2 2 NA
#16 9 2 30
#17 3 2 29
#18 1 2 27
#19 4 2 21
#20 10 2 19
If you want the NA
s to be some other value, you can use, for example, if_else
df %>%
group_by(id) %>%
mutate(roll.sum = lag(rollig_sum(x))) %>%
mutate(roll.sum = if_else(is.na(roll.sum), x, roll.sum))
Calculate rolling sum by group
It could be that plyr
was also loaded and the mutate
from plyr
masked the mutate
from dplyr
. We could use dplyr::mutate
library(dplyr)
library(zoo)
df %>%
group_by(person) %>%
dplyr::mutate(s1_rolling = rollsumr(score1, k = 3, fill = NA),
s2_rolling = rollsumr(score2, k = 3, fill = NA))
# A tibble: 10 x 5
# Groups: person [2]
# person score1 score2 s1_rolling s2_rolling
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Peter 1 1 NA NA
# 2 Peter 3 1 NA NA
# 3 Peter 2 1 6 3
# 4 Peter 5 5 10 7
# 5 Peter 4 1 11 7
# 6 James 6 3 NA NA
# 7 James 8 4 NA NA
# 8 James 4 8 18 15
# 9 James 5 9 17 21
#10 James 3 0 12 17
If there are more than one column, we can also use across
df %>%
group_by(person) %>%
dplyr::mutate(across(starts_with('score'),
~ rollsumr(., k = 3, fill = NA), .names = '{col}_rolling'))
For a faster version, use RcppRoll::roll_sumr
df %>%
group_by(person) %>%
dplyr::mutate(across(starts_with('score'),
~ RcppRoll::roll_sumr(., 3, fill = NA), .names = '{col}_rolling'))
The behavior can be reproduced with plyr::mutate
df %>%
group_by(person) %>%
plyr::mutate(s1_rolling = rollsumr(score1, k = 3, fill = NA),
s2_rolling = rollsumr(score2, k = 3, fill = NA))
# A tibble: 10 x 5
# Groups: person [2]
# person score1 score2 s1_rolling s2_rolling
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Peter 1 1 NA NA
# 2 Peter 3 1 NA NA
# 3 Peter 2 1 6 3
# 4 Peter 5 5 10 7
# 5 Peter 4 1 11 7
# 6 James 6 3 15 9
# 7 James 8 4 18 8
# 8 James 4 8 18 15
# 9 James 5 9 17 21
#10 James 3 0 12 17
how to make a rolling sum (or rolling average) with multiple variables
I did it using dplyr::lag()
library(dplyr)
library(tibble)
## Data
data <- tribble(
~Date, ~Prod, ~Amount,
"2010-01-28", "Corn", 1,
"2010-01-28", "Potato", 2,
"2010-02-28", "Corn", 3,
"2010-02-28", "Potato", 4,
"2010-03-28", "Corn", 5,
"2010-03-28", "Potato", 6,
"2010-04-28", "Corn", 7,
"2010-04-28", "Potato", 8
)
# Code
data %>%
group_by(Prod) %>%
mutate(cum_amount = Amount + lag(Amount, 1) + lag(Amount, 2)) %>%
filter(!is.na(cum_amount))
# A tibble: 4 x 4
# Groups: Prod [2]
Date Prod Amount cum_amount
<chr> <chr> <dbl> <dbl>
1 2010-03-28 Corn 5 9
2 2010-03-28 Potato 6 12
3 2010-04-28 Corn 7 15
4 2010-04-28 Potato 8 18
Update in order to your comment
data %>%
group_by(Prod) %>%
mutate(cum_amount = c(rep(NA, 2), zoo::rollsum(Amount, 3))) %>%
filter(!is.na(cum_amount))
PS: Remember to include the R tag in your questions
Calculate a Rolling Sum of Count in R
For the rollapplyr
function in the last line, can you try zoo::rollapplyr
:
df %>%
arrange(client, date) %>%
group_by(client) %>%
mutate(roll_sum = zoo::rollapplyr(count, 12, sum, partial=TRUE))
Rolling Sum Dplyr
There are likely dedicated functions, but this seems to work. It gives you some control on how you want it to behave. For example, the default = 0
in the lead function allows it to go to the last record, even though there are no leading values. My bet is that this is relatively slow and inefficient.
library(dplyr)
library(purrr)
rolling_sum <- function(v, window = 1) {
k <- 1:window
vLag <- k %>%
map_dfc(~lag(v, .))
vLead <- k %>%
map_dfc(~lead(v, ., default = 0))
rowSums(bind_cols(vLag, V = v, vLead))
}
df <- data.frame(n = c(1,3,4,5,6,7,9,1,5))
df %>%
mutate(window1 = rolling_sum(n, 1),
window2 = rolling_sum(n, 2))
Related Topics
"Correct" Way to Specifiy Optional Arguments in R Functions
How to Make Geom_Text Plot Within the Canvas's Bounds
Using Multiple Criteria in Subset Function and Logical Operators
Remove Ids That Occur X Times R
How to Convert R Markdown to HTML? I.E., What Does "Knit HTML" Do in Rstudio 0.96
R - Use Rbind on Multiple Variables with Similar Names
How to Check If Object (Variable) Is Defined in R
Load Multiple Packages at Once
Adding Space Between Bars in Ggplot2
Collapsing Data Frame by Selecting One Row Per Group
R Compare Multiple Values with Vector and Return Vector
Extract Every Nth Element of a Vector
Line Break When No Data in Ggplot2