Rolling Sum by Another Variable in R

Rolling sum of one variable in data.frame in number of steps defined by another variable

I'm not sure if having variable window size is possible in any of the rolling function. Here is one way to do this using map2_dbl :

library(dplyr)
df %>%
group_by(a) %>%
mutate(sum_b_step_window = purrr::map2_dbl(row_number(), step_window,
~sum(b[.x:(.x + .y - 1)], na.rm = TRUE)))

# a b step_window sum_b_step_window
# <int> <int> <dbl> <dbl>
# 1 1 1 2 3
# 2 1 2 3 9
# 3 1 3 1 3
# 4 1 4 2 9
# 5 1 5 4 5
# 6 2 6 1 6
# 7 2 7 2 15
# 8 2 8 3 27
# 9 2 9 2 19
#10 2 10 1 10

how to make a rolling sum (or rolling average) with multiple variables

I did it using dplyr::lag()

library(dplyr)
library(tibble)

## Data
data <- tribble(
~Date, ~Prod, ~Amount,
"2010-01-28", "Corn", 1,
"2010-01-28", "Potato", 2,
"2010-02-28", "Corn", 3,
"2010-02-28", "Potato", 4,
"2010-03-28", "Corn", 5,
"2010-03-28", "Potato", 6,
"2010-04-28", "Corn", 7,
"2010-04-28", "Potato", 8
)

# Code

data %>%
group_by(Prod) %>%
mutate(cum_amount = Amount + lag(Amount, 1) + lag(Amount, 2)) %>%
filter(!is.na(cum_amount))


# A tibble: 4 x 4
# Groups: Prod [2]
Date Prod Amount cum_amount
<chr> <chr> <dbl> <dbl>
1 2010-03-28 Corn 5 9
2 2010-03-28 Potato 6 12
3 2010-04-28 Corn 7 15
4 2010-04-28 Potato 8 18

Update in order to your comment

data %>% 
group_by(Prod) %>%
mutate(cum_amount = c(rep(NA, 2), zoo::rollsum(Amount, 3))) %>%
filter(!is.na(cum_amount))

PS: Remember to include the R tag in your questions

R - Rolling sum based on dates, with a condition per group

This should work:

library(sqldf)
library(dplyr)
library(data.table)

# Creating a new 'row column'
data$row_n <- 1:nrow(data)

# Identifying which lines overlap and then filtering data
data <- sqldf("select a.*,
coalesce(group_concat(b.rowid), '') as overlaps
from data a
left join data b on a.id = b.id and
not a.rowid = b.rowid and
((a.start_date_event between
b.start_date_event and b.end_interval) or
(b.start_date_event between a.start_date_event
and a.end_interval))
group by a.rowid
order by a.rowid") %>%
group_by(id) %>%
mutate(row_n = as.character(row_n),
previous_row = dplyr::lag(row_n, n = 1, default = NA),
previous_value = dplyr::lag(sum_days, n = 1, default = NA),
sum2 = case_when(mapply(grepl,previous_row, overlaps) == TRUE &
previous_value > 15 ~ as.integer(0),
TRUE ~ sum_days),
previous_value = dplyr::lag(sum2, n = 1, default = NA),
sum2 = case_when(mapply(grepl,previous_row, overlaps) == TRUE &
previous_value > 15 ~ as.integer(0),
TRUE ~ sum_days)) %>%
dplyr::select(-c(previous_value, previous_row, row_n))

How to get tally (rolling sum) by group in R?

We can use cumsum after grouping by 'type'

library(dplyr)
df <- df %>%
group_by(type) %>%
mutate(tally = cumsum(count)) %>%
ungroup

-output

# A tibble: 6 x 4
type rank count tally
<chr> <chr> <dbl> <dbl>
1 A low 9 9
2 A med 20 29
3 A high 31 60
4 B med 2 2
5 B high 4 6
6 C low 14 14

Cumulative sum in r based on another column excluding the current value for more than one column

Try this:

library(data.table)
nms <- c("categorical_variable", "categorical_variable_2")
df[, paste0(nms, "_transformed") :=
lapply(nms, \(g) ave(target_variable, get(g), FUN = cumsum) - target_variable)]
df
# categorical_variable categorical_variable_2 target_variable categorical_variable_transformed categorical_variable_2_transformed
# <char> <char> <num> <num> <num>
# 1: rock blue 0 0 0
# 2: indie green 0 0 0
# 3: rock red 1 0 0
# 4: rock red 1 1 1
# 5: pop blue 1 0 0
# 6: indie green 1 0 0
# 7: rock blue 0 2 1

Rolling sum based on date variable

We convert the date to actual date object, group_by id and for each date find out the dates in the group which lie in last 3 months and take sum of their respective value.

library(dplyr)

df %>%
mutate(date = as.Date(date, "%d/%m/%Y")) %>%
group_by(id) %>%
mutate(ans = purrr::map_dbl(date, ~ sum(value[
date > (. - months(3)) & date <= .])))


# id date value expected ans
# <dbl> <date> <dbl> <dbl> <dbl>
# 1 1 2015-01-01 106 106 106
# 2 1 2015-02-01 57 163 163
# 3 1 2015-02-15 43 206 206
# 4 1 2015-03-01 105 311 311
# 5 1 2015-04-01 180 385 385
# 6 2 2015-01-01 109 109 109
# 7 2 2015-02-01 36 145 145
# 8 2 2015-02-15 102 247 247
# 9 2 2015-02-20 97 344 344
#10 2 2015-03-01 97 441 441
# … with 11 more rows

R function that can sum of every x values, overlapping the data for each calculation?

You can use rollsum function from zoo to calculate rolling sum. Based on where you want to put the answer you can align the output accordingly.

For example, right align - when you want to sum of position 1, 2, 3, 4 and 5 should be put at position 5.

library(dplyr)
library(zoo)

df %>% mutate(roll_sum = rollsumr(Value, 5, fill = NA))

# Time Value roll_sum
#1 1:00 5.5 NA
#2 1:05 5.3 NA
#3 1:10 5.4 NA
#4 1:15 6.3 NA
#5 1:20 4.6 27.1
#6 1:25 6.4 28.0

Left align - when you want to put sum of position 1 to 5 at position 1.

df %>% mutate(roll_sum = rollsum(Value, 5, fill = NA, align = 'left'))

# Time Value roll_sum
#1 1:00 5.5 27.1
#2 1:05 5.3 28.0
#3 1:10 5.4 NA
#4 1:15 6.3 NA
#5 1:20 4.6 NA
#6 1:25 6.4 NA

Center align - When you want to sum of position 1 to 5 to be included at position 3.

df %>% mutate(roll_sum = rollsum(Value, 5, fill = NA))

# Time Value roll_sum
#1 1:00 5.5 NA
#2 1:05 5.3 NA
#3 1:10 5.4 27.1
#4 1:15 6.3 28.0
#5 1:20 4.6 NA
#6 1:25 6.4 NA

Note that in all the case the numbers are the same but the position at which answers are placed differs. If you store the output of above operation in result, you can get minimum value by doing min(result$roll_sum, na.rm = TRUE).

Generating a moving sum variable in R

You can use filter in ddply (or any other function implementing the "split-apply-combine" approach):

library(plyr)
ddply(DF, .(country), transform,
x5yrsum2 = as.numeric(filter(x,c(0,rep(1,5)),sides=1)))

# country year x x5yrsum x5yrsum2
# 1 A 1980 9 NA NA
# 2 A 1981 3 NA NA
# 3 A 1982 5 NA NA
# 4 A 1983 6 NA NA
# 5 A 1984 9 NA NA
# 6 A 1985 7 32 32
# 7 A 1986 9 30 30
# 8 A 1987 4 36 36
# 9 B 1990 0 NA NA
# 10 B 1991 4 NA NA
# 11 B 1992 2 NA NA
# 12 B 1993 6 NA NA
# 13 B 1994 3 NA NA
# 14 B 1995 7 15 15
# 15 B 1996 0 22 22


Related Topics



Leave a reply



Submit