Cumulative sum in a window (or running window sum) based on a condition in R
1) rollapply Create a Sum
function which takes FY
and Rev
as a 2 column matrix (or if not makes it one) and then sums the revenues for those years within k
of the last year. Then convert DFI
to a data table, sum rows having the same Customer/Product/Year and run rollapplyr
with Sum
for each Customer/Product group.
library(data.table)
library(zoo)
k <- 5
Sum <- function(x) {
x <- matrix(x,, 2)
FY <- x[, 1]
Rev <- x[, 2]
ok <- FY >= tail(FY, 1) - k + 1
sum(Rev[ok])
}
DT <- as.data.table(DFI)
DT <- DT[, list(Rev = sum(Rev)), by = c("Customer", "Product", "FY")]
DT[, cumsum := rollapplyr(.SD, k, Sum, by.column = FALSE, partial = TRUE),
by = c("Customer", "Product"), .SDcols = c("FY", "Rev")]
giving:
> DT
Customer Product FY Rev cumsum
1: 13575 A 2011 4 4
2: 13575 A 2012 3 7
3: 13575 A 2013 3 10
4: 13575 A 2015 1 11
5: 13575 A 2016 2 9
6: 13575 B 2011 3 3
7: 13575 B 2012 3 6
8: 13575 B 2013 4 10
9: 13575 B 2014 5 15
10: 13575 B 2015 6 21
11: 13578 A 2010 3 3
12: 13578 A 2016 2 2
13: 13578 B 2013 2 2
14: 13578 C 2014 4 4
15: 13578 D 2015 2 2
16: 13578 E 2010 2 2
2) data.table only
First sum rows that have the same Customer/Product/FY and then, grouping by Customer/Product, for each FY value, fy
, pick out the Rev
values whose FY values are between fy-k+1
and fy
and sum.
library(data.table)
k <- 5
DT <- as.data.table(DFI)
DT <- DT[, list(Rev = sum(Rev)), by = c("Customer", "Product", "FY")]
DT[, cumsum := sapply(FY, function(fy) sum(Rev[between(FY, fy-k+1, fy)])),
by = c("Customer", "Product")]
giving:
> DT
Customer Product FY Rev cumsum
1: 13575 A 2011 4 4
2: 13575 A 2012 3 7
3: 13575 A 2013 3 10
4: 13575 A 2015 1 11
5: 13575 A 2016 2 9
6: 13575 B 2011 3 3
7: 13575 B 2012 3 6
8: 13575 B 2013 4 10
9: 13575 B 2014 5 15
10: 13575 B 2015 6 21
11: 13578 A 2010 3 3
12: 13578 A 2016 2 2
13: 13578 B 2013 2 2
14: 13578 C 2014 4 4
15: 13578 D 2015 2 2
16: 13578 E 2010 2 2
Cumulative sum (or running-window sum) and carry-forward values in a window based on a condition in R
Explanation inline. Uses @G.Grothendieck's Sum function and also his application of his zoo::rollapplyr
from Cumulative sum in a window (or running window sum) based on a condition in R
k <- 2
Sum <- function(x) {
x <- matrix(x,, 2)
FY <- x[, 1]
Rev <- x[, 2]
ok <- FY >= tail(FY, 1) - k + 1
sum(Rev[ok])
}
setDT(DFI)
#This is prob the only difference from your solution
#create a combination of year to year + k for each Customer and product.
#Then subset to remove future years
combis <- unique(rbindlist(lapply(seq_len(k),
function(n) unique(DFI[, .(Year=Year+n-1, Customer, Product)]))))[
Year <= DFI[,max(Year)]]
#lookup revenue
out <- DFI[combis, on=.(Year, Customer, Product)][,
Rev := ifelse(is.na(Rev), 0, Rev)]
#order before summing
setorder(out, Customer,Product,Year)
out[,CumRev := zoo::rollapplyr(.SD, k, Sum, by.column = FALSE, partial = TRUE),
by = c("Customer", "Product"), .SDcols = c("Year", "Rev")][]
Cumulative Sum in R based on Date and other conditions using data.table
Here's one implementation, just data.table
and base R:
dat[, z := sapply(Date, function(z) sum(between(z - Date, 0.1, 30)))]
dat
# Code Team Date Day Date...30d Games.played.over.the.last.30.days z
# <char> <char> <Date> <int> <Date> <int> <int>
# 1: TORATA Atalanta 2020-09-26 2 2020-08-27 NA 0
# 2: LAZATA Atalanta 2020-09-30 1 2020-08-31 1 1
# 3: ATACAG Atalanta 2020-10-04 3 2020-09-04 2 2
# 4: NAPATA Atalanta 2020-10-17 4 2020-09-17 3 3
# 5: ATASAM Atalanta 2020-10-24 5 2020-09-24 4 4
# 6: CROATA Atalanta 2020-10-31 6 2020-10-01 3 3
# 7: ATAINT Atalanta 2020-11-08 7 2020-10-09 3 3
In this case, for each Date
value, we count how many of the dates are within 30 days of it.
If you need the NA
in place of a 0
, then you can add on dat[z < 1, z := NA]
or similar.
Data:
library(data.table)
dat <- structure(list(Code = c("TORATA", "LAZATA", "ATACAG", "NAPATA", "ATASAM", "CROATA", "ATAINT"), Team = c("Atalanta", "Atalanta", "Atalanta", "Atalanta", "Atalanta", "Atalanta", "Atalanta"), Date = structure(c(18531, 18535, 18539, 18552, 18559, 18566, 18574), class = "Date"), Day = c(2L, 1L, 3L, 4L, 5L, 6L, 7L), Date...30d = structure(c(18501, 18505, 18509, 18522, 18529, 18536, 18544), class = "Date"), Games.played.over.the.last.30.days = c(NA, 1L, 2L, 3L, 4L, 3L, 3L)), class = c("data.table", "data.frame"), row.names = c(NA, -7L))
setDT(dat)
Cumulative sum of unique values based on multiple criteria
This cound help, without the need for a join.
df %>% arrange(Country, Site, species, Year) %>%
filter(Year>1980) %>%
group_by(Site, species) %>%
mutate(nYear = length(unique(Year))) %>%
mutate(spsum = rowid(species))
# A tibble: 30 x 6
# Groups: Site, species [5]
Country Site species Year nYear spsum
<chr> <chr> <int> <int> <int> <int>
1 A F 1 1981 6 1
2 A F 1 1986 6 2
3 A F 1 1991 6 3
4 A F 1 1996 6 4
5 A F 1 2001 6 5
6 A F 1 2006 6 6
7 B G 2 1982 6 1
8 B G 2 1987 6 2
9 B G 2 1992 6 3
10 B G 2 1997 6 4
# ... with 20 more rows
R - Cumsum of product over rolling windows (quarters)
Here is an option:
DT[, do :=
.SD[.SD, on=.(date<=date), by=.EACHI, {
nqtr <- floor(pmax(0, i.date - x.date) / 90)
sum(value * 0.9^nqtr)
}]$V1
]
output:
date value country desired_output do
1: 2017-01-01 2 US 2.000 2.000
2: 2017-01-05 2 UK 8.000 8.000
3: 2017-01-05 4 US 8.000 8.000
4: 2017-04-01 5 IT 12.800 12.800
5: 2017-04-03 3 US 115.800 115.800
6: 2017-04-03 100 US 115.800 115.800
7: 2017-04-11 20 UK 135.200 135.200
8: 2017-04-15 6 US 141.200 141.200
9: 2017-07-02 30 US 170.520 160.220
10: 2017-10-12 4 UK 151.912 151.372
11: 2017-10-12 6 IT 151.912 151.372
The difference is in how we define a quarter. I used 90d. If 3m is really important, I will update the post. For example on 2017-07-02, rows 2 - 6 are 1 qtr ago when using 90days whereas in your OP, only rows 2 - 4 are in a qtr ago when using 3m.
data:
library(data.table)
DT <- fread('date,value,country,desired_output
"2017-01-01", 2, "US", 2
"2017-01-05", 2, "UK", 8
"2017-01-05", 4, "US", 8
"2017-04-01", 5, "IT", 12.8
"2017-04-03", 3, "US", 115.8
"2017-04-03", 100, "US", 115.8
"2017-04-11", 20, "UK", 135.2
"2017-04-15", 6, "US", 141.2
"2017-07-02", 30, "US", 170.52
"2017-10-12", 4, "UK", 151.912
"2017-10-12", 6, "IT", 151.912')
DT[, date := as.IDate(date, format="%Y-%m-%d")]
Handling 3m and country requirement:
DT[, do :=
.SD[.SD, on=.(country, date<=date), by=.EACHI, {
vec <- rev(seq(i.date, min(x.date)-93L, by="-1 quarter"))
itvl <- findInterval(x.date, vec, rightmost.closed=TRUE)
nqtr <- length(vec) - itvl - 1L
sum(value * 0.9^nqtr)
}]$V1
]
output:
date value country desired_output do
1: 2017-01-01 2 US 2.000 2.000
2: 2017-01-05 2 UK 8.000 8.000
3: 2017-01-05 4 US 8.000 8.000
4: 2017-04-01 5 IT 12.800 13.000
5: 2017-04-03 3 US 115.800 115.800
6: 2017-04-03 100 US 115.800 115.800
7: 2017-04-11 20 UK 135.200 135.200
8: 2017-04-15 6 US 141.200 141.200
9: 2017-07-02 30 US 170.520 170.520
10: 2017-10-12 4 UK 151.912 151.912
11: 2017-10-12 6 IT 151.912 151.912
Rolling sum with a conditioned window
A possible solution:
library(lubridate) # for the '%m+%'-function
d[, roll_sum_3d := .SD[.SD[, .(date, date2 = date %m+% days(3), revenue)]
, on = .(date > date, date <= date2)
][, sum(revenue, na.rm = TRUE), by = date]$V1
, by = customer][]
which gives:
customer date revenue roll_sum_3d
1: A 2016-01-01 32 168
2: A 2016-01-03 88 80
3: A 2016-01-04 80 0
4: A 2016-02-01 38 0
5: B 2016-01-13 44 0
6: B 2016-01-24 11 96
7: B 2016-01-25 50 46
8: B 2016-01-26 46 0
What this does:
- Group
d
bycustomer with
by = customer`. - Add
roll_sum_3d
by reference with:=
. - Calculate
roll_sum_3d
by joining.SD
(Subset of Data) for each group with a date-window of that group (.SD[, .(date, date2 = date %m+% days(3), revenue)]
with a non-equi joinon = .(date > date, date <= date2)
, summarise the revenue for each date and give that back.
An alternative based on @Arun's comment:
d[, roll_sum_3d := d[d[, .(customer, date, date2 = date %m+% days(3), revenue)]
, on = .(customer, date > date, date <= date2)
, sum(revenue, na.rm = TRUE), by=.EACHI]$V1][]
Related Topics
How to Load Xlsx File Using Fread Function
Grouped Correlation with Dplyr (Works Only on Console)
Using Melt with Matrix or Data.Frame Gives Different Output
Update() Inside a Function Only Searches the Global Environment
Error: Object '.Dosnowglobals' Not Found
Combine Result from Top_N with an "Other" Category in Dplyr
Add Font to R That Is Not in Extrafonts Library
Equation Numbering in Rmarkdown - for Export to Word
Get(X) Does Not Work in R Data.Table When X Is Also a Column in the Data Table
How to Plot X-Axis Labels and Bars Between Tick Marks in Ggplot2 Bar Plot
Pivot_Longer into Multiple Columns
R: Row-Wise Dplyr::Mutate Using Function That Takes a Data Frame Row and Returns an Integer
Transparency and Alpha Levels for Ggplot2 Stat_Density2D with Maps and Layers in R
Refer to Range of Columns by Name in R
Update an Entire Row in Data.Table in R
Frustration Using Rjava to Call a Third Party Java Jar