Rolling Sum by Another Variable in R

Rolling sum of one variable in data.frame in number of steps defined by another variable

I'm not sure if having variable window size is possible in any of the rolling function. Here is one way to do this using map2_dbl :

library(dplyr)
df %>% 
  group_by(a) %>% 
  mutate(sum_b_step_window = purrr::map2_dbl(row_number(), step_window, 
                             ~sum(b[.x:(.x + .y - 1)], na.rm = TRUE)))

#      a     b step_window sum_b_step_window
#   <int> <int>       <dbl>             <dbl>
# 1     1     1           2                 3
# 2     1     2           3                 9
# 3     1     3           1                 3
# 4     1     4           2                 9
# 5     1     5           4                 5
# 6     2     6           1                 6
# 7     2     7           2                15
# 8     2     8           3                27
# 9     2     9           2                19
#10     2    10           1                10

how to make a rolling sum (or rolling average) with multiple variables

I did it using dplyr::lag()

library(dplyr)
library(tibble)

## Data
data <- tribble(
  ~Date,        ~Prod,    ~Amount,
  "2010-01-28",   "Corn",      1,
  "2010-01-28", "Potato",      2,
  "2010-02-28",   "Corn",      3,
  "2010-02-28", "Potato",      4,
  "2010-03-28",   "Corn",      5,
  "2010-03-28", "Potato",      6,
  "2010-04-28",   "Corn",      7,
  "2010-04-28", "Potato",      8
)

# Code

data %>% 
  group_by(Prod) %>% 
  mutate(cum_amount = Amount + lag(Amount, 1) + lag(Amount, 2)) %>% 
  filter(!is.na(cum_amount))


# A tibble: 4 x 4
# Groups:   Prod [2]
  Date       Prod   Amount cum_amount
  <chr>      <chr>   <dbl>      <dbl>
1 2010-03-28 Corn        5          9
2 2010-03-28 Potato      6         12
3 2010-04-28 Corn        7         15
4 2010-04-28 Potato      8         18

Update in order to your comment

data %>% 
  group_by(Prod) %>% 
  mutate(cum_amount = c(rep(NA, 2), zoo::rollsum(Amount, 3))) %>% 
  filter(!is.na(cum_amount))

PS: Remember to include the R tag in your questions

R - Rolling sum based on dates, with a condition per group

This should work:

library(sqldf)
library(dplyr)
library(data.table)

# Creating a new 'row column'
data$row_n <- 1:nrow(data)

# Identifying which lines overlap and then filtering data
data <- sqldf("select a.*, 
                      coalesce(group_concat(b.rowid), '') as overlaps
               from data a
               left join data b on a.id = b.id and 
                                   not a.rowid = b.rowid and
                                   ((a.start_date_event between
                                     b.start_date_event and b.end_interval) or
                                    (b.start_date_event between a.start_date_event
                                     and a.end_interval))
               group by a.rowid
               order by a.rowid") %>%
               group_by(id) %>%
               mutate(row_n = as.character(row_n),
                      previous_row = dplyr::lag(row_n, n = 1, default = NA),
                      previous_value = dplyr::lag(sum_days, n = 1, default = NA),
                      sum2 = case_when(mapply(grepl,previous_row, overlaps) == TRUE &
                                         previous_value > 15 ~ as.integer(0),
                                       TRUE ~ sum_days),
                      previous_value = dplyr::lag(sum2, n = 1, default = NA),
                      sum2 = case_when(mapply(grepl,previous_row, overlaps) == TRUE &
                                         previous_value > 15 ~ as.integer(0),
                                       TRUE ~ sum_days)) %>%
               dplyr::select(-c(previous_value, previous_row, row_n))

How to get tally (rolling sum) by group in R?

We can use cumsum after grouping by 'type'

library(dplyr)
df <- df %>%
   group_by(type) %>%
    mutate(tally = cumsum(count)) %>%
    ungroup

-output

# A tibble: 6 x 4
  type  rank  count tally
  <chr> <chr> <dbl> <dbl>
1 A     low       9     9
2 A     med      20    29
3 A     high     31    60
4 B     med       2     2
5 B     high      4     6
6 C     low      14    14

Cumulative sum in r based on another column excluding the current value for more than one column

Try this:

library(data.table)
nms <- c("categorical_variable", "categorical_variable_2")
df[, paste0(nms, "_transformed") :=
       lapply(nms, \(g) ave(target_variable, get(g), FUN = cumsum) - target_variable)]
df
#    categorical_variable categorical_variable_2 target_variable categorical_variable_transformed categorical_variable_2_transformed
#                  <char>                 <char>           <num>                            <num>                              <num>
# 1:                 rock                   blue               0                                0                                  0
# 2:                indie                  green               0                                0                                  0
# 3:                 rock                    red               1                                0                                  0
# 4:                 rock                    red               1                                1                                  1
# 5:                  pop                   blue               1                                0                                  0
# 6:                indie                  green               1                                0                                  0
# 7:                 rock                   blue               0                                2                                  1

Rolling sum based on date variable

We convert the date to actual date object, group_by id and for each date find out the dates in the group which lie in last 3 months and take sum of their respective value.

library(dplyr)

df %>%
  mutate(date = as.Date(date, "%d/%m/%Y")) %>%
  group_by(id) %>%
  mutate(ans = purrr::map_dbl(date, ~ sum(value[
             date > (. -  months(3)) & date <= .])))


#      id date       value expected   ans
#    <dbl> <date>     <dbl>    <dbl> <dbl>
# 1     1 2015-01-01   106      106   106
# 2     1 2015-02-01    57      163   163
# 3     1 2015-02-15    43      206   206
# 4     1 2015-03-01   105      311   311
# 5     1 2015-04-01   180      385   385
# 6     2 2015-01-01   109      109   109
# 7     2 2015-02-01    36      145   145
# 8     2 2015-02-15   102      247   247
# 9     2 2015-02-20    97      344   344
#10     2 2015-03-01    97      441   441
# … with 11 more rows

R function that can sum of every x values, overlapping the data for each calculation?

You can use rollsum function from zoo to calculate rolling sum. Based on where you want to put the answer you can align the output accordingly.

For example, right align - when you want to sum of position 1, 2, 3, 4 and 5 should be put at position 5.

library(dplyr)
library(zoo)

df %>% mutate(roll_sum = rollsumr(Value, 5, fill = NA))

#  Time Value roll_sum
#1 1:00   5.5       NA
#2 1:05   5.3       NA
#3 1:10   5.4       NA
#4 1:15   6.3       NA
#5 1:20   4.6     27.1
#6 1:25   6.4     28.0

Left align - when you want to put sum of position 1 to 5 at position 1.

df %>% mutate(roll_sum = rollsum(Value, 5, fill = NA, align = 'left'))

#  Time Value roll_sum
#1 1:00   5.5     27.1
#2 1:05   5.3     28.0
#3 1:10   5.4       NA
#4 1:15   6.3       NA
#5 1:20   4.6       NA
#6 1:25   6.4       NA

Center align - When you want to sum of position 1 to 5 to be included at position 3.

df %>% mutate(roll_sum = rollsum(Value, 5, fill = NA))

#  Time Value roll_sum
#1 1:00   5.5       NA
#2 1:05   5.3       NA
#3 1:10   5.4     27.1
#4 1:15   6.3     28.0
#5 1:20   4.6       NA
#6 1:25   6.4       NA

Note that in all the case the numbers are the same but the position at which answers are placed differs. If you store the output of above operation in result, you can get minimum value by doing min(result$roll_sum, na.rm = TRUE).

Generating a moving sum variable in R

You can use filter in ddply (or any other function implementing the "split-apply-combine" approach):

library(plyr)
ddply(DF, .(country), transform, 
          x5yrsum2 = as.numeric(filter(x,c(0,rep(1,5)),sides=1)))

#    country year x x5yrsum x5yrsum2
# 1        A 1980 9      NA       NA
# 2        A 1981 3      NA       NA
# 3        A 1982 5      NA       NA
# 4        A 1983 6      NA       NA
# 5        A 1984 9      NA       NA
# 6        A 1985 7      32       32
# 7        A 1986 9      30       30
# 8        A 1987 4      36       36
# 9        B 1990 0      NA       NA
# 10       B 1991 4      NA       NA
# 11       B 1992 2      NA       NA
# 12       B 1993 6      NA       NA
# 13       B 1994 3      NA       NA
# 14       B 1995 7      15       15
# 15       B 1996 0      22       22