Rolling sum of one variable in data.frame in number of steps defined by another variable
I'm not sure if having variable window size is possible in any of the rolling function. Here is one way to do this using map2_dbl
:
library(dplyr)
df %>%
group_by(a) %>%
mutate(sum_b_step_window = purrr::map2_dbl(row_number(), step_window,
~sum(b[.x:(.x + .y - 1)], na.rm = TRUE)))
# a b step_window sum_b_step_window
# <int> <int> <dbl> <dbl>
# 1 1 1 2 3
# 2 1 2 3 9
# 3 1 3 1 3
# 4 1 4 2 9
# 5 1 5 4 5
# 6 2 6 1 6
# 7 2 7 2 15
# 8 2 8 3 27
# 9 2 9 2 19
#10 2 10 1 10
how to make a rolling sum (or rolling average) with multiple variables
I did it using dplyr::lag()
library(dplyr)
library(tibble)
## Data
data <- tribble(
~Date, ~Prod, ~Amount,
"2010-01-28", "Corn", 1,
"2010-01-28", "Potato", 2,
"2010-02-28", "Corn", 3,
"2010-02-28", "Potato", 4,
"2010-03-28", "Corn", 5,
"2010-03-28", "Potato", 6,
"2010-04-28", "Corn", 7,
"2010-04-28", "Potato", 8
)
# Code
data %>%
group_by(Prod) %>%
mutate(cum_amount = Amount + lag(Amount, 1) + lag(Amount, 2)) %>%
filter(!is.na(cum_amount))
# A tibble: 4 x 4
# Groups: Prod [2]
Date Prod Amount cum_amount
<chr> <chr> <dbl> <dbl>
1 2010-03-28 Corn 5 9
2 2010-03-28 Potato 6 12
3 2010-04-28 Corn 7 15
4 2010-04-28 Potato 8 18
Update in order to your comment
data %>%
group_by(Prod) %>%
mutate(cum_amount = c(rep(NA, 2), zoo::rollsum(Amount, 3))) %>%
filter(!is.na(cum_amount))
PS: Remember to include the R tag in your questions
R - Rolling sum based on dates, with a condition per group
This should work:
library(sqldf)
library(dplyr)
library(data.table)
# Creating a new 'row column'
data$row_n <- 1:nrow(data)
# Identifying which lines overlap and then filtering data
data <- sqldf("select a.*,
coalesce(group_concat(b.rowid), '') as overlaps
from data a
left join data b on a.id = b.id and
not a.rowid = b.rowid and
((a.start_date_event between
b.start_date_event and b.end_interval) or
(b.start_date_event between a.start_date_event
and a.end_interval))
group by a.rowid
order by a.rowid") %>%
group_by(id) %>%
mutate(row_n = as.character(row_n),
previous_row = dplyr::lag(row_n, n = 1, default = NA),
previous_value = dplyr::lag(sum_days, n = 1, default = NA),
sum2 = case_when(mapply(grepl,previous_row, overlaps) == TRUE &
previous_value > 15 ~ as.integer(0),
TRUE ~ sum_days),
previous_value = dplyr::lag(sum2, n = 1, default = NA),
sum2 = case_when(mapply(grepl,previous_row, overlaps) == TRUE &
previous_value > 15 ~ as.integer(0),
TRUE ~ sum_days)) %>%
dplyr::select(-c(previous_value, previous_row, row_n))
How to get tally (rolling sum) by group in R?
We can use cumsum
after grouping by 'type'
library(dplyr)
df <- df %>%
group_by(type) %>%
mutate(tally = cumsum(count)) %>%
ungroup
-output
# A tibble: 6 x 4
type rank count tally
<chr> <chr> <dbl> <dbl>
1 A low 9 9
2 A med 20 29
3 A high 31 60
4 B med 2 2
5 B high 4 6
6 C low 14 14
Cumulative sum in r based on another column excluding the current value for more than one column
Try this:
library(data.table)
nms <- c("categorical_variable", "categorical_variable_2")
df[, paste0(nms, "_transformed") :=
lapply(nms, \(g) ave(target_variable, get(g), FUN = cumsum) - target_variable)]
df
# categorical_variable categorical_variable_2 target_variable categorical_variable_transformed categorical_variable_2_transformed
# <char> <char> <num> <num> <num>
# 1: rock blue 0 0 0
# 2: indie green 0 0 0
# 3: rock red 1 0 0
# 4: rock red 1 1 1
# 5: pop blue 1 0 0
# 6: indie green 1 0 0
# 7: rock blue 0 2 1
Rolling sum based on date variable
We convert the date
to actual date
object, group_by
id
and for each date
find out the dates in the group which lie in last 3 months and take sum
of their respective value
.
library(dplyr)
df %>%
mutate(date = as.Date(date, "%d/%m/%Y")) %>%
group_by(id) %>%
mutate(ans = purrr::map_dbl(date, ~ sum(value[
date > (. - months(3)) & date <= .])))
# id date value expected ans
# <dbl> <date> <dbl> <dbl> <dbl>
# 1 1 2015-01-01 106 106 106
# 2 1 2015-02-01 57 163 163
# 3 1 2015-02-15 43 206 206
# 4 1 2015-03-01 105 311 311
# 5 1 2015-04-01 180 385 385
# 6 2 2015-01-01 109 109 109
# 7 2 2015-02-01 36 145 145
# 8 2 2015-02-15 102 247 247
# 9 2 2015-02-20 97 344 344
#10 2 2015-03-01 97 441 441
# … with 11 more rows
R function that can sum of every x values, overlapping the data for each calculation?
You can use rollsum
function from zoo
to calculate rolling sum. Based on where you want to put the answer you can align
the output accordingly.
For example, right align - when you want to sum of position 1, 2, 3, 4 and 5 should be put at position 5.
library(dplyr)
library(zoo)
df %>% mutate(roll_sum = rollsumr(Value, 5, fill = NA))
# Time Value roll_sum
#1 1:00 5.5 NA
#2 1:05 5.3 NA
#3 1:10 5.4 NA
#4 1:15 6.3 NA
#5 1:20 4.6 27.1
#6 1:25 6.4 28.0
Left align - when you want to put sum of position 1 to 5 at position 1.
df %>% mutate(roll_sum = rollsum(Value, 5, fill = NA, align = 'left'))
# Time Value roll_sum
#1 1:00 5.5 27.1
#2 1:05 5.3 28.0
#3 1:10 5.4 NA
#4 1:15 6.3 NA
#5 1:20 4.6 NA
#6 1:25 6.4 NA
Center align - When you want to sum of position 1 to 5 to be included at position 3.
df %>% mutate(roll_sum = rollsum(Value, 5, fill = NA))
# Time Value roll_sum
#1 1:00 5.5 NA
#2 1:05 5.3 NA
#3 1:10 5.4 27.1
#4 1:15 6.3 28.0
#5 1:20 4.6 NA
#6 1:25 6.4 NA
Note that in all the case the numbers are the same but the position at which answers are placed differs. If you store the output of above operation in result
, you can get minimum value by doing min(result$roll_sum, na.rm = TRUE)
.
Generating a moving sum variable in R
You can use filter
in ddply
(or any other function implementing the "split-apply-combine" approach):
library(plyr)
ddply(DF, .(country), transform,
x5yrsum2 = as.numeric(filter(x,c(0,rep(1,5)),sides=1)))
# country year x x5yrsum x5yrsum2
# 1 A 1980 9 NA NA
# 2 A 1981 3 NA NA
# 3 A 1982 5 NA NA
# 4 A 1983 6 NA NA
# 5 A 1984 9 NA NA
# 6 A 1985 7 32 32
# 7 A 1986 9 30 30
# 8 A 1987 4 36 36
# 9 B 1990 0 NA NA
# 10 B 1991 4 NA NA
# 11 B 1992 2 NA NA
# 12 B 1993 6 NA NA
# 13 B 1994 3 NA NA
# 14 B 1995 7 15 15
# 15 B 1996 0 22 22
Related Topics
One-Hot Encoding in [R] | Categorical to Dummy Variables
How to Set the Default Language of Date in R
Rstudio Shiny Error: There Is No Package Called "Shinydashboard"
Connect to Postgres via Ssl Using R
List for Multiple Plots from Loop (Ggplot2) - List Elements Being Overwritten
How to Convert Time (Mm:Ss) to Decimal Form in R
Extract Text After "/" in a Data Frame Column
Breaking Loop When "Warnings()" Appear in R
Suggestions for Speeding Up Random Forests
Can Rbind Be Parallelized in R
Choropleth Map in Ggplot with Polygons That Have Holes
Plot.New Has Not Been Called Yet
Assign Value to Group Based on Condition in Column
How to Filter Rows Based on Difference in Dates Between Rows in R
Divide Row Value by Aggregated Sum in R Data.Frame
Unlist a Data Frame by Rows, Not Columns