Cumulative sum that resets when 0 is encountered
Another base would be just
with(df, ave(b, cumsum(b == 0), FUN = cumsum))
## [1] 1 0 1 2
This will just divide column b
to groups according to 0
appearances and compute the cumulative sum of b
per these groups
Another solution using the latest data.table
version (v 1.9.6+)
library(data.table) ## v 1.9.6+
setDT(df)[, whatiwant := cumsum(b), by = rleid(b == 0L)]
# campaign date b whatiwant
# 1: a jan 1 1
# 2: b feb 0 0
# 3: c march 1 1
# 4: d april 1 2
Some benchmarks per comments
set.seed(123)
x <- sample(0:1e3, 1e7, replace = TRUE)
system.time(res1 <- ave(x, cumsum(x == 0), FUN = cumsum))
# user system elapsed
# 1.54 0.24 1.81
system.time(res2 <- Reduce(function(x, y) if (y == 0) 0 else x+y, x, accumulate=TRUE))
# user system elapsed
# 33.94 0.39 34.85
library(data.table)
system.time(res3 <- data.table(x)[, whatiwant := cumsum(x), by = rleid(x == 0L)])
# user system elapsed
# 0.20 0.00 0.21
identical(res1, as.integer(res2))
## [1] TRUE
identical(res1, res3$whatiwant)
## [1] TRUE
SQL Server - Cumulative Sum that resets when 0 is encountered
In SQL Server 2008, you are severely limited because you cannot use analytic functions. The following is not efficient, but it will solve your problem:
with tg as (
select t.*, g.grp
from t cross apply
(select count(*) as grp
from t t2
where t2.pk <= t.pk and t2.pk = 0
) g
)
select tg.*, p.running_price
from tg cross apply
(select sum(tg2.price) as running_price
from tg tg2
where tg2.grp = tg.grp and tg2.pk <= tg.pk
) p;
Alas, prior to SQL Server 2012, the most efficient solution might involve cursors. In SQL Server 2012+, you simply do:
select t.*,
sum(price) over (partition by grp order by pk) as running_price
from (select t.*,
sum(case when price = 0 then 1 else 0 end) over (order by pk) as grp
from t
) t;
How to perform cumsum with reset at 0 in R?
Create a temporary group column to create a new group everytime you encounter a 0.
library(dplyr)
df %>%
group_by(ID, grp = cumsum(Counter == 0)) %>%
mutate(Cumulative = cumsum(Counter)) %>%
ungroup() %>%
select(-grp) -> result
result
# ID Counter Cumulative
# <chr> <int> <int>
#1 A 1 1
#2 A 0 0
#3 A 1 1
#4 A 1 2
#5 B 1 1
#6 B 0 0
#7 B 1 1
The same logic can be implemented in base R and data.table
as :
df$Cumulative <- with(df, ave(Counter, ID, cumsum(Counter == 0), FUN = cumsum))
library(data.table)
setDT(df)[, Cumulative := cumsum(Counter), .(ID, cumsum(Counter == 0))]
data
df <- structure(list(ID = c("A", "A", "A", "A", "B", "B", "B"), Counter = c(1L,
0L, 1L, 1L, 1L, 0L, 1L)), class = "data.frame", row.names = c(NA, -7L))
cumsum by participant and reset on 0 R
Does this work?
library(dplyr)
library(data.table)
df %>%
mutate(grp = rleid(Correct)) %>%
group_by(Participant, grp) %>%
mutate(Count = cumsum(Correct)) %>%
select(- grp)
# A tibble: 10 x 4
# Groups: Participant, grp [6]
grp Participant Correct Count
<int> <chr> <dbl> <dbl>
1 1 A 1 1
2 1 A 1 2
3 1 A 1 3
4 2 A 0 0
5 3 A 1 1
6 3 B 1 1
7 3 B 1 2
8 4 B 0 0
9 5 B 1 1
10 5 B 1 2
Toy data:
df <- data.frame(
Participant = c(rep("A", 5), rep("B", 5)),
Correct = c(1,1,1,0,1,1,1,0,1,1)
)
Python pandas cumsum with reset everytime there is a 0
You can use:
a = df != 0
df1 = a.cumsum()-a.cumsum().where(~a).ffill().fillna(0).astype(int)
print (df1)
a b
0 0 1
1 1 2
2 0 3
3 1 0
4 2 1
5 0 2
Count the number of NA values in a row - reset when 0
You can use max
instead of cumsum
in your attempt :
library(data.table)
setDT(df)[, whatiwant := max(Accumulated), by = rleid(b == 0L)]
df
# b Accumulated whatiwant
# 1: 1 1 3
# 2: 1 2 3
# 3: 1 3 3
# 4: 0 0 0
# 5: 1 1 4
# 6: 1 2 4
# 7: 1 3 4
# 8: 1 4 4
# 9: 0 0 0
#10: 0 0 0
#11: 0 0 0
#12: 1 1 2
#13: 1 2 2
Cumulative monthly sum with reset to zero at the beginning of each new month in pandas
Solution if multiple years is grouping by month periods by Series.dt.to_period
:
df['Cumulative Value'] = df.groupby(df['Date'].dt.to_period('m'))['Value'].cumsum()
Solution if ony one year is possible use Series.dt.month
:
df['Cumulative Value'] = df.groupby(df['Date'].dt.month)['Value'].cumsum()
Also cumulative sum by default reset to 0
, so not necessary add code for this.
Pandas - Cumulative Sum, with resets, using GroupBy
IIUC, you can do the cumsum
after a groupby
on the column group and every time the column value is eq
to 0, you create a new group with cumsum
as well.
test['consec'] = test.groupby(['group', test['value'].eq(0).cumsum()])['value'].cumsum()
test['max_c'] = test.groupby(['group'])['consec'].cummax()
print(test)
group idx value consec max_c
0 1 0 0 0 0
1 1 1 1 1 1
2 1 2 0 0 1
3 1 3 1 1 1
4 1 4 1 2 2
5 1 5 1 3 3
6 1 6 1 4 4
7 2 0 0 0 0
8 2 1 1 1 1
9 2 2 1 2 2
10 2 3 1 3 3
11 2 4 0 0 3
12 2 5 1 1 3
13 2 6 0 0 3
Cumulative sum that resets when the condition is no longer met
Note: Uses global variable
c = 0
def fun(x):
global c
if x['speed'] > 2.0:
c = 0
else:
c = x['timedelta']+c
return c
df = pd.DataFrame( {'datetime': ['1-1-2019 19:30:00']*7,
'speed': [0.5,.7,0.1,5.0,25.0,0.1,0.1], 'timedelta': [0,2,2,2,2,4,7]})
df['cum_sum']=df.apply(fun, axis=1)
datetime speed timedelta cum_sum
0 1-1-2019 19:30:00 0.5 0 0
1 1-1-2019 19:30:00 0.7 2 2
2 1-1-2019 19:30:00 0.1 2 4
3 1-1-2019 19:30:00 5.0 2 0
4 1-1-2019 19:30:00 25.0 2 0
5 1-1-2019 19:30:00 0.1 4 4
6 1-1-2019 19:30:00 0.1 7 11
Related Topics
Rstudio Rmarkdown: Both Portrait and Landscape Layout in a Single PDF
Alternative to Expand.Grid for Data.Frames
Can't Print to PDF Ggplot Charts
R Command for Setting Working Directory to Source File Location in Rstudio
Error in If/While (Condition) {:Argument Is of Length Zero
Convert Currency with Commas into Numeric
Convert Written Number to Number in R
Convert Column Classes in Data.Table
Using Lists Inside Data.Table Columns
Use Merge() to Update a Data Frame with Values from a Second Data Frame
File Path Issues in R Using Windows ("Hex Digits in Character String" Error)
Remove Rows in R Matrix Where All Data Is Na
Ggplot Side by Side Geom_Bar()