Cumsum Reset at Certain Values

How to reset cumulative sum per group when a certain column is 0 in pandas

  1. For the given resetting condition, use groupby.cumsum to create a Reset grouper that tells us when Quantity hits 0 within each Group:

    condition = df.Quantity.eq(0)
    df['Reset'] = condition.groupby(df.Group).cumsum()

    # Group Quantity Value Cumulative_sum Reset
    # 0 A 10 200 200 0
    # 1 B 5 300 300 0
    # 2 A 1 50 250 0
    # 3 A 0 100 0 1
    # 4 C 5 400 400 0
    # 5 A 10 300 300 1
    # 6 B 10 200 500 0
    # 7 A 15 350 650 1
  2. mask the Value column whenever the resetting condition is met and use another groupby.cumsum on both Group and Reset:

    df['Cumul'] = df.Value.mask(condition, 0).groupby([df.Group, df.Reset]).cumsum()

    # Group Quantity Value Cumulative_sum Reset Cumul
    # 0 A 10 200 200 0 200
    # 1 B 5 300 300 0 300
    # 2 A 1 50 250 0 250
    # 3 A 0 100 0 1 0
    # 4 C 5 400 400 0 400
    # 5 A 10 300 300 1 300
    # 6 B 10 200 500 0 500
    # 7 A 15 350 650 1 650

Resetting Cumulative Sum once a value is reached and set a flag to 1

"Ordinary" cumsum() is here useless, as this function "doesn't know"
where to restart summation.

You can do it with the following custom function:

def myCumSum(x, thr):
if myCumSum.prev >= thr:
myCumSum.prev = 0
myCumSum.prev += x
return myCumSum.prev

This function is "with memory" (from the previous call) - prev, so there
is a way to "know" where to restart.

To speed up the execution, define a vectorized version of this function:

myCumSumV = np.vectorize(myCumSum, otypes=[np.int], excluded=['thr'])

Then execute:

threshold = 40
myCumSum.prev = 0 # Set the "previous" value
# Replace "a" column with your cumulative sum
df.a = myCumSumV(df.a.values, threshold)
df['flag'] = df.a.ge(threshold).astype(int) # Compute "flag" column

The result is:

     a  b  flag
0 5 1 0
1 11 1 0
2 41 1 1
3 170 0 1
4 5 1 0
5 15 1 0

Pandas Cumsum conditional reset

df = pd.DataFrame({'Size':[8,8,8,8,7,6,7,6,5,2]})

ls = []
cumsum = 0
last_reset = 0
for _, row in df.iterrows():
if cumsum + row.Size <= 16:
cumsum += row.Size
else:
last_reset = cumsum
cumsum = row.Size
ls.append(cumsum)

df['cumsum'] = ls

Result:

    Size    cumsum
0 8 8
1 8 16
2 8 8
3 8 16
4 7 7
5 6 13
6 7 7
7 6 13
8 5 5
9 2 7

cumsum with a condition to restart in R

You may use cumsum to create groups as well.

library(dplyr)

df <- df %>%
group_by(group = cumsum(dplyr::lag(port == 0, default = 0))) %>%
mutate(cumsum_G = cumsum(G)) %>%
ungroup

df

# inv ass port G group cumsum_G
# <chr> <chr> <int> <int> <dbl> <int>
#1 i x 2 1 0 1
#2 i x 2 0 0 1
#3 i x 0 1 0 2
#4 i x 3 0 1 0
#5 i x 3 1 1 1

You may remove the group column from output using %>% select(-group).

data

df <- structure(list(inv = c("i", "i", "i", "i", "i"), ass = c("x", 
"x", "x", "x", "x"), port = c(2L, 2L, 0L, 3L, 3L), G = c(1L,
0L, 1L, 0L, 1L)), class = "data.frame", row.names = c(NA, -5L))

reset cumulative sum based on another column

Here is an option, you first create the tp_cum column and then cumsum()

import pandas as pd
import numpy as np

df = pd.DataFrame([["y",10 ],
["y",20 ],
["y",5 ],
["n",30 ],
["n",20 ],
["n",5 ],
["y",10 ],
["y",40 ],
["y",15 ]],columns = ["type","sale"])

df["type2"] = np.cumsum((df["type"] != df["type"].shift(1)))
df["cum_sale"] = df[["sale","type2"]].groupby("type2").cumsum()
df

Output:

    type    sale    type2  cum_sale
0 y 10 1 10
1 y 20 1 30
2 y 5 1 35
3 n 30 2 30
4 n 20 2 50
5 n 5 2 55
6 y 10 3 10
7 y 40 3 50
8 y 15 3 65

Cumulative sum that resets when the condition is no longer met

Note: Uses global variable

c = 0
def fun(x):
global c
if x['speed'] > 2.0:
c = 0
else:
c = x['timedelta']+c
return c

df = pd.DataFrame( {'datetime': ['1-1-2019 19:30:00']*7,
'speed': [0.5,.7,0.1,5.0,25.0,0.1,0.1], 'timedelta': [0,2,2,2,2,4,7]})

df['cum_sum']=df.apply(fun, axis=1)
            datetime    speed   timedelta   cum_sum
0 1-1-2019 19:30:00 0.5 0 0
1 1-1-2019 19:30:00 0.7 2 2
2 1-1-2019 19:30:00 0.1 2 4
3 1-1-2019 19:30:00 5.0 2 0
4 1-1-2019 19:30:00 25.0 2 0
5 1-1-2019 19:30:00 0.1 4 4
6 1-1-2019 19:30:00 0.1 7 11


Related Topics



Leave a reply



Submit