How to reset cumulative sum per group when a certain column is 0 in pandas
For the given resetting
condition
, usegroupby.cumsum
to create aReset
grouper that tells us whenQuantity
hits 0 within eachGroup
:condition = df.Quantity.eq(0)
df['Reset'] = condition.groupby(df.Group).cumsum()
# Group Quantity Value Cumulative_sum Reset
# 0 A 10 200 200 0
# 1 B 5 300 300 0
# 2 A 1 50 250 0
# 3 A 0 100 0 1
# 4 C 5 400 400 0
# 5 A 10 300 300 1
# 6 B 10 200 500 0
# 7 A 15 350 650 1mask
theValue
column whenever the resettingcondition
is met and use anothergroupby.cumsum
on bothGroup
andReset
:df['Cumul'] = df.Value.mask(condition, 0).groupby([df.Group, df.Reset]).cumsum()
# Group Quantity Value Cumulative_sum Reset Cumul
# 0 A 10 200 200 0 200
# 1 B 5 300 300 0 300
# 2 A 1 50 250 0 250
# 3 A 0 100 0 1 0
# 4 C 5 400 400 0 400
# 5 A 10 300 300 1 300
# 6 B 10 200 500 0 500
# 7 A 15 350 650 1 650
Resetting Cumulative Sum once a value is reached and set a flag to 1
"Ordinary" cumsum() is here useless, as this function "doesn't know"
where to restart summation.
You can do it with the following custom function:
def myCumSum(x, thr):
if myCumSum.prev >= thr:
myCumSum.prev = 0
myCumSum.prev += x
return myCumSum.prev
This function is "with memory" (from the previous call) - prev, so there
is a way to "know" where to restart.
To speed up the execution, define a vectorized version of this function:
myCumSumV = np.vectorize(myCumSum, otypes=[np.int], excluded=['thr'])
Then execute:
threshold = 40
myCumSum.prev = 0 # Set the "previous" value
# Replace "a" column with your cumulative sum
df.a = myCumSumV(df.a.values, threshold)
df['flag'] = df.a.ge(threshold).astype(int) # Compute "flag" column
The result is:
a b flag
0 5 1 0
1 11 1 0
2 41 1 1
3 170 0 1
4 5 1 0
5 15 1 0
Pandas Cumsum conditional reset
df = pd.DataFrame({'Size':[8,8,8,8,7,6,7,6,5,2]})
ls = []
cumsum = 0
last_reset = 0
for _, row in df.iterrows():
if cumsum + row.Size <= 16:
cumsum += row.Size
else:
last_reset = cumsum
cumsum = row.Size
ls.append(cumsum)
df['cumsum'] = ls
Result:
Size cumsum
0 8 8
1 8 16
2 8 8
3 8 16
4 7 7
5 6 13
6 7 7
7 6 13
8 5 5
9 2 7
cumsum with a condition to restart in R
You may use cumsum
to create groups as well.
library(dplyr)
df <- df %>%
group_by(group = cumsum(dplyr::lag(port == 0, default = 0))) %>%
mutate(cumsum_G = cumsum(G)) %>%
ungroup
df
# inv ass port G group cumsum_G
# <chr> <chr> <int> <int> <dbl> <int>
#1 i x 2 1 0 1
#2 i x 2 0 0 1
#3 i x 0 1 0 2
#4 i x 3 0 1 0
#5 i x 3 1 1 1
You may remove the group
column from output using %>% select(-group)
.
data
df <- structure(list(inv = c("i", "i", "i", "i", "i"), ass = c("x",
"x", "x", "x", "x"), port = c(2L, 2L, 0L, 3L, 3L), G = c(1L,
0L, 1L, 0L, 1L)), class = "data.frame", row.names = c(NA, -5L))
reset cumulative sum based on another column
Here is an option, you first create the tp_cum
column and then cumsum()
import pandas as pd
import numpy as np
df = pd.DataFrame([["y",10 ],
["y",20 ],
["y",5 ],
["n",30 ],
["n",20 ],
["n",5 ],
["y",10 ],
["y",40 ],
["y",15 ]],columns = ["type","sale"])
df["type2"] = np.cumsum((df["type"] != df["type"].shift(1)))
df["cum_sale"] = df[["sale","type2"]].groupby("type2").cumsum()
df
Output:
type sale type2 cum_sale
0 y 10 1 10
1 y 20 1 30
2 y 5 1 35
3 n 30 2 30
4 n 20 2 50
5 n 5 2 55
6 y 10 3 10
7 y 40 3 50
8 y 15 3 65
Cumulative sum that resets when the condition is no longer met
Note: Uses global variable
c = 0
def fun(x):
global c
if x['speed'] > 2.0:
c = 0
else:
c = x['timedelta']+c
return c
df = pd.DataFrame( {'datetime': ['1-1-2019 19:30:00']*7,
'speed': [0.5,.7,0.1,5.0,25.0,0.1,0.1], 'timedelta': [0,2,2,2,2,4,7]})
df['cum_sum']=df.apply(fun, axis=1)
datetime speed timedelta cum_sum
0 1-1-2019 19:30:00 0.5 0 0
1 1-1-2019 19:30:00 0.7 2 2
2 1-1-2019 19:30:00 0.1 2 4
3 1-1-2019 19:30:00 5.0 2 0
4 1-1-2019 19:30:00 25.0 2 0
5 1-1-2019 19:30:00 0.1 4 4
6 1-1-2019 19:30:00 0.1 7 11
Related Topics
How to Sort a Vector of Alphanumeric Values Using Lexical Ordering in R
Dataframe Is Subseted by Row Number and Not by Cell Value After Clicking on Dt::Datatable
R - Check If String Contains Dates Within Specific Date Range
Convert Byte Encoding to Unicode
Scraping JavaScript Generated Data
Replace Na with Grouped Means in R
Scale Value Inside of Aes_String()
Changes in Plotting an Xts Object
How to Convert Class of Several Variables at Once
Stargazer Output Appears Below Text - Rmarkdown to PDF
Labelling Points with Ggplot2 and Directlabels
How to Select Dropdown Box Using Rselenium
Tls V1.1/Tls V1.2 Support in Rcurl
Out of Order Text Labels on Stack Bar Plot (Ggplot)
How to Edit Column Names in Datatable Function When Running R Shiny App
Character String Is Not in a Standard Unambiguous Format
How to Do a Glm When "Contrasts Can Be Applied Only to Factors with 2 or More Levels"