Cumsum Reset at Nan

Cumsum reset at NaN

A simple Numpy translation of your Matlab code is this:

import numpy as np

v = np.array([1., 1., 1., np.nan, 1., 1., 1., 1., np.nan, 1.])
n = np.isnan(v)
a = ~n
c = np.cumsum(a)
d = np.diff(np.concatenate(([0.], c[n])))
v[n] = -d
np.cumsum(v)

Executing this code returns the result array([ 1., 2., 3., 0., 1., 2., 3., 4., 0., 1.]). This solution will only be as valid as the original one, but maybe it will help you come up with something better if it isn't sufficient for your purposes.

Pandas dataframe, cumsum reset on NAN

Use groupby and cumsum:

df['s_cumsum'] = df.s_number.groupby(df.s_number.isna().cumsum()).cumsum()
df

Index s_number s_cumsum
0 0 1.0 1.0
1 1 4.0 5.0
2 2 6.0 11.0
3 3 NaN NaN
4 4 7.0 7.0
5 5 2.0 9.0
6 6 3.0 12.0

Note that if "s_number" is a column of strings, use

df['s_number'] = pd.to_numeric(df['s_number'], errors='coerce)

...first, to get a float column with NaNs.


If you want to fill the NaNs,

df['s_cumsum'] = (df.s_number.groupby(df.s_number.isna().cumsum())
.cumsum()
.fillna(0, downcast='infer'))
df

Index s_number s_cumsum
0 0 1.0 1
1 1 4.0 5
2 2 6.0 11
3 3 NaN 0
4 4 7.0 7
5 5 2.0 9
6 6 3.0 12

Matlab cumsum reset at NaN?

I can only think of a few-pass solution:

v = [1 1 1 NaN 1 1 1 1 NaN 1];
a = v==v; %% convert the values first to [1 1 1 0 1 1 1 1 0 1] format
n = a==0; %% positions of the NaNs
c = cumsum(a); %% your intermediate result
d = diff([0 c(n)]); %% runs of ones
v(n) = -d; %% replace Nans by -3, -4 [1 1 1 -3 1 1 1 1 -4 1]
cumsum(v) %% the answer [1 2 3 0 1 2 3 4 0 1]

Note: haven't checked extreme conditions (NaN in first/Last position, consecutive NaNs etc.)

How to reset cumulative sum per group when a certain column is 0 in pandas

  1. For the given resetting condition, use groupby.cumsum to create a Reset grouper that tells us when Quantity hits 0 within each Group:

    condition = df.Quantity.eq(0)
    df['Reset'] = condition.groupby(df.Group).cumsum()

    # Group Quantity Value Cumulative_sum Reset
    # 0 A 10 200 200 0
    # 1 B 5 300 300 0
    # 2 A 1 50 250 0
    # 3 A 0 100 0 1
    # 4 C 5 400 400 0
    # 5 A 10 300 300 1
    # 6 B 10 200 500 0
    # 7 A 15 350 650 1
  2. mask the Value column whenever the resetting condition is met and use another groupby.cumsum on both Group and Reset:

    df['Cumul'] = df.Value.mask(condition, 0).groupby([df.Group, df.Reset]).cumsum()

    # Group Quantity Value Cumulative_sum Reset Cumul
    # 0 A 10 200 200 0 200
    # 1 B 5 300 300 0 300
    # 2 A 1 50 250 0 250
    # 3 A 0 100 0 1 0
    # 4 C 5 400 400 0 400
    # 5 A 10 300 300 1 300
    # 6 B 10 200 500 0 500
    # 7 A 15 350 650 1 650

Cumsum from DateTime that reset at specific times

Try groupby().cumcount() on the cumsum:

# blocks starting with `14:30:00`
# print to see the blocks
blocks = df.Time.eq('14:30:00').cumsum()

# enumerate the rows within each block with `groupby`
df['count_1430'] = df.groupby(blocks).cumcount()

Output:

          Date      Time     Open     High      Low     Last  count_1430
0 28/05/2018 14:30:00 1.16167 1.16252 1.16130 1.16166 0
1 28/05/2018 15:00:00 1.16166 1.16287 1.16159 1.16276 1
2 28/05/2018 15:30:00 1.16277 1.16293 1.16177 1.16212 2
3 28/05/2018 16:00:00 1.16213 1.16318 1.16198 1.16262 3
4 28/05/2018 16:30:00 1.16262 1.16298 1.16258 1.16284 4
5 28/05/2018 17:00:00 1.16285 1.16329 1.16264 1.16265 5
6 28/05/2018 17:30:00 1.16266 1.16300 1.16243 1.16289 6
7 28/05/2018 18:00:00 1.16288 1.16290 1.16228 1.16269 7
8 28/05/2018 18:30:00 1.16269 1.16278 1.16264 1.16274 8
9 28/05/2018 19:00:00 1.16275 1.16277 1.16270 1.16275 9
10 28/05/2018 19:30:00 1.16276 1.16284 1.16270 1.16280 10
11 28/05/2018 20:00:00 1.16279 1.16288 1.16264 1.16278 11
12 28/05/2018 20:30:00 1.16278 1.16289 1.16260 1.16265 12
13 28/05/2018 21:00:00 1.16267 1.16270 1.16251 1.16262 13
14 29/05/2018 14:30:00 1.15793 1.15827 1.15714 1.15786 0
15 29/05/2018 15:00:00 1.15785 1.15900 1.15741 1.15814 1
16 29/05/2018 15:30:00 1.15813 1.15813 1.15601 1.15647 2
17 29/05/2018 16:00:00 1.15647 1.15658 1.15451 1.15539 3
18 29/05/2018 16:30:00 1.15539 1.15601 1.15418 1.15510 4
19 29/05/2018 17:00:00 1.15508 1.15599 1.15463 1.15527 5
20 29/05/2018 17:30:00 1.15528 1.15587 1.15442 1.15465 6
21 29/05/2018 18:00:00 1.15465 1.15469 1.15196 1.15261 7
22 29/05/2018 18:30:00 1.15261 1.15441 1.15261 1.15349 8
23 29/05/2018 19:00:00 1.15348 1.15399 1.15262 1.15399 9
24 29/05/2018 19:30:00 1.15400 1.15412 1.15239 1.15322 10
25 29/05/2018 20:00:00 1.15322 1.15373 1.15262 1.15367 11
26 29/05/2018 20:30:00 1.15367 1.15419 1.15351 1.15367 12
27 29/05/2018 21:00:00 1.15366 1.15438 1.15352 1.15354 13
28 29/05/2018 21:30:00 1.15355 1.15355 1.15354 1.15354 14
29 30/05/2018 14:30:00 1.16235 1.16323 1.16133 1.16161 0
30 30/05/2018 15:00:00 1.16162 1.16193 1.16020 1.16059 1

Python pandas cumsum with reset everytime there is a 0

You can use:

a = df != 0
df1 = a.cumsum()-a.cumsum().where(~a).ffill().fillna(0).astype(int)
print (df1)
a b
0 0 1
1 1 2
2 0 3
3 1 0
4 2 1
5 0 2

How to reset cumsum after change in sign of values?

Create new key to groupby, then do cumsum within each group

New key Create: By using the sign change , if change we add one then it will belong to nest group

df.groupby(df.data.lt(0).astype(int).diff().ne(0).cumsum()).data.cumsum()
Out[798]:
0 -2
1 -3
2 1
3 -3
4 -4
5 2
6 2
7 5
8 -1
9 -3
Name: data, dtype: int64

pandas fillna in column with cumsum of previous rows (reset after every nan)

Use GroupBy.cumsum with helper Series created by check missing value by another cumsum:

df['sum'] = df.groupby(df['points'].isna().cumsum())['points'].cumsum()
print (df)
team points sum
0 GB 43.76 43.76
1 TEN 17.30 61.06
2 ARI 0.20 61.26
3 ATL 12.30 73.56
4 HOU 21.10 94.66
5 ARI 1.70 96.36
6 ATL 12.60 108.96
7 SF 15.00 123.96
8 GB 5.70 129.66
9 1 NaN NaN
10 GB 43.76 43.76
11 TEN 17.30 61.06
12 ARI 0.20 61.26
13 ATL 12.30 73.56
14 HOU 21.10 94.66
15 ARI 1.70 96.36
16 ATL 12.60 108.96
17 BUF 7.00 115.96
18 GB 5.70 121.66
19 2 NaN NaN


Related Topics



Leave a reply



Submit