Pandas Groupby Cumulative Sum

Pandas function to group by cumulative sum and return another column when a certain amount is reached

cumsum

You can perform the cumsum post group with:

df['amount_cumsum'] = df.groupby(['ID', 'item'])['amount'].cumsum()

Output (as separate column for clarity):

   ID  item  amount  level  amount_cumsum
0 1 1 10 5 10
1 1 1 10 10 20
2 2 4 15 5 15
3 2 9 30 8 30
4 2 4 10 10 25
5 3 4 10 4 10
6 3 4 10 6 20

dictionary

(df[df['amount_cumsum'].ge(20)]
.groupby(['item'])['level'].agg(list)
.to_dict()
)

Output:

{1: [10], 4: [10, 6], 9: [8]}

Pandas: Cumulative sum within group with two conditions

You can use .where() on conditions x < 1 or x >= 1 to temporarily modify the values of value_1 to 0 according to the condition and then groupby cumsum, as follows:

The second condition is catered by the .groupby function while the first condition is catered by the .where() function, detailed below:

.where() keeps the column values when the condition is true and change the values (to 0 in this case) when the condition is false. Thus, for the first condition where column x < 1, value_1 will keep its values for feeding to the subsequent cumsum step to accumulate the filtered values of value_1. For rows where the condition x < 1 is False, value_1 has its values masked to 0. These 0 passed to cumsum for accumulation is effectively the same effect as taking out the original values of value_1 for the accumulation into
column cumsum_1.

The second line of codes accumulates value_1 values to column cumsum_2 with the opposite condition of x >= 1. These 2 lines of codes, in effect, allocate value_1 to cumsum_1 and cumsum_2 according to x < 1 and x >= 1, respectively.

(Thanks for the suggestion of @tdy to simplify the codes)

df['cumsum_1'] = df['value_1'].where(df['x'] < 1, 0).groupby(df['y']).cumsum()
df['cumsum_2'] = df['value_1'].where(df['x'] >= 1, 0).groupby(df['y']).cumsum()

Result:

print(df)

x y value_1 cumsum_1 cumsum_2
0 0.10 1 12 12 0
1 1.20 1 10 12 10
2 0.25 1 7 19 10
3 1.00 2 3 0 3
4 0.72 2 5 5 3
5 1.50 2 10 5 13

How to calculate cumulative sum (reversed) of a Python DataFrame within given groups?

You can try with series groupby

df['new'] = df.loc[::-1, 'Chi'].groupby(df['Basin']).cumsum()
df
Out[858]:
Basin (n=17 columns) Chi new
0 13.0 ... 4 14
1 13.0 ... 8 10
2 13.0 ... 2 2
3 21.0 ... 4 10
4 21.0 ... 6 6
5 38.0 ... 1 14
6 38.0 ... 7 13
7 38.0 ... 2 6
8 38.0 ... 4 4

Pandas groupby cumulative sum and month

how about this one-liner :

df.groupby([pd.Grouper(key='DATE', freq='M'), 'USER'])['USER'].count().groupby(['USER']).cumsum()

How can I use cumsum within a group in Pandas?

You can call transform and pass the cumsum function to add that column to your df:

In [156]:
df['cumsum'] = df.groupby('id')['val'].transform(pd.Series.cumsum)
df

Out[156]:
id stuff val cumsum
0 A 12 1 1
1 B 23232 2 2
2 A 13 -3 -2
3 C 1234 1 1
4 D 3235 5 5
5 B 3236 6 8
6 C 732323 -2 -1

With respect to your error, you can't call cumsum on a Series groupby object, secondly you're passing the name of the column as a list which is meaningless.

So this works:

In [159]:
df.groupby('id')['val'].cumsum()

Out[159]:
0 1
1 2
2 -2
3 1
4 5
5 8
6 -1
dtype: int64

Is there a faster method to do a Pandas groupby cumulative mean?

IIUC remove 2 groupby by aggregate by sum and size first and then cumulative sum by both columns:

df1 = df.groupby(['DateTime', 'Player', 'Venue'])['Score'].agg(['sum','count'])
df1 = df1.groupby(['Player', 'Venue'])[['sum', 'count']].cumsum().reset_index()
df1['Venue Preference'] = np.where(df1['count'] >= 2, df1['sum'] / df1['count'], np.nan)
df1 = df1.drop(['sum', 'count'], axis=1)
print (df1)
DateTime Player Venue Venue Preference
0 2021-09-25 17:15:00 Tim Stadium A NaN
1 2021-09-27 10:00:00 Blake Stadium B NaN


Related Topics



Leave a reply



Submit