How to Pandas Group-By to Get Sum

How do I Pandas group-by to get sum?

Use GroupBy.sum:

df.groupby(['Fruit','Name']).sum()

Out[31]:
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Grapes Bob 35
Tom 87
Tony 15
Oranges Bob 67
Mike 57
Tom 15
Tony 1

Pandas - dataframe groupby - how to get sum of multiple columns

By using apply

df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]:
col3 col4
col1 col2
a c 2 4
d 1 2
b d 1 2
e 2 4

If you want to agg

df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})

Get sum of group subset using pandas groupby

Let us try use the groupby transform idxmax filter the dataframe , then do another round of groupby

idx = df['Stage'].eq(12).groupby(df['id']).transform('idxmax')
output = df[df.index <= idx].groupby('id')['Value'].sum().reset_index()

Detail

the transform with idxmax will return the first index match with 12 for all the groupby row, then we need to filter the df with index less than that to get the data until the first 12 show up.

Pandas Groupby and Sum Only One Column

The only way to do this would be to include C in your groupby (the groupby function can accept a list).

Give this a try:

df.groupby(['A','C'])['B'].sum()

One other thing to note, if you need to work with df after the aggregation you can also use the as_index=False option to return a dataframe object. This one gave me problems when I was first working with Pandas. Example:

df.groupby(['A','C'], as_index=False)['B'].sum()

Getting % Rate using Pandas Group By and .sum()

You can group the dataframe on Year and aggregate using sum:

s1 = df.groupby('Year').sum()
s2 = df.query("Ind == 'A'").groupby('Year').sum()

s2.div(s1).round(2).add_suffix('Rate')


      XRate  YRate  ZRate
Year
2011 0.20 0.29 0.33
2012 0.47 0.62 0.25

Pandas groupby.sum for all columns

You can filter first and then pass df['group'] instead group to groupby, last add sum column by DataFrame.assign:

df1 = (df.filter(regex=r'_name$')
.groupby(df['group']).sum()
.assign(sum = lambda x: x.sum(axis=1)))

ALternative is filter columns names and pass after groupby:

cols = df.filter(regex=r'_name$').columns

df1 = df.groupby('group')[cols].sum()

Or:

cols = df.columns[df.columns.str.contains(r'_name$')]

df1 = df.groupby('group')[cols].sum().assign(sum = lambda x: x.sum(axis=1))


print (df1)
a_name b_name q_name sum
group
a 7 13 10 30
b 10 6 10 26
c 10 2 5 17

Pandas dataframe Groupby with Min,Max and Sum

print(
df.groupby("CID", as_index=False).agg(
{"priority": "min", "Ind": "max", "amount": "sum"}
)
)

Prints:

    CID  priority  Ind  amount
0 C100 1 1 150
1 C300 3 0 650

Groupby multiple columns & Sum - Create new column with added If Condition

Cause of error

  • The syntax to select multiple columns df['column1', 'column2'] is wrong. This should be df[['column1', 'column2']]
  • Even if you use df[['column1', 'column2']] for groupby, pandas will raise another error complaining that the grouper should be one dimensional. This is because df[['column1', 'column2']] returns a dataframe which is a two dimensional object.

How to fix the error?

Hard way:

Pass each of the grouping columns as one dimensional series to groupby

df['new_column'] = (
df['value']
.where(df['value'] > 0)
.groupby([df['column1'], df['column2']]) # Notice the change
.transform('sum')
)
Easy way:

First assign the masked column values to the target column, then do groupby + transform as you would normally do

df['new_column'] = df['value'].where(df['value'] > 0)
df['new_column'] = df.groupby(['column1', 'column2'])['new_column'].transform('sum')

how to add new row into each group of groupby in PANDAS , one of the value of that row is sum of values of each groups

You can create a dataframe with the sum of each group by .groupby() and .sum(), set the prop_cd as Hlds by .assign().

Then, concat with the original dataframe by pd.concat() and sort the columns to put the sum rows back together with their respective groups by .sort_values(), as follows:

df_sum = df.groupby(['eff_date','mdl_cd','ast_cd'], as_index=False)['value'].sum().assign(prop_cd='Hlds')

df_out = pd.concat([df, df_sum]).sort_values(['eff_date','mdl_cd','ast_cd'], kind='stable', ignore_index=True)

Result:

print(df_out)

eff_date mdl_cd ast_cd prop_cd value
0 2021-09-22 Comm Agri Car -0.1234
1 2021-09-22 Comm Agri Fund 0.5123
2 2021-09-22 Comm Agri Mmt -0.7612
3 2021-09-22 Comm Agri Hlds -0.3723
4 2021-09-22 Comm Engy Car 0.1212
5 2021-09-22 Comm Engy Fund -0.1234
6 2021-09-22 Comm Engy Mmt 0.5123
7 2021-09-22 Comm Engy Hlds 0.5101
8 2021-09-22 Comm Industry Car -0.7612
9 2021-09-22 Comm Industry Fund 0.1212
10 2021-09-22 Comm Industry Mmt -0.1234
11 2021-09-22 Comm Industry Hlds -0.7634
12 2021-09-22 Comm Metal Car 0.5123
13 2021-09-22 Comm Metal Fund -0.7612
14 2021-09-22 Comm Metal Mmt 0.1212
15 2021-09-22 Comm Metal Hlds -0.1277
16 2021-09-23 Equity Agri Car 0.6541
17 2021-09-23 Equity Agri Fund 0.5123
18 2021-09-23 Equity Agri Mmt -0.1874
19 2021-09-23 Equity Agri Hlds 0.9790
20 2021-09-23 Equity Engy Car 0.1212
21 2021-09-23 Equity Engy Fund -0.6234
22 2021-09-23 Equity Engy Mmt 0.5123
23 2021-09-23 Equity Engy Hlds 0.0101
24 2021-09-23 Equity Industry Car -0.1612
25 2021-09-23 Equity Industry Fund 0.1212
26 2021-09-23 Equity Industry Mmt -0.1934
27 2021-09-23 Equity Industry Hlds -0.2334
28 2021-09-23 Equity Metal Car 0.5123
29 2021-09-23 Equity Metal Fund 0.5412
30 2021-09-23 Equity Metal Mmt 0.1212
31 2021-09-23 Equity Metal Hlds 1.1747

Setup

df = pd.read_clipboard(',')

eff_date mdl_cd ast_cd prop_cd value
0 2021-09-22 Comm Agri Car -0.1234
1 2021-09-22 Comm Agri Fund 0.5123
2 2021-09-22 Comm Agri Mmt -0.7612
3 2021-09-22 Comm Engy Car 0.1212
4 2021-09-22 Comm Engy Fund -0.1234
5 2021-09-22 Comm Engy Mmt 0.5123
6 2021-09-22 Comm Industry Car -0.7612
7 2021-09-22 Comm Industry Fund 0.1212
8 2021-09-22 Comm Industry Mmt -0.1234
9 2021-09-22 Comm Metal Car 0.5123
10 2021-09-22 Comm Metal Fund -0.7612
11 2021-09-22 Comm Metal Mmt 0.1212
12 2021-09-23 Equity Agri Car 0.6541
13 2021-09-23 Equity Agri Fund 0.5123
14 2021-09-23 Equity Agri Mmt -0.1874
15 2021-09-23 Equity Engy Car 0.1212
16 2021-09-23 Equity Engy Fund -0.6234
17 2021-09-23 Equity Engy Mmt 0.5123
18 2021-09-23 Equity Industry Car -0.1612
19 2021-09-23 Equity Industry Fund 0.1212
20 2021-09-23 Equity Industry Mmt -0.1934
21 2021-09-23 Equity Metal Car 0.5123
22 2021-09-23 Equity Metal Fund 0.5412
23 2021-09-23 Equity Metal Mmt 0.1212

Interim result:

print(df_sum)

eff_date mdl_cd ast_cd value prop_cd
0 2021-09-22 Comm Agri -0.3723 Hlds
1 2021-09-22 Comm Engy 0.5101 Hlds
2 2021-09-22 Comm Industry -0.7634 Hlds
3 2021-09-22 Comm Metal -0.1277 Hlds
4 2021-09-23 Equity Agri 0.9790 Hlds
5 2021-09-23 Equity Engy 0.0101 Hlds
6 2021-09-23 Equity Industry -0.2334 Hlds
7 2021-09-23 Equity Metal 1.1747 Hlds


Related Topics



Leave a reply



Submit