How to Pandas Group-By to Get Sum

How do I Pandas group-by to get sum?

Use GroupBy.sum:

df.groupby(['Fruit','Name']).sum()

Out[31]: 
               Number
Fruit   Name         
Apples  Bob        16
        Mike        9
        Steve      10
Grapes  Bob        35
        Tom        87
        Tony       15
Oranges Bob        67
        Mike       57
        Tom        15
        Tony        1

Pandas - dataframe groupby - how to get sum of multiple columns

By using apply

df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]: 
           col3  col4
col1 col2            
a    c        2     4
     d        1     2
b    d        1     2
     e        2     4

If you want to agg

df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})

Get sum of group subset using pandas groupby

Let us try use the groupby transform idxmax filter the dataframe , then do another round of groupby

idx = df['Stage'].eq(12).groupby(df['id']).transform('idxmax')
output = df[df.index <= idx].groupby('id')['Value'].sum().reset_index()

Detail

the transform with idxmax will return the first index match with 12 for all the groupby row, then we need to filter the df with index less than that to get the data until the first 12 show up.

Pandas Groupby and Sum Only One Column

The only way to do this would be to include C in your groupby (the groupby function can accept a list).

Give this a try:

df.groupby(['A','C'])['B'].sum()

One other thing to note, if you need to work with df after the aggregation you can also use the as_index=False option to return a dataframe object. This one gave me problems when I was first working with Pandas. Example:

df.groupby(['A','C'], as_index=False)['B'].sum()

Getting % Rate using Pandas Group By and .sum()

You can group the dataframe on Year and aggregate using sum:

s1 = df.groupby('Year').sum()
s2 = df.query("Ind == 'A'").groupby('Year').sum()

s2.div(s1).round(2).add_suffix('Rate')

      XRate  YRate  ZRate
Year                     
2011   0.20   0.29   0.33
2012   0.47   0.62   0.25

Pandas groupby.sum for all columns

You can filter first and then pass df['group'] instead group to groupby, last add sum column by DataFrame.assign:

df1 = (df.filter(regex=r'_name$')
         .groupby(df['group']).sum()
         .assign(sum = lambda x: x.sum(axis=1)))

ALternative is filter columns names and pass after groupby:

cols = df.filter(regex=r'_name$').columns

df1 = df.groupby('group')[cols].sum()

Or:

cols = df.columns[df.columns.str.contains(r'_name$')]

df1 = df.groupby('group')[cols].sum().assign(sum = lambda x: x.sum(axis=1))

print (df1)
       a_name  b_name  q_name  sum
group                             
a           7      13      10   30
b          10       6      10   26
c          10       2       5   17

Pandas dataframe Groupby with Min,Max and Sum

print(
    df.groupby("CID", as_index=False).agg(
        {"priority": "min", "Ind": "max", "amount": "sum"}
    )
)

Prints:

    CID  priority  Ind  amount
0  C100         1    1     150
1  C300         3    0     650

Groupby multiple columns & Sum - Create new column with added If Condition

Cause of error

The syntax to select multiple columns df['column1', 'column2'] is wrong. This should be df[['column1', 'column2']]
Even if you use df[['column1', 'column2']] for groupby, pandas will raise another error complaining that the grouper should be one dimensional. This is because df[['column1', 'column2']] returns a dataframe which is a two dimensional object.

How to fix the error?

Hard way:

Pass each of the grouping columns as one dimensional series to groupby

df['new_column'] = (
        df['value']
          .where(df['value'] > 0)
          .groupby([df['column1'], df['column2']]) # Notice the change
          .transform('sum')
)

Easy way:

First assign the masked column values to the target column, then do groupby + transform as you would normally do

df['new_column'] = df['value'].where(df['value'] > 0)
df['new_column'] = df.groupby(['column1', 'column2'])['new_column'].transform('sum')

how to add new row into each group of groupby in PANDAS , one of the value of that row is sum of values of each groups

You can create a dataframe with the sum of each group by .groupby() and .sum(), set the prop_cd as Hlds by .assign().

Then, concat with the original dataframe by pd.concat() and sort the columns to put the sum rows back together with their respective groups by .sort_values(), as follows:

df_sum = df.groupby(['eff_date','mdl_cd','ast_cd'], as_index=False)['value'].sum().assign(prop_cd='Hlds')

df_out = pd.concat([df, df_sum]).sort_values(['eff_date','mdl_cd','ast_cd'], kind='stable', ignore_index=True)

Result:

print(df_out)

      eff_date  mdl_cd    ast_cd prop_cd   value
0   2021-09-22    Comm      Agri     Car -0.1234
1   2021-09-22    Comm      Agri    Fund  0.5123
2   2021-09-22    Comm      Agri     Mmt -0.7612
3   2021-09-22    Comm      Agri    Hlds -0.3723
4   2021-09-22    Comm      Engy     Car  0.1212
5   2021-09-22    Comm      Engy    Fund -0.1234
6   2021-09-22    Comm      Engy     Mmt  0.5123
7   2021-09-22    Comm      Engy    Hlds  0.5101
8   2021-09-22    Comm  Industry     Car -0.7612
9   2021-09-22    Comm  Industry    Fund  0.1212
10  2021-09-22    Comm  Industry     Mmt -0.1234
11  2021-09-22    Comm  Industry    Hlds -0.7634
12  2021-09-22    Comm     Metal     Car  0.5123
13  2021-09-22    Comm     Metal    Fund -0.7612
14  2021-09-22    Comm     Metal     Mmt  0.1212
15  2021-09-22    Comm     Metal    Hlds -0.1277
16  2021-09-23  Equity      Agri     Car  0.6541
17  2021-09-23  Equity      Agri    Fund  0.5123
18  2021-09-23  Equity      Agri     Mmt -0.1874
19  2021-09-23  Equity      Agri    Hlds  0.9790
20  2021-09-23  Equity      Engy     Car  0.1212
21  2021-09-23  Equity      Engy    Fund -0.6234
22  2021-09-23  Equity      Engy     Mmt  0.5123
23  2021-09-23  Equity      Engy    Hlds  0.0101
24  2021-09-23  Equity  Industry     Car -0.1612
25  2021-09-23  Equity  Industry    Fund  0.1212
26  2021-09-23  Equity  Industry     Mmt -0.1934
27  2021-09-23  Equity  Industry    Hlds -0.2334
28  2021-09-23  Equity     Metal     Car  0.5123
29  2021-09-23  Equity     Metal    Fund  0.5412
30  2021-09-23  Equity     Metal     Mmt  0.1212
31  2021-09-23  Equity     Metal    Hlds  1.1747

Setup

df = pd.read_clipboard(',')

      eff_date  mdl_cd    ast_cd prop_cd   value
0   2021-09-22    Comm      Agri     Car -0.1234
1   2021-09-22    Comm      Agri    Fund  0.5123
2   2021-09-22    Comm      Agri     Mmt -0.7612
3   2021-09-22    Comm      Engy     Car  0.1212
4   2021-09-22    Comm      Engy    Fund -0.1234
5   2021-09-22    Comm      Engy     Mmt  0.5123
6   2021-09-22    Comm  Industry     Car -0.7612
7   2021-09-22    Comm  Industry    Fund  0.1212
8   2021-09-22    Comm  Industry     Mmt -0.1234
9   2021-09-22    Comm     Metal     Car  0.5123
10  2021-09-22    Comm     Metal    Fund -0.7612
11  2021-09-22    Comm     Metal     Mmt  0.1212
12  2021-09-23  Equity      Agri     Car  0.6541
13  2021-09-23  Equity      Agri    Fund  0.5123
14  2021-09-23  Equity      Agri     Mmt -0.1874
15  2021-09-23  Equity      Engy     Car  0.1212
16  2021-09-23  Equity      Engy    Fund -0.6234
17  2021-09-23  Equity      Engy     Mmt  0.5123
18  2021-09-23  Equity  Industry     Car -0.1612
19  2021-09-23  Equity  Industry    Fund  0.1212
20  2021-09-23  Equity  Industry     Mmt -0.1934
21  2021-09-23  Equity     Metal     Car  0.5123
22  2021-09-23  Equity     Metal    Fund  0.5412
23  2021-09-23  Equity     Metal     Mmt  0.1212

Interim result:

print(df_sum)

     eff_date  mdl_cd    ast_cd   value prop_cd
0  2021-09-22    Comm      Agri -0.3723    Hlds
1  2021-09-22    Comm      Engy  0.5101    Hlds
2  2021-09-22    Comm  Industry -0.7634    Hlds
3  2021-09-22    Comm     Metal -0.1277    Hlds
4  2021-09-23  Equity      Agri  0.9790    Hlds
5  2021-09-23  Equity      Engy  0.0101    Hlds
6  2021-09-23  Equity  Industry -0.2334    Hlds
7  2021-09-23  Equity     Metal  1.1747    Hlds