Group by Multiple Columns and Sum Other Multiple Columns

Pandas - dataframe groupby - how to get sum of multiple columns

By using apply

df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]: 
           col3  col4
col1 col2            
a    c        2     4
     d        1     2
b    d        1     2
     e        2     4

If you want to agg

df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})

Groupby sum and count on multiple columns in python

It can be done using pivot_table this way:

>>> df1=pd.pivot_table(df, index=['country','month'],values=['revenue','profit','ebit'],aggfunc=np.sum)
>>> df1 
                ebit  profit  revenue
country month                        
Canada  201411     5      10       15
UK      201410     5      10       20
USA     201409     5      12       19

>>> df2=pd.pivot_table(df, index=['country','month'], values='ID',aggfunc=len).rename('count')
>>> df2

country  month 
Canada   201411    1
UK       201410    1
USA      201409    2

>>> pd.concat([df1,df2],axis=1)

                ebit  profit  revenue  count
country month                               
Canada  201411     5      10       15      1
UK      201410     5      10       20      1
USA     201409     5      12       19      2

UPDATE

It can be done in one-line using pivot_table and providing a dict of functions to apply to each column in the aggfunc argument:

pd.pivot_table(
   df,
   index=['country','month'],
   aggfunc={'revenue': np.sum, 'profit': np.sum, 'ebit': np.sum, 'ID': len}
).rename(columns={'ID': 'count'})

                count  ebit  profit  revenue
country month                               
Canada  201411      1     5      10       15
UK      201410      1     5      10       20
USA     201409      2     5      12       19

Groupby multiple columns & Sum - Create new column with added If Condition

Cause of error

The syntax to select multiple columns df['column1', 'column2'] is wrong. This should be df[['column1', 'column2']]
Even if you use df[['column1', 'column2']] for groupby, pandas will raise another error complaining that the grouper should be one dimensional. This is because df[['column1', 'column2']] returns a dataframe which is a two dimensional object.

How to fix the error?

Hard way:

Pass each of the grouping columns as one dimensional series to groupby

df['new_column'] = (
        df['value']
          .where(df['value'] > 0)
          .groupby([df['column1'], df['column2']]) # Notice the change
          .transform('sum')
)

Easy way:

First assign the masked column values to the target column, then do groupby + transform as you would normally do

df['new_column'] = df['value'].where(df['value'] > 0)
df['new_column'] = df.groupby(['column1', 'column2'])['new_column'].transform('sum')

Groupby multiple columns and get the sum of two other columns while keeping the first occurrence of every other column

I think that using two separate operations on the groupby object and join them afterwards is clearer than a one-liner. Here is a minimal example, grouping on 1 column:

df = pd.DataFrame(
    [
        ("bird", "Falconiformes", 389.0, 5.5, 1),
        ("bird", "Psittaciformes", 24.0, 4.5, 2),
        ("mammal", "Carnivora", 80.2, 33.3, 1),
        ("mammal", "Primates", np.nan, 33.7, 2),
        ("mammal", "Carnivora", 58, 23, 3),
    ],
    index=["falcon", "parrot", "lion", "monkey", "leopard"],
    columns=("class", "family", "max_speed", "height", "order"),
)
print(df, "\n")

grouped = df.groupby('class')
df_sum = grouped[['max_speed', 'height']].agg(sum)
df_first = grouped['order'].first()
df_out = pd.concat([df_sum, df_first], axis=1)
print(df_out)

Output:

          class          family  max_speed  height  order
falcon     bird   Falconiformes      389.0     5.5      1
parrot     bird  Psittaciformes       24.0     4.5      2
lion     mammal       Carnivora       80.2    33.3      1
monkey   mammal        Primates        NaN    33.7      2
leopard  mammal       Carnivora       58.0    23.0      3 

        max_speed  height  order
class                           
bird        413.0    10.0      1
mammal      138.2    90.0      1

Pandas groupby sum multiple columns together

A solution (similar to the comments) would be:

year_months_cols = ["201803", "201804", "201805", "201806"]
df["sum_months"] = df[year_months_cols].sum(axis=1)
df.groupby("District")["sum_months"].agg("sum")

Pandas dataframe, how can I group by multiple columns and apply sum for specific column and add new count column?

You can use a dict of column names and aggregation functions. See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.aggregate.html

>>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9],
...                    [np.nan, np.nan, np.nan]],
...                   columns=['A', 'B', 'C'])
>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
#         A    B
# max   NaN  8.0
# min   1.0  2.0
# sum  12.0  NaN