Group by Multiple Columns and Sum Other Multiple Columns

Pandas - dataframe groupby - how to get sum of multiple columns

By using apply

df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]:
col3 col4
col1 col2
a c 2 4
d 1 2
b d 1 2
e 2 4

If you want to agg

df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})

Groupby sum and count on multiple columns in python

It can be done using pivot_table this way:

>>> df1=pd.pivot_table(df, index=['country','month'],values=['revenue','profit','ebit'],aggfunc=np.sum)
>>> df1
ebit profit revenue
country month
Canada 201411 5 10 15
UK 201410 5 10 20
USA 201409 5 12 19

>>> df2=pd.pivot_table(df, index=['country','month'], values='ID',aggfunc=len).rename('count')
>>> df2

country month
Canada 201411 1
UK 201410 1
USA 201409 2

>>> pd.concat([df1,df2],axis=1)

ebit profit revenue count
country month
Canada 201411 5 10 15 1
UK 201410 5 10 20 1
USA 201409 5 12 19 2

UPDATE

It can be done in one-line using pivot_table and providing a dict of functions to apply to each column in the aggfunc argument:

pd.pivot_table(
df,
index=['country','month'],
aggfunc={'revenue': np.sum, 'profit': np.sum, 'ebit': np.sum, 'ID': len}
).rename(columns={'ID': 'count'})

count ebit profit revenue
country month
Canada 201411 1 5 10 15
UK 201410 1 5 10 20
USA 201409 2 5 12 19

Groupby multiple columns & Sum - Create new column with added If Condition

Cause of error

  • The syntax to select multiple columns df['column1', 'column2'] is wrong. This should be df[['column1', 'column2']]
  • Even if you use df[['column1', 'column2']] for groupby, pandas will raise another error complaining that the grouper should be one dimensional. This is because df[['column1', 'column2']] returns a dataframe which is a two dimensional object.

How to fix the error?

Hard way:

Pass each of the grouping columns as one dimensional series to groupby

df['new_column'] = (
df['value']
.where(df['value'] > 0)
.groupby([df['column1'], df['column2']]) # Notice the change
.transform('sum')
)
Easy way:

First assign the masked column values to the target column, then do groupby + transform as you would normally do

df['new_column'] = df['value'].where(df['value'] > 0)
df['new_column'] = df.groupby(['column1', 'column2'])['new_column'].transform('sum')

Groupby multiple columns and get the sum of two other columns while keeping the first occurrence of every other column

I think that using two separate operations on the groupby object and join them afterwards is clearer than a one-liner. Here is a minimal example, grouping on 1 column:

df = pd.DataFrame(
[
("bird", "Falconiformes", 389.0, 5.5, 1),
("bird", "Psittaciformes", 24.0, 4.5, 2),
("mammal", "Carnivora", 80.2, 33.3, 1),
("mammal", "Primates", np.nan, 33.7, 2),
("mammal", "Carnivora", 58, 23, 3),
],
index=["falcon", "parrot", "lion", "monkey", "leopard"],
columns=("class", "family", "max_speed", "height", "order"),
)
print(df, "\n")

grouped = df.groupby('class')
df_sum = grouped[['max_speed', 'height']].agg(sum)
df_first = grouped['order'].first()
df_out = pd.concat([df_sum, df_first], axis=1)
print(df_out)

Output:

          class          family  max_speed  height  order
falcon bird Falconiformes 389.0 5.5 1
parrot bird Psittaciformes 24.0 4.5 2
lion mammal Carnivora 80.2 33.3 1
monkey mammal Primates NaN 33.7 2
leopard mammal Carnivora 58.0 23.0 3

max_speed height order
class
bird 413.0 10.0 1
mammal 138.2 90.0 1

Pandas groupby sum multiple columns together

A solution (similar to the comments) would be:

year_months_cols = ["201803", "201804", "201805", "201806"]
df["sum_months"] = df[year_months_cols].sum(axis=1)
df.groupby("District")["sum_months"].agg("sum")

Pandas dataframe, how can I group by multiple columns and apply sum for specific column and add new count column?

You can use a dict of column names and aggregation functions. See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.aggregate.html

>>> df = pd.DataFrame([[1, 2, 3],
... [4, 5, 6],
... [7, 8, 9],
... [np.nan, np.nan, np.nan]],
... columns=['A', 'B', 'C'])
>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
# A B
# max NaN 8.0
# min 1.0 2.0
# sum 12.0 NaN


Related Topics



Leave a reply



Submit