Pandas - dataframe groupby - how to get sum of multiple columns
By using apply
df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]:
col3 col4
col1 col2
a c 2 4
d 1 2
b d 1 2
e 2 4
If you want to agg
df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})
Groupby sum and count on multiple columns in python
It can be done using pivot_table
this way:
>>> df1=pd.pivot_table(df, index=['country','month'],values=['revenue','profit','ebit'],aggfunc=np.sum)
>>> df1
ebit profit revenue
country month
Canada 201411 5 10 15
UK 201410 5 10 20
USA 201409 5 12 19
>>> df2=pd.pivot_table(df, index=['country','month'], values='ID',aggfunc=len).rename('count')
>>> df2
country month
Canada 201411 1
UK 201410 1
USA 201409 2
>>> pd.concat([df1,df2],axis=1)
ebit profit revenue count
country month
Canada 201411 5 10 15 1
UK 201410 5 10 20 1
USA 201409 5 12 19 2
UPDATE
It can be done in one-line using pivot_table
and providing a dict of functions to apply to each column in the aggfunc
argument:
pd.pivot_table(
df,
index=['country','month'],
aggfunc={'revenue': np.sum, 'profit': np.sum, 'ebit': np.sum, 'ID': len}
).rename(columns={'ID': 'count'})
count ebit profit revenue
country month
Canada 201411 1 5 10 15
UK 201410 1 5 10 20
USA 201409 2 5 12 19
Groupby multiple columns & Sum - Create new column with added If Condition
Cause of error
- The syntax to select multiple columns
df['column1', 'column2']
is wrong. This should bedf[['column1', 'column2']]
- Even if you use
df[['column1', 'column2']]
forgroupby
, pandas will raise another error complaining that the grouper should beone dimensional
. This is becausedf[['column1', 'column2']]
returns a dataframe which is a two dimensional object.
How to fix the error?
Hard way:
Pass each of the grouping columns as one dimensional series to groupby
df['new_column'] = (
df['value']
.where(df['value'] > 0)
.groupby([df['column1'], df['column2']]) # Notice the change
.transform('sum')
)
Easy way:
First assign the masked column values to the target column, then do groupby
+ transform
as you would normally do
df['new_column'] = df['value'].where(df['value'] > 0)
df['new_column'] = df.groupby(['column1', 'column2'])['new_column'].transform('sum')
Groupby multiple columns and get the sum of two other columns while keeping the first occurrence of every other column
I think that using two separate operations on the groupby object and join them afterwards is clearer than a one-liner. Here is a minimal example, grouping on 1 column:
df = pd.DataFrame(
[
("bird", "Falconiformes", 389.0, 5.5, 1),
("bird", "Psittaciformes", 24.0, 4.5, 2),
("mammal", "Carnivora", 80.2, 33.3, 1),
("mammal", "Primates", np.nan, 33.7, 2),
("mammal", "Carnivora", 58, 23, 3),
],
index=["falcon", "parrot", "lion", "monkey", "leopard"],
columns=("class", "family", "max_speed", "height", "order"),
)
print(df, "\n")
grouped = df.groupby('class')
df_sum = grouped[['max_speed', 'height']].agg(sum)
df_first = grouped['order'].first()
df_out = pd.concat([df_sum, df_first], axis=1)
print(df_out)
Output:
class family max_speed height order
falcon bird Falconiformes 389.0 5.5 1
parrot bird Psittaciformes 24.0 4.5 2
lion mammal Carnivora 80.2 33.3 1
monkey mammal Primates NaN 33.7 2
leopard mammal Carnivora 58.0 23.0 3
max_speed height order
class
bird 413.0 10.0 1
mammal 138.2 90.0 1
Pandas groupby sum multiple columns together
A solution (similar to the comments) would be:
year_months_cols = ["201803", "201804", "201805", "201806"]
df["sum_months"] = df[year_months_cols].sum(axis=1)
df.groupby("District")["sum_months"].agg("sum")
Pandas dataframe, how can I group by multiple columns and apply sum for specific column and add new count column?
You can use a dict of column names and aggregation functions. See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.aggregate.html
>>> df = pd.DataFrame([[1, 2, 3],
... [4, 5, 6],
... [7, 8, 9],
... [np.nan, np.nan, np.nan]],
... columns=['A', 'B', 'C'])
>>> df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
# A B
# max NaN 8.0
# min 1.0 2.0
# sum 12.0 NaN
Related Topics
How to Create a Lag Variable Within Each Group
Data.Table VS Dplyr: Can One Do Something Well the Other Can't or Does Poorly
Convert Data.Frame Columns from Factors to Characters
Complete Dataframe With Missing Combinations of Values
Finding Local Maxima and Minima
Add Column Which Contains Binned Values of a Numeric Column
Controlling Number of Decimal Digits in Print Output in R
Ggplot'S Qplot Does Not Execute on Sourcing
Dictionary Style Replace Multiple Items
Shading a Kernel Density Plot Between Two Points.
Determine Path of the Executing Script
How to Split Data into Training/Testing Sets Using Sample Function
Selecting Data Frame Rows Based on Partial String Match in a Column
Run R Script from Command Line
Dcast Warning: 'Aggregation Function Missing: Defaulting to Length'