How do I Pandas group-by to get sum?
Use GroupBy.sum
:
df.groupby(['Fruit','Name']).sum()
Out[31]:
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Grapes Bob 35
Tom 87
Tony 15
Oranges Bob 67
Mike 57
Tom 15
Tony 1
Pandas - dataframe groupby - how to get sum of multiple columns
By using apply
df.groupby(['col1', 'col2'])["col3", "col4"].apply(lambda x : x.astype(int).sum())
Out[1257]:
col3 col4
col1 col2
a c 2 4
d 1 2
b d 1 2
e 2 4
If you want to agg
df.groupby(['col1', 'col2']).agg({'col3':'sum','col4':'sum'})
Get sum of group subset using pandas groupby
Let us try use the groupby
transform
idxmax
filter the dataframe , then do another round of groupby
idx = df['Stage'].eq(12).groupby(df['id']).transform('idxmax')
output = df[df.index <= idx].groupby('id')['Value'].sum().reset_index()
Detail
the transform
with idxmax
will return the first index match with 12 for all the groupby
row, then we need to filter the df
with index
less than that to get the data until the first 12 show up.
Pandas Groupby and Sum Only One Column
The only way to do this would be to include C in your groupby (the groupby function can accept a list).
Give this a try:
df.groupby(['A','C'])['B'].sum()
One other thing to note, if you need to work with df after the aggregation you can also use the as_index=False
option to return a dataframe object. This one gave me problems when I was first working with Pandas. Example:
df.groupby(['A','C'], as_index=False)['B'].sum()
Getting % Rate using Pandas Group By and .sum()
You can group
the dataframe on Year
and aggregate using sum:
s1 = df.groupby('Year').sum()
s2 = df.query("Ind == 'A'").groupby('Year').sum()
s2.div(s1).round(2).add_suffix('Rate')
XRate YRate ZRate
Year
2011 0.20 0.29 0.33
2012 0.47 0.62 0.25
Pandas groupby.sum for all columns
You can filter first and then pass df['group']
instead group
to groupby
, last add sum
column by DataFrame.assign
:
df1 = (df.filter(regex=r'_name$')
.groupby(df['group']).sum()
.assign(sum = lambda x: x.sum(axis=1)))
ALternative is filter columns names and pass after groupby
:
cols = df.filter(regex=r'_name$').columns
df1 = df.groupby('group')[cols].sum()
Or:
cols = df.columns[df.columns.str.contains(r'_name$')]
df1 = df.groupby('group')[cols].sum().assign(sum = lambda x: x.sum(axis=1))
print (df1)
a_name b_name q_name sum
group
a 7 13 10 30
b 10 6 10 26
c 10 2 5 17
Pandas dataframe Groupby with Min,Max and Sum
print(
df.groupby("CID", as_index=False).agg(
{"priority": "min", "Ind": "max", "amount": "sum"}
)
)
Prints:
CID priority Ind amount
0 C100 1 1 150
1 C300 3 0 650
Groupby multiple columns & Sum - Create new column with added If Condition
Cause of error
- The syntax to select multiple columns
df['column1', 'column2']
is wrong. This should bedf[['column1', 'column2']]
- Even if you use
df[['column1', 'column2']]
forgroupby
, pandas will raise another error complaining that the grouper should beone dimensional
. This is becausedf[['column1', 'column2']]
returns a dataframe which is a two dimensional object.
How to fix the error?
Hard way:
Pass each of the grouping columns as one dimensional series to groupby
df['new_column'] = (
df['value']
.where(df['value'] > 0)
.groupby([df['column1'], df['column2']]) # Notice the change
.transform('sum')
)
Easy way:
First assign the masked column values to the target column, then do groupby
+ transform
as you would normally do
df['new_column'] = df['value'].where(df['value'] > 0)
df['new_column'] = df.groupby(['column1', 'column2'])['new_column'].transform('sum')
how to add new row into each group of groupby in PANDAS , one of the value of that row is sum of values of each groups
You can create a dataframe with the sum of each group by .groupby()
and .sum()
, set the prop_cd
as Hlds
by .assign()
.
Then, concat with the original dataframe by pd.concat()
and sort the columns to put the sum rows back together with their respective groups by .sort_values()
, as follows:
df_sum = df.groupby(['eff_date','mdl_cd','ast_cd'], as_index=False)['value'].sum().assign(prop_cd='Hlds')
df_out = pd.concat([df, df_sum]).sort_values(['eff_date','mdl_cd','ast_cd'], kind='stable', ignore_index=True)
Result:
print(df_out)
eff_date mdl_cd ast_cd prop_cd value
0 2021-09-22 Comm Agri Car -0.1234
1 2021-09-22 Comm Agri Fund 0.5123
2 2021-09-22 Comm Agri Mmt -0.7612
3 2021-09-22 Comm Agri Hlds -0.3723
4 2021-09-22 Comm Engy Car 0.1212
5 2021-09-22 Comm Engy Fund -0.1234
6 2021-09-22 Comm Engy Mmt 0.5123
7 2021-09-22 Comm Engy Hlds 0.5101
8 2021-09-22 Comm Industry Car -0.7612
9 2021-09-22 Comm Industry Fund 0.1212
10 2021-09-22 Comm Industry Mmt -0.1234
11 2021-09-22 Comm Industry Hlds -0.7634
12 2021-09-22 Comm Metal Car 0.5123
13 2021-09-22 Comm Metal Fund -0.7612
14 2021-09-22 Comm Metal Mmt 0.1212
15 2021-09-22 Comm Metal Hlds -0.1277
16 2021-09-23 Equity Agri Car 0.6541
17 2021-09-23 Equity Agri Fund 0.5123
18 2021-09-23 Equity Agri Mmt -0.1874
19 2021-09-23 Equity Agri Hlds 0.9790
20 2021-09-23 Equity Engy Car 0.1212
21 2021-09-23 Equity Engy Fund -0.6234
22 2021-09-23 Equity Engy Mmt 0.5123
23 2021-09-23 Equity Engy Hlds 0.0101
24 2021-09-23 Equity Industry Car -0.1612
25 2021-09-23 Equity Industry Fund 0.1212
26 2021-09-23 Equity Industry Mmt -0.1934
27 2021-09-23 Equity Industry Hlds -0.2334
28 2021-09-23 Equity Metal Car 0.5123
29 2021-09-23 Equity Metal Fund 0.5412
30 2021-09-23 Equity Metal Mmt 0.1212
31 2021-09-23 Equity Metal Hlds 1.1747
Setup
df = pd.read_clipboard(',')
eff_date mdl_cd ast_cd prop_cd value
0 2021-09-22 Comm Agri Car -0.1234
1 2021-09-22 Comm Agri Fund 0.5123
2 2021-09-22 Comm Agri Mmt -0.7612
3 2021-09-22 Comm Engy Car 0.1212
4 2021-09-22 Comm Engy Fund -0.1234
5 2021-09-22 Comm Engy Mmt 0.5123
6 2021-09-22 Comm Industry Car -0.7612
7 2021-09-22 Comm Industry Fund 0.1212
8 2021-09-22 Comm Industry Mmt -0.1234
9 2021-09-22 Comm Metal Car 0.5123
10 2021-09-22 Comm Metal Fund -0.7612
11 2021-09-22 Comm Metal Mmt 0.1212
12 2021-09-23 Equity Agri Car 0.6541
13 2021-09-23 Equity Agri Fund 0.5123
14 2021-09-23 Equity Agri Mmt -0.1874
15 2021-09-23 Equity Engy Car 0.1212
16 2021-09-23 Equity Engy Fund -0.6234
17 2021-09-23 Equity Engy Mmt 0.5123
18 2021-09-23 Equity Industry Car -0.1612
19 2021-09-23 Equity Industry Fund 0.1212
20 2021-09-23 Equity Industry Mmt -0.1934
21 2021-09-23 Equity Metal Car 0.5123
22 2021-09-23 Equity Metal Fund 0.5412
23 2021-09-23 Equity Metal Mmt 0.1212
Interim result:
print(df_sum)
eff_date mdl_cd ast_cd value prop_cd
0 2021-09-22 Comm Agri -0.3723 Hlds
1 2021-09-22 Comm Engy 0.5101 Hlds
2 2021-09-22 Comm Industry -0.7634 Hlds
3 2021-09-22 Comm Metal -0.1277 Hlds
4 2021-09-23 Equity Agri 0.9790 Hlds
5 2021-09-23 Equity Engy 0.0101 Hlds
6 2021-09-23 Equity Industry -0.2334 Hlds
7 2021-09-23 Equity Metal 1.1747 Hlds
Related Topics
How to Get Text of an Element in Selenium Webdriver, Without Including Child Element Text
How to Terminate Process from Python Using Pid
How Would I Build Python Myself from Source Code on Ubuntu
Listing Available Devices in Python-Opencv
Tkinter: Attributeerror: Nonetype Object Has No Attribute ≪Attribute Name≫
Are Global Variables Thread-Safe in Flask? How to Share Data Between Requests
What Is the Purpose of the Single Underscore "_" Variable in Python
Dealing With Multiple Python Versions and Pip
Cartesian Product of X and Y Array Points into Single Array of 2D Points
How to Simulate Html5 Drag and Drop in Selenium Webdriver
How to Get Pid by Process Name
Store Large Data or a Service Connection Per Flask Session
How to Sort a Dictionary by Value
What Does "List Comprehension" and Similar Mean? How Does It Work and How to Use It