Pandas percentage of total with groupby
Update 2022-03
This answer by caner using transform
looks much better than my original answer!
df['sales'] / df.groupby('state')['sales'].transform('sum')
Thanks to this comment by Paul Rougieux for surfacing it.
Original Answer (2014)
Paul H's answer is right that you will have to make a second groupby
object, but you can calculate the percentage in a simpler way -- just groupby
the state_office
and divide the sales
column by its sum. Copying the beginning of Paul H's answer:
# From Paul H
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
'office_id': list(range(1, 7)) * 2,
'sales': [np.random.randint(100000, 999999)
for _ in range(12)]})
state_office = df.groupby(['state', 'office_id']).agg({'sales': 'sum'})
# Change: groupby state_office and divide by sum
state_pcts = state_office.groupby(level=0).apply(lambda x:
100 * x / float(x.sum()))
Returns:
sales
state office_id
AZ 2 16.981365
4 19.250033
6 63.768601
CA 1 19.331879
3 33.858747
5 46.809373
CO 1 36.851857
3 19.874290
5 43.273852
WA 2 34.707233
4 35.511259
6 29.781508
pandas groupby to calculate percentage of groupby columns
You can do -
df = df.groupby(['location']).agg({'new_deaths': sum, 'population': max})
df['rate_death'] = df['new_deaths'] / df['population'] * 100
Result
new_deaths population rate_death
location
Afghanistan 15 38928341 0.000039
Albania 1 2877800 0.000035
Percentage of Total with Groupby for two columns
You can chain groupby
:
pct = lambda x: 100 * x / x.sum()
out = df.groupby(['Product', 'Type']).sum().groupby('Product').apply(pct)
print(out)
# Output
Sales Qty
Product Type
AA AC 37.500000 47.058824
AD 62.500000 52.941176
BB BC 36.363636 68.750000
BD 63.636364 31.250000
How to calculate count and percentage in groupby in Python
I think you can use:
P = Publisher.groupby('Category')['Title'].count().reset_index()
P['Percentage'] = 100 * P['Title'] / P['Title'].sum()
Sample:
Publisher = pd.DataFrame({'Category':['a','a','s'],
'Title':[4,5,6]})
print (Publisher)
Category Title
0 a 4
1 a 5
2 s 6
P = Publisher.groupby('Category')['Title'].count().reset_index()
P['Percentage'] = 100 * P['Title'] / P['Title'].sum()
print (P)
Category Title Percentage
0 a 2 66.666667
1 s 1 33.333333
Pandas group by column find percentage of count in each group
You can calculate percentage for each age/count using lambda
df['perc'] = df.groupby('age')['count'].apply(lambda x: x*100/x.sum())
age section count perc
0 13-17 a 160 55.555556
1 25-34 c 128 35.555556
2 13-17 d 128 44.444444
3 25-34 a 120 33.333333
4 35-44 b 120 50.000000
5 35-44 a 120 50.000000
6 25-34 b 112 31.111111
If you want to round the percentage values,
df['perc'] = df.groupby('age')['count'].apply(lambda x: np.round(x*100/x.sum(), 2))
Pandas percentage of total with groupby with more than one column
You can use groupby().transform()
to keep the original index:
state_office.div(state_office.groupby(level=0).transform('sum')).mul(100)
Output:
sales units
state office_id
AZ 2 16.981365 31.059160
4 19.250033 23.664122
6 63.768601 45.276718
CA 1 19.331879 22.049287
3 33.858747 24.254215
5 46.809373 53.696498
CO 1 36.851857 29.506546
3 19.874290 35.246727
5 43.273852 35.246727
WA 2 34.707233 34.645669
4 35.511259 16.596002
6 29.781508 48.758328
Pandas: percentage of a value relative to the total of the group
You need a simple groupby.transform('sum')
to get the total per group, then perform classical vector arithmetic.
I provided an example as float and one as string:
total = df.groupby('Range')['Quantity'].transform('sum')
# as float
df['% of range'] = df['Quantity'].div(total)
# as string
df['% of range (str)'] = df['Quantity'].div(total).mul(100).astype(int).astype(str) + ' %'
output:
id Product Range Quantity % of range % of range (str)
0 1 Prod1 A 6 0.6 60 %
1 2 Prod2 A 4 0.4 40 %
2 3 Prod3 B 2 0.2 20 %
3 4 Prod4 B 8 0.8 80 %
Pandas Groupby Percentage of total
Try div
:
q1_sales[['Jan','Feb','Mar']].div(q1_sales['Q1']*0.01, axis='rows')
Output:
Jan Feb Mar
City
Los Angeles 31.884058 28.985507 39.130435
Pandas groupby percentage of total and add subtotals
We need create you need step by step, include groupby
with append the subtotal per group on column , then transform
the total sum with state
s = df.groupby('state')[['sales','sales2','sales3']].sum().assign(office_id = 'Subtotal').set_index('office_id',append=True)
out = pd.concat([df,s.reset_index()]).sort_values('state')
out['Subtotal'] = out[['sales','sales2','sales3']].sum(axis=1)
v = out.groupby('state')['Subtotal'].transform('sum')/2
out.update(out[['sales','sales2','sales3','Subtotal']].div(v,axis=0))
out
state office_id sales sales2 sales3 Subtotal
3 AZ 4 0.047124 0.175385 0.118068 0.340578
7 AZ 2 0.041571 0.087926 0.087902 0.217399
11 AZ 6 0.156107 0.131998 0.153919 0.442023
0 AZ Subtotal 0.244802 0.395309 0.359889 1.000000
0 CA 1 0.062026 0.127860 0.145870 0.335756
4 CA 5 0.150188 0.107702 0.068203 0.326092
8 CA 3 0.108636 0.129193 0.100323 0.338152
1 CA Subtotal 0.320849 0.364755 0.314396 1.000000
2 CO 3 0.058604 0.072756 0.142734 0.274095
6 CO 1 0.108667 0.208210 0.145513 0.462390
10 CO 5 0.127604 0.095630 0.040282 0.263516
2 CO Subtotal 0.294875 0.376596 0.328529 1.000000
1 WA 2 0.106233 0.081434 0.085797 0.273463
5 WA 6 0.091156 0.127159 0.138270 0.356585
9 WA 4 0.108694 0.195807 0.065451 0.369952
3 WA Subtotal 0.306083 0.404399 0.289518 1.000000
Related Topics
Return, Return None, and No Return At All
Valueerror: Invalid Literal For Int() With Base 10: ''
Use a List of Values to Select Rows from a Pandas Dataframe
Parse Date String and Change Format
How to Install a Python Package With a .Whl File
Why Is Python Running My Module When I Import It, and How to Stop It
Why Is This Printing 'None' in the Output
How to Get a Substring of a String in Python
How to Sort a Dataframe in Python Pandas by Two or More Columns
Selecting Multiple Columns in a Pandas Dataframe
How Do Python'S Any and All Functions Work
How to Read a Text File into a String Variable and Strip Newlines