Pandas Percentage of Total With Groupby

Pandas percentage of total with groupby

Update 2022-03

This answer by caner using transform looks much better than my original answer!

df['sales'] / df.groupby('state')['sales'].transform('sum')

Thanks to this comment by Paul Rougieux for surfacing it.

Original Answer (2014)

Paul H's answer is right that you will have to make a second groupby object, but you can calculate the percentage in a simpler way -- just groupby the state_office and divide the sales column by its sum. Copying the beginning of Paul H's answer:

# From Paul H
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
'office_id': list(range(1, 7)) * 2,
'sales': [np.random.randint(100000, 999999)
for _ in range(12)]})
state_office = df.groupby(['state', 'office_id']).agg({'sales': 'sum'})
# Change: groupby state_office and divide by sum
state_pcts = state_office.groupby(level=0).apply(lambda x:
100 * x / float(x.sum()))

Returns:

                     sales
state office_id
AZ 2 16.981365
4 19.250033
6 63.768601
CA 1 19.331879
3 33.858747
5 46.809373
CO 1 36.851857
3 19.874290
5 43.273852
WA 2 34.707233
4 35.511259
6 29.781508

pandas groupby to calculate percentage of groupby columns

You can do -

df = df.groupby(['location']).agg({'new_deaths': sum, 'population': max})
df['rate_death'] = df['new_deaths'] / df['population'] * 100

Result

             new_deaths  population  rate_death
location
Afghanistan 15 38928341 0.000039
Albania 1 2877800 0.000035

Percentage of Total with Groupby for two columns

You can chain groupby:

pct = lambda x: 100 * x / x.sum()

out = df.groupby(['Product', 'Type']).sum().groupby('Product').apply(pct)
print(out)

# Output
Sales Qty
Product Type
AA AC 37.500000 47.058824
AD 62.500000 52.941176
BB BC 36.363636 68.750000
BD 63.636364 31.250000

How to calculate count and percentage in groupby in Python

I think you can use:

P = Publisher.groupby('Category')['Title'].count().reset_index()
P['Percentage'] = 100 * P['Title'] / P['Title'].sum()

Sample:

Publisher = pd.DataFrame({'Category':['a','a','s'],
'Title':[4,5,6]})

print (Publisher)
Category Title
0 a 4
1 a 5
2 s 6

P = Publisher.groupby('Category')['Title'].count().reset_index()
P['Percentage'] = 100 * P['Title'] / P['Title'].sum()
print (P)
Category Title Percentage
0 a 2 66.666667
1 s 1 33.333333

Pandas group by column find percentage of count in each group

You can calculate percentage for each age/count using lambda

df['perc'] = df.groupby('age')['count'].apply(lambda x: x*100/x.sum())


age section count perc
0 13-17 a 160 55.555556
1 25-34 c 128 35.555556
2 13-17 d 128 44.444444
3 25-34 a 120 33.333333
4 35-44 b 120 50.000000
5 35-44 a 120 50.000000
6 25-34 b 112 31.111111

If you want to round the percentage values,

df['perc'] = df.groupby('age')['count'].apply(lambda x: np.round(x*100/x.sum(), 2))

Pandas percentage of total with groupby with more than one column

You can use groupby().transform() to keep the original index:

state_office.div(state_office.groupby(level=0).transform('sum')).mul(100)

Output:

                     sales      units
state office_id
AZ 2 16.981365 31.059160
4 19.250033 23.664122
6 63.768601 45.276718
CA 1 19.331879 22.049287
3 33.858747 24.254215
5 46.809373 53.696498
CO 1 36.851857 29.506546
3 19.874290 35.246727
5 43.273852 35.246727
WA 2 34.707233 34.645669
4 35.511259 16.596002
6 29.781508 48.758328

Pandas: percentage of a value relative to the total of the group

You need a simple groupby.transform('sum') to get the total per group, then perform classical vector arithmetic.

I provided an example as float and one as string:

total = df.groupby('Range')['Quantity'].transform('sum')

# as float
df['% of range'] = df['Quantity'].div(total)

# as string
df['% of range (str)'] = df['Quantity'].div(total).mul(100).astype(int).astype(str) + ' %'

output:

   id Product Range  Quantity  % of range % of range (str)
0 1 Prod1 A 6 0.6 60 %
1 2 Prod2 A 4 0.4 40 %
2 3 Prod3 B 2 0.2 20 %
3 4 Prod4 B 8 0.8 80 %

Pandas Groupby Percentage of total

Try div:

q1_sales[['Jan','Feb','Mar']].div(q1_sales['Q1']*0.01, axis='rows')

Output:

                   Jan        Feb        Mar
City
Los Angeles 31.884058 28.985507 39.130435

Pandas groupby percentage of total and add subtotals

We need create you need step by step, include groupby with append the subtotal per group on column , then transform the total sum with state

s = df.groupby('state')[['sales','sales2','sales3']].sum().assign(office_id = 'Subtotal').set_index('office_id',append=True)
out = pd.concat([df,s.reset_index()]).sort_values('state')
out['Subtotal'] = out[['sales','sales2','sales3']].sum(axis=1)
v = out.groupby('state')['Subtotal'].transform('sum')/2
out.update(out[['sales','sales2','sales3','Subtotal']].div(v,axis=0))
out
state office_id sales sales2 sales3 Subtotal
3 AZ 4 0.047124 0.175385 0.118068 0.340578
7 AZ 2 0.041571 0.087926 0.087902 0.217399
11 AZ 6 0.156107 0.131998 0.153919 0.442023
0 AZ Subtotal 0.244802 0.395309 0.359889 1.000000
0 CA 1 0.062026 0.127860 0.145870 0.335756
4 CA 5 0.150188 0.107702 0.068203 0.326092
8 CA 3 0.108636 0.129193 0.100323 0.338152
1 CA Subtotal 0.320849 0.364755 0.314396 1.000000
2 CO 3 0.058604 0.072756 0.142734 0.274095
6 CO 1 0.108667 0.208210 0.145513 0.462390
10 CO 5 0.127604 0.095630 0.040282 0.263516
2 CO Subtotal 0.294875 0.376596 0.328529 1.000000
1 WA 2 0.106233 0.081434 0.085797 0.273463
5 WA 6 0.091156 0.127159 0.138270 0.356585
9 WA 4 0.108694 0.195807 0.065451 0.369952
3 WA Subtotal 0.306083 0.404399 0.289518 1.000000


Related Topics



Leave a reply



Submit