Pandas Percentage of Total With Groupby

Pandas percentage of total with groupby

Update 2022-03

This answer by caner using transform looks much better than my original answer!

df['sales'] / df.groupby('state')['sales'].transform('sum')

Thanks to this comment by Paul Rougieux for surfacing it.

Original Answer (2014)

Paul H's answer is right that you will have to make a second groupby object, but you can calculate the percentage in a simpler way -- just groupby the state_office and divide the sales column by its sum. Copying the beginning of Paul H's answer:

# From Paul H
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
                   'office_id': list(range(1, 7)) * 2,
                   'sales': [np.random.randint(100000, 999999)
                             for _ in range(12)]})
state_office = df.groupby(['state', 'office_id']).agg({'sales': 'sum'})
# Change: groupby state_office and divide by sum
state_pcts = state_office.groupby(level=0).apply(lambda x:
                                                 100 * x / float(x.sum()))

Returns:

                     sales
state office_id           
AZ    2          16.981365
      4          19.250033
      6          63.768601
CA    1          19.331879
      3          33.858747
      5          46.809373
CO    1          36.851857
      3          19.874290
      5          43.273852
WA    2          34.707233
      4          35.511259
      6          29.781508

pandas groupby to calculate percentage of groupby columns

You can do -

df = df.groupby(['location']).agg({'new_deaths': sum, 'population': max})
df['rate_death'] = df['new_deaths'] / df['population'] * 100

Result

             new_deaths  population  rate_death
location
Afghanistan          15    38928341    0.000039
Albania               1     2877800    0.000035

Percentage of Total with Groupby for two columns

You can chain groupby:

pct = lambda x: 100 * x / x.sum()

out = df.groupby(['Product', 'Type']).sum().groupby('Product').apply(pct)
print(out)

# Output
                  Sales        Qty
Product Type                      
AA      AC    37.500000  47.058824
        AD    62.500000  52.941176
BB      BC    36.363636  68.750000
        BD    63.636364  31.250000

How to calculate count and percentage in groupby in Python

I think you can use:

P = Publisher.groupby('Category')['Title'].count().reset_index()
P['Percentage'] = 100 * P['Title']  / P['Title'].sum()

Sample:

Publisher = pd.DataFrame({'Category':['a','a','s'],
                   'Title':[4,5,6]})

print (Publisher)
  Category  Title
0        a      4
1        a      5
2        s      6

P = Publisher.groupby('Category')['Title'].count().reset_index()
P['Percentage'] = 100 * P['Title']  / P['Title'].sum()
print (P)
  Category  Title  Percentage
0        a      2   66.666667
1        s      1   33.333333

Pandas group by column find percentage of count in each group

You can calculate percentage for each age/count using lambda

df['perc'] = df.groupby('age')['count'].apply(lambda x: x*100/x.sum())


    age section count   perc
0   13-17   a   160     55.555556
1   25-34   c   128     35.555556
2   13-17   d   128     44.444444
3   25-34   a   120     33.333333
4   35-44   b   120     50.000000
5   35-44   a   120     50.000000
6   25-34   b   112     31.111111

If you want to round the percentage values,

df['perc'] = df.groupby('age')['count'].apply(lambda x: np.round(x*100/x.sum(), 2))