﻿ Python: Plotting Percentage in Seaborn Bar Plot - ITCodar

# Python: Plotting Percentage in Seaborn Bar Plot

## Python: Plotting percentage in seaborn bar plot

You could use your own function in `sns.barplot` `estimator`, as from docs:

estimator : callable that maps vector -> scalar, optional

Statistical function to estimate within each categorical bin.

For you case you could define function as lambda:

``sns.barplot(x='group', y='Values', data=df, estimator=lambda x: sum(x==0)*100.0/len(x))``

## How to add percentages on top of bars in seaborn

The `seaborn.catplot` organizing function returns a FacetGrid, which gives you access to the fig, the ax, and its patches. If you add the labels when nothing else has been plotted you know which bar-patches came from which variables. From @LordZsolt's answer I picked up the `order` argument to `catplot`: I like making that explicit because now we aren't relying on the barplot function using the order we think of as default.

``import seaborn as snsfrom itertools import producttitanic = sns.load_dataset("titanic")class_order = ['First','Second','Third'] hue_order = ['child', 'man', 'woman']bar_order = product(class_order, hue_order)catp = sns.catplot(data=titanic, kind='count',                    x='class', hue='who',                   order = class_order,                    hue_order = hue_order )# As long as we haven't plotted anything else into this axis,# we know the rectangles in it are our barplot bars# and we know the order, so we can match up graphic and calculations:spots = zip(catp.ax.patches, bar_order)for spot in spots:    class_total = len(titanic[titanic['class']==spot[1][0]])    class_who_total = len(titanic[(titanic['class']==spot[1][0]) &         (titanic['who']==spot[1][1])])    height = spot[0].get_height()     catp.ax.text(spot[0].get_x(), height+3, '{:1.2f}'.format(class_who_total/class_total))    #checking the patch order, not for final:    #catp.ax.text(spot[0].get_x(), -3, spot[1][0][0]+spot[1][1][0])``

produces

An alternate approach is to do the sub-summing explicitly, e.g. with the excellent `pandas`, and plot with `matplotlib`, and also do the styling yourself. (Though you can get quite a lot of styling from `sns` context even when using `matplotlib` plotting functions. Try it out -- )

## Add labels as percentages instead of counts on a grouped bar graph in seaborn

Use `groupby.transform` to compute the split percentages per day:

``df['Customers (%)'] = df.groupby('Day')['Customers'].transform(lambda x: x / x.sum() * 100)#     Day  Customers Time  Customers (%)# 0   Mon         44    M      57.142857# 1   Tue         46    M      50.000000# 2   Wed         49    M      49.494949# 3  Thur         59    M      54.629630# 4   Fri         54    M      47.368421# 5   Mon         33    E      42.857143# 6   Tue         46    E      50.000000# 7   Wed         50    E      50.505051# 8  Thur         49    E      45.370370# 9   Fri         60    E      52.631579``

Then plot this new `Customers (%)` column and label the bars using `ax.bar_label` (with percentage formatting via the `fmt` param):

``ax = sns.barplot(x='Day', y='Customers (%)', hue='Time', data=df) for container in ax.containers:    ax.bar_label(container, fmt='%.0f%%')``

Note that `ax.bar_label` requires matplotlib 3.4.0.

## How can i plot a Seaborn Percentage Bar graph using a dictionary?

A bar plot with the dictionary keys as x-axis and the dictionary values divided by total as height. Optionally, a PercentFormatter can be set as display format. Please note that the values need to be converted from string to numeric, so they can be used as bar height.

Also note that using `dict` as a variable name can complicate future code, as afterwards `dict` can't be used anymore as keyword.

``from matplotlib import pyplot as pltfrom matplotlib.ticker import PercentFormatterfruit_dict = {'apple': '12', 'orange': '9', 'banana': '9', 'kiwi': '3'}for f in fruit_dict:    fruit_dict[f] = int(fruit_dict[f])total = sum(fruit_dict.values())plt.bar(fruit_dict.keys(), [v/total for v in fruit_dict.values()], color='salmon')plt.gca().yaxis.set_major_formatter(PercentFormatter(xmax=1, decimals=0))plt.grid(axis='y')plt.show()``

## How to calculate percentage for row counts in groupby in Python and generate bar plot

I believe this is what you are looking for:

``temp_df = (df.groupby('Status').size().sort_values(ascending=False) / df.groupby('Status').size().sort_values(ascending=False).sum())*100    ax = temp_df.plot(kind='bar')    ax.bar_label(ax.containers[0])        plt.show()``

## How to add percentages on countplot in seaborn

First, note that in matplotlib and seaborn, a subplot is called an "ax". Giving such a subplot a name such as "p3" or "plot" leads to unnecessary confusion when studying the documentation and online example code.

The bars in the seaborn bar plot are organized, starting with all the bars belonging to the first hue value, then the second, etc. So, in the given example, first come all the blue, then all the orange and finally all the green bars. This makes looping through `ax.patches` somewhat complicated. Luckily, the same patches are also available via `ax.collections`, where each hue group forms a separate collection of bars.

Here is some example code:

``import seaborn as snsimport matplotlib.pyplot as pltimport numpy as npdef percentage_above_bar_relative_to_xgroup(ax):    all_heights = [[p.get_height() for p in bars] for bars in ax.containers]    for bars in ax.containers:        for i, p in enumerate(bars):            total = sum(xgroup[i] for xgroup in all_heights)            percentage = f'{(100 * p.get_height() / total) :.1f}%'            ax.annotate(percentage, (p.get_x() + p.get_width() / 2, p.get_height()), size=11, ha='center', va='bottom')df = sns.load_dataset("titanic")plt.figure(figsize=(12, 8))ax3 = sns.countplot(x="class", hue="who", data=df)ax3.set(xlabel='Class', ylabel='Count')percentage_above_bar_relative_to_xgroup(ax3)plt.show()``

## Graphing percentage data in seaborn

I suggest you organize your DataFrame more like this, it will make it much easier to plot and organize this type of data.

Instead of doing your DataFrame as you have it, instead transpose it to two simple columns like so:

``name                 valuedebt_consolidation    0.152388credit_card           0.115689all_other             0.170111``

etc. By doing this you can simply plot your data in Seaborn by doing the below:

``sns.barplot(x="name",y="value", data = df)``

Which will look like this (click)

## How to plot groupby as percentage in seaborn?

You can use `barplot` here. I wasn't 100% sure of what you actually want to achieve so I developed several solutions.

Frequency of successful (unsuccessful) per total successful (unsuccessful)

``fig, axes = plt.subplots(2, 2, figsize=(15, 10))mainDf['frequency'] = 0 # a dummy column to refer tofor col, ax in zip(['page_name', 'weekday', 'type', 'industry'], axes.flatten()):    counts = mainDf.groupby([col, 'successful']).count()    freq_per_group = counts.div(counts.groupby('successful').transform('sum')).reset_index()    sns.barplot(x=col, y='frequency', hue='successful', data=freq_per_group, ax=ax)``

Frequency of successful (unsuccessful) per group

``fig, axes = plt.subplots(2, 2, figsize=(15, 10))mainDf['frequency'] = 0 # a dummy column to refer tofor col, ax in zip(['page_name', 'weekday', 'type', 'industry'], axes.flatten()):    counts = mainDf.groupby([col, 'successful']).count()    freq_per_group = counts.div(counts.groupby(col).transform('sum')).reset_index()    sns.barplot(x=col, y='frequency', hue='successful', data=freq_per_group, ax=ax)``

which, based on the data you provided, gives

Frequency of successful (unsuccessful) per total

``fig, axes = plt.subplots(2, 2, figsize=(15, 10))mainDf['frequency'] = 0 # a dummy column to refer tototal = len(mainDf)for col, ax in zip(['page_name', 'weekday', 'type', 'industry'], axes.flatten()):    counts = mainDf.groupby([col, 'successful']).count()    freq_per_total = counts.div(total).reset_index()    sns.barplot(x=col, y='frequency', hue='successful', data=freq_per_total, ax=ax)``

## plot percent bar in seaborn from dataframe

not my most elegant code, but this works:

``import pandas as pdstacked = pd.DataFrame({'Scenario': list('ABCD'), 'Male': [79, 59, 420, 208], 'Female': [217, 408, 330, 1330]})pd.DataFrame(stacked.apply(lambda x: {'Scenario': x.Scenario, 'Male_pct': x.Male / (x.Male + x.Female), 'Female_pct': x.Female / (x.Male + x.Female)}, axis=1,).tolist())``

then just plot it how you were plotting already.