How to Add Multiple Annotations to a Barplot

How to add multiple annotations to a bar plot

With pandas

  • Tested with pandas v1.2.4

Imports and Load Data

import pandas as pd
import matplotlib.pyplot as plt

# create the dataframe from values in the OP
counts = [29227, 102492, 53269, 504028, 802994]
df = pd.DataFrame(data=counts, columns=['counts'], index=['A','B','C','D','E'])

# add a percent column
df['%'] = df.counts.div(df.counts.sum()).mul(100).round(2)

# display(df)
counts %
A 29227 1.96
B 102492 6.87
C 53269 3.57
D 504028 33.78
E 802994 53.82

Plot use matplotlib from version 3.4.2

  • Use matplotlib.pyplot.bar_label
  • See How to add value labels on a bar chart for additional details and examples with .bar_label.
  • Tested with pandas v1.2.4, which is using matplotlib as the plot engine.
  • Some formatting can be done with the fmt parameter, but more sophisticated formatting should be done with the labels parameter.
ax = df.plot(kind='barh', y='counts', figsize=(10, 5), legend=False, width=.75,
title='This is the plot generated by all code examples in this answer')

# customize the label to include the percent
labels = [f' {v.get_width()}\n {df.iloc[i, 1]}%' for i, v in enumerate(ax.containers[0])]

# set the bar label
ax.bar_label(ax.containers[0], labels=labels, label_type='edge', size=13)

ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
plt.show()

Sample Image

Plot use matplotlib before version 3.4.2

# plot the dataframe
ax = df.plot(kind='barh', y='counts', figsize=(10, 5), legend=False, width=.75)
for i, y in enumerate(ax.patches):

# get the percent label
label_per = df.iloc[i, 1]

# add the value label
ax.text(y.get_width()+.09, y.get_y()+.3, str(round((y.get_width()), 1)), fontsize=10)

# add the percent label here
ax.text(y.get_width()+.09, y.get_y()+.1, str(f'{round((label_per), 2)}%'), fontsize=10)

ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
plt.show()

Original Answer without pandas

  • Tested with matplotlib v3.3.4
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 5))

counts = [29227, 102492, 53269, 504028, 802994]

# calculate percents
percents = [100*x/sum(counts) for x in counts]

y_ax = ('A','B','C','D','E')
y_tick = np.arange(len(y_ax))

ax.barh(range(len(counts)), counts, align = "center", color = "tab:blue")
ax.set_yticks(y_tick)
ax.set_yticklabels(y_ax, size = 8)

#annotate bar plot with values
for i, y in enumerate(ax.patches):
label_per = percents[i]
ax.text(y.get_width()+.09, y.get_y()+.3, str(round((y.get_width()), 1)), fontsize=10)
# add the percent label here
# ax.text(y.get_width()+.09, y.get_y()+.3, str(round((label_per), 2)), ha='right', va='center', fontsize=10)
ax.text(y.get_width()+.09, y.get_y()+.1, str(f'{round((label_per), 2)}%'), fontsize=10)

ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
plt.show()
  • You can play with the positioning.
  • Other formatting options mentioned by JohanC
  • Print both parts of the text in one string with a \n in between to get a "natural" line spacing:
  • str(f'{round((y.get_width()), 1)}\n{round((label_per), 2)}%')
  • ax.text(..., va='center') to vertically center and be able to use a slightly larger font.
  • ax.set_xlim(0, max(counts) * 1.18) to get a bit more space for the text.
  • Start each line of text with a space to get a natural "horizontal" padding.
  • str(f' {round((label_per), 2)}%'), note the space before {.
  • y.get_width()+.09 is extremely close to y.get_width() when these values are in the tens of thousands.

Sample Image

multiple annotations on bar seaborn chart

You can play around with the ax.bar_label in order to set custom labels. No need for annotations and loops.

I'm assuming the below example is what you mean by "plot the corresponding percentage values on the bars", but it can be adjusted flexibly.

Note that this doesn't show values smaller than 1%, since those would be overlapping the x-axis and the other label. This can also be easily adjusted below.

The docs have some instructive examples.

import seaborn as sns
import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 1, figsize=(15, 8))
plots = sns.barplot(x="STRUD", y="Struct_Count", data=df2, ax=ax)
ax.bar_label(ax.containers[0])
ax.bar_label(ax.containers[0],
labels=[f'{e}%' if e > 1 else "" for e in df2.Perc],
label_type="center")
plt.title("Distribution of STRUCT")

Sample Image

How do I annotate a barplot made from 2 different arrays?

I am able to annotate by using axes.pathces instead of plot.patches.

x = [' A6' ,' Q2', ' Q5', ' A5', ' A1', ' A4', ' Q3', ' A3']
y = [ 748, 822, 877, 882 ,1347 ,1381 ,1417, 1929]
fig, ax = plt.subplots(figsize = (10, 7))
ax.bar(x, y)
for bar in ax.patches:
ax.annotate(text = bar.get_height(),
xy = (bar.get_x() + bar.get_width() / 2, bar.get_height()),
ha='center',
va='center',
size=15,
xytext=(0, 8),
textcoords='offset points')
plt.xlabel("Car Model")
plt.ylabel("Car Frequency")
plt.title("Frequency of Most Popular Audi Cars")
plt.ylim(bottom=0)
plt.show()

Output

Creating and Annotating a Grouped Barplot in Python

There are other ways to convert the data format to a vertical format, but we will draw a bar chart for that vertical data. Then get the x-axis position and height of that bar, and annotate it. In my code, I have placed the text at half the height.

df_long = df.unstack().to_frame(name='value')
df_long = df_long.swaplevel()
df_long.reset_index(inplace=True)
df_long.columns = ['group', 'status', 'value']

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(12, 8))

g = sns.barplot(data=df_long, x='group', y='value', hue='status', ax=ax)

for bar in g.patches:
height = bar.get_height()
ax.text(bar.get_x() + bar.get_width() / 2., 0.5 * height, int(height),
ha='center', va='center', color='white')

plt.show()

Sample Image

How to plot a stacked bar with annotations for multiple groups

  • This is easier to implement as a stacked bar plot, as such, reshape the dataframe with pandas.crosstab and plot using pandas.DataFrame.plot with kind='bar' and stacked=True
    • This should not be implemented with plt.hist because it's more convoluted, and it's easier to use the pandas plot method directly.
    • Also a histogram is more appropriate when the x values are a continuous range of numbers, not discrete categorical values.
  • ct.iloc[:, :-1] selects all but the last column, 'tot' to be plotted as bars.
  • Use matplotlib.pyplot.bar_label to add annotations
    • ax.bar_label(ax.containers[2], padding=3) uses label_type='edge' by default, which results in annotating the edge with the cumulative sum ('center' annotates with the patch value), as shown in this answer.
      • The [2] in ax.containers[2] selects only the top containers to annotate with the cumulative sum. The containers are 0 indexed from the bottom.
    • See this answer for additional details and examples
    • This answer shows how to do annotations the old way, without .bar_label. I do not recommend it.
    • This answer shows how to customize labels to prevent annotations for values under a given size.
  • Tested in python 3.10, pandas 1.3.5, matplotlib 3.5.1

Load and Shape the DataFrame

import pandas as pd

# load from github repo link
url = 'https://raw.githubusercontent.com/jpiedehierroa/files/main/Libro1.csv'
df = pd.read_csv(url)

# reshape the dataframe
ct = pd.crosstab(df.countries, df.type)

# total medals per country, which is necessary to sort the bars
ct['tot'] = ct.sum(axis=1)

# sort
ct = ct.sort_values(by='tot', ascending=False)

# display(ct)
type bronze gold silver tot
countries
USA 33 39 41 113
China 18 38 32 88
ROC 23 20 28 71
GB 22 22 21 65
Japan 17 27 14 58
Australia 22 17 7 46
Italy 20 10 10 40
Germany 16 10 11 37
Netherlands 14 10 12 36
France 11 10 12 33

Plot

colors = ("#CD7F32", "silver", "gold")
cd = dict(zip(ct.columns, colors))

# plot the medals columns
title = 'Country Medal Count for Tokyo 2020'
ax = ct.iloc[:, :-1].plot(kind='bar', stacked=True, color=cd, title=title,
figsize=(12, 5), rot=0, width=1, ec='k' )

# annotate each container with individual values
for c in ax.containers:
ax.bar_label(c, label_type='center')

# annotate the top containers with the cumulative sum
ax.bar_label(ax.containers[2], padding=3)

# pad the spacing between the number and the edge of the figure
ax.margins(y=0.1)

Sample Image

  • An alternative way to annotate the top with the sum is to use the 'tot' column for custom labels, but as shown, this is not necessary.
labels = ct.tot.tolist()
ax.bar_label(ax.containers[2], labels=labels, padding=3)

How to plot and annotate grouped bars

  • The easiest solution is to use pandas. This puts the data in an object which easily facilitates further analysis, and the plot API properly manages the spacing of grouped bars.
    • This implementation uses only 6 lines of code, compared to 18 lines.
  • Use pandas.DataFrame.plot, which uses matplotlib as the default plotting backend. Columns are plotted as the bar groups and the index is the independent axis.
  • From matplotlib 3.4.2, .bar_label should be used for annotations on bars.
  • See How to add value labels on a bar chart for addition information and examples about using .bar_label, and How to plot and annotate a grouped bar chart for an additional example of grouped bars.
  • Tested in python 3.9.7, pandas 1.3.4, matplotlib 3.4.3
import pandas as pd
import matplotlib.pyplot as plt

# create a dict with the data
data = {'October': oct_data, 'November': nov_data}

# create the dataframe with the labels as the index
df = pd.DataFrame(data, index=labels)

# display(df)
October November
Account_1 10 12
Account_2 24 42
Account_3 25 21
Account_4 30 78

# plot the dataframe
ax = df.plot(kind='bar', figsize=(10, 6), rot=0, ylabel='Cost ($)', color=['#7f6d5f', '#557f2d'])

# iterate through each group of container (bar) objects
for c in ax.containers:

# annotate the container group
ax.bar_label(c, label_type='center')

plt.show()

Sample Image

How to plot and annotate grouped bars in seaborn / matplotlib

Data

  • The data needs to be converted to a long format using .melt
  • Because of the scale of values, 'log' is used for the yscale
  • All of the categories in 'cats' are included for the example.
    • Select only the desired columns before melting, or use dfl = dfl[dfl.cats.isin(['sub', 'vc']) to filter for the desired 'cats'.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# setup dataframe
data = {'vc': [76, 47, 140, 106, 246],
'tv': [29645400, 28770702, 50234486, 30704017, 272551386],
'sub': [66100, 15900, 44500, 37000, 76700],
'name': ['a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data)

vc tv sub name
0 76 29645400 66100 a
1 47 28770702 15900 b
2 140 50234486 44500 c

# convert to long form
dfl = (df.melt(id_vars='name', var_name='cats', value_name='values')
.sort_values('values', ascending=False).reset_index(drop=True))

name cats values
0 e tv 272551386
1 c tv 50234486
2 d tv 30704017

Updated as of matplotlib v3.4.2

  • Use matplotlib.pyplot.bar_label
  • .bar_label works for matplotlib, seaborn, and pandas plots.
  • See How to add value labels on a bar chart for additional details and examples with .bar_label.
  • Tested with seaborn v0.11.1, which is using matplotlib as the plot engine.
# plot
fig, ax = plt.subplots(figsize=(12, 6))
sns.barplot(x='name', y='values', data=dfl, hue='cats', ax=ax)
ax.set_xticklabels(ax.get_xticklabels(), rotation=0)
ax.set_yscale('log')

for c in ax.containers:
# set the bar label
ax.bar_label(c, fmt='%.0f', label_type='edge', padding=1)

# pad the spacing between the number and the edge of the figure
ax.margins(y=0.1)

Sample Image

Plot with seaborn v0.11.1

  • Using matplotlib before version 3.4.2
  • Note that using .annotate and .patches is much more verbose than with .bar_label.
# plot
fig, ax = plt.subplots(figsize=(12, 6))
sns.barplot(x='name', y='values', data=dfl, hue='cats', ax=ax)
ax.set_xticklabels(chart.get_xticklabels(), rotation=0)
ax.set_yscale('log')

for p in ax.patches:
ax.annotate(f"{p.get_height():.0f}", (p.get_x() + p.get_width() / 2., p.get_height()),
ha='center', va='center', xytext =(0, 7), textcoords='offset points')

How to plot and annotate a grouped bar chart

Imports and DataFrame

import pandas as pd
import matplotlib.pyplot as plt

# given the following code to create the dataframe
file="https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/coursera/Topic_Survey_Assignment.csv"
df=pd.read_csv(file, index_col=0)

df.sort_values(by=['Very interested'], axis=0, ascending=False, inplace=True)

# all columns are being divided by 2233 so those lines can be replace with the following single line
df = df.div(2233)

# display(df)
Very interested Somewhat interested Not interested
Data Analysis / Statistics 0.755934 0.198836 0.026870
Machine Learning 0.729512 0.213614 0.033139
Data Visualization 0.600090 0.328706 0.045678
Big Data (Spark / Hadoop) 0.596507 0.326467 0.056874
Deep Learning 0.565607 0.344828 0.060905
Data Journalism 0.192118 0.484102 0.273175

Using since matplotlib v3.4.2

  • Uses matplotlib.pyplot.bar_label and pandas.DataFrame.plot
  • Some formatting can be done with the fmt parameter, but more sophisticated formatting should be done with the labels parameter, as show in How to add multiple annotations to a barplot.
  • See How to add value labels on a bar chart for additional details and examples using .bar_label
# your colors
colors = ['#5cb85c', '#5bc0de', '#d9534f']

# plot with annotations is probably easier
p1 = df.plot(kind='bar', color=colors, figsize=(20, 8), rot=0, ylabel='Percentage', title="The percentage of the respondents' interest in the different data science Area")

for p in p1.containers:
p1.bar_label(p, fmt='%.2f', label_type='edge')

Sample Image

Using before matplotlib v3.4.2

  • w = 0.8 / 3 will resolve the issue, given the current code.
  • However, generating the plot can be accomplished more easily with pandas.DataFrame.plot
# your colors
colors = ['#5cb85c', '#5bc0de', '#d9534f']

# plot with annotations is probably easier
p1 = df.plot.bar(color=colors, figsize=(20, 8), ylabel='Percentage', title="The percentage of the respondents' interest in the different data science Area")
p1.set_xticklabels(p1.get_xticklabels(), rotation=0)

for p in p1.patches:
p1.annotate(f'{p.get_height():0.2f}', (p.get_x() + p.get_width() / 2., p.get_height()), ha = 'center', va = 'center', xytext = (0, 10), textcoords = 'offset points')


Related Topics



Leave a reply



Submit