Pandas - Plotting a Stacked Bar Chart

Plotting a stacked Bar Chart

Are you getting errors, or just not sure where to start?

%pylab inline
import pandas as pd
import matplotlib.pyplot as plt

df2 = df.groupby(['Name', 'Abuse/NFF'])['Name'].count().unstack('Abuse/NFF').fillna(0)
df2[['abuse','nff']].plot(kind='bar', stacked=True)

stacked bar plot

How to create stacked bar chart in python, color coded by category

  • The simplest way is to reshape the DataFrame with pandas.DataFrame.pivot_table, and then plot with pandas.DataFrame.plot specifying kind='bar' and stacked=True.
    • The important thing to remember is to shape the data into the correct format for the plot API.
    • Use .pivot_table if values need to be aggregated, otherwise use .pivot.
  • Using pandas v1.2.4 and matplotlib v3.3.4 (matplotlib is imported by pandas as a dependency).
import seaborn as sns  # used for the titanic data
import pandas as pd

# load the two necessary column
df = sns.load_dataset('titanic').loc[:, ['sex', 'survived']]

# create a pivot table
dfp = df.pivot_table(index='sex', columns=['survived'], aggfunc=len)

# display(dfp)
survived 0 1
sex
female 81 233
male 468 109

# plot the dataframe
dfp.plot(kind='bar', stacked=True, ylabel='Counts', xlabel='Gender',
title='Survival Status Count by Gender', rot=0)

Sample Image

  • I do not recommend stacked bars because it is more difficult to differentiate and compare the values for each category.
dfp.plot(kind='bar', stacked=False, ylabel='Counts', xlabel='Gender',
title='Survival Status Count by Gender', rot=0)

Sample Image

plotting stacked bar graph

With Python 3.9.7, your graphs look like the same:

>>> df_alpha
a b c d e f g h i j k l
2016 20 20 20 20 20 20 20 20 20 20 20 20
2017 0 13 20 20 20 20 20 20 20 20 20 20
2018 0 10 10 10 10 10 10 10 10 10 10 0
2019 0 18 15 15 15 15 15 15 15 15 15 0
2020 10 15 18 18 18 18 18 18 18 18 18 0
2021 21 45 78 75 78 78 78 78 78 78 78 20

>>> df_month
January February March April may June July August September October November December
2016 20 20 20 20 20 20 20 20 20 20 20 20
2017 0 13 20 20 20 20 20 20 20 20 20 20
2018 0 10 10 10 10 10 10 10 10 10 10 0
2019 0 18 15 15 15 15 15 15 15 15 15 0
2020 10 15 18 18 18 18 18 18 18 18 18 0
2021 21 45 78 75 78 78 78 78 78 78 78 20

Full-code:

import pandas as pd
import matplotlib.pyplot as plt

years = ['2016', '2017', '2018', '2019', '2020', '2021']

dataavail1 = {'a': [20, 0, 0, 0, 10, 21], 'b': [20, 13, 10, 18, 15, 45], 'c': [20, 20, 10, 15, 18, 78], 'd': [20, 20, 10, 15, 18, 75], 'e': [20, 20, 10, 15, 18, 78], 'f': [20, 20, 10, 15, 18, 78], 'g': [20, 20, 10, 15, 18, 78], 'h': [20, 20, 10, 15, 18, 78], 'i': [20, 20, 10, 15, 18, 78], 'j': [20, 20, 10, 15, 18, 78], 'k': [20, 20, 10, 15, 18, 78], 'l': [20, 20, 0, 0, 0, 20]}

dataavail2 = {'January': [20, 0, 0, 0, 10, 21], 'February': [20, 13, 10, 18, 15, 45], 'March': [20, 20, 10, 15, 18, 78], 'April': [20, 20, 10, 15, 18, 75], 'may': [20, 20, 10, 15, 18, 78], 'June': [20, 20, 10, 15, 18, 78], 'July': [20, 20, 10, 15, 18, 78], 'August': [20, 20, 10, 15, 18, 78], 'September': [20, 20, 10, 15, 18, 78], 'October': [20, 20, 10, 15, 18, 78], 'November': [20, 20, 10, 15, 18, 78], 'December': [20, 20, 0, 0, 0, 20]}

df_alpha = pd.DataFrame(dataavail1, index=years)
df_month = pd.DataFrame(dataavail2, index=years)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 8))
df_alpha.plot(kind='bar', stacked=True, colormap=plt.cm.tab20, ax=ax1, rot=0)
df_month.plot(kind='bar', stacked=True, colormap=plt.cm.tab20, ax=ax2, rot=0)
plt.show()

Sample Image

Update: the code also works with Python 3.7.12

Change the stacked bar chart to Stacked Percentage Bar Plot

You can just calculate the percentages yourself e.g. in a new column of your dataframe as you do have the absolute values and plot this column instead.
Using sum() and division using dataframes you should get there quickly.

You might wanna have a look at GeeksForGeeks post which shows how this could be done.

EDIT

I have now gone ahead and adjusted your program so it will give the results that you want (at least the result I think you would like).
Two key functions that I used and you did not, are df.value_counts() and df.transpose(). You might wanna read on those two as they are quite helpful in many situations.

import pandas as pd
import matplotlib.pyplot as plt
df_responses= pd.read_csv('https://raw.githubusercontent.com/eng-aomar/Security_in_practice/main/secuirtyInPractice.csv')

df_new =df_responses.iloc[:,9:21]
image_format = 'svg' # e.g .png, .svg, etc.

# initialize empty dataframe providing the columns
df2 = pd.DataFrame(columns=df_new.columns)

# loop over all columns
for col in df_new.columns:
# counting occurences for each value can be done by value_counts()
val_counts = df_new[col].value_counts()
# replace nan values with 0
val_counts.fillna(0)
# calculate the sum of all categories
total = val_counts.sum()
# use value count for each category and divide it by the total count of all categories
# and multiply by 100 to get nice percent values
df2[col] = val_counts / total * 100

# columns and rows need to be transposed in order to get the result we want
df2.transpose().plot.bar(stacked=True)
plt.show()

Stacked bar chart for a pandas df

You can groupby both columns and count on Priority, then unstack and plot as stacked bar chart:

df.groupby(['Component', 'Priority']).Priority.count().unstack().plot.bar(stacked=True)

Example:

import pandas as pd

df = pd.DataFrame({'Component': list('abccc'), 'Priority': ['Minor', 'Major', 'Major', 'Minor', 'Minor']})
df.groupby(['Component', 'Priority']).Priority.count().unstack().plot.bar(stacked=True)

Sample Image

As an alternative, you can use a crosstab:

pd.crosstab(df.Component, df.Priority).plot.bar(stacked=True)

If you want to use seaborn (I only now saw the seaborn tag), you can use a displot:

import seaborn as sns

sns.displot(x='Component', hue='Priority', data=df, multiple='stack')

Sample Image

Draw a stacked bar chart from multiple columns instead of drawing a graph for each column?

It can be achieved by rearranging the dataframe using groupby. Consider the code:

df_responses= pd.read_csv('https://raw.githubusercontent.com/eng-aomar/Security_in_practice/main/secuirtyInPractice.csv')

df_new =df_responses.iloc[:,9:21]
image_format = 'svg' # e.g .png, .svg, etc.

# initialize empty dataframe
df2 = pd.DataFrame()

# group by each column counting the size of each category values
for col in df_new:
grped = df_new.groupby(col).size()
grped = grped.rename(grped.index.name)
df2 = df2.merge(grped.to_frame(), how='outer', left_index=True, right_index=True)

# plot the merged dataframe
df2.plot.bar(stacked=True)
plt.show()

Using your data and notebook. This is what I get:

Stacked Bar Plot

Pandas plot of a stacked and grouped bar chart

You can get the plot data from a crosstab and then make a right aligned and a left aligned bar plot on the same axes:

ax = pd.crosstab(df.MainCategory, df.SubCategory.str.partition('.')[2], df.DurationH, aggfunc=sum).plot.bar(
stacked=True, width=-0.4, align='edge', ylabel='DurationH', ec='w', color=[(0,1,0,x) for x in np.linspace(1, 0.1, 7)], legend=False)
h_durationh, _ = ax.get_legend_handles_labels()

ax = pd.crosstab(df.MainCategory, df.SubCategory.str.partition('.')[2], df.Number, aggfunc=sum).plot.bar(
stacked=True, width=0.4, align='edge', secondary_y=True, ec='w', color=[(0,0,1,x) for x in np.linspace(1, 0.1, 7)], legend=False, ax=ax)
h_number, _ = ax.get_legend_handles_labels()

ax.set_ylabel('Number')
ax.set_xlim(left=ax.get_xlim()[0] - 0.5)
ax.legend([h_durationh[0], h_number[0]], ['DurationH', 'Number'])

Sample Image

Plot Stacked bar chart and Multiple bars chart from Pandas dataframe

you can try pivot_table to reshape the data:

fig = df.pivot_table(index = ['year','quarter'], columns = 'type', values = 'total', dropna=False , fill_value = 0).plot(kind ='bar', stacked = True)

OUTPUT:

Sample Image

To display bar side-by-side just remove the stack parameter:

fig = df.pivot_table(index = ['year','quarter'], columns = 'type', values = 'total', dropna=False , fill_value = 0).plot(kind ='bar')
plt.xticks(rotation = 30)

OUTPUT:

Sample Image



Related Topics



Leave a reply



Submit