Histogram Matplotlib

plotting a histogram on a Log scale with Matplotlib

Specifying bins=8 in the hist call means that the range between the minimum and maximum value is divided equally into 8 bins. What is equal on a linear scale is distorted on a log scale.

What you could do is specify the bins of the histogram such that they are unequal in width in a way that would make them look equal on a logarithmic scale.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

x = [2, 1, 76, 140, 286, 267, 60, 271, 5, 13, 9, 76, 77, 6, 2, 27, 22, 1, 12, 7,
19, 81, 11, 173, 13, 7, 16, 19, 23, 197, 167, 1]
x = pd.Series(x)

# histogram on linear scale
plt.subplot(211)
hist, bins, _ = plt.hist(x, bins=8)

# histogram on log scale.
# Use non-equal bin sizes, such that they look equal on log scale.
logbins = np.logspace(np.log10(bins[0]),np.log10(bins[-1]),len(bins))
plt.subplot(212)
plt.hist(x, bins=logbins)
plt.xscale('log')
plt.show()

Sample Image

pyplot histogram, different color for each bar (bin)

One of the options is to use pyplot.bar instead of pyplot.hist, which has the option color for each bin.

The inspiration is from:
https://stackabuse.com/change-font-size-in-matplotlib/

from collections import Counter
import matplotlib.pyplot as plt
plt.rcParams['font.size'] = '20'

data = ['a', 'b', 'b', 'c', 'c', 'c']

plt.bar( range(3), Counter(data).values(), color=['red', 'green', 'blue']);
plt.xticks(range(3), Counter(data).keys());

Sample Image

UPDATE:

According to @JohanC suggestion, there is additional optional using seaborn (It seems me the best option):

import seaborn as sns 
sns.countplot(x=data, palette=['r', 'g', 'b'])

Also, there is a very similar question:

Have each histogram bin with a different color

Python matplotlib - doubling the histogram

Try to use plt.close(fig) after the first plot save, just before you start the second one.

Plot histogram given pre-computed counts and bins

Pyplot's bar can be used aligned at their edge (default is centered), with widths calculated by np.diff:

import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

counts = np.array([20, 19, 40, 46, 58, 42, 23, 10, 8, 2])
bin_edges = np.array([0.5, 0.55, 0.59, 0.63, 0.67, 0.72, 0.76, 0.8, 0.84, 0.89, 0.93])

ax.bar(x=bin_edges[:-1], height=counts, width=np.diff(bin_edges), align='edge', fc='skyblue', ec='black')
plt.show()

resulting plot

Optionally the xticks can be set to the bin edges:

ax.set_xticks(bin_edges)

Or to the bin centers:

ax.set_xticks((bin_edges[:-1] + bin_edges[1:]) / 2)

Matplotlib: incorrect histograms

By default, plt.hist() creates 10 bins (or 11 edges). The default value is found in the documentation, and is taken from you rc parameter rcParams["hist.bins"] = 10.

So if you provide data in the range [1–6], hist will count the number of values in the bins: [1.–1.5), [1.5–2.), [2–2.5), [2.5–3.), [3–3.5), [3.5–4.), [4–4.5), [4.5–5.), [5.–5.5), [5.5–6.]. You can tell that that's the case by looking at the text output by hist() (in addition to the graph).

hist() returns 3 objects when called:

  • the height of each bar (that is the number of items in each bin), equivalent to the column "#" in that Khan video
  • the edges of the bins, which is roughly equivalent to the column "Bucket" in the video
  • a list of matplotlib objects that you can use to tweak their appearance when needed.

In summary:

If you want to have bars of width 1, then you need to specify either the number of bins (5), or the edges of your bins.

These two calls provide the same result:

plt.hist(counts, bins=5)
plt.hist(counts, bins=[1,2,3,4,5,6])

EDIT
Here is a function that can help you see the "buckets" chosen by hist:

def hist_and_bins(x, ax=None, **kwargs):
ax = ax or plt.gca()
counts, edges, patches = ax.hist(x, **kwargs)
bin_edges = [[a,b] for a,b in zip(edges, edges[1:])]
ticks = np.mean(bin_edges, axis=1)
tick_labels = ['[{}-{})'.format(l,r) for l,r in bin_edges]
tick_labels[-1] = tick_labels[-1][:-1]+']' # last bin is a closed interval
ax.set_xticks(ticks)
ax.set_xticklabels(tick_labels)
return counts, edges, patches, ax.get_xticks()

fig, (ax1, ax2, ax3) = plt.subplots(1,3, figsize=(9,3))
ax1.hist([1,2,3,4,5,6,6])
hist_and_bins([1,2,3,4,5,6,6], ax=ax2)
hist_and_bins([1,2,3,4,5,6,6], ax=ax3, bins=5, ec='w')
fig.autofmt_xdate()

Sample Image



Related Topics



Leave a reply



Submit