Bin Size in Matplotlib (Histogram)

Bin size in Matplotlib (Histogram)

Actually, it's quite easy: instead of the number of bins you can give a list with the bin boundaries. They can be unequally distributed, too:

plt.hist(data, bins=[0, 10, 20, 30, 40, 50, 100])

If you just want them equally distributed, you can simply use range:

plt.hist(data, bins=range(min(data), max(data) + binwidth, binwidth))

Added to original answer

The above line works for data filled with integers only. As macrocosme points out, for floats you can use:

import numpy as np
plt.hist(data, bins=np.arange(min(data), max(data) + binwidth, binwidth))

matplotlib histogram bins shifted and individual data label

Although you can use an histogram, you have to be careful with the bin size that you choose (by default, matplotlib will create 10 bins of equal width).

I rather think you want to draw a bar plot, instead of an histogram:

data = df['overall'].value_counts()

fig, ax = plt.subplots()
ax.bar(data.index, data.values)

Sample Image

Is there a way to plot a histogram with given bin widths with Mathplotlib?

As you already have the heights for each bin, you should create a bar plot.

The x-values should be the bin edges, except for the last. By default, the bars are centered; you need align='edge' to align them with the bin edges. The widths of the bars are the differences of the bin edges.

from matplotlib import pyplot as plt
from matplotlib.ticker import FormatStrFormatter
import numpy as np

bin_edge = [0.36, 0.46, 0.66, 1.00, 1.30, 1.70, 2.20, 3.00, 4.00,
5.20, 6.50, 8.00, 10.00, 12.00, 14.00, 16.00, 18.00,
20.00, 22.00, 25.00, 28.00, 31.00, 34.00, 37.00, 40.00]
counts = [159746491, 9316595, 855578, 166092, 151198, 41293, 51051,
26098, 38536, 1172, 2.872e-12, 24598, 3.27097e-12, 3.86874e-12,
4.46613e-12, 5.06328e-12, 5.6602754e-12, 6.2571442e-12, 4.6652e-12,
5.26229e-12, 5.8592429e-12, 0, 7.052837e-12, 0]
fig, ax = plt.subplots(figsize=(12, 5))
ax.bar(x=bin_edge[:-1], height=counts, width=np.diff(bin_edge), align='edge', fc='MediumOrchid', ec='black')
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('Size / µm')
ax.set_ylabel('counts')
ax.set_title('Histogram')
ax.margins(x=0.01) # less margin left and right

ax.xaxis.set_major_formatter(FormatStrFormatter('%g'))
ax.xaxis.set_minor_formatter(FormatStrFormatter('%g'))
# "length" is the length of the tick mark, it also changes the text offset
# "labelsize" is the fontsize of the tick label
ax.tick_params(axis='x', which='minor', length=5, labelsize=8)
ax.tick_params(axis='x', which='major', length=10, labelsize=12)
plt.tight_layout()
plt.show()

histogram with explicit bin edges

variation in matplotlib histogram bin width

As pointed out by ImportanceOfBeingErnest in the comments, the bins will be spread out among the range of your data unless you specifically set the range parameter when plotting. So in my case, for some lines the range was 0-0.18, in other places 0-0.98, thus causing the variation in bar width. The solution is to amend the histogram line to:

ax1.hist(graphed_data, bins=50, range=(0,1), label=['both', 'selective', 'clearcut'])

using the range parameter, and with the rwidth parameter being unrelated and optional.

Histogram shows unlimited bins despite bin specification in matplotlib

The matplotlib documentation for plt.hist() explains that the first parameter can either by a 1D array or a sequence of 1D arrays. The latter case is used if you pass in a 2D array and will result in plotting a separate bar with cycling colors for each of the rows.

This is what we see in your example: The X-axis ticks still correspond to the bin-edges that were passed in - but for each bin there are many bars. So, I'm assuming you passed in a multidimensional array.

To fix this, simply flatten your data before passing it to matplotlib, e.g. plt.hist(np.ravel(error), bins=bins).



Related Topics



Leave a reply



Submit