variation in matplotlib histogram bin width

Question

I am creating a histogram in matplotlib, and having problems because the widths of the bars are varying when they should all be the same width. An example of this is here:

Histogram showing variable bar width between iterations

In the image the left column has the full histograms, and the right column is zoomed in sections of the full histogram. In the full histogram for some unknown reason the bar widths are different between the two trials, where as on the right in the zooms they have the same size bars. I would like them to have the same size bars, where rwidth=1 and there are no gaps between neighboring bins.

This has happened both when I leave rwidth to default and when I set it equal to 1. A similar question was asked here, but it seems to have been related to varying tick ranges or the outline of the bars overlapping, neither of which apply to my graph.

Does anyone know why my bins are varying in width, or what else I could try to make them stay the same width?

The code I am using is shown here:

def graph_pvalues(both, selective, clearcut, trials, location):
    # define overall figure
    plt.figure(figsize=(16, int(project_images*(trials*0.15 + 0.5))))
    gs = gridspec.GridSpec(project_images-1, 3) 

    # plot one graph per substack size
    for v in range(project_images-1):
        # define subsets of data being graphed, remove nan values, and combine
        S_sub = selective[:, v]
        C_sub = clearcut[:, v]
        B_sub = both[:, v]
        graphed_data = [B_sub[~np.isnan(B_sub)], S_sub[~np.isnan(S_sub)], C_sub[~np.isnan(C_sub)]]

        # plot main graph
        ax1 = plt.subplot2grid((project_images-1, 3), (v, 0), colspan=2)
        ax1.hist(graphed_data, bins=50, rwidth=1, label=['both', 'selective', 'clearcut'])
        ax1.axis([0, 1, 0, trials])
        ax1.set_title("Disturbance at the %s using a substack of %i images" % (location, v+1))
        ax1.set_xlabel("p-value")
        ax1.set_ylabel("Number of trials")
        ax1.legend(prop={'size': 10})

        # plot zoom graph for 0 to 0.1
        ax2 = plt.subplot2grid((project_images-1, 3), (v, 2))
        ax2.hist(graphed_data, bins=10, range=(0, 0.1), label=['both', 'selective', 'clearcut'])
        ax2.axis([0, 0.1, 0, trials])
        ax2.set_title("Zoom 0 - 0.1 (%s, %i images)" % (location, v+1))
        ax2.set_xlabel("p-value")
        ax2.legend(prop={'size': 10})

    plt.tight_layout()

    plt.show()

I suspect that the width varies, because you always use a relative width of 1, rwidth=1. What is the purpose of this? Why not remove it? — ImportanceOfBeingErnest
– ImportanceOfBeingErnest, Commented Jun 14, 2018 at 19:09
This same problem exists even when I don't use the rwidth parameter. I tried using it to see if I could force all the bars to be the same width. — ycartwhelen
– ycartwhelen, Commented Jun 14, 2018 at 19:20
So if you have 50 bins in the range between 0 and 1, their width will be 5 times smaller than if you only have 10 bins in that range, right? — ImportanceOfBeingErnest
– ImportanceOfBeingErnest, Commented Jun 14, 2018 at 19:22
Yes, so that's why the left and right graphs have different widths (as expected). My question is why do the bars in the two graphs on the left have different widths when they both have 50 bins spread out between 0 and 1? — ycartwhelen
– ycartwhelen, Commented Jun 14, 2018 at 19:25
Because apparently they don't spread over the same range. Looking at the image the first dataset spreads from 0 to 0.9, while the second dataset spreads from 0 to 0.18. — ImportanceOfBeingErnest
– ImportanceOfBeingErnest, Commented Jun 14, 2018 at 19:27

ycartwhelen · Accepted Answer · 2018-06-14 19:41:23Z

2

As pointed out by ImportanceOfBeingErnest in the comments, the bins will be spread out among the range of your data unless you specifically set the range parameter when plotting. So in my case, for some lines the range was 0-0.18, in other places 0-0.98, thus causing the variation in bar width. The solution is to amend the histogram line to:

ax1.hist(graphed_data, bins=50, range=(0,1), label=['both', 'selective', 'clearcut'])

using the range parameter, and with the rwidth parameter being unrelated and optional.

answered Jun 14, 2018 at 19:41

ycartwhelen

1211 silver badge7 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

variation in matplotlib histogram bin width

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related