What does Matplotlib hist() do with a 2-D numpy array input?

Question

Suppose I have a 2-D Numpy array. It's supposed to represent the learned weights of a PyTorch linear layer. Below I'm creating an example array full of Gaussian random numbers.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data = np.random.normal(size=(4, 768))
print(data.shape) # (4, 768)

I then try to use the Matplotlib function hist() to create a histogram of the values. I'm using Jupyter Notebook (Google Colab). When I call the function like below (by passing in the original 2-D array), it takes a long time to complete, and the visual output is quite bizarre.

%%time
_ = plt.hist(data, bins=100)

# Result:
# CPU times: user 48.5 s, sys: 737 ms, total: 49.2 s
# Wall time: 49.2 s

On the other hand, when I reshape the 2-D array into a 1-D array with reshape(), the hist() function completes almost immediately, and the visualization has the shape of what I would expect, namely a Gaussian curve.

data = data.reshape(-1)
print(data.shape) # (3072,)

%%time
_ = plt.hist(data, bins=100)

# Result:
# CPU times: user 70.7 ms, sys: 2.01 ms, total: 72.7 ms
# Wall time: 70.9 ms

So what exactly is going on with my first attempt where I pass in a 2-D array? Why does it take so long? What does the visualized graph represent?

Thanks for any help.

You get 768 histograms with your 4 values each distributed in 100 bins. — Mr. T
– Mr. T, Commented Dec 29, 2020 at 3:22

Mr. T · Accepted Answer · 2020-12-29 13:04:01Z

1

I am rather surprised that matplotlib, unlike numpy, does not flatten the input array first. However, the matplotlib documentation states that the input x can be an (n,) array or sequence of (n,) arrays. This is how matplotlib interprets your input - 768 arrays of shape (4,) that are displayed as in your output as 768 histograms in one graph. You don't see much because the bars are rather thin with 76800 bars to display - an increase in figure size and resolution will probably improve that. The opposite case of data = np.random.normal(size=(768, 4)) reveals this because now only 400 bars have to be displayed:

But we can also have a look at what matplotlib returns:

hist_count, hist_bins, hist_bars = plt.hist(data, bins=100)
print(hist_count.shape)
>>>(768, 100)
print(hist_bars)
>>><a list of 768 BarContainer objects>

Or for an even simpler version:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(123)
data = np.random.normal(size=(4, 5))
print(data.shape) #(4, 5)

hist_count, hist_bins, hist_bars = plt.hist(data, bins=6)

print(hist_count.shape) #(5, 6)
print(hist_count)
#[[0. 1. 2. 0. 0. 1.]
# [1. 0. 0. 1. 1. 1.]
# [0. 0. 1. 1. 0. 2.]
# [0. 1. 1. 0. 2. 0.]
# [0. 0. 3. 1. 0. 0.]]
print(hist_bins) #[-2.42667924 -1.65457769 -0.88247613 -0.11037458  0.66172697  1.43382853  2.20593008]
print(hist_bars) #<a list of 5 BarContainer objects>
plt.show()

edited Dec 29, 2020 at 13:04

answered Dec 29, 2020 at 3:51

Mr. T

12.5k10 gold badges39 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

stackoverflowuser2010 Over a year ago

Thank you for the explanation. The documentation doesn't give much information on how plt.hist() treats a 2-D array. It's not intuitive to me that it would produce 768 arrays of shape (4,) that are displayed as in your output as 768 histograms in one graph.

Mr. T Over a year ago

I don't know the inner workings of matplotlib but maybe this is (if not intended, I mean) a side effect because they use similar routines to plot hist() and hist2D()? Just a guess, if somebody with more insight posts an answer, do not hesitate to accept the better answer.

Collectives™ on Stack Overflow

What does Matplotlib hist() do with a 2-D numpy array input?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related