Suppose I have a 2-D Numpy array. It's supposed to represent the learned weights of a PyTorch linear layer. Below I'm creating an example array full of Gaussian random numbers.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = np.random.normal(size=(4, 768))
print(data.shape) # (4, 768)
I then try to use the Matplotlib function hist() to create a histogram of the values. I'm using Jupyter Notebook (Google Colab). When I call the function like below (by passing in the original 2-D array), it takes a long time to complete, and the visual output is quite bizarre.
%%time
_ = plt.hist(data, bins=100)
# Result:
# CPU times: user 48.5 s, sys: 737 ms, total: 49.2 s
# Wall time: 49.2 s
On the other hand, when I reshape the 2-D array into a 1-D array with reshape(), the hist() function completes almost immediately, and the visualization has the shape of what I would expect, namely a Gaussian curve.
data = data.reshape(-1)
print(data.shape) # (3072,)
%%time
_ = plt.hist(data, bins=100)
# Result:
# CPU times: user 70.7 ms, sys: 2.01 ms, total: 72.7 ms
# Wall time: 70.9 ms
So what exactly is going on with my first attempt where I pass in a 2-D array? Why does it take so long? What does the visualized graph represent?
Thanks for any help.



