I have a very large pyspark dataframe and I took a sample and convert it into pandas dataframe
sample = heavy_pivot.sample(False, fraction = 0.2, seed = None)
sample_pd = sample.toPandas()
The dataframe looks like this:
sample_pd[['client_id', 'beer_freq']].head(10)
client_id beer_freq
0 1000839 0.000000
1 1002185 0.000000
2 1003366 1.000000
3 1005218 1.000000
4 1005483 1.000000
5 100964 0.434783
6 101272 0.166667
7 1017462 0.000000
8 1020561 0.000000
9 1023646 0.000000
I want to plot a histogram of column "beer_freq"
import matplotlib.pyplot as plt
matplotlib.pyplot.switch_backend('agg')
sample_pd.hist('beer_freq', bins = 100)
The plot did not show up... It gives results like this:
>>>array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f60f6fd0750>]], dtype=object)
It seems like that I cannot write general python code using matplotlib and pandas dataframe to plot figures in pyspark environment.
If I call plt.show() Nothing happens...

plt.show()?plt.show()Nothing happens. It is so weird.