3

I'm trying to do a histogram in Python just like I did in R. How can I do it?

R:

age <- c(43, 23, 56, 34, 38, 37, 41)
hist(age)

R output

Python:

age = (43, 23, 56, 34, 38, 37, 41)
plt.hist(age)

matplotlib output

2
  • every plotting system produces different output, if you want output exactly like R then I suggest you stick with R... otherwise, if you can be more specific about which aspects of matplotlib output you don't like you might have a chance at getting an answer Commented Sep 2, 2019 at 16:58
  • Excellent approach + great Q. Having a strong understanding of how to execute something in one language will lend itself helpful to learning how to do it in another. Commented Sep 2, 2019 at 16:59

2 Answers 2

3

The difference here is caused by the way R and matplotlib choose the number of bins by default.

For this particular example you can use:

age = (43, 23, 56, 34, 38, 37, 41)
plt.hist(age, bins=4)

to replicate the R-style histogram.

General Case

If we want to have matplotlib's histograms look like R's in general, all we need to do is replicate the binning logic that R uses. Internally, R uses Sturges' Formula* to calculate the number of bins. matplotlib supports this out of the box, we just have to pass 'sturges' for the bins argument.

age = (43, 23, 56, 34, 38, 37, 41)
plt.hist(age, bins='sturges')

* It's a little bit more complicated internally, but this gets us most of the way there.

Sign up to request clarification or add additional context in comments.

Comments

2

In short, use bins="sturges" in the plt.hist call.


From numpy.histogram_bin_edges

bins:
[...]
‘sturges’ R’s default method, only accounts for data size. Only optimal for gaussian data and underestimates number of bins for large non-gaussian datasets.

So you will get a histogram similar to R's via

import matplotlib.pyplot as plt
import numpy as np

age = np.array((43, 23, 56, 34, 38, 37, 41))

plt.hist(age, bins="sturges", facecolor="none", edgecolor="k")

plt.show()

enter image description here

Note however that the edges are still the minimum and maximum of the data. There is no way to automatically change this, but you could the bins manually to be exactly those from the R diagram via bins=(20,30,40,50,60).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.