14

The pandas.plot.kde() function is handy for plotting the estimated density function of a continuous random variable. It will take data x as input, and display the probabilities p(x) of the binned input as its output.

How can I extract the values of probabilities it computes? Instead of just plotting the probabilities of bandwidthed samples, I would like an array or pandas series that contains the probability values it internally computed.

If this can't be done with pandas kde, let me know of any equivalent in scipy or other

1

1 Answer 1

19

there are several ways to do that. You can either compute it yourself or get it from the plot.

  1. As pointed out in the comment by @RichieV following this post, you can extract the data from the plot using
data.plot.kde().get_lines()[0].get_xydata()
  1. Use seaborn and then the same as in 1):

You can use seaborn to estimate the kernel density and then matplotlib to extract the values (as in this post). You can either use distplot or kdeplot:

import seaborn as sns

# kde plot
x,y = sns.kdeplot(data).get_lines()[0].get_data()
# distplot
x,y = sns.distplot(data, hist=False).get_lines()[0].get_data()

  1. You can use the underlying methods of scipy.stats.gaussian_kde to estimate the kernel density which is used by pandas:
import scipy.stats

density = scipy.stats.gaussian_kde(data)

and then you can use this to evaluate it on a set of points:

x = np.linspace(0,80,200)
y = density(xs)
Sign up to request clarification or add additional context in comments.

4 Comments

for the third method, what if the data is known to be non-gaussian?
That's a problematic issue, scipy nor anything which is built on top of it, like pandas can handle anything non-gaussian. If you need that, I recommend using statsmodels. I also recommend this post about other kernels: jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/… or consult statsmodels: scikit-learn.org/stable/modules/….
pandas.plot.kde() will graphically display the estimated density of anything you send it though, whether it be non-normal or non-unimodal
The scipy docs says: The estimation works best for a unimodal distribution; bimodal or multi-modal distributions tend to be oversmoothed. About the non-normal. Yes, it will always return something, the question is what. I recommend the previous links.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.