How to extract density function probabilities in python (pandas kde)

Question

The pandas.plot.kde() function is handy for plotting the estimated density function of a continuous random variable. It will take data x as input, and display the probabilities p(x) of the binned input as its output.

How can I extract the values of probabilities it computes? Instead of just plotting the probabilities of bandwidthed samples, I would like an array or pandas series that contains the probability values it internally computed.

If this can't be done with pandas kde, let me know of any equivalent in scipy or other

Try this stackoverflow.com/a/8939010/6692898

RichieV
– RichieV

2020-08-05 05:15:01 +00:00
Commented Aug 5, 2020 at 5:15 — RichieV
– RichieV, Commented Aug 5, 2020 at 5:15

My Work · Accepted Answer · 2020-08-05 06:08:55Z

19

there are several ways to do that. You can either compute it yourself or get it from the plot.

As pointed out in the comment by @RichieV following this post, you can extract the data from the plot using

data.plot.kde().get_lines()[0].get_xydata()

Use seaborn and then the same as in 1):

You can use seaborn to estimate the kernel density and then matplotlib to extract the values (as in this post). You can either use distplot or kdeplot:

import seaborn as sns

# kde plot
x,y = sns.kdeplot(data).get_lines()[0].get_data()
# distplot
x,y = sns.distplot(data, hist=False).get_lines()[0].get_data()

You can use the underlying methods of scipy.stats.gaussian_kde to estimate the kernel density which is used by pandas:

import scipy.stats

density = scipy.stats.gaussian_kde(data)

and then you can use this to evaluate it on a set of points:

x = np.linspace(0,80,200)
y = density(xs)

answered Aug 5, 2020 at 6:08

My Work

2,5605 gold badges28 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

develarist Over a year ago

for the third method, what if the data is known to be non-gaussian?

My Work Over a year ago

That's a problematic issue, scipy nor anything which is built on top of it, like pandas can handle anything non-gaussian. If you need that, I recommend using statsmodels. I also recommend this post about other kernels: jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/… or consult statsmodels: scikit-learn.org/stable/modules/….

develarist Over a year ago

pandas.plot.kde() will graphically display the estimated density of anything you send it though, whether it be non-normal or non-unimodal

My Work Over a year ago

The scipy docs says: The estimation works best for a unimodal distribution; bimodal or multi-modal distributions tend to be oversmoothed. About the non-normal. Yes, it will always return something, the question is what. I recommend the previous links.

Collectives™ on Stack Overflow

How to extract density function probabilities in python (pandas kde)

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related