How to create a density plot

Question

In R I can create the desired output by doing:

data = c(rep(1.5, 7), rep(2.5, 2), rep(3.5, 8),
         rep(4.5, 3), rep(5.5, 1), rep(6.5, 8))
plot(density(data, bw=0.5))

Density plot in R

In python (with matplotlib) the closest I got was with a simple histogram:

import matplotlib.pyplot as plt
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
plt.hist(data, bins=6)
plt.show()

Histogram in matplotlib

I also tried the normed=True parameter but couldn't get anything other than trying to fit a gaussian to the histogram.

My latest attempts were around scipy.stats and gaussian_kde, following examples on the web, but I've been unsuccessful so far.

Xin · Accepted Answer · 2015-09-26 23:57:03Z

205

Five years later, when I Google "how to create a kernel density plot using python", this thread still shows up at the top!

Today, a much easier way to do this is to use seaborn, a package that provides many convenient plotting functions and good style management.

import numpy as np
import seaborn as sns
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
sns.set_style('whitegrid')
sns.kdeplot(np.array(data), bw=0.5)

answered Sep 26, 2015 at 23:57

Xin

4,6125 gold badges22 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Sitz Blogz Over a year ago

Thank you so much .. Been searching for something like this since days .. can u pls explain why the bw=0.5 is given?

Xin Over a year ago

@SitzBlogz The bw parameter stands for bandwidth. I was trying to match OP's setting (see his original first code example). For a detailed explanation of what bw controls, see en.wikipedia.org/wiki/…. Basically it controls how smooth you want the density plot to be. The larger the bw, the more smooth it will be.

Sitz Blogz Over a year ago

I have another query to ask my data is discrete in nature and I am trying to plot the PDF for that, after reading through scipy doc I understood that PMF = PDF any suggestions on that how to plot it?

endolith Over a year ago

When I try this I get TypeError: slice indices must be integers or None or have an __index__ method

Raisin Over a year ago

Just want to add that the bw parameter is deprecated, and can be removed as a starting point.

EJoshuaS - Stand with Ukraine · Accepted Answer · 2019-03-03 05:28:45Z

150

Sven has shown how to use the class gaussian_kde from Scipy, but you will notice that it doesn't look quite like what you generated with R. This is because gaussian_kde tries to infer the bandwidth automatically. You can play with the bandwidth in a way by changing the function covariance_factor of the gaussian_kde class. First, here is what you get without changing that function:

alt text

However, if I use the following code:

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
density = gaussian_kde(data)
xs = np.linspace(0,8,200)
density.covariance_factor = lambda : .25
density._compute_covariance()
plt.plot(xs,density(xs))
plt.show()

I get

alt text

which is pretty close to what you are getting from R. What have I done? gaussian_kde uses a changable function, covariance_factor to calculate its bandwidth. Before changing the function, the value returned by covariance_factor for this data was about .5. Lowering this lowered the bandwidth. I had to call _compute_covariance after changing that function so that all of the factors would be calculated correctly. It isn't an exact correspondence with the bw parameter from R, but hopefully it helps you get in the right direction.

edited Mar 3, 2019 at 5:28

EJoshuaS - Stand with Ukraine

12.2k63 gold badges59 silver badges86 bronze badges

answered Nov 11, 2010 at 6:49

Justin Peel

47.1k6 gold badges62 silver badges81 bronze badges

2 Comments

eddygeek Over a year ago

A set_bandwidth method and a bw_method constructor argument were added to gaussian_kde in scipy 0.11.0 per issue 1619

Ger Over a year ago

In order to link with other answers, in the seaborn or pandas implementation of the kde, the default kde is the gaussian_kde.

Aziz Alto · Accepted Answer · 2017-12-18 02:44:51Z

73

Option 1:

Use pandas dataframe plot (built on top of matplotlib):

import pandas as pd
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
pd.DataFrame(data).plot(kind='density') # or pd.Series()

Option 2:

Use distplot of seaborn:

import seaborn as sns
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
sns.distplot(data, hist=False)

edited Dec 18, 2017 at 2:44

answered Nov 2, 2015 at 9:28

Aziz Alto

20.7k5 gold badges82 silver badges63 bronze badges

2 Comments

Anake Over a year ago

To add the bandwidth parameter: df.plot.density(bw_method=0.5)

Nate Anderson Over a year ago

@Aziz Don't need pandas.DataFrame, can use pandas.Series(data).plot(kind='density') @Anake, don't need to set df.plot.density as a separate step; can just pass in your bw_method kwarg into pd.Series(data).plot(kind='density', bw_method=0.5)

Sven Marnach · Accepted Answer · 2010-11-11 00:40:13Z

53

Maybe try something like:

import matplotlib.pyplot as plt
import numpy
from scipy import stats
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
density = stats.kde.gaussian_kde(data)
x = numpy.arange(0., 8, .1)
plt.plot(x, density(x))
plt.show()

You can easily replace gaussian_kde() by a different kernel density estimate.

answered Nov 11, 2010 at 0:40

Sven Marnach

608k123 gold badges968 silver badges865 bronze badges

Comments

zerryberry · Accepted Answer · 2020-10-23 17:02:28Z

0

You can do something like:

s = np.random.normal(2, 3, 1000)
import matplotlib.pyplot as plt
count, bins, ignored = plt.hist(s, 30, density=True)
plt.plot(bins, 1/(3 * np.sqrt(2 * np.pi)) * np.exp( - (bins - 2)**2 / (2 * 3**2) ), 
linewidth=2, color='r')
plt.show()

answered Oct 23, 2020 at 17:02

zerryberry

113 bronze badges

Comments

baxx · Accepted Answer · 2020-04-30 01:00:11Z

-1

The density plot can also be created by using matplotlib: The function plt.hist(data) returns the y and x values necessary for the density plot (see the documentation https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.hist.html). Resultingly, the following code creates a density plot by using the matplotlib library:

import matplotlib.pyplot as plt
dat=[-1,2,1,4,-5,3,6,1,2,1,2,5,6,5,6,2,2,2]
a=plt.hist(dat,density=True)
plt.close()
plt.figure()
plt.plot(a[1][1:],a[0])

This code returns the following density plot

edited Apr 30, 2020 at 1:00

baxx

4,95414 gold badges57 silver badges129 bronze badges

answered Oct 20, 2019 at 21:57

tetrisforjeff

951 silver badge10 bronze badges

2 Comments

András Aszódi Over a year ago

This answer deserves a downvote. I won't do it though, downvotes are evil, but rather explain what's wrong: Density estimates from a sample (set of data points) usually involve smoothing. This is what R's density() function does, or what SciPy's gaussian_kde() does. The result is an approximation of the continuous density the data points presumably came from, and that's what the OP was looking for.

JeeyCi Over a year ago

@András Aszódi: "usually involve smoothing", but not obligatory. THE MAIN idea about Density IS the equality of surface under the curve to 1 ! (OR integral over the histogram is 1= np.sum(hist*np.diff(bins))) and with plt.hist, as well as with numpy.histogram (docs) - with param density=True - PDF essence is satisfied even without smoothing... my upvote - to unmark negative vote, as so as the answer is correct and simple in implementation - numpy methods are convenient sometimes even without scipy.stats

Collectives™ on Stack Overflow

How to create a density plot

6 Answers 6

5 Comments

2 Comments

2 Comments

Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

5 Comments

2 Comments

2 Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related