4

Is there a way to plot densities using data that has observation weights?

I have a vector of observations x and a vector of integer weights y, such that y1 indicates how many observations we have of x1. That is, the density of

   x    y 
   1    2
   2    2
   2    3 

is equal to the density of 1, 1, 2, 2, 2, 2 ,2 (2x1, 5x2). As far as I understand it, matplotlib.pyplot.hist(weights=y) allow for observation weights when plotting the histogram. Is there any equivalent for computing and plotting the density?

The reason I want the package to be able to do this is that my data is very big, and I'm looking for a more efficient alternative.

Alternatively, I'm open to other packages.

6
  • You only need to generate the densities from the observations? Commented Nov 12, 2014 at 22:32
  • 1
    Sorry for the confusion, I want to plot the densities as in stackoverflow.com/questions/4150171/… Commented Nov 12, 2014 at 22:36
  • so as I understand it, you only need to create a list that you call a histogram and send it to one of the package suggested. Is your trouble creating that list from observations, or do you have a list and you're having trouble with the package? Or both? Commented Nov 12, 2014 at 22:41
  • 1
    I say that I know functions that allow plotting histograms using observation weights. On the other hand, I'm not aware of functions that allow plotting densities using these weights. I bring the comparison given that densities are somewhat limit cases of histograms. I am not aware of being able to plot densities using histograms. Commented Nov 12, 2014 at 22:44
  • Ahhh now I get it...! Sorry, can't help you too much there :) Commented Nov 12, 2014 at 22:45

1 Answer 1

4

Statsmodels' kde univariate receives weights in its fit function. See the output of the following code.

import matplotlib.pyplot as plt
import statsmodels.api as sm
import pandas as pd

df = pd.DataFrame({'x':[1.,2.],'weight':[2,4]})
weighted = sm.nonparametric.KDEUnivariate(df.x)
noweight = sm.nonparametric.KDEUnivariate(df.x)
weighted.fit(fft=False, weights=df.weight)
noweight.fit()

f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
ax1.plot(noweight.support, noweight.density)
ax2.plot(weighted.support, weighted.density)

ax1.set_title('No Weight')
ax2.set_title('Weighted')

Output: No Weight vs Weighted Densities

Note: Your time concern regarding array creation will probably not be resolved with this. Because as noted in the source code:

If FFT is False, then a ‘number_of_obs’ x ‘gridsize’ intermediate array is created

Sign up to request clarification or add additional context in comments.

1 Comment

Use ax1.plot(noweight.support, noweight.density) to have correct x-axis values. Also, note that the weights need to be a numpy array (or a column in pandas) or you will have the code complaining it can not do weights.sum()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.