Plot density using observation weights

Question

Is there a way to plot densities using data that has observation weights?

I have a vector of observations x and a vector of integer weights y, such that y1 indicates how many observations we have of x1. That is, the density of

is equal to the density of 1, 1, 2, 2, 2, 2 ,2 (2x1, 5x2). As far as I understand it, matplotlib.pyplot.hist(weights=y) allow for observation weights when plotting the histogram. Is there any equivalent for computing and plotting the density?

The reason I want the package to be able to do this is that my data is very big, and I'm looking for a more efficient alternative.

Alternatively, I'm open to other packages.

You only need to generate the densities from the observations? — Reut Sharabani
– Reut Sharabani, Commented Nov 12, 2014 at 22:32
Sorry for the confusion, I want to plot the densities as in stackoverflow.com/questions/4150171/… — FooBar
– FooBar, Commented Nov 12, 2014 at 22:36
so as I understand it, you only need to create a list that you call a histogram and send it to one of the package suggested. Is your trouble creating that list from observations, or do you have a list and you're having trouble with the package? Or both? — Reut Sharabani
– Reut Sharabani, Commented Nov 12, 2014 at 22:41
I say that I know functions that allow plotting histograms using observation weights. On the other hand, I'm not aware of functions that allow plotting densities using these weights. I bring the comparison given that densities are somewhat limit cases of histograms. I am not aware of being able to plot densities using histograms. — FooBar
– FooBar, Commented Nov 12, 2014 at 22:44
Ahhh now I get it...! Sorry, can't help you too much there :) — Reut Sharabani
– Reut Sharabani, Commented Nov 12, 2014 at 22:45

tozCSS · Accepted Answer · 2018-05-30 22:29:36Z

4

Statsmodels' kde univariate receives weights in its fit function. See the output of the following code.

import matplotlib.pyplot as plt
import statsmodels.api as sm
import pandas as pd

df = pd.DataFrame({'x':[1.,2.],'weight':[2,4]})
weighted = sm.nonparametric.KDEUnivariate(df.x)
noweight = sm.nonparametric.KDEUnivariate(df.x)
weighted.fit(fft=False, weights=df.weight)
noweight.fit()

f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
ax1.plot(noweight.support, noweight.density)
ax2.plot(weighted.support, weighted.density)

ax1.set_title('No Weight')
ax2.set_title('Weighted')

Output:

Note: Your time concern regarding array creation will probably not be resolved with this. Because as noted in the source code:

If FFT is False, then a ‘number_of_obs’ x ‘gridsize’ intermediate array is created

edited May 30, 2018 at 22:29

answered Nov 8, 2015 at 1:04

tozCSS

6,2343 gold badges37 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

fuyas Over a year ago

Use ax1.plot(noweight.support, noweight.density) to have correct x-axis values. Also, note that the weights need to be a numpy array (or a column in pandas) or you will have the code complaining it can not do weights.sum()

Collectives™ on Stack Overflow

Plot density using observation weights

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related