Gaussian kernel density estimation with fixed covariance (with python)

Question

I can perform a Gaussian kernel density estimation using scipy library by simply running

from scipy import stats
kernel = stats.gaussian_kde(data)

but I would like to fix the covariance to some predefined value and perform KDE with it. Is there a simple way to achieve this with help of python without explicitly writing the optimization procedure (which I will do if there is no existing library offering such functionality, yet I wish to avoid it).

Can you elaborate on what you mean by "covariance" in this case? Generally, for density estimation, the gaussian involved serves as a "window" function, and the "covariance" of that window (effectively the bandwidth parameter in a 1-D case) is just meant to control how the window's response falls off as a function of distance for the point-under-test. I am not familiar with any KDE procedure that seeks to use a specific multivariate covariance structure for this window-fall-off effect. — ely
– ely, Commented Sep 10, 2013 at 15:42
I would also guess that the most complicated such 'covariance' that would be advisable in practice would be a diagonal matrix where you just used a different bandwidth parameter for each dimension of the data. Maybe (and it's a very tenuous maybe) you could do some kind of PCA breakdown of the principle directions of your data and put the different bandwidths there, but I think it's highly unlikely this will payoff unless the data directions have wildly different scales, in which case you'd be better off just scoring your inputs before doing the KDE in the first place, and use one bandwidth. — ely
– ely, Commented Sep 10, 2013 at 15:44
@EMS, if you are fitting a multivariate Gaussian, you can have a covariance. I suspect that is what the OP is asking about. — gung - Reinstate Monica
– gung - Reinstate Monica, Commented Sep 10, 2013 at 15:53
I don't think the question is about fitting a Gaussian, but I could be wrong. — ely
– ely, Commented Sep 10, 2013 at 15:54

ely · Accepted Answer · 2013-09-10 15:49:58Z

3

From my comments:

Generally, for density estimation, the gaussian involved serves as a "window" function, and the "covariance" of that window (effectively the bandwidth parameter in a 1-D case) is just meant to control how the window's response falls off as a function of distance for the point-under-test. I am not familiar with any KDE procedure that seeks to use a specific multivariate covariance structure for this window-fall-off effect.

I would also guess that the most complicated such 'covariance' that would be advisable in practice would be a diagonal matrix where you just used a different bandwidth parameter for each dimension of the data. Maybe (and it's a very tenuous maybe) you could do some kind of PCA breakdown of the principle directions of your data and put the different bandwidths there, but I think it's highly unlikely this will payoff unless the data directions have wildly different scales, in which case you'd be better off just scoring your inputs before doing the KDE in the first place, and use one bandwidth.

If you read the KDE examples from scikits.learn, and the documentation for their KernelDensity class, it also seems that (like SciPy) they just offer you a bandwidth feature (a single floating point number) to summarize the way the kernel's repsonse should fall off.

To me this suggests it's not of much practical interest to have a lot of control over multivariate bandwidth settings. Best bet is to performs some scoring or standardization to transform your input variables in a way that makes them all of the same scale (so that smoothing in every direction with the same scale is appropriate) and then use the KDE to predict or classify values in that transformed space, and apply inverse transformations to each coordinate if you want to go back to the original scaled space.

answered Sep 10, 2013 at 15:49

ely

77.8k36 gold badges158 silver badges234 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

fr_andres Over a year ago

This answer makes a nice point about normalizing the data for the diagonal variances, plus the arguable lack of use for covariates in a KDE. The suggestion of operating on the data instead of the kernel functions is on point. I was looking for a way of providing custom covariances as well, and although not "accepted", this answer convinced me that renormalizing is cleaner. I'm assuming (and probably the Q/A does as well?) that we are talking about Gaussian kernels, or others that can be characterized by their mean and covariance matrix

Collectives™ on Stack Overflow

Gaussian kernel density estimation with fixed covariance (with python)

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related