4

I can perform a Gaussian kernel density estimation using scipy library by simply running

from scipy import stats
kernel = stats.gaussian_kde(data)

but I would like to fix the covariance to some predefined value and perform KDE with it. Is there a simple way to achieve this with help of python without explicitly writing the optimization procedure (which I will do if there is no existing library offering such functionality, yet I wish to avoid it).

4
  • Can you elaborate on what you mean by "covariance" in this case? Generally, for density estimation, the gaussian involved serves as a "window" function, and the "covariance" of that window (effectively the bandwidth parameter in a 1-D case) is just meant to control how the window's response falls off as a function of distance for the point-under-test. I am not familiar with any KDE procedure that seeks to use a specific multivariate covariance structure for this window-fall-off effect. Commented Sep 10, 2013 at 15:42
  • I would also guess that the most complicated such 'covariance' that would be advisable in practice would be a diagonal matrix where you just used a different bandwidth parameter for each dimension of the data. Maybe (and it's a very tenuous maybe) you could do some kind of PCA breakdown of the principle directions of your data and put the different bandwidths there, but I think it's highly unlikely this will payoff unless the data directions have wildly different scales, in which case you'd be better off just scoring your inputs before doing the KDE in the first place, and use one bandwidth. Commented Sep 10, 2013 at 15:44
  • @EMS, if you are fitting a multivariate Gaussian, you can have a covariance. I suspect that is what the OP is asking about. Commented Sep 10, 2013 at 15:53
  • I don't think the question is about fitting a Gaussian, but I could be wrong. Commented Sep 10, 2013 at 15:54

1 Answer 1

3

From my comments:

Generally, for density estimation, the gaussian involved serves as a "window" function, and the "covariance" of that window (effectively the bandwidth parameter in a 1-D case) is just meant to control how the window's response falls off as a function of distance for the point-under-test. I am not familiar with any KDE procedure that seeks to use a specific multivariate covariance structure for this window-fall-off effect.

I would also guess that the most complicated such 'covariance' that would be advisable in practice would be a diagonal matrix where you just used a different bandwidth parameter for each dimension of the data. Maybe (and it's a very tenuous maybe) you could do some kind of PCA breakdown of the principle directions of your data and put the different bandwidths there, but I think it's highly unlikely this will payoff unless the data directions have wildly different scales, in which case you'd be better off just scoring your inputs before doing the KDE in the first place, and use one bandwidth.

If you read the KDE examples from scikits.learn, and the documentation for their KernelDensity class, it also seems that (like SciPy) they just offer you a bandwidth feature (a single floating point number) to summarize the way the kernel's repsonse should fall off.

To me this suggests it's not of much practical interest to have a lot of control over multivariate bandwidth settings. Best bet is to performs some scoring or standardization to transform your input variables in a way that makes them all of the same scale (so that smoothing in every direction with the same scale is appropriate) and then use the KDE to predict or classify values in that transformed space, and apply inverse transformations to each coordinate if you want to go back to the original scaled space.

Sign up to request clarification or add additional context in comments.

1 Comment

This answer makes a nice point about normalizing the data for the diagonal variances, plus the arguable lack of use for covariates in a KDE. The suggestion of operating on the data instead of the kernel functions is on point. I was looking for a way of providing custom covariances as well, and although not "accepted", this answer convinced me that renormalizing is cleaner. I'm assuming (and probably the Q/A does as well?) that we are talking about Gaussian kernels, or others that can be characterized by their mean and covariance matrix

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.