Efficient weighted vector distance calculation with numpy

Question

I want to calculate the squared euclidean distance between two sets of points, inputs and testing. inputs is typically a real array of size ~(200, N), whereas testing is typically ~(1e8, N), and N is around 10. The distances should be scaled in each dimension in N, so I'd be aggregating the expression scale[j]*(inputs[i,j] - testing[ii,j])**2 (where scale is the scaling vector) over N times. I am trying to make this as fast as possible, particularly as N can be large. My first test is

def old_version (inputs, testing, x0):
    nn, d1 = testing.shape
    n, d1 = inputs.shape
    b = np.zeros((n, nn))
    for d in xrange(d1):
        b += x0[d] * (((np.tile(inputs[:, d], (nn, 1)) -
             np.tile (testing[:, d], (n, 1)).T))**2).T
return b

Nothing too fancy. I then tried using scipy.spatial.distance.cdist, although I still have to loop through it to get the scaling right

def new_version (inputs, testing, x0):
    # import scipy.spatial.distance as dist
    nn, d1 = testing.shape
    n, d1 = inputs.shape
    b = np.zeros ((n, nn))

    for d in xrange(d1):
        b += x0[d] * dist.cdist(inputs[:, d][:, None], 
             testing[:, d][:, None], 'sqeuclidean')
    return b

It would appear that new_version scales better (as N > 1000), but I'm not sure that I've gone as fast as possible here. Any further ideas much appreciated!

if you have to loop 10 times only, it seems you already got a really good approach... — Saullo G. P. Castro
– Saullo G. P. Castro, Commented Jun 16, 2014 at 17:08
Check this out - i did some similar tests on stuff like that before: stackoverflow.com/questions/17527340/… — usethedeathstar
– usethedeathstar, Commented Jun 17, 2014 at 11:54

Xyand · Accepted Answer · 2014-06-16 19:00:45Z

3

This code gave me a factor of 10 over your implementation, give it a try:

x = np.random.randn(200, 10)
y = np.random.randn(1e5, 10)
scale = np.abs(np.random.randn(1, 10))
scale_sqrt = np.sqrt(scale)

dist_map = dist.cdist(x*scale_sqrt, y*scale_sqrt, 'sqeuclidean')

These are the test results:

In [135]: %timeit suggested_version(inputs, testing, x0)

1 loops, best of 3: 341 ms per loop

In [136]: %timeit op_version(inputs, testing, x00) (NOTICE: x00 is a reshape of x0)

1 loops, best of 3: 3.37 s per loop

Just make sure than when you go for the larger N you don't get low on memory. It can really slow things down.

edited Jun 16, 2014 at 19:00

answered Jun 16, 2014 at 18:52

Xyand

4,5085 gold badges41 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jose Over a year ago

OK, I was pretty sure I had tried that to get rid of the loop, but I probably got the sqrt bit missing. I'm getting a 16.5x increase in speed.

Collectives™ on Stack Overflow

Efficient weighted vector distance calculation with numpy

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related