2

I want to calculate the squared euclidean distance between two sets of points, inputs and testing. inputs is typically a real array of size ~(200, N), whereas testing is typically ~(1e8, N), and N is around 10. The distances should be scaled in each dimension in N, so I'd be aggregating the expression scale[j]*(inputs[i,j] - testing[ii,j])**2 (where scale is the scaling vector) over N times. I am trying to make this as fast as possible, particularly as N can be large. My first test is

def old_version (inputs, testing, x0):
    nn, d1 = testing.shape
    n, d1 = inputs.shape
    b = np.zeros((n, nn))
    for d in xrange(d1):
        b += x0[d] * (((np.tile(inputs[:, d], (nn, 1)) -
             np.tile (testing[:, d], (n, 1)).T))**2).T
return b

Nothing too fancy. I then tried using scipy.spatial.distance.cdist, although I still have to loop through it to get the scaling right

def new_version (inputs, testing, x0):
    # import scipy.spatial.distance as dist
    nn, d1 = testing.shape
    n, d1 = inputs.shape
    b = np.zeros ((n, nn))

    for d in xrange(d1):
        b += x0[d] * dist.cdist(inputs[:, d][:, None], 
             testing[:, d][:, None], 'sqeuclidean')
    return b

It would appear that new_version scales better (as N > 1000), but I'm not sure that I've gone as fast as possible here. Any further ideas much appreciated!

2
  • if you have to loop 10 times only, it seems you already got a really good approach... Commented Jun 16, 2014 at 17:08
  • 1
    Check this out - i did some similar tests on stuff like that before: stackoverflow.com/questions/17527340/… Commented Jun 17, 2014 at 11:54

1 Answer 1

3

This code gave me a factor of 10 over your implementation, give it a try:

x = np.random.randn(200, 10)
y = np.random.randn(1e5, 10)
scale = np.abs(np.random.randn(1, 10))
scale_sqrt = np.sqrt(scale)

dist_map = dist.cdist(x*scale_sqrt, y*scale_sqrt, 'sqeuclidean')

These are the test results:

In [135]: %timeit suggested_version(inputs, testing, x0)

1 loops, best of 3: 341 ms per loop

In [136]: %timeit op_version(inputs, testing, x00) (NOTICE: x00 is a reshape of x0)

1 loops, best of 3: 3.37 s per loop

Just make sure than when you go for the larger N you don't get low on memory. It can really slow things down.

Sign up to request clarification or add additional context in comments.

1 Comment

OK, I was pretty sure I had tried that to get rid of the loop, but I probably got the sqrt bit missing. I'm getting a 16.5x increase in speed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.