Python - Numpy: How to reduce rounding errors in this pairwise distance code?

Ask Question

Asked 9 years, 11 months ago

Modified 9 years, 11 months ago

Viewed 841 times

I have to calculate the pairwise weighted distance of a potentially large number of vectors. Let A be the m vectors/n atribute matrix, W is a 1xn weight vector.

My first function calculates the pairwise distance the naive way:

def weightedDistance(A,W):
    aff = np.zeros((len(A),len(A)))
    for i in range(len(A)):
        for j in range(len(A)):
            aff[i][j] = np.sum(((A[i] - A[j])**2)*W)
    return aff

This takes over twenty seconds for a 1000x2 matrix.

Then I tried a faster version, using matrix multiplication, based on this blog post:

def weightedDistanceFast(A,W):
    AA = np.sum((A**2)*W,1)
    A2 = -2*((A*W).dot(A.T))
    return (AA+A2).W+AA

This is "instant" for the same 1000x2 matrix. However, it introduces a bunch of errors at the order of e-17.

>>> dist = desc.weightedDistanceFast(attr,(1,1))
>>> dist2 = desc.weightedDistance(attr,(1,1))
>>> np.sum(dist-dist2)
6.9155863612166586e-11
>>> len(np.where((dist-dist2 > 0))[0])
380824

Checking other answers around here, I also tried to use the wminkowski distance from the sklearn.distance package, but it yielded different results from the above two functions depending on the value of W.

def weightedDistanceMinkowski(A,W):
    aff = sp.squareform(sp.pdist(A,metric='wminkowski',p=2,w=W))
    return aff**2

Anyway, I know that a certain degree of error is inevitable, but since I'm going to do further calculations on this distance matrix, I wonder if there is a way to reduce this error (or, alternatively, a less error-prone way to make function 1 faster).

Thank you!

edited May 23, 2017 at 12:04

CommunityBot

11 silver badge

asked Dec 22, 2015 at 2:08

caranha

111 bronze badge

The real question is do you really need that level of precision? If not (likely!) just go for the speed! And if you did need it, then you'll probably have to worry more about the noise in your data than the numerical errors...

Julien
– Julien

2015-12-22 02:24:14 +00:00
Commented Dec 22, 2015 at 2:24
If necessary, you can increase the precision by changing an array's data type, e.g., A = A.astype(np.float64) or similar.

Reti43
– Reti43

2015-12-22 02:27:09 +00:00
Commented Dec 22, 2015 at 2:27
What makes you so confident that the first answer is the "correct" one? Both will have rounding errors, but if I had to guess I would expect the second answer to be closer to the true distance value. np.dot usually calls BLAS routines, which can make use of things like fused multiply-add operations to reduce the total number of floating point operations required and hence the rounding error.

ali_m
– ali_m

2015-12-22 03:30:05 +00:00
Commented Dec 22, 2015 at 3:30
Thank you all for the comments -- Julien is quite correct in mentioning that if I care about this level of error I might have bigger problems. So I guess my zeroth question should be if these errors really matter, to which I say "I'm not sure yet". Time to think about it. ali_m: as for the correctness, the errors occur even on very small matrixes, which I did check by hand. This leads me to believe that it is numpy introducing the errors, but thanks for the comment! I'm still interested in the answer for curiosity's sake. I wonder if what Reti says is all there is to it.

caranha
– caranha

2016-01-04 02:03:01 +00:00
Commented Jan 4, 2016 at 2:03

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Python - Numpy: How to reduce rounding errors in this pairwise distance code?

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked