Calculate Euclidian Distance in two numpy arrays

Question

I have two numpy arrays as following.

X = np.array([-0.34095692,-0.34044722,-0.27155318,-0.21320583,-0.44657865,-0.19587836, -0.29414279, -0.3948753 ,-0.21655774 , -0.34857087])
Y = np.array([0.16305762,0.38554548, 0.10412536, -0.57981103, 0.17927523, -0.22612216, -0.34569697, 0.30463137,0.01301744,-0.42661108])

These are x and y coordination of 10 users. I need to find similarity between each user. For eg :

x1 = -0.34095692
y1 = 0.16305762
x2 = -0.34044722
y2 = 0.38554548

Euclidean distance = (|x1-y1|^2 + |x2-y2|^2)^1/2

So ultimately I want to get a matrix like following: Help me on achieving this.

enter image description here

@Jonathon Reinhart: I have no idea of starting this? Any help? — Nilani Algiriyage
– Nilani Algiriyage, Commented Feb 25, 2014 at 8:36
Sigh, did you consider asking Google? It leads you directly to this successfully-answered question. — Jonathon Reinhart
– Jonathon Reinhart, Commented Feb 25, 2014 at 8:37
Or, if you prefer, SciPy has a function that handles all distance-related problems: docs.scipy.org/doc/scipy/reference/generated/… — Carsten
– Carsten, Commented Feb 25, 2014 at 8:42
do you actually mean (|x1-x2|^2+|y1-y2|^2)^0.5 instead of (|x1-y1|^2 - |x2-y2|^2)^1/2 ? — zhangxaochen
– zhangxaochen, Commented Feb 25, 2014 at 8:55

zhangxaochen · Accepted Answer · 2014-02-25 09:05:53Z

Use zip(X, Y) to get the coordinate pairs, and if you wanna get the euclidian distance between points, it should be (|x1-x2|^2+|y1-y2|^2)^0.5， not (|x1-y1|^2 - |x2-y2|^2)^1/2:

In [125]: coords=zip(X, Y)

In [126]: from scipy import spatial
     ...: dists=spatial.distance.cdist(coords, coords)

In [127]: dists
Out[127]: 
array([[ 0.        ,  0.22248844,  0.09104884,  0.75377329,  0.10685954,
         0.41534165,  0.5109039 ,  0.15149362,  0.19490308,  0.58971785],
       [ 0.22248844,  0.        ,  0.28973034,  0.9737061 ,  0.23197262,
         0.62852005,  0.73270705,  0.09751671,  0.39258852,  0.81219719],
       [ 0.09104884,  0.28973034,  0.        ,  0.68642072,  0.19047682,
         0.33880688,  0.45038919,  0.23539542,  0.1064197 ,  0.53629553],
       [ 0.75377329,  0.9737061 ,  0.68642072,  0.        ,  0.79415038,
         0.35411306,  0.24770988,  0.90290761,  0.59283795,  0.20443561],
       [ 0.10685954,  0.23197262,  0.19047682,  0.79415038,  0.        ,
         0.47665258,  0.54665574,  0.13560014,  0.28381556,  0.61376196],
       [ 0.41534165,  0.62852005,  0.33880688,  0.35411306,  0.47665258,
         0.        ,  0.15477091,  0.56683251,  0.24003205,  0.25201351],
       [ 0.5109039 ,  0.73270705,  0.45038919,  0.24770988,  0.54665574,
         0.15477091,  0.        ,  0.65808357,  0.36700881,  0.09751671],
       [ 0.15149362,  0.09751671,  0.23539542,  0.90290761,  0.13560014,
         0.56683251,  0.65808357,  0.        ,  0.34181257,  0.73270705],
       [ 0.19490308,  0.39258852,  0.1064197 ,  0.59283795,  0.28381556,
         0.24003205,  0.36700881,  0.34181257,  0.        ,  0.45902146],
       [ 0.58971785,  0.81219719,  0.53629553,  0.20443561,  0.61376196,
         0.25201351,  0.09751671,  0.73270705,  0.45902146,  0.        ]])

To get the upper triangle of this array, use numpy.triu:

In [128]: np.triu(dists)
Out[128]: 
array([[ 0.        ,  0.22248844,  0.09104884,  0.75377329,  0.10685954,
         0.41534165,  0.5109039 ,  0.15149362,  0.19490308,  0.58971785],
       [ 0.        ,  0.        ,  0.28973034,  0.9737061 ,  0.23197262,
         0.62852005,  0.73270705,  0.09751671,  0.39258852,  0.81219719],
       [ 0.        ,  0.        ,  0.        ,  0.68642072,  0.19047682,
         0.33880688,  0.45038919,  0.23539542,  0.1064197 ,  0.53629553],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.79415038,
         0.35411306,  0.24770988,  0.90290761,  0.59283795,  0.20443561],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.47665258,  0.54665574,  0.13560014,  0.28381556,  0.61376196],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.15477091,  0.56683251,  0.24003205,  0.25201351],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.65808357,  0.36700881,  0.09751671],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.34181257,  0.73270705],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.45902146],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])

Thank you very much! At last found it. Thanks a lot again. :)

Kiwi · Accepted Answer · 2014-02-25 09:05:45Z

2

Short snippet that does the job :

A = (X-Y)**2
p, q = np.meshgrid(np.arange(10), np.arange(10))
np.sqrt(A[p]-A[q])

Edit : Explanations

A is just a precomputed vector with all squared differences.
The magic is in np.meshgrid : The purpose of this function is to generate all pairs of values in tow different arrays. This is not the best solution because you will get the whole matrix but it's not a big deal for the number of samples you have. Values generated will correspond to the indices of A.
The indexation part A[p] is some kind of magic too. Try it yourself to understand its behaviour.
Here the matrix is full of nan but that's what you ask for. The true euclidean distance is +, not -.

p & q :

 array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
   [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
   [3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
   [4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
   [5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
   [6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
   [7, 7, 7, 7, 7, 7, 7, 7, 7, 7],
   [8, 8, 8, 8, 8, 8, 8, 8, 8, 8],
   [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]])

edited Feb 25, 2014 at 9:05

answered Feb 25, 2014 at 8:44

Kiwi

2,83619 silver badges16 bronze badges

3 Comments

Nilani Algiriyage Over a year ago

This is nice! I havent checked the accuracy of this. Could you please explain this. Any way there are lots of nans right?

Nilani Algiriyage Over a year ago

Thank you very much for your detailed answer. Yes, that should be + which I have now updated in the question. One last question which I don't get, what all these 'nans' mean?(Are they more close or more separated or what?)

Kiwi Over a year ago

the difference may be negative, sqrt will make negative numbers be nan. With the correct formula, you won't get these nans

Collectives™ on Stack Overflow

Calculate Euclidian Distance in two numpy arrays

2 Answers 2

2 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related