0

I have two numpy arrays as following.

X = np.array([-0.34095692,-0.34044722,-0.27155318,-0.21320583,-0.44657865,-0.19587836, -0.29414279, -0.3948753 ,-0.21655774 , -0.34857087])
Y = np.array([0.16305762,0.38554548, 0.10412536, -0.57981103, 0.17927523, -0.22612216, -0.34569697, 0.30463137,0.01301744,-0.42661108])

These are x and y coordination of 10 users. I need to find similarity between each user. For eg :

x1 = -0.34095692
y1 = 0.16305762
x2 = -0.34044722
y2 = 0.38554548

Euclidean distance = (|x1-y1|^2 + |x2-y2|^2)^1/2

So ultimately I want to get a matrix like following: Help me on achieving this.

enter image description here

7
  • 1
    Sounds good. What is the question? Commented Feb 25, 2014 at 8:31
  • @Jonathon Reinhart: I have no idea of starting this? Any help? Commented Feb 25, 2014 at 8:36
  • 1
    Sigh, did you consider asking Google? It leads you directly to this successfully-answered question. Commented Feb 25, 2014 at 8:37
  • Or, if you prefer, SciPy has a function that handles all distance-related problems: docs.scipy.org/doc/scipy/reference/generated/… Commented Feb 25, 2014 at 8:42
  • 1
    do you actually mean (|x1-x2|^2+|y1-y2|^2)^0.5 instead of (|x1-y1|^2 - |x2-y2|^2)^1/2 ? Commented Feb 25, 2014 at 8:55

2 Answers 2

2

Use zip(X, Y) to get the coordinate pairs, and if you wanna get the euclidian distance between points, it should be (|x1-x2|^2+|y1-y2|^2)^0.5, not (|x1-y1|^2 - |x2-y2|^2)^1/2:

In [125]: coords=zip(X, Y)

In [126]: from scipy import spatial
     ...: dists=spatial.distance.cdist(coords, coords)

In [127]: dists
Out[127]: 
array([[ 0.        ,  0.22248844,  0.09104884,  0.75377329,  0.10685954,
         0.41534165,  0.5109039 ,  0.15149362,  0.19490308,  0.58971785],
       [ 0.22248844,  0.        ,  0.28973034,  0.9737061 ,  0.23197262,
         0.62852005,  0.73270705,  0.09751671,  0.39258852,  0.81219719],
       [ 0.09104884,  0.28973034,  0.        ,  0.68642072,  0.19047682,
         0.33880688,  0.45038919,  0.23539542,  0.1064197 ,  0.53629553],
       [ 0.75377329,  0.9737061 ,  0.68642072,  0.        ,  0.79415038,
         0.35411306,  0.24770988,  0.90290761,  0.59283795,  0.20443561],
       [ 0.10685954,  0.23197262,  0.19047682,  0.79415038,  0.        ,
         0.47665258,  0.54665574,  0.13560014,  0.28381556,  0.61376196],
       [ 0.41534165,  0.62852005,  0.33880688,  0.35411306,  0.47665258,
         0.        ,  0.15477091,  0.56683251,  0.24003205,  0.25201351],
       [ 0.5109039 ,  0.73270705,  0.45038919,  0.24770988,  0.54665574,
         0.15477091,  0.        ,  0.65808357,  0.36700881,  0.09751671],
       [ 0.15149362,  0.09751671,  0.23539542,  0.90290761,  0.13560014,
         0.56683251,  0.65808357,  0.        ,  0.34181257,  0.73270705],
       [ 0.19490308,  0.39258852,  0.1064197 ,  0.59283795,  0.28381556,
         0.24003205,  0.36700881,  0.34181257,  0.        ,  0.45902146],
       [ 0.58971785,  0.81219719,  0.53629553,  0.20443561,  0.61376196,
         0.25201351,  0.09751671,  0.73270705,  0.45902146,  0.        ]])

To get the upper triangle of this array, use numpy.triu:

In [128]: np.triu(dists)
Out[128]: 
array([[ 0.        ,  0.22248844,  0.09104884,  0.75377329,  0.10685954,
         0.41534165,  0.5109039 ,  0.15149362,  0.19490308,  0.58971785],
       [ 0.        ,  0.        ,  0.28973034,  0.9737061 ,  0.23197262,
         0.62852005,  0.73270705,  0.09751671,  0.39258852,  0.81219719],
       [ 0.        ,  0.        ,  0.        ,  0.68642072,  0.19047682,
         0.33880688,  0.45038919,  0.23539542,  0.1064197 ,  0.53629553],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.79415038,
         0.35411306,  0.24770988,  0.90290761,  0.59283795,  0.20443561],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.47665258,  0.54665574,  0.13560014,  0.28381556,  0.61376196],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.15477091,  0.56683251,  0.24003205,  0.25201351],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.65808357,  0.36700881,  0.09751671],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.34181257,  0.73270705],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.45902146],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much! At last found it. Thanks a lot again. :)
@NilaniAlgiriyage glad to help, np ;)
2

Short snippet that does the job :

A = (X-Y)**2
p, q = np.meshgrid(np.arange(10), np.arange(10))
np.sqrt(A[p]-A[q])

Edit : Explanations

  1. A is just a precomputed vector with all squared differences.
  2. The magic is in np.meshgrid : The purpose of this function is to generate all pairs of values in tow different arrays. This is not the best solution because you will get the whole matrix but it's not a big deal for the number of samples you have. Values generated will correspond to the indices of A.
  3. The indexation part A[p] is some kind of magic too. Try it yourself to understand its behaviour.
  4. Here the matrix is full of nan but that's what you ask for. The true euclidean distance is +, not -.

p & q :

 array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
   [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
   [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
   [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
   [3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
   [4, 4, 4, 4, 4, 4, 4, 4, 4, 4],
   [5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
   [6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
   [7, 7, 7, 7, 7, 7, 7, 7, 7, 7],
   [8, 8, 8, 8, 8, 8, 8, 8, 8, 8],
   [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]]) 

3 Comments

This is nice! I havent checked the accuracy of this. Could you please explain this. Any way there are lots of nans right?
Thank you very much for your detailed answer. Yes, that should be + which I have now updated in the question. One last question which I don't get, what all these 'nans' mean?(Are they more close or more separated or what?)
the difference may be negative, sqrt will make negative numbers be nan. With the correct formula, you won't get these nans

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.