Euclidean distance in Python

Question

I have two 3000x3 vectors and I'd like to compute 1-to-1 Euclidean distance between them. For example, vec1 is

The vec2 is

I'd like to get the results as

I triedscipy.spatial.distance.cdist(vec1,vec2), and it returns a 3000x3000 matrix whereas I only need the main diagonal. I also tried np.sqrt(np.sum((vec1-vec2)**2 for vec1,vec2 in zip(vec1,vec2))) and it didn't work for my purpose. Is there any way to compute the distances please? I'd appreciate any comments.

Yes, in 2 different files. The following posts answered my question. Thanks anyway. — user3821120
– user3821120, Commented Aug 21, 2015 at 18:41

ali_m · Accepted Answer · 2015-08-21 17:56:01Z

3

cdist gives you back a 3000 x 3000 array because it computes the distance between every pair of row vectors in your two input arrays.

To compute only the distances between corresponding row indices, you could use np.linalg.norm:

a = np.repeat((np.arange(3000) + 1)[:, None], 3, 1)
b = a + 1

dist = np.linalg.norm(a - b, axis=1)

Or using standard vectorized array operations:

dist = np.sqrt(((a - b) ** 2).sum(1))

answered Aug 21, 2015 at 17:56

ali_m

74.6k28 gold badges230 silver badges315 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

wgwz · Accepted Answer · 2015-08-22 16:53:21Z

0

Here's another way that works. It still utilizes the np.linalg.norm function but it processes the data, if that is something you needed.

import numpy as np
vec1='''1 1 1
    2 2 2
    3 3 3
    4 4 4'''
vec2='''2 2 2
    3 3 3
    4 4 4
    5 5 5'''

process_vec1 = np.array([])
process_vec2 = np.array([])

for line in vec1:
    process_vec1 = np.append( process_vec1, map(float,line.split()) )
for line in vec2:
    process_vec2 = np.append( process_vec2, map(float,line.split()) )

process_vec1 = process_vec1.reshape( (len(process_vec1)/3, 3) )
process_vec2 = process_vec2.reshape( (len(process_vec2)/3, 3) )

dist = np.linalg.norm( process_vec1 - process_vec2 , axis = 1 )

print dist

[1.7320508075688772 1.7320508075688772 1.7320508075688772 1.7320508075688772]

edited Aug 22, 2015 at 16:53

answered Aug 21, 2015 at 18:25

wgwz

2,7792 gold badges27 silver badges38 bronze badges

1 Comment

ali_m Over a year ago

In general it's going to be a lot faster to use vectorization to process multiple rows (e.g. np.linalg.norm(process_vec1 - process_vec2, axis=1)) rather than using map, which implicitly iterates over the rows in Python rather than C.

Collectives™ on Stack Overflow

Euclidean distance in Python

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related