0

I have a dataframe 'df', from which I want to extract values and put in 2 different arrays that would be 3D arrays. Then I want to find minkowski distances between both array for whole sets of values in the dataset and append those (according to p_values) to the original data frame. But I'm not able to create function properly

my df looks like:

    x1         y1       z1        x2        y2        z2
0  0.040928  0.250813  0.258730  0.050584  0.298290  0.273055
1  0.000000  0.174905  0.228518  0.011435  0.215528  0.233548
2  0.990905  0.746038  0.790401  0.972913  0.755414  0.822155
3  0.914052  0.669185  0.707238  0.922316  0.676172  0.734213
4  0.909504  0.480774  0.484074  0.915810  0.503221  0.489242

then I defined 2 arrays p1 and p2 as:

p1 = df[["x1", "y1", "z1"]].to_numpy() 
p2 = df[["x2", "y2", "z2"]].to_numpy() 

Now I want to calculate minkowski values for different values of p, between both arrays:

from math import sqrt
 
# calculate minkowski distance
def minkowski_distance(a, b, p):
    return sum(abs(e1-e2)**p for e1, e2 in zip(a,b))**(1/p)

dist = minkowski_distance(p1,p2, 2)
dist
array([13.0317225 ,  9.36364486,  7.56526207])

I want my resultant data frame to look like:

x1  y1  z1  x2  y2  z2  m(1)  m(2)  m(3) ...

where m(1) represents minkowski distance for p=1 and so on And all the rows of this data frame should correspond to the row value for which distance is to be calculated i.e.

(x1, y1, z1) <---------m--------> (x2,y2,z2)
2
  • What is the error? Copy paste in your post (no screenshot) Commented Jan 22, 2022 at 8:03
  • It gives cumultaive sort of results as shown by variable 'dist' for all values of x1,y1,z1 and x2,y2,z2. Commented Jan 22, 2022 at 8:43

1 Answer 1

1

You could try to calculate Minkowski distance in a vectorised way:

def minkowski_distance(a, b, p=2):
    return np.sum(np.abs(a - b)**p, axis=1)**(1/p)

for p in range(1, 4):
    df[f'm({p})'] = minkowski_distance(p1, p2, p)
Sign up to request clarification or add additional context in comments.

2 Comments

This worked for me. Thanks a lot @mathfux Can you please explain to me why didn't my code work this way? taking up each row of vectors and correspondingly giving the values?
Alrigth, let's take an example. You want to sum [np.array([4, 9, 16]), np.array([0, 4, 9]), np.array([4, 1, 9]), np.array([4, 16, 0]), np.array([9, 9, 4])]. That's a bad idea. I didn't expect it to work but it just adds your columns. It's equivalent to np.sum(arr, axis=0). You need an axis=1. Another thing, you need to refuse iteration of arrays because numpy is not designed for it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.