2

I would like to create an own customized k nearest neighbor method.

For this I would need a matrix (x : y) which returns the distance for each combination of x and y for a given function (e.g. euclidean based on 7 items of my dataset).

e.g.

data:
   x1  x2  x3
  row 1:  1   2   3
  row 2:  1   1   1 
  row 3:  4   2   3

if I select x1 and x2 and euclidean, then the output should be a 3x3 output

1:1=0
1:2 =sqrt((1-1)^2+(2-1)^2)=1
1:3 =sqrt((1-4)^2+(2-2)^2)=sqrt(3)
2:1=1:2=1
2:2=0
2:3=sqrt((1-4)^2+(1-2)^2)=2
3:3=0

and so forth...

how to write that without iterating through the dataframe?

Thanks in advance for your support.

1
  • It looks like some of your example calculations are wrong, i.e. 1:3 should be sqrt(9)=3, and 2:3 should be sqrt(10). Commented Nov 29, 2016 at 17:01

1 Answer 1

6

You can use scipy.spatial.distance.pdist and scipy.spatial.distance.squareform:

from scipy.spatial.distance import pdist, squareform

dist = pdist(df[['x1', 'x2']], 'euclidean')
df_dist = pd.DataFrame(squareform(dist))

If you just want an array as your output, and not a DataFrame, just use squareform by itself, without wrapping it in a DataFrame.

The resulting output (as a DataFrame):

     0         1         2
0  0.0  1.000000  3.000000
1  1.0  0.000000  3.162278
2  3.0  3.162278  0.000000
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.