0

I am trying to create a loop that will allow me to loop through both numpy arrays and floats, specifically, ndarray and float64.

My current code is:

def euclidean_distance(a, b):
    print (type(a))
    print (type(b))
    total_distance = 0

    for index in range(len(a)):
        total_distance = total_distance + ((a[index] - b[index])*(a[index] - b[index]))
    total_distance = math.sqrt(total_distance)

    return total_distance

My output is:

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.float64'>
<class 'numpy.float64'>

Traceback (most recent call last):
  File "D:/ML/WiP_KMeans.py", line 289, in <module>
    main()
  File "D:/ML/WiP_KMeans.py", line 286, in main
    k_means(test, 3)
  File "D:/ML/WiP_KMeans.py", line 239, in k_means
    centroid_error = centroid_error + get_centroid_error(currCent , oldCent)
  File "D:/ML/WiP_KMeans.py", line 70, in get_centroid_error
    total_error = total_error + euclidean_distance(centroid[index], old_centroid[index])
  File "D:/ML/WiP_KMeans.py", line 48, in euclidean_distance
    for index in range(len(a)):
TypeError: object of type 'numpy.float64' has no len()

I have tried using different variations of nditer from numpy documentation, but have not found a solution that will allow me to properly iterate either an ndarray or a float to calculate Euclidean Distance.

An example of a normal input can be something like a=[0.3, 5.4, 3.2, 11.0] and b=[0.0, 5.0, 31.3, 2.0]. I have included some examples, here:

[5.9, 3.0, 5.1, 1.8]  -  [5.1, 3.3, 1.7, 0.5]
[5.9, 3.0, 5.1, 1.8]  -  [4.8, 3.4, 1.9, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.0, 3.0, 1.6, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.0, 3.4, 1.6, 0.4]
[5.9, 3.0, 5.1, 1.8]  -  [5.2, 3.5, 1.5, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.2, 3.4, 1.4, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [4.7, 3.2, 1.6, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [4.8, 3.1, 1.6, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.4, 3.4, 1.5, 0.4]
[5.9, 3.0, 5.1, 1.8]  -  [5.2, 4.1, 1.5, 0.1]
[5.9, 3.0, 5.1, 1.8]  -  [4.9, 3.1, 1.5, 0.1]
[5.9, 3.0, 5.1, 1.8]  -  [5.0, 3.2, 1.2, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.5, 3.5, 1.3, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [4.9, 3.1, 1.5, 0.1]
[5.9, 3.0, 5.1, 1.8]  -  [4.4, 3.0, 1.3, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.1, 3.4, 1.5, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.0, 3.5, 1.3, 0.3]
[5.9, 3.0, 5.1, 1.8]  -  [4.5, 2.3, 1.3, 0.3]
[5.9, 3.0, 5.1, 1.8]  -  [4.4, 3.2, 1.3, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.0, 3.5, 1.6, 0.6]
[5.9, 3.0, 5.1, 1.8]  -  [5.1, 3.8, 1.9, 0.4]
[5.9, 3.0, 5.1, 1.8]  -  [4.8, 3.0, 1.4, 0.3]
[5.9, 3.0, 5.1, 1.8]  -  [5.1, 3.8, 1.6, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [4.6, 3.2, 1.4, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.3, 3.7, 1.5, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.0, 3.3, 1.4, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [4.9, 2.4, 3.3, 1.0]
[5.9, 3.0, 5.1, 1.8]  -  [5.0, 2.0, 3.5, 1.0]
[5.9, 3.0, 5.1, 1.8]  -  [5.0, 2.3, 3.3, 1.0]
[5.9, 3.0, 5.1, 1.8]  -  [5.1, 2.5, 3.0, 1.1]
[5.488288288288287]  -  [6.4]

Can anybody assist?

8
  • Use a.shape[0] instead of len(a). Numpy arrays to not work with the len() function in python. Commented Feb 25, 2019 at 3:58
  • @CollinPhillips I receive the error TypeError: 'int' object is not iterable. What it is normally iterating through is something that looks like [0.2,2.9,3.4,5.6], but can also just be two floats 4.2 and 9.6 (arbitrary) Commented Feb 25, 2019 at 4:00
  • Please give in the input and if possible the output too. That would be more helpful to understand the array. Commented Feb 25, 2019 at 4:02
  • Sure. Editing now @JimTodd Commented Feb 25, 2019 at 4:02
  • 1
    @IanThompson I unfortunately am working under a constraint where I cannot use scipy Commented Feb 25, 2019 at 4:22

2 Answers 2

3

This operation can be fully vectorized (no Python for loops needed, massive performance increase):

a = np.array([0.3, 5.4, 3.2, 11.0])
b = np.array([0.0, 5.0, 31.3, 2.0])
np.sqrt(np.sum((a - b) ** 2))

Howver, NumPy comes with batteries included. There is a function for this:

np.linalg.norm(a - b)

Similar performance should be expected from both methods. The second is probably faster, though.

Sign up to request clarification or add additional context in comments.

2 Comments

This is an excellent answer. NumPy's power is the ability to avoid implementing loops in Python.
This is the better answer, there should be no reason to iterate through the values.
0

Here is an example that should work for you.

import numpy as np

a=np.array([0.3, 5.4, 3.2, 11.0])
b=np.array([0.0, 5.0, 31.3, 2.0])
c=np.array([0.1])
d=np.array([6.2])

def dist(x,y):
    return np.sqrt(sum([(x[i]-y[i])**2 for i in range(x.shape[0])]))

print(dist(a,b))
print(dist(c,d))

4 Comments

Hey @Colin Phillips. Thanks for the suggestion. I just tried that and got the error IndexError: tuple index out of range. Thoughts?
@Jerry, your code itself seems to work fine without error for me. And also, Collin's code runs without error. Whats the version of python you are using?
For what it is worth, it looks like it errors out when the values are: 5.488288288288287 and 6.3. I edited your code to print out types as well.
@JimTodd version 3.7

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.