Properly iterate through ndarray

Question

I am trying to create a loop that will allow me to loop through both numpy arrays and floats, specifically, ndarray and float64.

My current code is:

def euclidean_distance(a, b):
    print (type(a))
    print (type(b))
    total_distance = 0

    for index in range(len(a)):
        total_distance = total_distance + ((a[index] - b[index])*(a[index] - b[index]))
    total_distance = math.sqrt(total_distance)

    return total_distance

My output is:

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.float64'>
<class 'numpy.float64'>

Traceback (most recent call last):
  File "D:/ML/WiP_KMeans.py", line 289, in <module>
    main()
  File "D:/ML/WiP_KMeans.py", line 286, in main
    k_means(test, 3)
  File "D:/ML/WiP_KMeans.py", line 239, in k_means
    centroid_error = centroid_error + get_centroid_error(currCent , oldCent)
  File "D:/ML/WiP_KMeans.py", line 70, in get_centroid_error
    total_error = total_error + euclidean_distance(centroid[index], old_centroid[index])
  File "D:/ML/WiP_KMeans.py", line 48, in euclidean_distance
    for index in range(len(a)):
TypeError: object of type 'numpy.float64' has no len()

I have tried using different variations of nditer from numpy documentation, but have not found a solution that will allow me to properly iterate either an ndarray or a float to calculate Euclidean Distance.

An example of a normal input can be something like a=[0.3, 5.4, 3.2, 11.0] and b=[0.0, 5.0, 31.3, 2.0]. I have included some examples, here:

[5.9, 3.0, 5.1, 1.8]  -  [5.1, 3.3, 1.7, 0.5]
[5.9, 3.0, 5.1, 1.8]  -  [4.8, 3.4, 1.9, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.0, 3.0, 1.6, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.0, 3.4, 1.6, 0.4]
[5.9, 3.0, 5.1, 1.8]  -  [5.2, 3.5, 1.5, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.2, 3.4, 1.4, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [4.7, 3.2, 1.6, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [4.8, 3.1, 1.6, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.4, 3.4, 1.5, 0.4]
[5.9, 3.0, 5.1, 1.8]  -  [5.2, 4.1, 1.5, 0.1]
[5.9, 3.0, 5.1, 1.8]  -  [4.9, 3.1, 1.5, 0.1]
[5.9, 3.0, 5.1, 1.8]  -  [5.0, 3.2, 1.2, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.5, 3.5, 1.3, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [4.9, 3.1, 1.5, 0.1]
[5.9, 3.0, 5.1, 1.8]  -  [4.4, 3.0, 1.3, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.1, 3.4, 1.5, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.0, 3.5, 1.3, 0.3]
[5.9, 3.0, 5.1, 1.8]  -  [4.5, 2.3, 1.3, 0.3]
[5.9, 3.0, 5.1, 1.8]  -  [4.4, 3.2, 1.3, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.0, 3.5, 1.6, 0.6]
[5.9, 3.0, 5.1, 1.8]  -  [5.1, 3.8, 1.9, 0.4]
[5.9, 3.0, 5.1, 1.8]  -  [4.8, 3.0, 1.4, 0.3]
[5.9, 3.0, 5.1, 1.8]  -  [5.1, 3.8, 1.6, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [4.6, 3.2, 1.4, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.3, 3.7, 1.5, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [5.0, 3.3, 1.4, 0.2]
[5.9, 3.0, 5.1, 1.8]  -  [4.9, 2.4, 3.3, 1.0]
[5.9, 3.0, 5.1, 1.8]  -  [5.0, 2.0, 3.5, 1.0]
[5.9, 3.0, 5.1, 1.8]  -  [5.0, 2.3, 3.3, 1.0]
[5.9, 3.0, 5.1, 1.8]  -  [5.1, 2.5, 3.0, 1.1]
[5.488288288288287]  -  [6.4]

Can anybody assist?

Use a.shape[0] instead of len(a). Numpy arrays to not work with the len() function in python. — Collin Phillips
– Collin Phillips, Commented Feb 25, 2019 at 3:58
@CollinPhillips I receive the error TypeError: 'int' object is not iterable. What it is normally iterating through is something that looks like [0.2,2.9,3.4,5.6], but can also just be two floats 4.2 and 9.6 (arbitrary) — artemis
– artemis, Commented Feb 25, 2019 at 4:00
Please give in the input and if possible the output too. That would be more helpful to understand the array. — Jim Todd
– Jim Todd, Commented Feb 25, 2019 at 4:02
@IanThompson I unfortunately am working under a constraint where I cannot use scipy — artemis
– artemis, Commented Feb 25, 2019 at 4:22

iz_ · Accepted Answer · 2019-02-25 04:27:52Z

3

This operation can be fully vectorized (no Python for loops needed, massive performance increase):

a = np.array([0.3, 5.4, 3.2, 11.0])
b = np.array([0.0, 5.0, 31.3, 2.0])
np.sqrt(np.sum((a - b) ** 2))

Howver, NumPy comes with batteries included. There is a function for this:

np.linalg.norm(a - b)

Similar performance should be expected from both methods. The second is probably faster, though.

edited Feb 25, 2019 at 4:27

answered Feb 25, 2019 at 4:21

iz_

16.7k4 gold badges29 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Nikolas Stevenson-Molnar Over a year ago

This is an excellent answer. NumPy's power is the ability to avoid implementing loops in Python.

Collin Phillips Over a year ago

This is the better answer, there should be no reason to iterate through the values.

Collin Phillips · Accepted Answer · 2019-02-25 04:13:04Z

0

Here is an example that should work for you.

import numpy as np

a=np.array([0.3, 5.4, 3.2, 11.0])
b=np.array([0.0, 5.0, 31.3, 2.0])
c=np.array([0.1])
d=np.array([6.2])

def dist(x,y):
    return np.sqrt(sum([(x[i]-y[i])**2 for i in range(x.shape[0])]))

print(dist(a,b))
print(dist(c,d))

answered Feb 25, 2019 at 4:13

Collin Phillips

1967 bronze badges

4 Comments

artemis Over a year ago

Hey @Colin Phillips. Thanks for the suggestion. I just tried that and got the error IndexError: tuple index out of range. Thoughts?

Jim Todd Over a year ago

@Jerry, your code itself seems to work fine without error for me. And also, Collin's code runs without error. Whats the version of python you are using?

artemis Over a year ago

For what it is worth, it looks like it errors out when the values are: 5.488288288288287 and 6.3. I edited your code to print out types as well.

artemis Over a year ago

@JimTodd version 3.7

Collectives™ on Stack Overflow

Properly iterate through ndarray

2 Answers 2

2 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related