Detect ordered pair in numpy array

Question

I am using a numpy array to hold a list of ordered pairs (representing grid coordinates). The algorithm I am writing needs to check if a newly generated ordered pair is already in this array. Below is a schematic of the code:

cluster=np.array([[x1,y1]])
cluster=np.append(cluster,[[x2,y2]],axis=0)
cluster=np.append...etc.

new_spin=np.array([[x,y]])

if new_spin in cluster==False:
    do something

The problem with this current code is that it gives false positives. If x or y appear in the cluster, then new_spin in cluster evaluates as true. At first I thought a simple fix would be to ask if x and y appear in cluster, but this would not ensure that they appear as an ordered pair. To make sure they appear as an ordered pair I'd have to find the indices where x and y appear in cluster and compare them, which seems very clunky and inelegant, and I'm certain there must be a better solution out there. However, I have not been able to work it out myself.

Thanks for any help.

Its a bit annoying because of a small bug that is related to it in numpy <1.7., but if you query the same set many times, you should use sorting, or maybe hack something with scipy.spatial.cKDTree if the current bugs in numpy are too annoying. — seberg
– seberg, Commented Dec 15, 2012 at 17:05

unutbu · Accepted Answer · 2012-12-15 03:49:01Z

4

Let's work through an example:

In [7]: import numpy as np
In [8]: cluster = np.random.randint(10, size = (5,2))
In [9]: cluster
Out[9]: 
array([[9, 7],
       [7, 2],
       [8, 9],
       [1, 3],
       [3, 4]])

In [10]: new_spin = np.array([[1,2]])

In [11]: new_spin == cluster
Out[11]: 
array([[False, False],
       [False,  True],
       [False, False],
       [ True, False],
       [False, False]], dtype=bool)

new_spin == cluster is a numpy array of dtype bool. It is True where the value in cluster equals the corresponding value in new_spin.

For new_spin to be "in" cluster, a row of the above boolean array must all be True. We can find such rows by calling the all(axis = 1) method:

In [12]: (new_spin == cluster).all(axis = 1)
Out[12]: array([False, False, False, False, False], dtype=bool)

So new_spin is "in" cluster, if any of the rows is all True:

In [13]: 
In [14]: (new_spin == cluster).all(axis = 1).any()
Out[14]: False

By the way, np.append is a very slow operation -- slower than Python list.append. Chances are, you will get much better performance if you avoid np.append. If cluster is not too large, you may be better off making cluster a Python list of lists -- at least until you are done appending items. Then, if needed, convert cluster to a numpy array with cluster = np.array(cluster).

edited Dec 15, 2012 at 3:49

answered Dec 15, 2012 at 3:07

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Dylan B Over a year ago

I did end up using a list of lists, which can be queried using a simple (x,y) in cluster statement without issue (and because one of my friends pointed out that lists should be faster to use than arrays, and I didn't really need it to be an array - I'm just used to working with them as a data type). I like your answer for using arrays, and it's good to know that any and all can accept axes.

seberg Over a year ago

@DylanB from the built-in datatypes, set is most likely preferable to a list though, since its in is much more efficient.

Collectives™ on Stack Overflow

Detect ordered pair in numpy array

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related