1

I am using a numpy array to hold a list of ordered pairs (representing grid coordinates). The algorithm I am writing needs to check if a newly generated ordered pair is already in this array. Below is a schematic of the code:

cluster=np.array([[x1,y1]])
cluster=np.append(cluster,[[x2,y2]],axis=0)
cluster=np.append...etc.

new_spin=np.array([[x,y]])

if new_spin in cluster==False:
    do something

The problem with this current code is that it gives false positives. If x or y appear in the cluster, then new_spin in cluster evaluates as true. At first I thought a simple fix would be to ask if x and y appear in cluster, but this would not ensure that they appear as an ordered pair. To make sure they appear as an ordered pair I'd have to find the indices where x and y appear in cluster and compare them, which seems very clunky and inelegant, and I'm certain there must be a better solution out there. However, I have not been able to work it out myself.

Thanks for any help.

1
  • Its a bit annoying because of a small bug that is related to it in numpy <1.7., but if you query the same set many times, you should use sorting, or maybe hack something with scipy.spatial.cKDTree if the current bugs in numpy are too annoying. Commented Dec 15, 2012 at 17:05

1 Answer 1

4

Let's work through an example:

In [7]: import numpy as np
In [8]: cluster = np.random.randint(10, size = (5,2))
In [9]: cluster
Out[9]: 
array([[9, 7],
       [7, 2],
       [8, 9],
       [1, 3],
       [3, 4]])

In [10]: new_spin = np.array([[1,2]])

In [11]: new_spin == cluster
Out[11]: 
array([[False, False],
       [False,  True],
       [False, False],
       [ True, False],
       [False, False]], dtype=bool)

new_spin == cluster is a numpy array of dtype bool. It is True where the value in cluster equals the corresponding value in new_spin.

For new_spin to be "in" cluster, a row of the above boolean array must all be True. We can find such rows by calling the all(axis = 1) method:

In [12]: (new_spin == cluster).all(axis = 1)
Out[12]: array([False, False, False, False, False], dtype=bool)

So new_spin is "in" cluster, if any of the rows is all True:

In [13]: 
In [14]: (new_spin == cluster).all(axis = 1).any()
Out[14]: False

By the way, np.append is a very slow operation -- slower than Python list.append. Chances are, you will get much better performance if you avoid np.append. If cluster is not too large, you may be better off making cluster a Python list of lists -- at least until you are done appending items. Then, if needed, convert cluster to a numpy array with cluster = np.array(cluster).

Sign up to request clarification or add additional context in comments.

2 Comments

I did end up using a list of lists, which can be queried using a simple (x,y) in cluster statement without issue (and because one of my friends pointed out that lists should be faster to use than arrays, and I didn't really need it to be an array - I'm just used to working with them as a data type). I like your answer for using arrays, and it's good to know that any and all can accept axes.
@DylanB from the built-in datatypes, set is most likely preferable to a list though, since its in is much more efficient.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.