I'd like to pick at random out of the indexes of those elements of a numpy array that meet a condition. My arrays are typically 2D, a few million elements total; the condition is computed over the whole array, and only relatively few elements (< a percent) come out true. I need to pick one element where the condition is true, at random. Because of the use of the data, the random choice has to be unbiased (every true element picked with the same probability) and I only pick one per array on each pass (so no reuse of any calculations).
Slow code which does the right thing by building a list of all candidate indexes explicitly:
# prepare sample data
img = np.zeros((2048,2048), dtype=bool)
for n in range(10000):
i, j = np.random.randint(img.shape[0]), np.random.randint(img.shape[1])
img[i,j] = True
def pick(img):
indexes = np.argwhere(img)
k = np.random.randint(len(indexes))
return indexes[k]
pick(img) # around 8ms
This seems to take a stupidly long time to pick one element out of 10000. The culprit is, of course, np.argwhere() which is where most of the time is spend. I don't need the whole list this returns; I just need one element from a random shuffle of that list, and can stop the calculation early at that point.
How do I do the same thing, but faster?
P.S. The elements may be clustered - it is entirely possible for all of the true values to be in one corner of the array. So any speedup which relies on dividing areas probably won't work :)
imgin your test case? I agree 8ms sounds incredibly long fornp.argwhere()np.unravel_index(np.random.choice(np.flatnonzero(img)), img.shape)seems to do a lot better, there might be better yet thonp.ix_) that could be useful here but I'm not super familiar with that set of numpy functionality, I'll look around for a bit and expand on the comment if I can't find anything else in a few minutes