5

I'd like to sample from indices of a 2D Numpy array, considering that each index is weighted by the number inside of that array. The way I know it is with numpy.random.choice however that does not return the index but the number itself. Is there any efficient way of doing so?

Here is my code:

import numpy as np
A=np.arange(1,10).reshape(3,3)
A_flat=A.flatten()
d=np.random.choice(A_flat,size=10,p=A_flat/float(np.sum(A_flat)))
print d
3
  • See this: stackoverflow.com/a/10803136/553404 with minor modifications Commented Nov 4, 2013 at 0:26
  • @MrE, but that means to make an extra array to store the indices, right? Commented Nov 4, 2013 at 0:29
  • 1
    Yes. I'd save the output of np.indices(A), flatten the result(ing tuples) as well as your array of weights, use the linked method and your result is then given by flattened_indices_x[idx], flattened_indices[idx]. EDIT: Actually you can probably use docs.scipy.org/doc/numpy/reference/generated/… to avoid creating the index array and get the 2d index straight from idx and your weights array shape. Commented Nov 4, 2013 at 0:36

2 Answers 2

2

You could do something like:

import numpy as np

def wc(weights):
    cs = np.cumsum(weights)
    idx = cs.searchsorted(np.random.random() * cs[-1], 'right')
    return np.unravel_index(idx, weights.shape)

Notice that the cumsum is the slowest part of this, so if you need to do this repeatidly for the same array I'd suggest computing the cumsum ahead of time and reusing it.

Sign up to request clarification or add additional context in comments.

3 Comments

Unfortunately the weights are changing so I cannot do that. However your search is going to be costly right? I think keeping indices in a separate array would make computations cheaper, right?
The search is quite cheep, especially compared to cusum or sum. I don't see how indices in a separate array would help here, but maybe I'm missing something. The cumsum is not too bad either, but for people who need to take multiple samples from the same distribution, it's an easy optimization to do the cumsum ahead of time.
@Naji, cs is sorted and searchsorted() exploits that to do a binary search - only O(log(len(weights))) comparisons are needed. Very cheap.
2

To expand on my comment: Adapting the weighted choice method presented here https://stackoverflow.com/a/10803136/553404

def weighted_choice_indices(weights):
    cs = np.cumsum(weights.flatten())/np.sum(weights)
    idx = np.sum(cs < np.random.rand())
    return np.unravel_index(idx, weights.shape)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.