Efficient way of sampling from indices of a Numpy array?

Question

I'd like to sample from indices of a 2D Numpy array, considering that each index is weighted by the number inside of that array. The way I know it is with numpy.random.choice however that does not return the index but the number itself. Is there any efficient way of doing so?

Here is my code:

import numpy as np
A=np.arange(1,10).reshape(3,3)
A_flat=A.flatten()
d=np.random.choice(A_flat,size=10,p=A_flat/float(np.sum(A_flat)))
print d

See this: stackoverflow.com/a/10803136/553404 with minor modifications — YXD
– YXD, Commented Nov 4, 2013 at 0:26
@MrE, but that means to make an extra array to store the indices, right? — Cupitor
– Cupitor, Commented Nov 4, 2013 at 0:29
Yes. I'd save the output of np.indices(A), flatten the result(ing tuples) as well as your array of weights, use the linked method and your result is then given by flattened_indices_x[idx], flattened_indices[idx]. EDIT: Actually you can probably use docs.scipy.org/doc/numpy/reference/generated/… to avoid creating the index array and get the 2d index straight from idx and your weights array shape. — YXD
– YXD, Commented Nov 4, 2013 at 0:36

Bi Rico · Accepted Answer · 2013-11-04 01:32:11Z

2

You could do something like:

import numpy as np

def wc(weights):
    cs = np.cumsum(weights)
    idx = cs.searchsorted(np.random.random() * cs[-1], 'right')
    return np.unravel_index(idx, weights.shape)

Notice that the cumsum is the slowest part of this, so if you need to do this repeatidly for the same array I'd suggest computing the cumsum ahead of time and reusing it.

edited Nov 4, 2013 at 1:32

answered Nov 4, 2013 at 0:52

Bi Rico

25.9k3 gold badges57 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Cupitor Over a year ago

Unfortunately the weights are changing so I cannot do that. However your search is going to be costly right? I think keeping indices in a separate array would make computations cheaper, right?

Bi Rico Over a year ago

The search is quite cheep, especially compared to cusum or sum. I don't see how indices in a separate array would help here, but maybe I'm missing something. The cumsum is not too bad either, but for people who need to take multiple samples from the same distribution, it's an easy optimization to do the cumsum ahead of time.

Tim Peters Over a year ago

@Naji, cs is sorted and searchsorted() exploits that to do a binary search - only O(log(len(weights))) comparisons are needed. Very cheap.

Community · Accepted Answer · 2017-05-23 12:33:00Z

2

To expand on my comment: Adapting the weighted choice method presented here https://stackoverflow.com/a/10803136/553404

def weighted_choice_indices(weights):
    cs = np.cumsum(weights.flatten())/np.sum(weights)
    idx = np.sum(cs < np.random.rand())
    return np.unravel_index(idx, weights.shape)

edited May 23, 2017 at 12:33

CommunityBot

11 silver badge

answered Nov 4, 2013 at 0:47

YXD

32.6k15 gold badges79 silver badges117 bronze badges

Collectives™ on Stack Overflow

Efficient way of sampling from indices of a Numpy array?

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related