How could I get numpy array indices by some conditions

Question

I come to a problem like this: suppose I have arrays like this: a = np.array([[1,2,3,4,5,4,3,2,1],]) label = np.array([[1,0,1,0,0,1,1,0,1],]) I need to obtain the indices of a at which position the element value of label is 1 and the value of a is the largest amount all that causing label to be 1.

It maybe confusing, in the above example, the indices where label is 1 are: 0, 2, 5, 6, 8, their corresponding values of a are thus: 1, 3, 4, 3, 1, among which 4 is the larges, thus I need to get the result of 5 which is the index of number 4 in a. How could I do this with numpy ?

Divakar · Accepted Answer · 2018-09-25 07:10:41Z

Get the 1s indices say as idx, then index into a with it, get max index and finally trace it back to the original order by indexing into idx -

idx = np.flatnonzero(label==1)
out = idx[a[idx].argmax()]

Sample run -

# Assuming inputs to be 1D
In [18]: a
Out[18]: array([1, 2, 3, 4, 5, 4, 3, 2, 1])

In [19]: label
Out[19]: array([1, 0, 1, 0, 0, 1, 1, 0, 1])

In [20]: idx = np.flatnonzero(label==1)

In [21]: idx[a[idx].argmax()]
Out[21]: 5

For a as ints and label as an array of 0s and 1s, we could optimize further as we could scale a based on the range of values in it, like so -

(label*(a.max()-a.min()+1) + a).argmax()

Furthermore, if a has positive numbers only, it would simplify to -

(label*(a.max()+1) + a).argmax()

Timings for positive ints largish a -

In [115]: np.random.seed(0)
     ...: a = np.random.randint(0,10,(100000))
     ...: label = np.random.randint(0,2,(100000))

In [117]: %%timeit
     ...: idx = np.flatnonzero(label==1)
     ...: out = idx[a[idx].argmax()]
1000 loops, best of 3: 592 µs per loop

In [116]: %timeit (label*(a.max()-a.min()+1) + a).argmax()
1000 loops, best of 3: 357 µs per loop

# @coldspeed's soln
In [120]: %timeit np.ma.masked_where(~label.astype(bool), a).argmax()
1000 loops, best of 3: 1.63 ms per loop

# won't work with negative numbers in a
In [119]: %timeit (label*(a.max()+1) + a).argmax()
1000 loops, best of 3: 292 µs per loop

# @klim's soln (won't work with negative numbers in a)
In [121]: %timeit np.argmax(a * (label == 1))
1000 loops, best of 3: 229 µs per loop

cs95 · Accepted Answer · 2018-09-25 05:48:55Z

1

You can use masked arrays:

>>> np.ma.masked_where(~label.astype(bool), a).argmax()
5

answered Sep 25, 2018 at 5:48

cs95

406k106 gold badges744 silver badges797 bronze badges

Comments

klim · Accepted Answer · 2018-09-25 05:58:30Z

1

Here is one of the simplest ways.

>>> np.argmax(a * (label == 1))
5
>>> np.argmax(a * (label == 1), axis=1)
array([5])

Coldspeed's method may take more time.

answered Sep 25, 2018 at 5:58

klim

1,2699 silver badges11 bronze badges

3 Comments

Divakar Over a year ago

What if there are negative numbers in a?

klim Over a year ago

Divakar. Yes this will not work if there are negatives numbers in a and if there is no match in label.

Divakar Over a year ago

Still pretty fast under the constraints, going by the timings just added.

Collectives™ on Stack Overflow

How could I get numpy array indices by some conditions

3 Answers 3

1 Comment

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related