Select values of one array based on a boolean expression applied to another array

Question

Starting with the following array

array([ nan,  nan,  nan,   1.,  nan,  nan,   0.,  nan,  nan])

which is generated like so:

import numpy as np
row = np.array([ np.nan,  np.nan,  np.nan,   1.,  np.nan,  np.nan,   0.,  np.nan,  np.nan])

I'd like to get the indices of the sorted array and then exclude the nans. In this case, I'd like to get [6,3].

I've come up with the following way to do this:

vals = np.sort(row)
inds = np.argsort(row)

def select_index_by_value(indices, values):
    selected_indices = []
    for i in range(len(indices)):
        if not np.isnan(values[i]):
            selected_indices.append(indices[i])
    return selected_indices

selected_inds = select_index_by_value(inds, vals)

Now selected_inds is [6,3]. However, this seems like quite a few lines of code to achieve something simple. Is there perhaps a shorter way of doing this?

Divakar · Accepted Answer · 2016-07-30 13:41:54Z

3

You could do something like this -

# Store non-NaN indices
idx = np.where(~np.isnan(row))[0]

# Select non-NaN elements, perform argsort and use those argsort       
# indices to re-order non-NaN indices as final output
out = idx[row[idx].argsort()]

answered Jul 30, 2016 at 13:41

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

akuiper · Accepted Answer · 2016-07-30 13:47:02Z

1

Another option:

row.argsort()[~np.isnan(np.sort(row))]
# array([6, 3])

answered Jul 30, 2016 at 13:47

akuiper

216k33 gold badges363 silver badges380 bronze badges

1 Comment

Kurt Peek Over a year ago

Both solutions work, but I find the use of boolean indexing more elegant than Numpy's where. Thanks!

Merlin · Accepted Answer · 2016-08-01 15:53:14Z

There is another faster solution (for OP data).

Psidom's Solution

%timeit row.argsort()[~np.isnan(np.sort(row))]

The slowest run took 31.23 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.16 µs per loop

Divakar's Solution

%timeit idx = np.where(~np.isnan(row))[0]; idx[row[idx].argsort()]

The slowest run took 35.11 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.73 µs per loop

Based on Divakar's Solution

%timeit np.where(~np.isnan(row))[0][::-1]

The slowest run took 9.42 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.86 µs per loop

I think this works because np.where(~np.isnan(row)) retains order.

Collectives™ on Stack Overflow

Select values of one array based on a boolean expression applied to another array

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related