1

Starting with the following array

array([ nan,  nan,  nan,   1.,  nan,  nan,   0.,  nan,  nan])

which is generated like so:

import numpy as np
row = np.array([ np.nan,  np.nan,  np.nan,   1.,  np.nan,  np.nan,   0.,  np.nan,  np.nan])

I'd like to get the indices of the sorted array and then exclude the nans. In this case, I'd like to get [6,3].

I've come up with the following way to do this:

vals = np.sort(row)
inds = np.argsort(row)

def select_index_by_value(indices, values):
    selected_indices = []
    for i in range(len(indices)):
        if not np.isnan(values[i]):
            selected_indices.append(indices[i])
    return selected_indices

selected_inds = select_index_by_value(inds, vals)

Now selected_inds is [6,3]. However, this seems like quite a few lines of code to achieve something simple. Is there perhaps a shorter way of doing this?

3 Answers 3

3

You could do something like this -

# Store non-NaN indices
idx = np.where(~np.isnan(row))[0]

# Select non-NaN elements, perform argsort and use those argsort       
# indices to re-order non-NaN indices as final output
out = idx[row[idx].argsort()]
Sign up to request clarification or add additional context in comments.

Comments

1

Another option:

row.argsort()[~np.isnan(np.sort(row))]
# array([6, 3])

1 Comment

Both solutions work, but I find the use of boolean indexing more elegant than Numpy's where. Thanks!
0

There is another faster solution (for OP data).

Psidom's Solution

%timeit row.argsort()[~np.isnan(np.sort(row))]

The slowest run took 31.23 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.16 µs per loop

Divakar's Solution

%timeit idx = np.where(~np.isnan(row))[0]; idx[row[idx].argsort()]

The slowest run took 35.11 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.73 µs per loop

Based on Divakar's Solution

%timeit np.where(~np.isnan(row))[0][::-1]

The slowest run took 9.42 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.86 µs per loop

I think this works because np.where(~np.isnan(row)) retains order.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.