Optimal way for finding index of 'greatest value less than' in Numpy array

Question

I have a sorted numpy array X and also two constants k and delta that are not in X. I would like to find the index corresponding to, the largest value in X less than or equal to k and the value must be within delta of k i.e. I want

max {i | k - delta <= X[i] <= k }    (1)

Note this set may be empty in which case I will return None. The way I'm doing it I currently feel is unoptimal as it doesn't take advantage of the fact X is ordered at the first step

# Get the max from the set of indices in X satisfying (1)
idx = np.where((k-delta <= X) * (X <= k))[0].max()

I'm not sure how clever Numpy can be when doing this as it doesn't already know X is sorted hence the (k-delta <= X) * (X <= k)) will I assume take longer than necessary. Note we can use the .max() as we know ourselves the array is sorted.

What would be a more optimal way of doing this?

You say the array is sorted. So isn't the element that satisfies your condition will always be in the index before k? That fact greatly simplifies the solution. — DeepSpace
– DeepSpace, Commented Aug 3, 2016 at 11:04
Yes but k is not in X so I must first do a comparison between k and the elements n X — rwolst
– rwolst, Commented Aug 3, 2016 at 11:08
I see. It's important to note that k isn't necessarily in X. You might want to clarify it in your question. — DeepSpace
– DeepSpace, Commented Aug 3, 2016 at 11:10

Divakar · Accepted Answer · 2016-08-03 11:59:13Z

4

One efficient approach to leverage the sorted order would be with np.searchsorted -

def largest_within_delta(X, k, delta):
    right_idx = X.searchsorted(k,'right')-1
    if (k - X[right_idx]) <= delta:
        return right_idx
    else:
        return None

Sample runs to cover various scenarios -

In [216]: X
Out[216]: array([ 8,  9, 33, 35, 36, 37, 44, 45, 71, 81])

In [217]: largest_within_delta(X, 36, 0) # this k is already in array
Out[217]: 4

In [218]: largest_within_delta(X, 36, 1) # shouldn't choose for next one 37
Out[218]: 4    

In [220]: largest_within_delta(X, 40, 3) # Gets 37's index
Out[220]: 5

In [221]: largest_within_delta(X, 40, 2) # Out of 37's reach

Runtime test

In [212]: # Inputs
     ...: X = np.unique(np.random.randint(0,1000000,(10000)))
     ...: k = 50000
     ...: delta = 100
     ...: 

In [213]: %timeit np.where((k-delta <= X) * (X <= k))[0].max()
10000 loops, best of 3: 44.6 µs per loop

In [214]: %timeit largest_within_delta(X, k, delta)
100000 loops, best of 3: 3.22 µs per loop

edited Aug 3, 2016 at 11:59

answered Aug 3, 2016 at 11:21

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

rwolst Over a year ago

I like this, just a quick note that to answer the question precisely the function should return right_idx as opposed to X[right_idx]. I'll wait to see if anyone has any speed analysis before accepting.

Divakar Over a year ago

@rwolst Updated my post accordingly. Added some timings there.

dms_quant · Accepted Answer · 2016-08-03 11:29:20Z

1

Numpy.argmax could be usefull for taking advantage of the sorted list.

import numpy as np
np.argmax(X <= k) if k-d < np.argmax(X <= k) < k+d else None

edited Aug 3, 2016 at 11:29

answered Aug 3, 2016 at 11:11

dms_quant

951 gold badge5 silver badges11 bronze badges

1 Comment

Jean Paul Over a year ago

note that in case of ties np.argmax will return the index of the first largest value, but it seems more logical to prefer the last one, also i don't think that np.argmax takes advantage of the sorted list

Collectives™ on Stack Overflow

Optimal way for finding index of 'greatest value less than' in Numpy array

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related