4

I have a sorted numpy array X and also two constants k and delta that are not in X. I would like to find the index corresponding to, the largest value in X less than or equal to k and the value must be within delta of k i.e. I want

max {i | k - delta <= X[i] <= k }    (1)

Note this set may be empty in which case I will return None. The way I'm doing it I currently feel is unoptimal as it doesn't take advantage of the fact X is ordered at the first step

# Get the max from the set of indices in X satisfying (1)
idx = np.where((k-delta <= X) * (X <= k))[0].max()

I'm not sure how clever Numpy can be when doing this as it doesn't already know X is sorted hence the (k-delta <= X) * (X <= k)) will I assume take longer than necessary. Note we can use the .max() as we know ourselves the array is sorted.

What would be a more optimal way of doing this?

3
  • 1
    You say the array is sorted. So isn't the element that satisfies your condition will always be in the index before k? That fact greatly simplifies the solution. Commented Aug 3, 2016 at 11:04
  • Yes but k is not in X so I must first do a comparison between k and the elements n X Commented Aug 3, 2016 at 11:08
  • 1
    I see. It's important to note that k isn't necessarily in X. You might want to clarify it in your question. Commented Aug 3, 2016 at 11:10

2 Answers 2

4

One efficient approach to leverage the sorted order would be with np.searchsorted -

def largest_within_delta(X, k, delta):
    right_idx = X.searchsorted(k,'right')-1
    if (k - X[right_idx]) <= delta:
        return right_idx
    else:
        return None

Sample runs to cover various scenarios -

In [216]: X
Out[216]: array([ 8,  9, 33, 35, 36, 37, 44, 45, 71, 81])

In [217]: largest_within_delta(X, 36, 0) # this k is already in array
Out[217]: 4

In [218]: largest_within_delta(X, 36, 1) # shouldn't choose for next one 37
Out[218]: 4    

In [220]: largest_within_delta(X, 40, 3) # Gets 37's index
Out[220]: 5

In [221]: largest_within_delta(X, 40, 2) # Out of 37's reach

Runtime test

In [212]: # Inputs
     ...: X = np.unique(np.random.randint(0,1000000,(10000)))
     ...: k = 50000
     ...: delta = 100
     ...: 

In [213]: %timeit np.where((k-delta <= X) * (X <= k))[0].max()
10000 loops, best of 3: 44.6 µs per loop

In [214]: %timeit largest_within_delta(X, k, delta)
100000 loops, best of 3: 3.22 µs per loop
Sign up to request clarification or add additional context in comments.

2 Comments

I like this, just a quick note that to answer the question precisely the function should return right_idx as opposed to X[right_idx]. I'll wait to see if anyone has any speed analysis before accepting.
@rwolst Updated my post accordingly. Added some timings there.
1

Numpy.argmax could be usefull for taking advantage of the sorted list.

import numpy as np
np.argmax(X <= k) if k-d < np.argmax(X <= k) < k+d else None

1 Comment

note that in case of ties np.argmax will return the index of the first largest value, but it seems more logical to prefer the last one, also i don't think that np.argmax takes advantage of the sorted list

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.