1

I want to perform a binary-search using e.g. np.searchsorted, however, I do not want to create an explicit array containing values. Instead, I want to define a function giving the value to be expected at the desired position of the array, e.g. p(i) = i, where i denotes the position within the array.

Generating an array of values regarding the function would, in my case, be neither efficient nor elegant. Is there any way to achieve this?

3
  • You need at least to define the maximum i. Then I think it's easier/simpler to write your own binary search function. Commented Jun 28, 2019 at 14:50
  • Yes, the function will be bounded, e.g. 0<= i <= 10. I would like to know if there is some way to do this with numpy and rather not write my own binary-search for this problem. Commented Jun 28, 2019 at 14:53
  • Perhaps you can use numpy for the binary-search, but it is likely that you need your own class implementing an on-the-fly sequence implementing the collections.abc.Sequence interface. Commented Jun 28, 2019 at 14:56

2 Answers 2

2

What about something like:

import collections

class GeneratorSequence(collections.Sequence):
    def __init__(self, func, size):
        self._func = func
        self._len = size

    def __len__(self):
        return self._len

    def __getitem__(self, i):
        if 0 <= i < self._len:
            return self._func(i)
        else:
            raise IndexError

    def __iter__(self):
        for i in range(self._len):
            yield self[i]

This would work with np.searchsorted(), e.g.:

import numpy as np

gen_seq = GeneratorSequence(lambda x: x ** 2, 100)
np.searchsorted(gen_seq, 9)
# 3

You could also write your own binary search function, you do not really need NumPy in this case, and it can actually be beneficial:

def bin_search(seq, item):
    first = 0
    last = len(seq) - 1
    found = False
    while first <= last and not found:
        midpoint = (first + last) // 2
        if seq[midpoint] == item:
            first = midpoint
            found = True
        else:
            if item < seq[midpoint]:
                last = midpoint - 1
            else:
                first = midpoint + 1
    return first

Which gives identical results:

all(bin_search(gen_seq, i) == np.searchsorted(gen_seq, i) for i in range(100))
# True

Incidentally, this is also WAY faster:

gen_seq = GeneratorSequence(lambda x: x ** 2, 1000000)

%timeit np.searchsorted(gen_seq, 10000)
# 1 loop, best of 3: 1.23 s per loop
%timeit bin_search(gen_seq, 10000)
# 100000 loops, best of 3: 16.1 µs per loop
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! This was what I was looking for.
1

Inspired by @norok2 comment, I think you can use something like this:

def f(i):
    return i*2 # Just an example

class MySeq(Sequence):
    def __init__(self, f, maxi):
        self.maxi = maxi
        self.f = f
    def __getitem__(self, x):
        if x < 0 or x > self.maxi:
             raise IndexError()
        return self.f(x)
    def __len__(self):
        return self.maxi + 1

In this case f is your function while maxi is the maximum index. This of course only works if the function f return values in sorted order.
At this point you can use an object of type MySeq inside np.searchsorted.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.