Binary-search without an explicit array

Question

I want to perform a binary-search using e.g. np.searchsorted, however, I do not want to create an explicit array containing values. Instead, I want to define a function giving the value to be expected at the desired position of the array, e.g. p(i) = i, where i denotes the position within the array.

Generating an array of values regarding the function would, in my case, be neither efficient nor elegant. Is there any way to achieve this?

You need at least to define the maximum i. Then I think it's easier/simpler to write your own binary search function. — marco romelli
– marco romelli, Commented Jun 28, 2019 at 14:50
Yes, the function will be bounded, e.g. 0<= i <= 10. I would like to know if there is some way to do this with numpy and rather not write my own binary-search for this problem. — St123
– St123, Commented Jun 28, 2019 at 14:53
Perhaps you can use numpy for the binary-search, but it is likely that you need your own class implementing an on-the-fly sequence implementing the collections.abc.Sequence interface. — norok2
– norok2, Commented Jun 28, 2019 at 14:56

norok2 · Accepted Answer · 2019-06-28 17:20:31Z

What about something like:

import collections

class GeneratorSequence(collections.Sequence):
    def __init__(self, func, size):
        self._func = func
        self._len = size

    def __len__(self):
        return self._len

    def __getitem__(self, i):
        if 0 <= i < self._len:
            return self._func(i)
        else:
            raise IndexError

    def __iter__(self):
        for i in range(self._len):
            yield self[i]

This would work with np.searchsorted(), e.g.:

import numpy as np

gen_seq = GeneratorSequence(lambda x: x ** 2, 100)
np.searchsorted(gen_seq, 9)
# 3

You could also write your own binary search function, you do not really need NumPy in this case, and it can actually be beneficial:

def bin_search(seq, item):
    first = 0
    last = len(seq) - 1
    found = False
    while first <= last and not found:
        midpoint = (first + last) // 2
        if seq[midpoint] == item:
            first = midpoint
            found = True
        else:
            if item < seq[midpoint]:
                last = midpoint - 1
            else:
                first = midpoint + 1
    return first

Which gives identical results:

all(bin_search(gen_seq, i) == np.searchsorted(gen_seq, i) for i in range(100))
# True

Incidentally, this is also WAY faster:

gen_seq = GeneratorSequence(lambda x: x ** 2, 1000000)

%timeit np.searchsorted(gen_seq, 10000)
# 1 loop, best of 3: 1.23 s per loop
%timeit bin_search(gen_seq, 10000)
# 100000 loops, best of 3: 16.1 µs per loop

marco romelli · Accepted Answer · 2019-06-28 15:38:36Z

1

Inspired by @norok2 comment, I think you can use something like this:

def f(i):
    return i*2 # Just an example

class MySeq(Sequence):
    def __init__(self, f, maxi):
        self.maxi = maxi
        self.f = f
    def __getitem__(self, x):
        if x < 0 or x > self.maxi:
             raise IndexError()
        return self.f(x)
    def __len__(self):
        return self.maxi + 1

In this case f is your function while maxi is the maximum index. This of course only works if the function f return values in sorted order.
At this point you can use an object of type MySeq inside np.searchsorted.

edited Jun 28, 2019 at 15:38

answered Jun 28, 2019 at 15:16

marco romelli

1,1839 silver badges21 bronze badges

Collectives™ on Stack Overflow

Binary-search without an explicit array

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related