Could this recursive binary search algorithm be more efficient?

Question

I've seen a lot of different implementations of this algorithm, but I'm wondering if there are ways to improve the efficiency beyond just making the search binary. I've designed this particular version of the algorithm so the edges and midpoint of an array/list will be checked immediately for the key being searched for, as to avoid looping through a search when the key your looking for is just the first, middle, or last element.

def searchRB(the_array, the_key, imin, imax):
    print("searching")
    found = False
    if (0 > the_key or the_key > len(the_array)):
        return found
    else:
        imid = imin + ((imax - imin) // 2)
        if imid == the_key or imin == the_key or imax == the_key:
            found = True
            return found
        elif the_array[imid] > the_key:
            return searchRB(the_array, the_key, imin, imid-1)
        elif the_array[imid] < the_key:
            return searchRB(the_array, the_key, imid+1, imax)
        else:
            return found

For example, if your were looking for the number 1 in a list of 1-100, this would find it on the first loop, unlike some other implementations.

However, I'm not sure if this actually improves efficiency at all (except for certain edge cases), and if checking the first, mid, and end values in the list/array is actually detrimental once you continue to loop and have to check those three values every time.

Is this is good or bad implementation of this type of algorithm, or am I just splitting hairs?

how would you start the search? what are the values for the_key, imin, imax? — LetzerWille
– LetzerWille, Commented Oct 3, 2015 at 3:09
the_key = the number you're searching for, imin = the min value in the range your searching, imax = the max value in the range your searching. — 123
– 123, Commented Oct 3, 2015 at 3:10
Because an iterative implementation avoids the overhead of more function calls, which in Python aren't blazingly fast. I think this is splitting hairs, yes: asymptotically (as size of the_array gets large), recognizing the endpoints quickly doesn't benefit. The number of possible endpoints is quite small compared to the number of elements, so you waste extra time on the average element checking for a rare case. Also, the found variable isn't useful: just return False, True and False (respectively) where you now return found, and delete the two lines found = <bool-const>. — BrianO
– BrianO, Commented Oct 3, 2015 at 3:13

tt_Gantz · Accepted Answer · 2015-10-03 04:15:38Z

2

The main big one is changing from recursive approach to using a while loop, saves having a call stack (since python doesn't have tail recursion).

You have small redundancies that can be optimised. Algorithm already optimised enough, don't over optimise unless you understand the compiler

If you're going down the tree on the left you'll be comparing the same imin over and over, however this whole line might be parallelised or done sequentially

if the_array[imid] == the_key or the_array[min] == the_key or the_array[imax] == the_key:

Also this could mess with cache performance as you will be have the_array[min] always being kept in cache. Sometimes compilers store a chunk from an array around the index you're trying to access in cache. You might be wasting even more cache than just for that 1 value.

Also statements like this could be optimized , you could just type return True , but again this should be picked up by the compiler.

found = True return found

Not having found as an object would optimise the code because that object wouldn't be stored in memory the whole time.

This else statement seems redundant as there's no possible way to get to that else else return found

Actual relevant optimisations will come from knowing more about the dataset.

If you are able to preprocess the data (or have more information about the data) you can do other algorithms.

edited Oct 3, 2015 at 4:15

answered Oct 3, 2015 at 3:35

tt_Gantz

2,9665 gold badges26 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

123 Over a year ago

One thing I noticed here is that searching the min, max values for equivalence to the key doesn't work very well. Like you said, the way I was doing it just compares the same min and max values every time it loops (which is useless), but if you use the_key == the_array[imin] or the_key == the_array[imax], the index goes out of range. I think it's best to just drop it completely, and compare the key to the midpoint only.

tt_Gantz Over a year ago

Yea just compare the midpoint and also I think your logic for when to break is a bit broken, it should be comparing mid with 0 and the length of the array to determine if the index is within bound

123 Over a year ago

Should define imid at the beginning of the function and that statement should be if 0 > imid or imid > len(the_array): like you said.

Collectives™ on Stack Overflow

Could this recursive binary search algorithm be more efficient?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related