0

I've seen a lot of different implementations of this algorithm, but I'm wondering if there are ways to improve the efficiency beyond just making the search binary. I've designed this particular version of the algorithm so the edges and midpoint of an array/list will be checked immediately for the key being searched for, as to avoid looping through a search when the key your looking for is just the first, middle, or last element.

def searchRB(the_array, the_key, imin, imax):
    print("searching")
    found = False
    if (0 > the_key or the_key > len(the_array)):
        return found
    else:
        imid = imin + ((imax - imin) // 2)
        if imid == the_key or imin == the_key or imax == the_key:
            found = True
            return found
        elif the_array[imid] > the_key:
            return searchRB(the_array, the_key, imin, imid-1)
        elif the_array[imid] < the_key:
            return searchRB(the_array, the_key, imid+1, imax)
        else:
            return found

For example, if your were looking for the number 1 in a list of 1-100, this would find it on the first loop, unlike some other implementations.

However, I'm not sure if this actually improves efficiency at all (except for certain edge cases), and if checking the first, mid, and end values in the list/array is actually detrimental once you continue to loop and have to check those three values every time.

Is this is good or bad implementation of this type of algorithm, or am I just splitting hairs?

13
  • 1
    Iterative binary search will be more efficient. Commented Oct 3, 2015 at 2:57
  • Interesting, could you elaborate a little on why that is? Commented Oct 3, 2015 at 3:01
  • how would you start the search? what are the values for the_key, imin, imax? Commented Oct 3, 2015 at 3:09
  • the_key = the number you're searching for, imin = the min value in the range your searching, imax = the max value in the range your searching. Commented Oct 3, 2015 at 3:10
  • 2
    Because an iterative implementation avoids the overhead of more function calls, which in Python aren't blazingly fast. I think this is splitting hairs, yes: asymptotically (as size of the_array gets large), recognizing the endpoints quickly doesn't benefit. The number of possible endpoints is quite small compared to the number of elements, so you waste extra time on the average element checking for a rare case. Also, the found variable isn't useful: just return False, True and False (respectively) where you now return found, and delete the two lines found = <bool-const>. Commented Oct 3, 2015 at 3:13

1 Answer 1

2

The main big one is changing from recursive approach to using a while loop, saves having a call stack (since python doesn't have tail recursion).

You have small redundancies that can be optimised. Algorithm already optimised enough, don't over optimise unless you understand the compiler

If you're going down the tree on the left you'll be comparing the same imin over and over, however this whole line might be parallelised or done sequentially

if the_array[imid] == the_key or the_array[min] == the_key or the_array[imax] == the_key:

Also this could mess with cache performance as you will be have the_array[min] always being kept in cache. Sometimes compilers store a chunk from an array around the index you're trying to access in cache. You might be wasting even more cache than just for that 1 value.

Also statements like this could be optimized , you could just type return True , but again this should be picked up by the compiler.

found = True return found

Not having found as an object would optimise the code because that object wouldn't be stored in memory the whole time.

This else statement seems redundant as there's no possible way to get to that else else return found

Actual relevant optimisations will come from knowing more about the dataset.

If you are able to preprocess the data (or have more information about the data) you can do other algorithms.

Sign up to request clarification or add additional context in comments.

3 Comments

One thing I noticed here is that searching the min, max values for equivalence to the key doesn't work very well. Like you said, the way I was doing it just compares the same min and max values every time it loops (which is useless), but if you use the_key == the_array[imin] or the_key == the_array[imax], the index goes out of range. I think it's best to just drop it completely, and compare the key to the midpoint only.
Yea just compare the midpoint and also I think your logic for when to break is a bit broken, it should be comparing mid with 0 and the length of the array to determine if the index is within bound
Should define imid at the beginning of the function and that statement should be if 0 > imid or imid > len(the_array): like you said.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.