2
bool binsearch(string phrase, vector<string> words, int from, int to, int &test)
{
    while (tf == "y") //tf is a global variable
    {
        int mid = (to+from)/2;
        if (words[mid] == phrase) {tf = "t"; return true;}
        if (mid == test) {tf = "f"; return false;}
        if (words[mid] > phrase) {return binsearch(phrase, words, mid-1, to, mid);}
        else {return binsearch(phrase, words, from, mid+1, mid);}
     }
}

i'm working on getting this binary search working. i need the overall function to return either "true" or "false". i understand how the recursion works up until either line 6 or 7 executes and the return command is invoked. i've done research, and it seems like there's no way to exit the function right there and it has to "unwind" itself. the tf global variable nonsense is so it won't execute that body again when it's unwinding...but i'm still not getting the results i want.

essentially, i just want to get out of the function ASAP after either the return true or return false command is invoked, and return the value to the main function accordingly

Thanks

3
  • 2
    I suspect you are doing this for learning purposes? Because otherwise you could just use std::binary_search. Is the signature fixed? It would be more elegant to use an iterator-based solution. Commented Apr 14, 2011 at 7:59
  • 3
    You don't need a loop here at all, hence you don't need the global variable either. Commented Apr 14, 2011 at 8:01
  • 1
    ... and a const reference to the vector instead of a copy. Commented Apr 14, 2011 at 8:02

7 Answers 7

4

I don't understand your binary search either, and using global variables in addition to recursion leads to programs which are very hard to understand. It's better to go back the call stack again and "unwind" it properly. Look at the following example (untested):

bool binsearch(const string& phrase, const vector<string> &words, int from, int to)
{
    if (from > to)
        return false; // word not found
    int mid = from + (to - from) / 2; // i think your calculation is wrong here
    int cmp = phrase.compare(words[mid]);
    if (cmp < 0)
        return binsearch(phrase, words, from, mid - 1);
    else if (cmp > 0)
        return binsearch(phrase, words, mid + 1, to);
    else
        return true; // word found
}
Sign up to request clarification or add additional context in comments.

3 Comments

thanks for your reply. a couple of Q: when will "from" ever be smaller than "to"? i forgot to mention, "from" is initially 0 and "to" is initially words.size().
In each step we reduce the list of words by a half and one (because of the mid element). So, at some point, the list might become empty. In that case, we can say for sure that the phrase is not part of the empty list. Empty lists are represented by from > to. Let's say words is {"a", "b", "d"} and you want to search for "c". At first, binsearch(0, 3) is called. Since "c" > words[1] = "b", binsearch(1 + 1, 3) is called. Now we have, "c" < words[2] = "d", so binsearch(2, 3-1) is called. At least we have "c" < words[2] = "d", and binsearch(2, 2-1) is called. (with 2 > 1). End. Element not found.
Ah, and I just noticed that you should call it with (0, words.size()-1) initially. :)
4

You can use the STL's built-in binary_search as follows:

binary_search(words.begin(),words.end(), phrase)

If you're doing this to learn; there are a few things...

  • You don't need a while loop. There are three cases to consider: the word comes before mid, at mid, or after mid. Each of these three cases returns - so it's impossible to even reach the end of the loop body.
  • You use test when exactly, and do you need this variable?
  • You should consider carefully exactly which range of indexes still needs to be searched. Are from and to inclusive or exclusive? You need to be precise and consistent.
  • Consider that division of positive integers rounds down. No matter what values they have, ensure that the recursive call calls a smaller range - to avoid infinite loops. This will help avoid the need for your test variable (see David's comment below).
  • It's not good practice to use global variables; certainly not in otherwise pure functions - I'm assuming you're doing this for debugging purposes?
  • How large can to and from be? In some cases, note that to+from may exceed 2^31-1.
  • It's typical in C++ to express these notions with iterators. You don't have to, of course.
  • It's typical in C++ to pass large objects by const & where possible - that way, the recursive call doesn't need to copy the entire vector. This is not important for correctness, but practically very important for efficient code.

4 Comments

thanks for the reply. i'm really a newbie to programming so i'm still digesting everything you've said. but a couple things: without the use of "test", how could i prevent the recursion from going on indefinitely? i'm taking "phrase" and trying to see if it can be found in vector<string> words. the "from" is initially 0 and the "to" is initially words.size(). if the "phrase" isn't in the vector, wouldn't the recursion go on indefinitely without the use of "test"?
also, i'm currently learning how to manually create a binary search function, so i can't use the built in binary_search
@pauliwago: That is what the point of ensuring that in each recursive call the problem is strictly smaller than in the previous call. You need to figure out how to partition the space, but that depends on other decisions, like whether the to and from are inclusive or exclusive (usually to would be exclusive and from inclusive), and that you stop when the size of the problem is 0. The combination of those two: strictly decreasing, stop when empty; is a guarantee of termination.
What David says is fundamental: recursive functions should call themselves only with smaller problems and they should stop when empty. If you do that, then you can never get an infinite recursion (aka stackoverflow); and that means you only need to look at the correctness of one execution of the function - which is much easier than nebulously reasoning about a potentially long call chain. Ignore the call-chain, make sure the function is correct for one run, and make sure that it eventually terminates - and you're home free.
1

Pass vector<string> words as reference in your binsearch() function. Presently it keeps creating copies of vector<string> whenever the function is called which is not needed. Moreover in future if you want to update that vector<>, then passing by reference is the best way.

There should be return statement outside the while loop. That will be the final 'return`.

1 Comment

THANK YOU for the vector<string> &words advice! I could not understand why my code was so slow, this fixed the speed issue
1

One of the classical way to get rid of this : rewrite it without recursion.

For example, use a while loop, then as soon as you find the result, use a break to go out. You can have a look at following code (not compiled, just written quickly from your own code)

bool binsearch(const string& phrase, const vector<string> &words, int from, int to)
{
    bool retVal = false;
    int start = from;
    int end = to;


    while(start<end) {
       int mid = from + (to - from) / 2; // i think your calculation is wrong here
       int cmp = phrase.compare(words[mid]);
       if (cmp < 0) {
           end=mid-1;
       } else if (cmp > 0) {
           start=mid+1;
       } else {
           retVal = true;
           break;
       }
    }
    return retVal;
}

There is no elegant or portable way to jump out of a full call stack, it's at best fairly risky. Moreover, the derecursified function will be much quicker : it does not need to push stuff on stack and do a a function call

Edit

  • added missing return
  • concerning performances : just benchmark it. In this particular case, code complexity (for the human reader) is almost the same, but depending on the algo, it can be much more complex (or even impossible).

1 Comment

The performance might not be that different. Binary search can be implemented as a tail-recursive function, and that might in turn be converted by the compiler into a loop. In this particular case, turning it into non-recursive does not really complicate the code so either way is fine (recursive/iterative), but in other problems while conversion to iterative can be done, it makes reasoning about the code harder. As a side note, you are missing a return false at the end of the non-recursive function.
1

There are several things wrong with your code. For starters, it's not clear what to and from mean: are they inclusive, or exclusive. And if they're both inclusive (which your arguments to the recursive calls seems to suggest), how do you detect the end. And what does test mean? You seem to be using it as an end criterion when you don't find the word, but I don't see how.

If I were writing this, I'd use a simple helper class to hold the target and the word list (but you can just propagate them down explicitly), and a wrapper function so that the client code doesn't have to specify the to and from arguments. There's no need for a global variable, or any additional test. And I'd use the half open intervals that are idiomatic in C++: lower bound inclusive, upper bound exclusive (so top == bottom specifies an empty range, so I've finished without finding the element):

bool 
binSearch( std::string const& target,
           std::vector<std::string> const& words );

namespace {

    class BinSearcher
    {
        std::vector<std::string> const& myWords;
        std::string const& myTarget;

        bool doSearch( int bottom, int top ) const
        {
            int mid = (top - bottom) / 2;
            return mid != top
                && (myTarget == myWords[mid]
                    || (myTarget > myWords[mid] && doSearch( mid + 1, top ))
                    || (myTarget < myWords[mid] && doSearch( bottom, mid ));
        }
        BinSearcher( std::string const& target,
                     std::vector<std::string> const& words )
            : myWords( words )
            , myTarget( target )
        {
        }
        friend bool binSearch( std::string const&,
                               std::vector<std::string> const& );
    };
}

bool 
binSearch( std::string const& target,
           std::vector<std::string> const& words )
{
    BinSearcher searcher( target, words );
    return searcher.doSearch( 0, words.size() );
}

Note that you can't do the comparison before testing that the range isn't equal; it will cause an out of bounds access if all of the elements are less than the target.

Also: I presume that you're doing this for pedagogical reasons. Otherwise, you should just use the function in the standard library. And I wouldn't normally use recursion here: there's a straightforward iterative solution:

namespace {

    bool
    doBinSearch( std::string const& target,
                 std::vector<std::string> const& words,
                 int bottom,
                 int top )
    {
        bool found = false;
        while ( bottom != top && ! found ) {
            int mid = (top - bottom) / 2;
            int cmp = target.compare( words[mid] );
            if ( cmp < 0 ) {
                top = mid ;
            } else if ( 0 < cmp ) {
                bottom = mid + 1;
            } else {
                found = true;
            }
        }
        return found;
    }
}

bool
binSearch( std::string const& target,
           std::vector<std::string> const& words )
{
    return doBinSearch( target, words, 0, words.size() );
}

(Finally, you'll notice that I've converted your parameters to references. It doesn't change anything in the logic of the code, but if words is relatively large, it will make a very significant impact on the performance.)

Comments

0

You can use a longjmp, aka a "non-local goto", to exit the inner recursion immediately, but the question is whether this micro-optimization is worth the trouble.

A better option is to change the recursion into a loop. Since all the recursive calls are "in tail position" (are the argument to return), you can replace them with code that resets the parameter variables. Unfortunately, I don't understand your code, so I can't give you an example.

4 Comments

How would such a trick affect the stack? The stack builds up for each recursion. Would a jump leave the stack in a bad state?
@daramarak: longjmp restores the stack to more or less the state it was in when setjmp was called. I'm not sure how it interacts with RAII-style objects, though; my guess is "poorly", which is yet another reason for recommending the other options to the OP.
Avoid doing this! This is more often than not a source of problems and maintenance burden --execution flow becomes harder to reason. And in the particular problem of a binary search, which is an exercise on recursion, it will have funny side effects (where are you going to store the result of the search when you bail out from the inner recursive call to the outer one? a global? another bad idea.
@dribeas: I do recommand unfolding the recursion over doing a longjmp. Mentioning that was only intended to answer the OP's question of how to jump out of a recursive function.
0

This is a classic exercise in using recursion - sure, one can also do things nonrecursively, but it's very elegant to "let the recursion manage one's bookkeeping." For those whose knee-jerk reaction is "do it iteratively", I suggest doing the analogous exercise on a merge-sort or quicksort. Very similar recursive structure, but the bookkeeping there is greatly eased in a recursive context. And on modern CPUs the recursive code - surprise! - often runs as fast or faster, to boot.

Here is my recursive implementation, using the OP's problem context. Note there is no need to separately test the midpoint element: Within the context of the C++ "less than" paradigm for comparison predicates (where given <, one can infer equality via .not.(a.lt.b) .and. .not.(b.lt.a) ), an extra test for equality of the midpoint makes little sense, although in the special case of the string class with its many-valued compare result, it may yield a modest speedup to add special handling for the 0-return result. My sample version assumes only < (in the sense that if one really only had <, one would slightly rearrange the divide-and-conquer conditional to use that), which generalizes more easily to numeric and user-defined data types:

bool binsearch(const string&phrase, vector<string>&words, int from, int to)
{
    if(from > to) return false;    // range sanity check
    if(from == to) return (phrase.compare(words[to]) == 0);    // bottom of the recursion
    int mid = from + (to-from)/2;        // Range still has >= 2 elements; set up to recurse
    if(phrase.compare(words[mid]) <= 0)  // Recurse into the subinterval bracketing the target
        return binsearch(phrase,words, from,mid);
    else
        return binsearch(phrase,words, mid+1,to);
}

And here is a non-recursive version of the above, which corrects several problems with the sample code posted by user 'Bruce' and again uses no separate midpoint-value test:

bool binsearch_nr(const string& phrase, const vector<string> &words, int from, int to)
{
    if(from > to) return false;    // range sanity check
    while(from < to) {
        int mid = from + (to-from)/2;
        int cmp = phrase.compare(words[mid]);
        if (cmp <= 0)
            to = mid;
        else
            from = mid+1;
    }
    return (phrase.compare(words[to]) == 0);
}

I did a comparative timing of the above 2 implementations using a vector of 1 million quasi-random text snippets, code built using gcc 4.2 on MacOS ... the recursive version runs ~20% slower. For my hand-rolled merge-sort code, though, recursive is faster. YMMV.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.