0

The Minimal Example that reproduces the issue is as follows:

#include <iostream>
#include <string>
#include <list>
#include <iterator>
#include <vector>
#include <cctype>

class Tokenizer{
public:
    void TokenizeLine(const std::string& CurrLine)
    {
        std::list<std::string> sList;
        std::size_t bPos = 0;
        char c = 0;
        int nTokens = 0;

        while(bPos < CurrLine.size())
        {
            c = CurrLine[bPos];
            if (isspace(c))
            {
                bPos++;
                continue;
            }
            else
                sList.push_back(this->ReadSimpleToken(bPos, CurrLine));
        }

        this->_Tokens.clear();
        this->_Tokens.reserve(nTokens);
        std::copy(sList.begin(), sList.end(), std::back_inserter(_Tokens));
    }

    std::string ReadSimpleToken(std::size_t& start, const std::string& line)
    {
        std::size_t startpos = start;
        std::size_t currpos = start;
        while(currpos < line.size() && !isspace(line[currpos]))
            currpos++;
        return line.substr(startpos, currpos - startpos);
    }

    int GetNumberOfTokens()
    {
        return (int) _Tokens.size();
    }

private:
    std::vector<std::string> _Tokens;
};

int main() {
    // Write C++ code here
    std::string test{"staticstructural nsteps 1 nmodules 1"};
    Tokenizer t;
    t.TokenizeLine(test);
    std::cout << "Output: " << t.GetNumberOfTokens() << std::endl;

    return 0;
}

The above code seems to compile, but doesn't give the desired output.

However, when I replace the method ReadSimpleToken() with the below implementation:

    std::string Tokenizer::ReadSimpleToken(std::size_t& start, const std::string& line)
    {
        std::size_t startpos = start;
        // std::size_t currpos = start;
        while(start < line.size() && !isspace(line[start]))
            start++;
        return line.substr(startpos, start - startpos);
    }

It works perfectly fine.

Can anyone explain to me what is wrong with the first implementation? In the first implementation, I am updating currpos, until it reaches either the end of the string or the character at currpos is a space char. This should be same if I am assigning currpos to be start or if I am doing the same for start itself. The only difference between both implementations is that the first one uses currpos and second one does the same but with only start.

I tried this with g++ and whatever compiler progamiz uses. The link is here:

https://www.programiz.com/online-compiler/2Wvrf5F6I3Spc

8
  • 5
    The first version does not change start. Commented May 19 at 19:21
  • 1
    The above code seems to compile, but doesn't give the desired output -- A program that compiles successfully only means there are no syntax errors. It doesn't mean that the program will run successfully, due to logical bugs. What if you were asked to write a program to add two numbers, but instead you make a mistake and subtracted the two numbers? The program will still compile successfully, but when running the program, the results are incorrect. Commented May 20 at 2:47
  • 3
    Also, this really is not a "curious issue" -- this is how C++ works. See this simple example. Commented May 20 at 3:02
  • what is the desired output? Commented May 20 at 18:22
  • What makes you call this issue "curious"? Your code works if you modify start, and does not work if you do not (because you modify a copy instead). Even without looking at the role of start, I do not find a change in behavior unusual. Why do you think the behavior should be the same? (Explain your understanding of how your code should work.) Commented May 20 at 23:27

2 Answers 2

0

Both versions of the

std::string ReadSimpleToken(std::size_t& start, const std::string& line)

function pass start by reference.

In the first implementation the value of the start variable is copied to a different variable with std::size_t currpos = start;

Then currpos is incremented in the function with currpos++; however start never changes in that function.

The reason why this is important is the following part of Tokenizer::TokenizeLine requires that the bPos variable to advance in the call to ReadSimpleToken(bPos, CurrLine):

        while(bPos < CurrLine.size())
        {
            c = CurrLine[bPos];
            if (isspace(c))
            {
                bPos++;
                continue;
            }
            else
                sList.push_back(this->ReadSimpleToken(bPos, CurrLine));
        }

If bPos didn't advance in this call the above code becomes an infinite loop for all cases where CurrLine contains any character that is not a space.

The second implementation of std::string ReadSimpleToken(std::size_t& start, const std::string& line) doesn't make this mistake of copying the value of the start. Instead it increments it correctly.

Sign up to request clarification or add additional context in comments.

Comments

-1

The loop inside Tokenizer::TokenizeLine does not advance the bPos if the current character is not a space. Because bPos is passed by reference to Tokenizer::ReadSimpleToken it could be changed by that method.

The first implementation of Tokenizer::ReadSimpleToken does not do this however, so the loop inside Tokenizer::TokenizeLine should never terminate if CurrLine contains a non-space character.

In the second implementation the line start++ ends up incrementing the value of bPos inside Tokenizer::TokenizeLine due to start being passed by reference and not by value.

3 Comments

start is passed by reference in both implementations. In the first implementation the variable is copied to a different variable with std::size_t currpos = start; that is incremented in the function with currpos++; however start never changes in that function making Tokenizer::TokenizeLine contain an infinite loop in the case that the data has a character that is not a space.
The comment @drescherjm answers the question I had. Thank you so much everyone for contributing. Accepting this answer. Please edit the answer to include the above comment.
@lokitkhemka don't accept the answer when you see the answer in a comment but not in the answer itself. Chances are the author will never come back to edit. You don't have to accept an answer immediately, you can wait for one that you deem acceptable. You could even write an answer yourself (and then accept it if no other is as good)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.