Behavior is different when using the input "reference" variable, compared to when the same reference is copied

Question

The Minimal Example that reproduces the issue is as follows:

#include <iostream>
#include <string>
#include <list>
#include <iterator>
#include <vector>
#include <cctype>

class Tokenizer{
public:
    void TokenizeLine(const std::string& CurrLine)
    {
        std::list<std::string> sList;
        std::size_t bPos = 0;
        char c = 0;
        int nTokens = 0;

        while(bPos < CurrLine.size())
        {
            c = CurrLine[bPos];
            if (isspace(c))
            {
                bPos++;
                continue;
            }
            else
                sList.push_back(this->ReadSimpleToken(bPos, CurrLine));
        }

        this->_Tokens.clear();
        this->_Tokens.reserve(nTokens);
        std::copy(sList.begin(), sList.end(), std::back_inserter(_Tokens));
    }

    std::string ReadSimpleToken(std::size_t& start, const std::string& line)
    {
        std::size_t startpos = start;
        std::size_t currpos = start;
        while(currpos < line.size() && !isspace(line[currpos]))
            currpos++;
        return line.substr(startpos, currpos - startpos);
    }

    int GetNumberOfTokens()
    {
        return (int) _Tokens.size();
    }

private:
    std::vector<std::string> _Tokens;
};

int main() {
    // Write C++ code here
    std::string test{"staticstructural nsteps 1 nmodules 1"};
    Tokenizer t;
    t.TokenizeLine(test);
    std::cout << "Output: " << t.GetNumberOfTokens() << std::endl;

    return 0;
}

The above code seems to compile, but doesn't give the desired output.

However, when I replace the method ReadSimpleToken() with the below implementation:

    std::string Tokenizer::ReadSimpleToken(std::size_t& start, const std::string& line)
    {
        std::size_t startpos = start;
        // std::size_t currpos = start;
        while(start < line.size() && !isspace(line[start]))
            start++;
        return line.substr(startpos, start - startpos);
    }

It works perfectly fine.

Can anyone explain to me what is wrong with the first implementation? In the first implementation, I am updating currpos, until it reaches either the end of the string or the character at currpos is a space char. This should be same if I am assigning currpos to be start or if I am doing the same for start itself. The only difference between both implementations is that the first one uses currpos and second one does the same but with only start.

I tried this with g++ and whatever compiler progamiz uses. The link is here:

https://www.programiz.com/online-compiler/2Wvrf5F6I3Spc

The above code seems to compile, but doesn't give the desired output -- A program that compiles successfully only means there are no syntax errors. It doesn't mean that the program will run successfully, due to logical bugs. What if you were asked to write a program to add two numbers, but instead you make a mistake and subtracted the two numbers? The program will still compile successfully, but when running the program, the results are incorrect. — PaulMcKenzie
– PaulMcKenzie, Commented May 20 at 2:47
Also, this really is not a "curious issue" -- this is how C++ works. See this simple example. — PaulMcKenzie
– PaulMcKenzie, Commented May 20 at 3:02
What makes you call this issue "curious"? Your code works if you modify start, and does not work if you do not (because you modify a copy instead). Even without looking at the role of start, I do not find a change in behavior unusual. Why do you think the behavior should be the same? (Explain your understanding of how your code should work.) — JaMiT
– JaMiT, Commented May 20 at 23:27

drescherjm · Accepted Answer · 2025-05-21 14:53:39Z

Both versions of the

std::string ReadSimpleToken(std::size_t& start, const std::string& line)

function pass start by reference.

In the first implementation the value of the start variable is copied to a different variable with std::size_t currpos = start;

Then currpos is incremented in the function with currpos++; however start never changes in that function.

The reason why this is important is the following part of Tokenizer::TokenizeLine requires that the bPos variable to advance in the call to ReadSimpleToken(bPos, CurrLine):

        while(bPos < CurrLine.size())
        {
            c = CurrLine[bPos];
            if (isspace(c))
            {
                bPos++;
                continue;
            }
            else
                sList.push_back(this->ReadSimpleToken(bPos, CurrLine));
        }

If bPos didn't advance in this call the above code becomes an infinite loop for all cases where CurrLine contains any character that is not a space.

The second implementation of std::string ReadSimpleToken(std::size_t& start, const std::string& line) doesn't make this mistake of copying the value of the start. Instead it increments it correctly.

Paweł Tarbaj · Accepted Answer · 2025-05-19 19:44:55Z

-1

The loop inside Tokenizer::TokenizeLine does not advance the bPos if the current character is not a space. Because bPos is passed by reference to Tokenizer::ReadSimpleToken it could be changed by that method.

The first implementation of Tokenizer::ReadSimpleToken does not do this however, so the loop inside Tokenizer::TokenizeLine should never terminate if CurrLine contains a non-space character.

In the second implementation the line start++ ends up incrementing the value of bPos inside Tokenizer::TokenizeLine due to start being passed by reference and not by value.

answered May 19 at 19:44

Paweł Tarbaj

16

3 Comments

drescherjm May 19 at 19:49

start is passed by reference in both implementations. In the first implementation the variable is copied to a different variable with std::size_t currpos = start; that is incremented in the function with currpos++; however start never changes in that function making Tokenizer::TokenizeLine contain an infinite loop in the case that the data has a character that is not a space.

user30582213 May 21 at 8:31

The comment @drescherjm answers the question I had. Thank you so much everyone for contributing. Accepting this answer. Please edit the answer to include the above comment.

463035818_is_not_an_ai May 21 at 12:04

@lokitkhemka don't accept the answer when you see the answer in a comment but not in the answer itself. Chances are the author will never come back to edit. You don't have to accept an answer immediately, you can wait for one that you deem acceptable. You could even write an answer yourself (and then accept it if no other is as good)

Collectives™ on Stack Overflow

Behavior is different when using the input "reference" variable, compared to when the same reference is copied

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related