Take the below piece of code, that simply Trims a string, removing whitespace characters from either end:
const std::string TrimString(const std::string& s)
{
const auto iter = std::find_if(s.cbegin(), s.cend(), [](auto c) -> bool { return !std::isspace(static_cast<int32_t>(c)); });
return iter != s.end() ?
std::string(iter, std::find_if(s.crbegin(), s.crend(), [](auto c) -> bool { return !std::isspace(static_cast<int32_t>(c)); }).base()) :
std::string();
}
//Usage
std::vector<uint8_t> d{ 0xc5, 0xbc }; // example UTF-8 character
std::string uft8(d.begin(), d.end());
std::string trimmed = TrimString(utf);
The above code if you run this on MSVC (17.14.19) will actually crash, but on Linux using GCC (14.2.0) it will work perfectly.
Now I know WHY it crashes and it's easy enough to fix, but what I'm looking is trying to understand this difference and even, what the standard says about this.
The reason for the crash is that on MSVC, std::isspace takes an int and that it must be in the range of -1 -> 255 (according to the runtime crash dialog). But then, why does this work on GCC?
Obviously, this has to do with the auto as the parameter of the lambda. In MSVC, the auto parameter of the lambda is probably a char so each byte is being sign-extended and that's what causes the crash (as it ends up as a negative value). What I'm not sure about is what is happening in the case of GCC. Surely, this would also be doing something similar? Is std::isspace less picky on Linux?
As I said it's an easy fix, but am looking for more understanding of the difference between MSVC and GCC in this regard.
std::isspacedocumentation: "The behavior is undefined if the value ofchis not representable asunsigned charand is not equal toEOF". Your program takes 0xC5, converts it tochar(when constructing thestd::string), which likely turns it into a negative value. Then that negative value is converted toint- still negative. Then you pass that negative value tostd::isspace, whereupon your program exhibits undefined behavior. "Seems to work" is one possible manifestation of undefined behavior; "crash" is another.autoshould becharin this example, with both implementations. You can confirm this by printingsizeof(c), ortypeid(c).name(), or addingstatic_assert(std::is_same_v<decltype(c), char>)autoto some type that is different to what will be deduced from its initializer it will have different behaviour - of course. But that's not the fault ofauto-autojust does what it is specified to do; take on the type of what it is initialised with.