Anyone knows why the C++ standard library’ std::string class (or more generally the std::basic_string class template) lacks ordinary character string functionality such as uppercasing, substring replacement and trimming, etc., compared to e.g., QString class from Qt, or Python strings?
3 Answers
I'm going to take a stab at this...
The template std::basic_string was originally written and included in the STL, which represents the "abstract" parts of the Standard Library as we know it (containers, iterators, algorithms, allocators, etc.). This also included std::string.
Notice how there is absolutely no encoding, internationalization or locale dependent functionality in the STL. It wasn't a design goal.
Now my take on the happenings of previous generations: When C++ was standardized, there was need for a comprehensive Standard Library. The STL was a very good fit for this, and was taken over almost verbatim. Only later was stuff like <iostream> and <locale> added. The clunky and very incoherent interface differences between streams and strings only prove this "let's throw it all together" attitude.
As with many std facilities, interoperation between the components wasn't optimized. On top of that, the simplicity of a small C++ function wrapping existing C functionality (like toupper) have been used as a reason not to include this in the Standard Library.
By a next revision of the Standard (and thus the Library it comprises), backward compatibility prevented any useful and necessary changes (injecting locale into std::string functionality) from being added.
Note that this conjecture does not at all explain why for example a std::trim taking a string and locale object wasn't added. It does kind of attempt to explain the background process involved.
Now that all has been said, I wholly agree the C++ Standard Library is clunky and incomplete in its general usefulness.
UPDATE: I've been informed that my timeline is reversed: the Standard Library (and iostream) existed before the STL was added. The point above is still valid though: STL was copy-pasted, with little to no integration (simple example: the missing until recently std::basic_istream<T>::open(const std::basic_string<T>&), which will be deprecated in the next iteration due to std::filesystem stuffs).
Comments
Can't answer about all missing features in the most general sense, but…
The two features mentioned, trimming and uppercasing, are locale-dependent. They aren't only functions of characters, but also the encoding and language being used.
std::string doesn't really handle that. Although in practice, everyone uses Unicode with whitespace as defined by ASCII, that's not general enough for the kind of standardization process that defines C++.
Such operations are obtained by streams (for example, read out of a std::stringstream to strip excessive space) and locale objects (for example, accessed through std::tolower).
9 Comments
toupper) on an entire collection is more difficult to define (because some characters may be pure ASCII and some Cyrillic in a single string -- I haven't thought through this bit completely yet) than it is for single characters (where you can put a check before conversion and act accordingly).toupper semantics is difficult. I'd say the same can be applied to trimming -- punctuation (do you consider an apostrophe in elision in French l'arbre part of the string?) is difficult to codify without going diving into an even deeper problem than what the scope of the language library standard may define.std::string::toupper function taking a std::locale with a default argument. There is a certain level of poor interface design, especially around char_traits. But as for features being missing, the library as a whole does plenty if you look outside std::string proper (and there's nothing wrong with that).notdef characters or put a default value)? Any ideas?std::string itself. and so, clearly, that criterion is not any explanation of how std::string can be missing e.g. trimming and splitting (not to mention simple uppercasing).Poor functionality? Is considered to be one of the bloated components in the Standard Library. You have an entire set of algorithms that operate on std::strings, all the standard algorithms. Don't restrict yourself to member functions, there is much more in an interface than that...
11 Comments
std::string (or more generally std::basic_string) has extremely poor functionality as a string. try simple things like uppercasing a string, or replacing all occurrences of a substring with another. even more basic, try trimming a string. the C++ std::string is good for one thing only, and that's as a standard way to pass strings around, even if it's needlessly inefficient and restricted for that purpose.size and length for a very simple example. This was the result of combining a standalone string library with the STL.basic_string. For instance, for_each(s.begin(), s.end(), toupper) uppercases the entire string.std::string class.std::string doesn't use any particular encoding. Once you step outside of ASCII, you're not asking for std::string anymore; you're asking for a std::utf8_string, which is a completely different issue.
ctype::toupper(particularly, the second overload which takes a range) and 2) an email by James Kanze on concerns with uppercasing strings.