5

If I have a long string:

std::string data("This is a string long enough so that it is not a short string");

And then I have a view onto that string:

std::string_view dataView(std::begin(data) + 5, std::end(data) - 5);

If I move the original string:

std::string movedData(std::move(data));

Then I would expect the view dataView to remain valid.

But, this assumption does not hold if the std::string short string optimization takes effect, as the underlying memory of the string is not dynamically allocated, and the move now (underneath the hood) becomes a destructive copy operation, leaving the view invalid.

Is there a way to detect SSO (so I can take appropriate action in my class move constructor)? And does the standard make any reference to SSO?

Context

I have a class that holds a URL as a string, and then access to each of the parts of the URL is held as views into the original URL. Of course, calculating the views has a cost, but it's better to calculate it once than on each access (or that was my thought process). For a copy, you have recalculated the views, but a move (I thought) does not need to recalculate the views, as the underlying storage will be moved and thus the views will still be valid.

 class URL
 {
     std::string      url;
     std::string_view schema;
     std::string_view host;
     std::string_view path;
     // .. etc (for the multiple parts of a URL you can extract).
     // Note: Parsing a URL correctly is non-trivial (handling IPV6, etc.).
     //       So I don't want to do it that often.

     public:
         // Default constructor.
         URL() {}
         // Normal constructor: Accept input by copy/move
         URL(std::string urlInput)
             : url(std::move(urlInput))
         {
             // Compute Views.
         }
         // Copy constructor.
         URL(URL const& copy)
             : url(copy.url)
         {
             // Compute Views.
         }
         // Move constructor
         // I hoped I could simply swap the two objects.
         // This works if there is no short string optimization.
         URL(URL&& move) noexcept
         {
             swap(move);
         }
         // Assignment (both copy and move in one place).
         // Use standard copy and swap idium.
         URL operator=(URL assign) noexcept
         {
             swap(assign);
             return *this;
         }
         // Faithful swap function.
         void swap(URL& other) noexcept
         {
             using std::swap;
             swap(url, other.url);
             swap(schema, other.schema);
             swap(host,   other.host);
             swap(path,   other.path);
         }

         // Getter functions removed. But simply return std::string_view.
 };
9
  • 5
    Moving a string (in fact, any standard container) invalidates all of its iterators - including those referenced by a _view. This is true regardless of whether the string implementation uses SSO or not. So your assertion that your approach "works" unless SSO is used is incorrect - it has undefined behaviour, regardless. One of the "joys" of undefined behaviour is that the observed behaviour (e.g. with a particular implementation) can "work" as the programmer expects ..... until something changes (e.g. update of the implementation, building your program with a different compiler, etc). Commented Nov 3 at 1:19
  • Use an existing URL class. Or, replace your string_view with something that uses indices into the string to remember the starting point. Or, do that to adjust the string_view in your move. Commented Nov 3 at 5:14
  • With SSO I would expect data() to return an address in the range of the address of the string + its size. If data() returns an address not in that range the string data has been dynamically allocated Commented Nov 3 at 7:46
  • std::string{}.capacity() Commented Nov 3 at 11:35
  • Why would you expect a view to remain valid if the viewed object was moved? I'm curious about your reasoning. If you try to formulate an interface that has this property and implement some non-trivial objects with it, you'll quickly find out that it's an implementation nightmare where you have to be careful about side effects everywhere forever, which is why it's usually not guaranteed. Commented Nov 3 at 14:13

4 Answers 4

8
  1. The standard doesn't mention short string optimizations.
  2. The standard requires the move construction in constant time.
  3. Unlike at other sequence containers, the standard doesn't require validity of iterators, references or pointers to elements in a source string after the move construction.

So, SSO is an implementation detail, it's possible with such requirements above, and dereference of iterators, references or pointers after moving is undefined behavior generally.

I would fix it with using vectors and keep using string views, because I would not like to lose the string related API.

std::vector<char>      url;

URL(const std::string& urlInput) {
   url.assign(urlInput.data(), urlInput.data() + urlInput.size() + 1);

  // Compute Views.
}
Sign up to request clarification or add additional context in comments.

4 Comments

Which sequence container requires the validity of iterators in the source object after move? I didn't find it mentioned anywhere.
@pptaszni Other sequence containers don't invalidate iterators with a swap() operation. 23.2.2.2 Container requirements [container.reqmts] paragraph 66.6 which maps to my use case in the URL example. Sorry for the flawed initial example.
I think this is the best answer posted. But I don't think this is what I would use. As the input is a std::string I want to keep the underlying type the same (to prevent a copy into a vector). I think the better solution (as suggested below) is to not store the view as std::string_view but store the views as offsets into the string (as the offsets don't change). Then std::string_view can be created dynamically as needed at low cost. This is why I have not accepted this answer.
|
2

Is there a way to detect SSO (so I can take appropriate action in my class move constructor)?

Possibly, but if you need to detect an optimization, you're probably prematurely optimizing and/or writing fragile code. It would be better to rethink your approach. For the URL example, it would be simpler to store an offset and length for each piece instead of a string_view.

The big advantage of storing offset and length is that this permits the compiler-generated copy and move operations to work correctly. There is no special logic required, not even re-computing the pieces in the copy constructor. The default logic just works.

There is not much of a downside to storing offset and length. Storage size is the same if you use appropriately-sized integers, as a string_view also consists of two values, pointer and length. Runtime is almost the same; you would need to generate a string_view as needed, but this is just a step above trivial (construct the view from url.data() + offset and length).

Skip the SSO detector and switch to "easy mode" (a.k.a. compiler-generated functions).

does the standard make any reference to SSO?

Not directly. However, it is notable that for containers, "passing a container as an argument to a library function" does not invalidate iterators unless otherwise specified ([container.reqmts]/67), but for strings, iterators may be invalidated when passing a string as an argument to a library function if the the argument is "a reference to non-const basic_string" ([string.require]/4). This difference in requirements serves little purpose other than permitting SSO, so one could say the standard was engineered to permit SSO without naming (or requiring) it.

3 Comments

I agree with your interpretation that the difference between container and string requirements is simply to allow SSO.
Storing offsets into the strings does seem like a better solution here. The get methods can still return std::string_view, but these can simply be calculated on demand using the offsets.
Accepting this. Because this is the direction I will take. Storing the view as offsets into the string. The change in the get function to convert these offsets to views is trivial.
1

Although the other answers, indicating that you probably should re-think your approach, are correct, I think it's worth recording an answer to the actual question (how to detect whether a given string is using SSO).

This fairly simple function works in practice:

auto is_sso(const std::string& s) -> bool
{
    const auto* const s_addr = reinterpret_cast<const char*>(&s);
    return s_addr <= s.data() && s.data() < s_addr + sizeof(std::string);
}

Some examples on godbolt

1 Comment

Nice. +1 for the code to detect SSO.
-2

My first hack at fixing this was:

     void swap(URL& other) noexcept
     {
         using std::swap;
         // Get pointer to the underlying data.
         char const* original = other.url.data();
         swap(url, other.url);

         if (url.data() == original) {
             // The underlying string storage has been moved.
             // So the views are still correct and simply need to be swapped.
             swap(schema, other.schema);
             swap(host,   other.host);
             swap(path,   other.path);
         }
         else {
             // The underlying storage was not moved.
             // So we must recompute the views. But since this only happens
             // for short strings when SSO is being used by std::string the
             // parsing of the URL should not be overburdensome.

             // Compute Views.
         }
     }

After reading comments on the original question. This is probably UB. So I am reading the standard a bit to see if I can find something. But may swap to using a vector (maybe).

6 Comments

"This is probably UB." -- So you are saying that this answer should not be followed / is not useful to others?
Not only that, but the answer seems to rely on context that wasn’t present in the question. “How can I know if a view into a string remains valid after the string is moved” is a different question from “How can I pry implementation details from the standard library that aren’t of my concern to begin with”.
(Simplest solution: use a pair of indices into the string instead of std::string_view.)
@dumpass Maybe my initial question was a bit flawed. I was trying to provide a simple reproducible example (I reduced it in a bad way). I was thinking that std::string would have a standard container type. This does provide guarantees around the swap. But unfortunately std::string is a bit of an utlier in terms of containers and does not provide these gurantees.
I do like the idea of rather than keeping a std::string_view I keep a pair of integer offsets (representing offset from the beginning). Then the get methods create the std::string_view dynamically from the offsets.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.