42

In C++11, we know that std::string is guaranteed to be both contiguous and null-terminated (or more pedantically, terminated by charT(), which in the case of char is the null character 0).

There is this C API I need to use that fills in a string by pointer. It writes the whole string + null terminator. In C++03, I was always forced to use a vector<char>, because I couldn't assume that string was contiguous or null-terminated. But in C++11 (assuming a properly conforming basic_string class, which is still iffy in some standard libraries), I can.

Or can I? When I do this:

std::string str(length);

The string will allocate length+1 bytes, with the last filled in by the null-terminator. That's good. But when I pass this off to the C API, it's going to write length+1 characters. It's going to overwrite the null-terminator.

Admittedly, it's going to overwrite the null-terminator with a null character. Odds are good that this will work (indeed, I can't imagine how it couldn't work).

But I don't care about what "works". I want to know, according to the spec, whether it's OK to overwrite the null-terminator with a null character?

18
  • 5
    Right, but @NicolBolas's question is not "does it cause a problem", but "does the spec allow for it". Commented Oct 5, 2012 at 6:22
  • 2
    @texasbruce In other words, who cares what the spec allows, if it works on your system use it? Luckily, not everyone has that attitude. Commented Oct 5, 2012 at 7:02
  • 3
    It should be possible to avoid the problem by using a std::string with an extra null character at the end, with length length + 1, unless I'm missing something? Commented Oct 5, 2012 at 7:06
  • 7
    @texasbruce That is utterly irrelevant. The point is that nothing in the standard guarantees that the null termination is at a writeable memory location at all. It’s entirely possible (if unlikely) that it’s in read-only memory, for instance. Then any attempt to write to it will crash the program. Any competent C programmer will tell you that you are stark raving mad if you attempt to write portable programs that ignore these effects. It is not “perfectly normal” at all. Commented Oct 5, 2012 at 8:31
  • 4
    @FrerichRaabe Agreed. But that’s a completely different discussion. And even then it doesn’t pay to ignore what the spec says: you may still consciously decide to break the spec – but you should know it first. Commented Oct 5, 2012 at 8:44

4 Answers 4

25

Unfortunately, this is UB, if I interpret the wording correct (in any case, it's not allowed):

§21.4.5 [string.access] p2

Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.

(Editorial error that it says T not charT.)

.data() and .c_str() basically point back to operator[] (§21.4.7.1 [string.accessors] p1):

Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].

Sign up to request clarification or add additional context in comments.

15 Comments

Does writing '\0' into something which is already '\0' actually count as modifying it?
@MichaelAnderson, yes, definitely. It writes to memory.
@KonradRudolph Well, data and c_str return const pointers, so they are out of the question for modifying, anyway. And from 21.4.5 p2 it doesn't really follow that *(&str[0] + str.size()) is even allowed, since [] is only equal to *(begin()+pos) for pos < size(). I think an implementation is perfectly allowed to hold the string data in a length array together with an additional static const charT member for the null (of course this means it would have to maintain an additional buffer to return by data and c_str, but why not?).
@KonradRudolph: It has to. The buffer is contiguous. Therefore, &str[str.size() - 1] == &str[str.size()] - 1 must be true (assuming length() is at least 1). If it weren't, then the buffer wouldn't be contiguous.
@NicolBolas No. size() is an invalid argument for operator[] so nothing guarantees that its return value will point to the buffer. For instance (far-fetched), operator[] could contain the following logic: static CharT terminator{}; if (index == size()) return terminator; else return _data[i];.
|
14

LWG 2475 made this valid by editing the specification of operator[](size()) (inserted text in bold):

Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object to any value other than charT() leads to undefined behavior.

8 Comments

I don't see how this resolves the situation: It means you can write s[s.size()] = '\0'; but it is still not defined (in C++14) that &s[s.size() - 1] + 1 is dereferencable (let alone yielding a null terminator); so by extension an algorithm that starts at &s[0] and increments a char * still can't read or write the null terminator. The section [string.require]/4 only defines this pointer arithmetic for the index being strictly less than the size.
@M.M. That's hidden in the specification of data().
I agree that if data() is used as the source of the pointer then it's all good, however I don't see where &s[0] gives the same guarantee. Specifically, I don't see anything preventing the implementation not writing the null terminator until data() or c_str() is actually called (and either writing it or using a dummy for the &s[s.size()] case).
@M.M Then file an LWG issue. The intent here is pretty clear, so if the wording doesn't match it's in the "obvious defect" category.
I don't know why LWG 2475 is being referred to in past tense, the status report shows that it is not yet resolved.
|
10

According to the spec, overwriting the terminating NUL should be undefined behavior. So, the right thing to do would be to allocate length+1 characters in the string, pass the string buffer to the C API, and then resize() back to length:

// "+ 1" to make room for the terminating NUL for the C API
std::string str(length + 1);

// Call the C API passing &str[0] to safely write to the string buffer
...

// Resize back to length
str.resize(length);

(FWIW, I tried the "overwriting NUL" approach on MSVC10, and it works fine.)


EDIT 2024-FEB-27: Since 2012 (the year in which this question was originally asked and answered here) the C++ standard has been modified, and since C++17 it's legal to overwrite std::string's NUL terminator with another NUL.

3 Comments

I’d go with this solution as well. But it’s unsatisfying that this requires a totally needless allocation of an extra character. Why didn’t the spec just make the null termination writeable?
Because that would mean more typing? 'You must not overwrite the null, except with another null'. Perhaps their fingers were getting tired, or it was the end of the day and the bars were open.
@KonradRudolph: I agree that the standard should be changed, making it possible to overwrite a NUL with another NUL. I see no reason why it shouldn't be possible or should trigger undefined behavior, and I don't like the needless allocation of an extra character either.
5

I suppose n3092 isn't current any more but that's what I have. Section 21.4.5 allows access to a single element. It requires pos <= size(). If pos < size() then you get the actual element, otherwise (i.e. if pos == size()) then you get a non-modifiable reference.

I think that as far as the programming language is concerned, a kind of access which could modify the value is considered a modification even if the new value is the same as the old value.

Does g++ have a pedantic library that you can link to?

1 Comment

libstdc++ has a debug mode but there are limits to what it will diagnose. It validates iterator operations, but can't notice writes of individual bytes through pointers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.