Legal to overwrite std::string's null terminator?

Question

In C++11, we know that std::string is guaranteed to be both contiguous and null-terminated (or more pedantically, terminated by charT(), which in the case of char is the null character 0).

There is this C API I need to use that fills in a string by pointer. It writes the whole string + null terminator. In C++03, I was always forced to use a vector<char>, because I couldn't assume that string was contiguous or null-terminated. But in C++11 (assuming a properly conforming basic_string class, which is still iffy in some standard libraries), I can.

Or can I? When I do this:

std::string str(length);

The string will allocate length+1 bytes, with the last filled in by the null-terminator. That's good. But when I pass this off to the C API, it's going to write length+1 characters. It's going to overwrite the null-terminator.

Admittedly, it's going to overwrite the null-terminator with a null character. Odds are good that this will work (indeed, I can't imagine how it couldn't work).

But I don't care about what "works". I want to know, according to the spec, whether it's OK to overwrite the null-terminator with a null character?

Right, but @NicolBolas's question is not "does it cause a problem", but "does the spec allow for it". — nneonneo
– nneonneo, Commented Oct 5, 2012 at 6:22
@texasbruce In other words, who cares what the spec allows, if it works on your system use it? Luckily, not everyone has that attitude. — user743382
– user743382, Commented Oct 5, 2012 at 7:02
It should be possible to avoid the problem by using a std::string with an extra null character at the end, with length length + 1, unless I'm missing something? — user743382
– user743382, Commented Oct 5, 2012 at 7:06
@texasbruce That is utterly irrelevant. The point is that nothing in the standard guarantees that the null termination is at a writeable memory location at all. It’s entirely possible (if unlikely) that it’s in read-only memory, for instance. Then any attempt to write to it will crash the program. Any competent C programmer will tell you that you are stark raving mad if you attempt to write portable programs that ignore these effects. It is not “perfectly normal” at all. — Konrad Rudolph
– Konrad Rudolph, Commented Oct 5, 2012 at 8:31
@FrerichRaabe Agreed. But that’s a completely different discussion. And even then it doesn’t pay to ignore what the spec says: you may still consciously decide to break the spec – but you should know it first. — Konrad Rudolph
– Konrad Rudolph, Commented Oct 5, 2012 at 8:44

Xeo · Accepted Answer · 2012-10-05 06:44:31Z

25

Unfortunately, this is UB, if I interpret the wording correct (in any case, it's not allowed):

§21.4.5 [string.access] p2

Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.

(Editorial error that it says T not charT.)

.data() and .c_str() basically point back to operator[] (§21.4.7.1 [string.accessors] p1):

Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].

answered Oct 5, 2012 at 6:44

Xeo

132k55 gold badges299 silver badges406 bronze badges

Sign up to request clarification or add additional context in comments.

15 Comments

Michael Anderson Over a year ago

Does writing '\0' into something which is already '\0' actually count as modifying it?

Jonathan Wakely Over a year ago

@MichaelAnderson, yes, definitely. It writes to memory.

Christian Rau Over a year ago

@KonradRudolph Well, data and c_str return const pointers, so they are out of the question for modifying, anyway. And from 21.4.5 p2 it doesn't really follow that *(&str[0] + str.size()) is even allowed, since [] is only equal to *(begin()+pos) for pos < size(). I think an implementation is perfectly allowed to hold the string data in a length array together with an additional static const charT member for the null (of course this means it would have to maintain an additional buffer to return by data and c_str, but why not?).

Nicol Bolas Over a year ago

@KonradRudolph: It has to. The buffer is contiguous. Therefore, &str[str.size() - 1] == &str[str.size()] - 1 must be true (assuming length() is at least 1). If it weren't, then the buffer wouldn't be contiguous.

Konrad Rudolph Over a year ago

@NicolBolas No. size() is an invalid argument for operator[] so nothing guarantees that its return value will point to the buffer. For instance (far-fetched), operator[] could contain the following logic: static CharT terminator{}; if (index == size()) return terminator; else return _data[i];.

|

T.C. · Accepted Answer · 2016-12-24 04:54:52Z

14

LWG 2475 made this valid by editing the specification of operator[](size()) (inserted text in bold):

Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object to any value other than charT() leads to undefined behavior.

answered Dec 24, 2016 at 4:54

T.C.

139k18 gold badges306 silver badges440 bronze badges

8 Comments

M.M Over a year ago

I don't see how this resolves the situation: It means you can write s[s.size()] = '\0'; but it is still not defined (in C++14) that &s[s.size() - 1] + 1 is dereferencable (let alone yielding a null terminator); so by extension an algorithm that starts at &s[0] and increments a char * still can't read or write the null terminator. The section [string.require]/4 only defines this pointer arithmetic for the index being strictly less than the size.

T.C. Over a year ago

@M.M. That's hidden in the specification of data().

M.M Over a year ago

I agree that if data() is used as the source of the pointer then it's all good, however I don't see where &s[0] gives the same guarantee. Specifically, I don't see anything preventing the implementation not writing the null terminator until data() or c_str() is actually called (and either writing it or using a dummy for the &s[s.size()] case).

T.C. Over a year ago

@M.M Then file an LWG issue. The intent here is pretty clear, so if the wording doesn't match it's in the "obvious defect" category.

Ben Voigt Over a year ago

I don't know why LWG 2475 is being referred to in past tense, the status report shows that it is not yet resolved.

|

Mr.C64 · Accepted Answer · 2024-02-27 12:43:46Z

10

According to the spec, overwriting the terminating NUL should be undefined behavior. So, the right thing to do would be to allocate length+1 characters in the string, pass the string buffer to the C API, and then resize() back to length:

// "+ 1" to make room for the terminating NUL for the C API
std::string str(length + 1);

// Call the C API passing &str[0] to safely write to the string buffer
...

// Resize back to length
str.resize(length);

(FWIW, I tried the "overwriting NUL" approach on MSVC10, and it works fine.)

EDIT 2024-FEB-27: Since 2012 (the year in which this question was originally asked and answered here) the C++ standard has been modified, and since C++17 it's legal to overwrite std::string's NUL terminator with another NUL.

edited Feb 27, 2024 at 12:43

answered Oct 5, 2012 at 8:36

Mr.C64

43.3k15 gold badges97 silver badges170 bronze badges

3 Comments

Konrad Rudolph Over a year ago

I’d go with this solution as well. But it’s unsatisfying that this requires a totally needless allocation of an extra character. Why didn’t the spec just make the null termination writeable?

Martin James Over a year ago

Because that would mean more typing? 'You must not overwrite the null, except with another null'. Perhaps their fingers were getting tired, or it was the end of the day and the bars were open.

Mr.C64 Over a year ago

@KonradRudolph: I agree that the standard should be changed, making it possible to overwrite a NUL with another NUL. I see no reason why it shouldn't be possible or should trigger undefined behavior, and I don't like the needless allocation of an extra character either.

Windows programmer · Accepted Answer · 2012-10-05 06:33:16Z

5

I suppose n3092 isn't current any more but that's what I have. Section 21.4.5 allows access to a single element. It requires pos <= size(). If pos < size() then you get the actual element, otherwise (i.e. if pos == size()) then you get a non-modifiable reference.

I think that as far as the programming language is concerned, a kind of access which could modify the value is considered a modification even if the new value is the same as the old value.

Does g++ have a pedantic library that you can link to?

answered Oct 5, 2012 at 6:33

Windows programmer

8,1211 gold badge25 silver badges23 bronze badges

1 Comment

Jonathan Wakely Over a year ago

libstdc++ has a debug mode but there are limits to what it will diagnose. It validates iterator operations, but can't notice writes of individual bytes through pointers.

Collectives™ on Stack Overflow

Legal to overwrite std::string's null terminator?

4 Answers 4

15 Comments

8 Comments

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

15 Comments

8 Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related