12

So I have an std::string and have a function which takes char* and writes into it. Since std::string::c_str() and std::string::data() return const char*, I can't use them. So I was allocating a temporary buffer, calling a function with it and copying it into std::string.

Now I plan to work with big amount of information and copying this buffer will have a noticeable impact and I want to avoid it.

Some people suggested to use &str.front() or &str[0] but does it invoke the undefined behavior?

4
  • 8
    "C++17 added added non-const data() to std::string but it still says that you can't modify the buffer." Huh? Where does it say that? Commented Aug 29, 2016 at 7:38
  • before accessing to &std[0] &str.font(), ... be sure to have memory, ie that str.size() > 0. If you just have instantiate str, use str.resize() Commented Aug 29, 2016 at 7:47
  • 1
    @ildjam It looks like I totally misread that paper. .data() should just work. Commented Aug 29, 2016 at 7:49
  • See also: How to convert a std::string to const char* or char* Commented Jun 3, 2022 at 6:01

4 Answers 4

26

C++98/03

Impossible. String can be copy on write so it needs to handle all reads and writes.

C++11/14

In [string.require]:

The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

So &str.front() and &str[0] should work.

C++17

str.data(), &str.front() and &str[0] work.

Here it says:

charT* data() noexcept;

Returns: A pointer p such that p + i == &operator[](i) for each i in [0, size()].

Complexity: Constant time.

Requires: The program shall not alter the value stored at p + size().

The non-const .data() just works.

The recent draft has the following wording for .front():

const charT& front() const;

charT& front();

Requires: !empty().

Effects: Equivalent to operator[](0).

And the following for operator[]:

const_reference operator[](size_type pos) const;

reference operator[](size_type pos);

Requires: pos <= size().

Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object leads to undefined behavior.

Throws: Nothing.

Complexity: Constant time.

So it uses iterator arithmetic. so we need to inspect the information about iterators. Here it says:

3 A basic_string is a contiguous container ([container.requirements.general]).

So we need to go here:

A contiguous container is a container that supports random access iterators ([random.access.iterators]) and whose member types iterator and const_iterator are contiguous iterators ([iterator.requirements.general]).

Then here:

Iterators that further satisfy the requirement that, for integral values n and dereferenceable iterator values a and (a + n), *(a + n) is equivalent to *(addressof(*a) + n), are called contiguous iterators.

Apparently, contiguous iterators are a C++17 feature which was added in these papers.

The requirement can be rewritten as:

assert(*(a + n) == *(&*a + n));

So, in the second part we dereference iterator, then take address of the value it points to, then do a pointer arithmetic on it, dereference it and it's the same as incrementing an iterator and then dereferencing it. This means that contiguous iterator points to the memory where each value stored right after the other, hence contiguous. Since functions that take char* expect contiguous memory, you can pass the result of &str.front() or &str[0] to these functions.

Sign up to request clarification or add additional context in comments.

10 Comments

Good answer. Just some adds: std::string is a contiguous memory container only since C++11. So you can use safetly &str[0] since C++11 only (before, it's std lib implementation specific). Before C++11 you have to use a temporary std::vector and copy it to a std::string or use it to instantiate a std::string
Given that the definition of operator[] says "where modifying the object leads to undefined behavior." I think your answer is wrong. I think this is a defect in the standard.
@Garf365: In practise though, there was no standard library that didn't offer contiguous memory for std::string (since C++98), so it has always been safe, and C++11 says that it always will be safe.
@MartinBonner it says that only for pos==size. In other words you are not suposed to touch terminating symbol. Everything before it is a fair game.
I think I have misread the definition of operator[]. The undefined behaviour only applies in the case where pos is not <size() (which given the "requires" section, means pos == size()).
|
3

You can simply use &s[0] for a non-empty string. This gives you a pointer to the start of the buffer

When you use it to put a string of n characters there the string's length (not just the capacity) needs to be at least n beforehand, because there's no way to adjust it up without clobbering the data.

I.e., usage can go like this:

auto foo( int const n )
    -> string
{
    if( n <= 0 ) { return ""; }

    string result( n, '#' );   // # is an arbitrary fill character.
    int const n_stored = some_api_function( &result[0], n );
    assert( n_stored <= n );
    result.resize( n_stored );
    return result;
}

This approach has worked formally since C++11. Before that, in C++98 and C++03, the buffer was not formally guaranteed to be contiguous. However, for the in-practice the approach has worked since C++98, the first standard – the reason that the contiguous buffer requirement could be adopted in C++11 (it was added in the Lillehammer meeting, I think that was 2005) was that there were no extant standard library implementations with a non-contiguous string buffer.


Regarding

C++17 added added non-const data() to std::string but it still says that you can't modify the buffer.

I'm not aware of any such wording, and since that would defeat the purpose of non-const data() I doubt that this statement is correct.


Regarding

Now I plan to work with big amount of information and copying this buffer will have a noticeable impact and I want to avoid it.

If copying the buffer has a noticeable impact, then you'd want to avoid inadvertently copying the std::string.

One way is to wrap it in a class that's not copyable.

Comments

0

I don't know what you intend to do with that string, but if
all you need is a buffer of chars which frees its own memory automatically,
then I usually use vector<char> or vector<int> or whatever type
of buffer you need.

With v being the vector, it's guaranteed that &v[0] points to
a sequential memory which you can use as a buffer.

1 Comment

it's also true for std::string since C++11 and before, as said MatinBonner in a comment of other answer, although that behavior is non standard, most of standard lib implements std::string as a contiguous memory
0

Note: if you consider string::front() to be the same as &string[0] then the following is a redundant answer:

According to cplusplus: In C++98, you shouldn't write to .data() or .c_str(), they are to be treated as read-only/const:

A program shall not alter any of the characters in this sequence.

But in C++11 this warning was removed, but the return values are still const, so officially it isn't allowed in C++11 either. So to avoid undefined behavior, you can use string::front(), which:

If the string object is const-qualified, the function returns a const char&. Otherwise, it returns a char&.

So if your string isn't const, then you are officially allowed to manipulate the contents returned by string::front(), which is a reference to the first element of the buffer. But the link doesn't mention which C++ standard this applies to. I assume C++11 and later.

Also, it returns the first element, not a pointer, so you'll need to take its address. It's not clear whether you are officially allowed to use that as a const char* for the whole buffer, but in combination with other answers, I'm sure it's safe. Atleast it doesn't produce any compiler warnings.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.