std::string vs. byte buffer (difference in c++)

Question

I have a project where I transfer data between client and server using boost.asio sockets. Once one side of the connection receives data, it converts it into a std::vector of std::strings which gets then passed on to the actualy recipient object of the data via previously defined "callback" functions. That way works fine so far, only, I am at this point using methods like atoi() and to_string to convert other data types than strings into a sendable format and back. This method is of course a bit wasteful in terms of network usage (especially when transferring bigger amounts of data than just single ints and floats). Therefore I'd like to serialize and deserialize the data. Since, effectively, any serialisation method will produce a byte array or buffer, it would be convenient for me to just use std::string instead. Is there any disadvantage to doing that? I would not understand why there should be once, since strings should be nothing more than byte arrays.

"Is there any disadvantage to doing that?" No. Maybe a std::vector<uint8_t> might be semantically clearer. — πάντα ῥεῖ
– πάντα ῥεῖ, Commented May 24, 2017 at 18:41
std::string pretty much has to null-terminate its buffer as far as I can tell, whereas std::vector<char> wouldn't have to. Probably not enough of a performance impact to worry about, though, compared to the extra functionality std::string makes available. — Daniel Schepler
– Daniel Schepler, Commented May 24, 2017 at 18:41
@DanielSchepler I thought std::string isn't null terminated, only string::c_str and string::data gives you a null terminated sequence — Passer By
– Passer By, Commented May 25, 2017 at 3:28
But string::c_str is documented to be constant-time at least at cppreference.com, and I don't see how you would achieve that aside from maintaining the string data with a null terminator after it. — Daniel Schepler
– Daniel Schepler, Commented May 25, 2017 at 5:41
@DanielSchepler std::string is not necessarily null-terminated within its managed user data of the index range “[0, size()).” However, the specification ensures the presence of the hidden null terminator at the index size() so that .c_str() always returns a null-terminated C string in constant time. — Константин Ван
– Константин Ван, Commented Mar 3, 2024 at 6:51

Xirema · Accepted Answer · 2017-05-24 18:53:42Z

7

In terms of functionality, there's no real difference.

Both for performance reasons and for code clarity reasons, however, I would recommend using std::vector<uint8_t> instead, as it makes it far more clear to anyone maintaining the code that it's a sequence of bytes, not a String.

answered May 24, 2017 at 18:53

Xirema

19.9k4 gold badges37 silver badges70 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Slava · Accepted Answer · 2017-05-24 19:52:29Z

4

You should use std::string when you work with strings, when you work with binary blob you better work with std::vector<uint8_t>. There many benefits:

your intention is clear so code is less error prone
you would not pass your binary buffer as a string to function that expects std::string by mistake
you can override std::ostream<<() for this type to print blob in proper format (usually hex dump). Very unlikely that you would want to print binary blob as a string.

there could be more. Only benefit of std::string that I can see that you do not need to do typedef.

answered May 24, 2017 at 19:52

Slava

44.4k2 gold badges54 silver badges100 bronze badges

Comments

Wagner Patriota · Accepted Answer · 2017-05-27 00:41:03Z

1

You're right. Strings are nothing more than byte arrays. std::string is just a convenient way to manage the buffer array that represents the string. That's it!

There's no disadvantage of using std::string unless you are working on something REALLY REALLY performance critical, like a kernel, for example... then working with std::string would have a considerable overhead. Besides that, feel free to use it.

--

An std::string behind the scenes needs to do a bunch of checks about the state of the string in order to decide if it will use the small-string optimization or not. Today pretty much all compilers implement small-string optimizations. They all use different techniques, but basically it needs to test bitflags that will tell if the string will be constructed in the stack or the heap. This overhead doesn't exist if you straight use char[]. But again, unless you are working on something REALLY critical, like a kernel, you won't notice anything and std::string is much more convenient.

Again, this is just ONE of the things that happens under the hood, just as an example to show the difference of them.

edited May 27, 2017 at 0:41

answered May 24, 2017 at 18:40

Wagner Patriota

5,75428 silver badges49 bronze badges

6 Comments

Wagner Patriota Over a year ago

yes, if you use std::string in the kernel level the overhead is very considerable. Here is an example... but there are many more out there: stackoverflow.com/questions/21946447/…

Xirema Over a year ago

@Ðаn I don't personally know the details, but there is a small amount of extra overhead in std::string because it has several constraints it needs to conform to, including but not limited to the fact that it needs to always have an extra byte allocated to null-terminate the string. At the same time though, std::string objects can be subject to "Small String Optimizations", which can improve the memory footprint. The critical point to take away is that std::string can do things under-the-hood that you might not expect.

Wagner Patriota Over a year ago

@Xirema, about the null-terminate char, both C-String and std::string have. So this is not the issue. The overhead is associated with the code necessary to construct and delete the string. For example, it needs to handle the case for small string optimizations, etc... and this make the std::string a little heavy! I will update the answer with details.

Xirema Over a year ago

@WagnerPatriota We're not comparing std::string and "C-Strings" though, we're comparing std::string and char[] or std::vector<char>. char[] and std::vector<char> do not allocate and manage the null terminating character automatically; it needs to be manually added by the user (or, more likely, ignored, since no good String use depends on it).

Xirema Over a year ago

@Ðаn Well, again, I don't know all the details. I only know that there are various things that affect std::string that std::vector is happy to ignore, that have impacts on performance.

|

forsamori · Accepted Answer · 2017-05-24 19:47:08Z

-2

Depending on how often you're firing network messages, std::string should be fine. It's a convenience class that handles a lot of char work for you. If you have a lot of data to push though, it might be worth using a char array straight and converting it to bytes, just to minimise the extra overhead std::string has.

Edit: if someone could comment and point out why you think my answer is bad, that'd be great and help me learn too.

edited May 24, 2017 at 19:47

answered May 24, 2017 at 18:48

forsamori

217 bronze badges

Collectives™ on Stack Overflow

std::string vs. byte buffer (difference in c++)

4 Answers 4

Comments

Comments

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related