I have a project where I transfer data between client and server using boost.asio sockets. Once one side of the connection receives data, it converts it into a std::vector of std::strings which gets then passed on to the actualy recipient object of the data via previously defined "callback" functions. That way works fine so far, only, I am at this point using methods like atoi() and to_string to convert other data types than strings into a sendable format and back. This method is of course a bit wasteful in terms of network usage (especially when transferring bigger amounts of data than just single ints and floats). Therefore I'd like to serialize and deserialize the data. Since, effectively, any serialisation method will produce a byte array or buffer, it would be convenient for me to just use std::string instead. Is there any disadvantage to doing that? I would not understand why there should be once, since strings should be nothing more than byte arrays.
4 Answers
You should use std::string when you work with strings, when you work with binary blob you better work with std::vector<uint8_t>. There many benefits:
your intention is clear so code is less error prone
you would not pass your binary buffer as a string to function that expects
std::stringby mistakeyou can override
std::ostream<<()for this type to print blob in proper format (usually hex dump). Very unlikely that you would want to print binary blob as a string.
there could be more. Only benefit of std::string that I can see that you do not need to do typedef.
Comments
You're right. Strings are nothing more than byte arrays. std::string is just a convenient way to manage the buffer array that represents the string. That's it!
There's no disadvantage of using std::string unless you are working on something REALLY REALLY performance critical, like a kernel, for example... then working with std::string would have a considerable overhead. Besides that, feel free to use it.
--
An std::string behind the scenes needs to do a bunch of checks about the state of the string in order to decide if it will use the small-string optimization or not. Today pretty much all compilers implement small-string optimizations. They all use different techniques, but basically it needs to test bitflags that will tell if the string will be constructed in the stack or the heap. This overhead doesn't exist if you straight use char[]. But again, unless you are working on something REALLY critical, like a kernel, you won't notice anything and std::string is much more convenient.
Again, this is just ONE of the things that happens under the hood, just as an example to show the difference of them.
6 Comments
std::string in the kernel level the overhead is very considerable. Here is an example... but there are many more out there: stackoverflow.com/questions/21946447/…std::string because it has several constraints it needs to conform to, including but not limited to the fact that it needs to always have an extra byte allocated to null-terminate the string. At the same time though, std::string objects can be subject to "Small String Optimizations", which can improve the memory footprint. The critical point to take away is that std::string can do things under-the-hood that you might not expect.std::string have. So this is not the issue. The overhead is associated with the code necessary to construct and delete the string. For example, it needs to handle the case for small string optimizations, etc... and this make the std::string a little heavy! I will update the answer with details.std::string and "C-Strings" though, we're comparing std::string and char[] or std::vector<char>. char[] and std::vector<char> do not allocate and manage the null terminating character automatically; it needs to be manually added by the user (or, more likely, ignored, since no good String use depends on it).std::string that std::vector is happy to ignore, that have impacts on performance.Depending on how often you're firing network messages, std::string should be fine. It's a convenience class that handles a lot of char work for you. If you have a lot of data to push though, it might be worth using a char array straight and converting it to bytes, just to minimise the extra overhead std::string has.
Edit: if someone could comment and point out why you think my answer is bad, that'd be great and help me learn too.
std::vector<uint8_t>might be semantically clearer.std::stringpretty much has to null-terminate its buffer as far as I can tell, whereasstd::vector<char>wouldn't have to. Probably not enough of a performance impact to worry about, though, compared to the extra functionalitystd::stringmakes available.std::stringisn't null terminated, onlystring::c_strandstring::datagives you a null terminated sequencestring::c_stris documented to be constant-time at least at cppreference.com, and I don't see how you would achieve that aside from maintaining the string data with a null terminator after it.std::stringis not necessarily null-terminated within its managed user data of the index range “[0,size()).” However, the specification ensures the presence of the hidden null terminator at the indexsize()so that.c_str()always returns a null-terminated C string in constant time.