GCC's std::string - why so weird implementation

Question

When I was looking at the way std::string is implemented in gcc I noticed that sizeof(std::string) is exactly equal to the size of pointer (4 bytes in x32 build, and 8 bytes for x64). As string should hold a pointer to string buffer and its length as a bare minimum, this made me think that std::string object in GCC is actually a pointer to some internal structure that holds this data.

As a consequence when new string is created one dynamic memory allocation should occur (even if the string is empty).

In addition to performance overhead this also cause memory overhead (that happens when we are allocating very small chunk of memory).

So I see only downsides of such design. What am I missing? What are the upsides and what is the reason for such implementation in the first place?

For myself, I usually find that it is a safe assumption that the compiler and standard library writers have given more thought to their designs than I have. I'd just assume there are good reasons, until I had a specific problem that might be caused by this. Now, you might be asking out of curiosity. In that case I would recommend that you do more research yourself -- the source code and its repository is right there :) — Magnus Hoff
– Magnus Hoff, Commented May 28, 2012 at 13:33
GCC std::string is implemented as copy-on-write smart pointer to the actual buffer. It's open source, so you can just read it. — Jan Hudec
– Jan Hudec, Commented May 28, 2012 at 13:35
Thanks, Jan, should have've found this myself. Please post as an answer, and I will accept — Alex Z
– Alex Z, Commented May 28, 2012 at 13:38

Jonathan Wakely · Accepted Answer · 2012-05-29 00:38:08Z

5

Read the long comment at the top of <bits/basic_string.h>, it explains what the pointer points to and where the string length (and reference count) are stored and why it's done that way.

However, C++11 doesn't allow a reference-counted Copy-On-Write std::string so the GCC implementation will have to change, but doing so would break the ABI so is being delayed until an ABI change is inevitable. We don't want to change the ABI, then have to change it again a few months later, then again. When it changes it should only change once to minimise the hassles for users.

answered May 29, 2012 at 0:38

Jonathan Wakely

172k28 gold badges360 silver badges540 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Kerrek SB Over a year ago

Any updates on this? Is the ABI change foreseeable? Can it be enabled conditionally?

Jonathan Wakely Over a year ago

The change might happen for GCC 4.9, and would take effect only when C++11 is used

John Zwinck Over a year ago

@KerrekSB The C++11 ABI change happened in GCC 5.1: gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html (and there's a macro you can define to keep the old ABI, even through today's GCC 8.x).

Collectives™ on Stack Overflow

GCC's std::string - why so weird implementation

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related