Optimizations are nice, but sometimes their effects can take you by surprise,
especially if you’re not thinking about them while trying to debug an issue.
If that’s the case, they can make your head hurt for a while. This post is
about std::string
class, but it’s far from being the only string class out
there, so keep the outlined issues in mind while working with other
implementations as well.
Updating a string might do more than you expect
I don’t remember the exact code where I hit this, so here is something close that will demonstrate the issue. Can you guess what’s the bug given the hint in the section’s title?
std::string copy = original; const char *line = copy.c_str(); while (*line != '\0') { const char *comma = std::strchr(line, ','); if (comma == nullptr) { process(line); break; } else { copy[comma - line] = '\0'; process(line); line = comma + 1; } }
The code breaks a single string into null-terminated pieces at ,
and processes
them separately. Behaviour of the code in the debugger after *next = '\0';
looks very weird, unless you notice that location of copy
’s storage changes on
that line and recall that std::string
in GCC can have
copy-on-write (COW) implementation (even today I don’t think that
all Linux distributions have moved to new ABI). This means that right after
*next = '\0';
, line
still points to origin
while copy
has diverged. The
first solution was probably to force deep string copying right after making
second object, later the code was changed to use string views instead.
This behaviour has probably contributed to COW strings being banned by C++11 standard. Although I tripped over this only once myself (or maybe I noticed it only once, who knows).
Moving strings around
The code in previous section breaks on C++11 standard while not conforming
to it (took some time for libstdc++ maintainers to address the ABI
breakage). Does it mean that fully conforming C++11 std::string
has
nothing to surprise you with? No, it doesn’t, as now there is a different
feature that can make you scratch your head. Do you expect this function to
always return true
?
bool moveStringAround(std::string &&str) { const char *line = hi.c_str(); std::string moved = std::move(hi); return (line == moved.c_str()); }
Obviously, I wouldn’t ask if it couldn’t return false
. And it will return it
only sometimes, when you use small enough strings and definition of
“small enough” varies depending on the standard library you’re using. The
latter point might actually give you a clue if you see your tests failing for
slightly different lengths of strings on different platforms/compilers.
This is the effect of small string optimization (SSO) added in C++11.
This means that if your string is small enough, then moving it is actually
copying it (because value is contained within the object itself). In my case
the string was using custom allocator while being a
field of another object and I was worrying about propagating allocators on move
correctly and completely forgot about SSO. (See
this presentation for how to handle allocators.) The
solution here is to switch to std::vector<char>
, which doesn’t have such
optimizations (yet?) and fits well in my case (a bunch of string views pointing
into a single buffer).
By the way, it was easy to realize the cause of this behaviour after stepping into string implementation in gdb, so you don’t always want to skip standard C++ library during debug session.