reversed(top()) code tags rss about

Don't trip over string optimizations

March 31, 2020
[c++] [c++11] [gcc] [programming]

Optimizations are nice, but sometimes their effects can take you by surprise, especially if you’re not thinking about them while trying to debug an issue. If that’s the case, they can make your head hurt for a while. This post is about std::string class, but it’s far from being the only string class out there, so keep the outlined issues in mind while working with other implementations as well.

Updating a string might do more than you expect

I don’t remember the exact code where I hit this, so here is something close that will demonstrate the issue. Can you guess what’s the bug given the hint in the section’s title?

std::string copy = original;
const char *line = copy.c_str();
while (*line != '\0') {
    const char *comma = std::strchr(line, ',');
    if (comma == nullptr) {
        process(line);
        break;
    } else {
        copy[comma - line] = '\0';
        process(line);
        line = comma + 1;
    }
}

The code breaks a single string into null-terminated pieces at , and processes them separately. Behaviour of the code in the debugger after *next = '\0'; looks very weird, unless you notice that location of copy’s storage changes on that line and recall that std::string in GCC can have copy-on-write (COW) implementation (even today I don’t think that all Linux distributions have moved to new ABI). This means that right after *next = '\0';, line still points to origin while copy has diverged. The first solution was probably to force deep string copying right after making second object, later the code was changed to use string views instead.

This behaviour has probably contributed to COW strings being banned by C++11 standard. Although I tripped over this only once myself (or maybe I noticed it only once, who knows).

Moving strings around

The code in previous section breaks on C++11 standard while not conforming to it (took some time for libstdc++ maintainers to address the ABI breakage). Does it mean that fully conforming C++11 std::string has nothing to surprise you with? No, it doesn’t, as now there is a different feature that can make you scratch your head. Do you expect this function to always return true?

bool
moveStringAround(std::string &&str)
{
    const char *line = hi.c_str();
    std::string moved = std::move(hi);
    return (line == moved.c_str());
}

Obviously, I wouldn’t ask if it couldn’t return false. And it will return it only sometimes, when you use small enough strings and definition of “small enough” varies depending on the standard library you’re using. The latter point might actually give you a clue if you see your tests failing for slightly different lengths of strings on different platforms/compilers.

This is the effect of small string optimization (SSO) added in C++11. This means that if your string is small enough, then moving it is actually copying it (because value is contained within the object itself). In my case the string was using custom allocator while being a field of another object and I was worrying about propagating allocators on move correctly and completely forgot about SSO. (See this presentation for how to handle allocators.) The solution here is to switch to std::vector<char>, which doesn’t have such optimizations (yet?) and fits well in my case (a bunch of string views pointing into a single buffer).

By the way, it was easy to realize the cause of this behaviour after stepping into string implementation in gdb, so you don’t always want to skip standard C++ library during debug session.