Meaning of Acronym Sso in the Context of Std::String

Meaning of acronym SSO in the context of std::string

Background / Overview

Operations on automatic variables ("from the stack", which are variables that you create without calling malloc / new) are generally much faster than those involving the free store ("the heap", which are variables that are created using new). However, the size of automatic arrays is fixed at compile time, but the size of arrays from the free store is not. Moreover, the stack size is limited (typically a few MiB), whereas the free store is only limited by your system's memory.

SSO is the Short / Small String Optimization. A std::string typically stores the string as a pointer to the free store ("the heap"), which gives similar performance characteristics as if you were to call new char [size]. This prevents a stack overflow for very large strings, but it can be slower, especially with copy operations. As an optimization, many implementations of std::string create a small automatic array, something like char [20]. If you have a string that is 20 characters or smaller (given this example, the actual size varies), it stores it directly in that array. This avoids the need to call new at all, which speeds things up a bit.

EDIT:

I wasn't expecting this answer to be quite so popular, but since it is, let me give a more realistic implementation, with the caveat that I've never actually read any implementation of SSO "in the wild".

Implementation details

At the minimum, a std::string needs to store the following information:

The size
The capacity
The location of the data

The size could be stored as a std::string::size_type or as a pointer to the end. The only difference is whether you want to have to subtract two pointers when the user calls size or add a size_type to a pointer when the user calls end. The capacity can be stored either way as well.

You don't pay for what you don't use.

First, consider the naive implementation based on what I outlined above:

class string {
public:
    // all 83 member functions
private:
    std::unique_ptr<char[]> m_data;
    size_type m_size;
    size_type m_capacity;
    std::array<char, 16> m_sso;
};

For a 64-bit system, that generally means that std::string has 24 bytes of 'overhead' per string, plus another 16 for the SSO buffer (16 chosen here instead of 20 due to padding requirements). It wouldn't really make sense to store those three data members plus a local array of characters, as in my simplified example. If m_size <= 16, then I will put all of the data in m_sso, so I already know the capacity and I don't need the pointer to the data. If m_size > 16, then I don't need m_sso. There is absolutely no overlap where I need all of them. A smarter solution that wastes no space would look something a little more like this (untested, example purposes only):

class string {
public:
    // all 83 member functions
private:
    size_type m_size;
    union {
        class {
            // This is probably better designed as an array-like class
            std::unique_ptr<char[]> m_data;
            size_type m_capacity;
        } m_large;
        std::array<char, sizeof(m_large)> m_small;
    };
};

I'd assume that most implementations look more like this.

std::string SSO tuning

The whole idea with "short string optimisation" is that it "takes no extra space. So the size is calculated such that the local buffer overlays other variables in the class that are used when the string is longer.

It is a bad idea to modify system headers, since they are often compiler version dependent, and there are implementation details that make it "binary incompatible".

As the comment says, make sure this really is a problem (performance or otherwise) before doing anything about it. And then consider carefully what you should do about it. What problem are you trying to fix, and are you sure it's worth it. Remember that if you do something like:

 std::string func(std::string arg)
 {
   ...
 }

you will copy more bytes on passing arg on the stack. And no, it doesn't really help making it const std::string& arg if your calling code makes a temporary string, e.g. func("Name: " + name);. And if you do vector<std::string>, the size of each will be bigger, so the vector will take more space - even for the cases where the string STILL don't fit, so more time will be taken when you grow/shrink the vector.

And I think the right solution, once you have made a decision, is to implement your own string class. std::string is a standard template library class, they are not extendable, and you aren't supposed to modify the standard library header files, as, like I said earlier, it's highly compiler dependent. It will be quite some work to make it completely compatible with std::string, but you could of course "cheat" and make a converter function operator std::string() for your string class, so you only need to produce the more basic functions that std::string offers.

What are the mechanics of short string optimization in libc++?

The libc++ basic_string is designed to have a sizeof 3 words on all architectures, where sizeof(word) == sizeof(void*). You have correctly dissected the long/short flag, and the size field in the short form.

what value would __min_cap, the capacity of short strings, take for different architectures?

In the short form, there are 3 words to work with:

1 bit goes to the long/short flag.
7 bits goes to the size.
Assuming char, 1 byte goes to the trailing null (libc++ will always store a trailing null behind the data).

This leaves 3 words minus 2 bytes to store a short string (i.e. largest capacity() without an allocation).

On a 32 bit machine, 10 chars will fit in the short string. sizeof(string) is 12.

On a 64 bit machine, 22 chars will fit in the short string. sizeof(string) is 24.

A major design goal was to minimize sizeof(string), while making the internal buffer as large as possible. The rationale is to speed move construction and move assignment. The larger the sizeof, the more words you have to move during a move construction or move assignment.

The long form needs a minimum of 3 words to store the data pointer, size and capacity. Therefore I restricted the short form to those same 3 words. It has been suggested that a 4 word sizeof might have better performance. I have not tested that design choice.

_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT

There is a configuration flag called _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT which rearranges the data members such that the "long layout" changes from:

struct __long
{
    size_type __cap_;
    size_type __size_;
    pointer   __data_;
};

to:

struct __long
{
    pointer   __data_;
    size_type __size_;
    size_type __cap_;
};

The motivation for this change is the belief that putting __data_ first will have some performance advantages due to better alignment. An attempt was made to measure the performance advantages, and it was difficult to measure. It won't make the performance worse, and it may make it slightly better.

The flag should be used with care. It is a different ABI, and if accidentally mixed with a libc++ std::string compiled with a different setting of _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT will create run time errors.

I recommend this flag only be changed by a vendor of libc++.

Disable std:string's SSO

As SSO is an optional optimization, there won't be a standard way to turn it off.

In reality, you can just reserve a string that won't fit into the SSO buffer to force the buffer being allocated dynamically:

std::string str;
str.reserve(sizeof(str) + 1);

That seems to work for gcc at least and should even work portably as the internal buffer needs to fit inside the string. (Live)

What does in-situ without memory allocation mean for folly implementation of String?

This means that the cost of std::string already includes 23 characters. Anything more requires additional allocations.

This means internally the structure looks approximately like:

struct string {
  // ...
  char internal[23];
  char* external;
};

This is presumably to make copying short strings very cheap as no heap operations have to be performed, plus it doesn't require any delete calls to clear.

On the implementation of std::string moves

I've only analyzed GCC's version. Here's what's going on: the code handles different kind of allocators. If the allocator has the trait of _S_propagate_on_move_assign or _S_always_equal, then the move is almost free, as you expect. This is the if in move operator=:

if (!__str._M_is_local()
    && (_Alloc_traits::_S_propagate_on_move_assign()
      || _Alloc_traits::_S_always_equal()))
          // cheap move
else assign(__str);

If the condition is true (_M_is_local() means small string, description here), then the move is cheap.

If it is false, then it calls normal assign (not the moving one). This is the case when either:

the string is small, so the assign will do a simple memcpy (cheap)
or the allocator doesn't have the trait always-equal nor propagate-on-move-assign, so the assign will allocate (not cheap)

What does this mean?

It means, that if you use the default allocator (or any allocator with traits mentioned earlier), then the move is still almost free.

On the other hand, the generated code is unnecessarily huge, and can be improved I think. It should have a separate code for handling usual allocators, or have a better assign code (the problem is that assign doesn't check for _M_is_local(), but it does a capacity check, so the compiler cannot decide whether an allocation is needed or not, so it puts the allocation codepath into the executable unnecessarily - you can check out the exact details in the source code).

cstring - c++ string conversion

The std::string constructor will be called with a const char* argument

There is no telling whether memory would be allocated (dynamically), but the chances are that your standard library implementation has the SSO in place, which means it can store small strings without dynamic allocations.

Meaning of Acronym Sso in the Context of Std::String