What Are the Mechanics of Short String Optimization in Libc++

What are the mechanics of short string optimization in libc++?

The libc++ basic_string is designed to have a sizeof 3 words on all architectures, where sizeof(word) == sizeof(void*). You have correctly dissected the long/short flag, and the size field in the short form.

what value would __min_cap, the capacity of short strings, take for different architectures?

In the short form, there are 3 words to work with:

  • 1 bit goes to the long/short flag.
  • 7 bits goes to the size.
  • Assuming char, 1 byte goes to the trailing null (libc++ will always store a trailing null behind the data).

This leaves 3 words minus 2 bytes to store a short string (i.e. largest capacity() without an allocation).

On a 32 bit machine, 10 chars will fit in the short string. sizeof(string) is 12.

On a 64 bit machine, 22 chars will fit in the short string. sizeof(string) is 24.

A major design goal was to minimize sizeof(string), while making the internal buffer as large as possible. The rationale is to speed move construction and move assignment. The larger the sizeof, the more words you have to move during a move construction or move assignment.

The long form needs a minimum of 3 words to store the data pointer, size and capacity. Therefore I restricted the short form to those same 3 words. It has been suggested that a 4 word sizeof might have better performance. I have not tested that design choice.

_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT

There is a configuration flag called _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT which rearranges the data members such that the "long layout" changes from:

struct __long
{
size_type __cap_;
size_type __size_;
pointer __data_;
};

to:

struct __long
{
pointer __data_;
size_type __size_;
size_type __cap_;
};

The motivation for this change is the belief that putting __data_ first will have some performance advantages due to better alignment. An attempt was made to measure the performance advantages, and it was difficult to measure. It won't make the performance worse, and it may make it slightly better.

The flag should be used with care. It is a different ABI, and if accidentally mixed with a libc++ std::string compiled with a different setting of _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT will create run time errors.

I recommend this flag only be changed by a vendor of libc++.

What are the mechanics of short string optimization in libc++?

The libc++ basic_string is designed to have a sizeof 3 words on all architectures, where sizeof(word) == sizeof(void*). You have correctly dissected the long/short flag, and the size field in the short form.

what value would __min_cap, the capacity of short strings, take for different architectures?

In the short form, there are 3 words to work with:

  • 1 bit goes to the long/short flag.
  • 7 bits goes to the size.
  • Assuming char, 1 byte goes to the trailing null (libc++ will always store a trailing null behind the data).

This leaves 3 words minus 2 bytes to store a short string (i.e. largest capacity() without an allocation).

On a 32 bit machine, 10 chars will fit in the short string. sizeof(string) is 12.

On a 64 bit machine, 22 chars will fit in the short string. sizeof(string) is 24.

A major design goal was to minimize sizeof(string), while making the internal buffer as large as possible. The rationale is to speed move construction and move assignment. The larger the sizeof, the more words you have to move during a move construction or move assignment.

The long form needs a minimum of 3 words to store the data pointer, size and capacity. Therefore I restricted the short form to those same 3 words. It has been suggested that a 4 word sizeof might have better performance. I have not tested that design choice.

_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT

There is a configuration flag called _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT which rearranges the data members such that the "long layout" changes from:

struct __long
{
size_type __cap_;
size_type __size_;
pointer __data_;
};

to:

struct __long
{
pointer __data_;
size_type __size_;
size_type __cap_;
};

The motivation for this change is the belief that putting __data_ first will have some performance advantages due to better alignment. An attempt was made to measure the performance advantages, and it was difficult to measure. It won't make the performance worse, and it may make it slightly better.

The flag should be used with care. It is a different ABI, and if accidentally mixed with a libc++ std::string compiled with a different setting of _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT will create run time errors.

I recommend this flag only be changed by a vendor of libc++.

No small string optimization with gcc?

One of your flags is:

--with-default-libstdcxx-abi=gcc4-compatible

and GCC4 does not support small string optimzation.


GCC5 started supporting it. isocpp states:

A new implementation of std::string is enabled by default, using the small string optimization instead of copy-on-write reference counting.

which supports my claim.

Moreover, Exploring std::string mentions:

As we see, older libstdc++ implements copy-on-write, and so it makes
sense for them to not utilize small objects optimization.

and then he changes context, when GCC5 comes in play.

Meaning of acronym SSO in the context of std::string

Background / Overview

Operations on automatic variables ("from the stack", which are variables that you create without calling malloc / new) are generally much faster than those involving the free store ("the heap", which are variables that are created using new). However, the size of automatic arrays is fixed at compile time, but the size of arrays from the free store is not. Moreover, the stack size is limited (typically a few MiB), whereas the free store is only limited by your system's memory.

SSO is the Short / Small String Optimization. A std::string typically stores the string as a pointer to the free store ("the heap"), which gives similar performance characteristics as if you were to call new char [size]. This prevents a stack overflow for very large strings, but it can be slower, especially with copy operations. As an optimization, many implementations of std::string create a small automatic array, something like char [20]. If you have a string that is 20 characters or smaller (given this example, the actual size varies), it stores it directly in that array. This avoids the need to call new at all, which speeds things up a bit.

EDIT:

I wasn't expecting this answer to be quite so popular, but since it is, let me give a more realistic implementation, with the caveat that I've never actually read any implementation of SSO "in the wild".

Implementation details

At the minimum, a std::string needs to store the following information:

  • The size
  • The capacity
  • The location of the data

The size could be stored as a std::string::size_type or as a pointer to the end. The only difference is whether you want to have to subtract two pointers when the user calls size or add a size_type to a pointer when the user calls end. The capacity can be stored either way as well.

You don't pay for what you don't use.

First, consider the naive implementation based on what I outlined above:

class string {
public:
// all 83 member functions
private:
std::unique_ptr<char[]> m_data;
size_type m_size;
size_type m_capacity;
std::array<char, 16> m_sso;
};

For a 64-bit system, that generally means that std::string has 24 bytes of 'overhead' per string, plus another 16 for the SSO buffer (16 chosen here instead of 20 due to padding requirements). It wouldn't really make sense to store those three data members plus a local array of characters, as in my simplified example. If m_size <= 16, then I will put all of the data in m_sso, so I already know the capacity and I don't need the pointer to the data. If m_size > 16, then I don't need m_sso. There is absolutely no overlap where I need all of them. A smarter solution that wastes no space would look something a little more like this (untested, example purposes only):

class string {
public:
// all 83 member functions
private:
size_type m_size;
union {
class {
// This is probably better designed as an array-like class
std::unique_ptr<char[]> m_data;
size_type m_capacity;
} m_large;
std::array<char, sizeof(m_large)> m_small;
};
};

I'd assume that most implementations look more like this.

Does FBString's small string optimization rely on undefined behavior?

Not only does that appear to be UB, it is quite unnecessary, because the only use of bytes_ appears to be for reading the last byte of this, which can be done without UB:

reinterpret_cast<const char*>(this)[sizeof(*this) - 1]

That's thanks to the special exemption in C++ which allows reinterpreting objects as char arrays.

Why move() of string changes underlying data position in memory?

What you are seeing is a consequence of short string optimization. In the most basic sense, there is an array in the string object to save a call to new for small strings. Since the array is a member of the class, it has to have it's own address in each object and when you move a string that is in the array, a copy happens.

What is the reasoning behind libc++'s 16-byte alignment pattern for std::basic_string?

When I first designed <string>, libc++ wasn't yet destined to be open-source. I was writing for Apple's platforms only. And Apple's malloc always allocates at least 16 bytes, and in multiples of 16 bytes, no matter how much you ask for (at least this was true in 2007, I haven't checked recently).

So if the most commonly used allocator is going to hand you 16 bytes, you might as well use them in your capacity.

At one time, a few years earlier, I tried to change the allocator API so that it could ask the allocator how much memory it actually handed out for any particular request. But that attempt failed. So the next best thing was taking advantage of a-priori knowledge of the most common allocator the code was going to deal with.



Related Topics



Leave a reply



Submit