Small String Optimization for Vector

small string optimization for vector?

You can borrow the SmallVector implementation from LLVM. (header only, located in LLVM\include\llvm\ADT)

How could C++ Small String Optimized (SSO) work with containers?

The difference between "small string" and the "large string" is not the difference between storing it on the stack or on the heap. Instead what differs is the level of indirection.

What that means is std::string object can hold a pointer to the actual string data, which could be of (almost) any length, but has all the downsides of indirect dynamic memory - allocations, deallocations, cache misses etc.

Alternatively SSO allows std::string to store small strings "in place", right inside thestd::string object, wherever it is allocated. If the object is in some container (on the heap) that's where the string will be, but it won't require another indirection as the large string would.

std::vector-like class optimized to hold a small number of items

LLVM has a class for that called SmallVector.

What are the mechanics of short string optimization in libc++?

The libc++ basic_string is designed to have a sizeof 3 words on all architectures, where sizeof(word) == sizeof(void*). You have correctly dissected the long/short flag, and the size field in the short form.

what value would __min_cap, the capacity of short strings, take for different architectures?

In the short form, there are 3 words to work with:

  • 1 bit goes to the long/short flag.
  • 7 bits goes to the size.
  • Assuming char, 1 byte goes to the trailing null (libc++ will always store a trailing null behind the data).

This leaves 3 words minus 2 bytes to store a short string (i.e. largest capacity() without an allocation).

On a 32 bit machine, 10 chars will fit in the short string. sizeof(string) is 12.

On a 64 bit machine, 22 chars will fit in the short string. sizeof(string) is 24.

A major design goal was to minimize sizeof(string), while making the internal buffer as large as possible. The rationale is to speed move construction and move assignment. The larger the sizeof, the more words you have to move during a move construction or move assignment.

The long form needs a minimum of 3 words to store the data pointer, size and capacity. Therefore I restricted the short form to those same 3 words. It has been suggested that a 4 word sizeof might have better performance. I have not tested that design choice.

_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT

There is a configuration flag called _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT which rearranges the data members such that the "long layout" changes from:

struct __long
{
size_type __cap_;
size_type __size_;
pointer __data_;
};

to:

struct __long
{
pointer __data_;
size_type __size_;
size_type __cap_;
};

The motivation for this change is the belief that putting __data_ first will have some performance advantages due to better alignment. An attempt was made to measure the performance advantages, and it was difficult to measure. It won't make the performance worse, and it may make it slightly better.

The flag should be used with care. It is a different ABI, and if accidentally mixed with a libc++ std::string compiled with a different setting of _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT will create run time errors.

I recommend this flag only be changed by a vendor of libc++.

Is the small string optimization undesirable?

but as a programmer I'm not supposed to know the internal implementation of a data structure

You are meant to know the iterator invalidation rules of the data structures in the standard library that you use. They are part of the public contract of every container, and make it easy enough to deduce when an iterator/reference/pointer to something in a standard library container may be used without risk of undefined behavior.

It's no different than any other interface in any other language that hands out handles to something. The handle is only going to be valid as long as certain conditions are met.

And it's not as though C++ doesn't give you the tools to safeguard your code. You can create containers of smart pointers if you need certain more elaborate ownership semantics, and different containers have different iterator invalidation rules (paid for by run-time or memory complexity).

May std::vector make use of small buffer optimization?

23.2.1 / p10 / b6:

Unless otherwise specified ...

  • no swap() function invalidates any references, pointers, or iterators referring to the elements of the containers being swapped.
    ...

Nowhere does it "specify otherwise" for vector. So this outlaws the SBO for vector.

string is not bound by this rule because it does "specify otherwise" in 21.4.1/p6:

References, pointers, and iterators referring to the elements of a
basic_string sequence may be invalidated by the following uses of that
basic_string object:

  • as an argument to any standard library function taking a reference to non-const basic_string as an argument.^234

234) For example, as an argument to non-member functions swap()
(21.4.8.8), operator>>() (21.4.8.9), and getline() (21.4.8.9), or as
an argument to basic_string::swap()

What are the mechanics of short string optimization in libc++?

The libc++ basic_string is designed to have a sizeof 3 words on all architectures, where sizeof(word) == sizeof(void*). You have correctly dissected the long/short flag, and the size field in the short form.

what value would __min_cap, the capacity of short strings, take for different architectures?

In the short form, there are 3 words to work with:

  • 1 bit goes to the long/short flag.
  • 7 bits goes to the size.
  • Assuming char, 1 byte goes to the trailing null (libc++ will always store a trailing null behind the data).

This leaves 3 words minus 2 bytes to store a short string (i.e. largest capacity() without an allocation).

On a 32 bit machine, 10 chars will fit in the short string. sizeof(string) is 12.

On a 64 bit machine, 22 chars will fit in the short string. sizeof(string) is 24.

A major design goal was to minimize sizeof(string), while making the internal buffer as large as possible. The rationale is to speed move construction and move assignment. The larger the sizeof, the more words you have to move during a move construction or move assignment.

The long form needs a minimum of 3 words to store the data pointer, size and capacity. Therefore I restricted the short form to those same 3 words. It has been suggested that a 4 word sizeof might have better performance. I have not tested that design choice.

_LIBCPP_ABI_ALTERNATE_STRING_LAYOUT

There is a configuration flag called _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT which rearranges the data members such that the "long layout" changes from:

struct __long
{
size_type __cap_;
size_type __size_;
pointer __data_;
};

to:

struct __long
{
pointer __data_;
size_type __size_;
size_type __cap_;
};

The motivation for this change is the belief that putting __data_ first will have some performance advantages due to better alignment. An attempt was made to measure the performance advantages, and it was difficult to measure. It won't make the performance worse, and it may make it slightly better.

The flag should be used with care. It is a different ABI, and if accidentally mixed with a libc++ std::string compiled with a different setting of _LIBCPP_ABI_ALTERNATE_STRING_LAYOUT will create run time errors.

I recommend this flag only be changed by a vendor of libc++.

How to reserve a vector of strings, if string size is variable?

The compiler doesn't know the size of the strings, it knows the size of std::string object. Now, the size of std::string object does not depend on size of string. That is because - most of the time [1] - std::string will allocate on heap, so the object itself is only a pointer and length.

This also means, when you reserve the vector, you don't yet reserve memory for the strings. This is, however, not always a problem. std::strings come from somewhere: if the strings you receive are the return value of a function (i.e., you have them by value), then the memory is already allocated for the string (in the return value). Thus, e.g. std::swap() can help you speeding up populating the array with the results.

If however you populate it using passing references, then the callee will do the operations that result in alloc. In this case, you'd likely want to loop over the vector and reserve each string:

std::vector<std::string> v;
v.reserve(60000); // expected number of strings
for (auto& s : v) {
s.reserve(500); // expected/max. size of strings
}

[1] It might occur that the specific implementation of std::string actually has a small, fixed-size buffer for sort strings and thus allocates only on heap when the string is longer than that.

std::string — small string optimization and swap

(Reposting from comment)

std::string is not a container – the fact that basic_string is not described in the Containers library chapter (§23) of the FDIS is a good clue. ;-]



Related Topics



Leave a reply



Submit