How Exactly Is Std::String_View Faster Than Const Std::String&

How exactly is std::string_view faster than const std::string&?

std::string_view is faster in a few cases.

First, std::string const& requires the data to be in a std::string, and not a raw C array, a char const* returned by a C API, a std::vector<char> produced by some deserialization engine, etc. The avoided format conversion avoids copying bytes, and (if the string is longer than the SBO¹ for the particular std::string implementation) avoids a memory allocation.

void foo( std::string_view bob ) {
std::cout << bob << "\n";
}
int main(int argc, char const*const* argv) {
foo( "This is a string long enough to avoid the std::string SBO" );
if (argc > 1)
foo( argv[1] );
}

No allocations are done in the string_view case, but there would be if foo took a std::string const& instead of a string_view.

The second really big reason is that it permits working with substrings without a copy. Suppose you are parsing a 2 gigabyte json string (!)². If you parse it into std::string, each such parse node where they store the name or value of a node copies the original data from the 2 gb string to a local node.

Instead, if you parse it to std::string_views, the nodes refer to the original data. This can save millions of allocations and halve memory requirements during parsing.

The speedup you can get is simply ridiculous.

This is an extreme case, but other "get a substring and work with it" cases can also generate decent speedups with string_view.

An important part to the decision is what you lose by using std::string_view. It isn't much, but it is something.

You lose implicit null termination, and that is about it. So if the same string will be passed to 3 functions all of which require a null terminator, converting to std::string once may be wise. Thus if your code is known to need a null terminator, and you don't expect strings fed from C-style sourced buffers or the like, maybe take a std::string const&. Otherwise take a std::string_view.

If std::string_view had a flag that stated if it was null terminated (or something fancier) it would remove even that last reason to use a std::string const&.

There is a case where taking a std::string with no const& is optimal over a std::string_view. If you need to own a copy of the string indefinitely after the call, taking by-value is efficient. You'll either be in the SBO case (and no allocations, just a few character copies to duplicate it), or you'll be able to move the heap-allocated buffer into a local std::string. Having two overloads std::string&& and std::string_view might be faster, but only marginally, and it would cause modest code bloat (which could cost you all of the speed gains).


¹ Small Buffer Optimization

² Actual use case.

Why is std::string_view faster than const char*?

Simply because with std::string_view you're passed the length and you don't have to insert a null char whenever you want a new string. char* has to search for the end everytime and if you want a substring you'll probably have to copy as you'll need a null char at the end of the substring.

string_view Vs const char* performance

But I pass them a std::string_view object using its data() member function. Is this bad practice

Yes, this is a bad practice. It's bad primarily because a string view doesn't necessarily point to a string that is null terminated. In case it doesn't, passing data() into a function that requires null termination will result in undefined behaviour.

Secondarily, there are cases where knowing the length of the string beforehand is more efficient. The length is known since it's stored in the string view. When you use data() only as an argument, you're not providing the known size to the function.

Use this instead: std::istringstream iss(std::string{str});

Should I revert back to const char*?

I see no good reason for doing so in this case.

Why is `std::string_view` not implemented differently?

string_view knows nothing about string. It is not a "wrapper" around a string. It has no idea that std::string even exists as a type; the conversion from string to string_view happens within std::string. string_view has no association with or reliance on std::string.

In fact, that is the entire purpose of string_view: to be able to have a non-modifiable sized string without knowing how it is allocated or managed. That it can reference any string type that stores its characters contiguously is the point of the thing. It allows you to create an interface that takes a string_view without knowing or caring whether the caller is using std::string, CString, or any other string type.

Since the owning string's behavior is not string_view's business, there is no possible mechanism for string_view to be told when the string it references is no longer valid.



We could store a pointer to the string object itself and then determine if the string is in SSO more or not and work accordingly.

For the sake of argument, let us ignore that string_view is not supposed to know or care whether its characters come from std::string. Let's assume that string_view only works with std::string (even though that makes the type completely worthless).

Even then, this would not work. Or rather, it would only work if the type was functionally no different from a std::string const&.

If string_view stores a pointer to the first character and a size, then any modification to the std::string might change this. It could change the size even without breaking small-string optimization. It could change the size without causing reallocation. The only way to correct this is to have the string_view always ask the std::string it references what its character data and size are.

And that's no different from just using a std::string const& directly.

std::string_view complexity for constant strings

Although std::string_view(const char*) constructor has linear complexity, it is a constexpr constructor and a string literal is a compile time constant, and optimisers are often able to perform the linear complexity at compile time in practice, making the runtime constant. In constant evaluation contexts, this is guaranteed.

Note that your suggested macro, as well as the template suggested in the other answer behave differently from std::string_view(const char*) because string literals may contain null terminators, and the constructor only extends until the first terminator while the macro extends the entire literal.

This is particularly problematic if used with a non-string literal array that contains uninitialised elements that have garbage values:

const string_view svMyVar1("test\0test");
// svMyVar1.size() == 4

const string_view svMyVar2(foo("test\0test"));
// svMyVar2.size() == 9

char arr[32];
arr[0] = 'a';
arr[1] = '\0';
// arr now contains a null terminated "a" followed by 30 garbage chars
const string_view svMyVar3(foo(arr));
// svMyVar3.size() == 31, contains garbage

If they do, is the null-terminator preserved?

String view doesn't modify the array that it refers to. If the referred array contains a null terminator after the referred string, then the null terminator remains there. If there isn't a null terminator, then no null terminator is added.

Reading svMyVar[svMyVar.size()] still has undefined behaviour even if there is a null terminator outside the bounds of the view.

On the other hand, reading *(svMyVar.data() + svMyVar.size()) is fine if you know that there is a null terminator (or any other character) there. You cannot rely on that being the case in general with string view, but you can rely on it if the view is created from a string literal which is guaranteed to be null terminated.



without using internal strlen

Compilers are smart enough to calculate even strlen("literal") at compile time.

Technically, std::string_view uses Traits::length interanally, not std::strlen.



Related Topics



Leave a reply



Submit