What Is String_View

What is string_view?

The purpose of any and all kinds of "string reference" and "array reference" proposals is to avoid copying data which is already owned somewhere else and of which only a non-mutating view is required. The string_view in question is one such proposal; there were earlier ones called string_ref and array_ref, too.

The idea is always to store a pair of pointer-to-first-element and size of some existing data array or string.

Such a view-handle class could be passed around cheaply by value and would offer cheap substringing operations (which can be implemented as simple pointer increments and size adjustments).

Many uses of strings don't require actual owning of the strings, and the string in question will often already be owned by someone else. So there is a genuine potential for increasing the efficiency by avoiding unneeded copies (think of all the allocations and exceptions you can save).

The original C strings were suffering from the problem that the null terminator was part of the string APIs, and so you couldn't easily create substrings without mutating the underlying string (a la strtok). In C++, this is easily solved by storing the length separately and wrapping the pointer and the size into one class.

The one major obstacle and divergence from the C++ standard library philosophy that I can think of is that such "referential view" classes have completely different ownership semantics from the rest of the standard library. Basically, everything else in the standard library is unconditionally safe and correct (if it compiles, it's correct). With reference classes like this, that's no longer true. The correctness of your program depends on the ambient code that uses these classes. So that's harder to check and to teach.

How exactly is std::string_view faster than const std::string&?

std::string_view is faster in a few cases.

First, std::string const& requires the data to be in a std::string, and not a raw C array, a char const* returned by a C API, a std::vector<char> produced by some deserialization engine, etc. The avoided format conversion avoids copying bytes, and (if the string is longer than the SBO¹ for the particular std::string implementation) avoids a memory allocation.

void foo( std::string_view bob ) {
std::cout << bob << "\n";
}
int main(int argc, char const*const* argv) {
foo( "This is a string long enough to avoid the std::string SBO" );
if (argc > 1)
foo( argv[1] );
}

No allocations are done in the string_view case, but there would be if foo took a std::string const& instead of a string_view.

The second really big reason is that it permits working with substrings without a copy. Suppose you are parsing a 2 gigabyte json string (!)². If you parse it into std::string, each such parse node where they store the name or value of a node copies the original data from the 2 gb string to a local node.

Instead, if you parse it to std::string_views, the nodes refer to the original data. This can save millions of allocations and halve memory requirements during parsing.

The speedup you can get is simply ridiculous.

This is an extreme case, but other "get a substring and work with it" cases can also generate decent speedups with string_view.

An important part to the decision is what you lose by using std::string_view. It isn't much, but it is something.

You lose implicit null termination, and that is about it. So if the same string will be passed to 3 functions all of which require a null terminator, converting to std::string once may be wise. Thus if your code is known to need a null terminator, and you don't expect strings fed from C-style sourced buffers or the like, maybe take a std::string const&. Otherwise take a std::string_view.

If std::string_view had a flag that stated if it was null terminated (or something fancier) it would remove even that last reason to use a std::string const&.

There is a case where taking a std::string with no const& is optimal over a std::string_view. If you need to own a copy of the string indefinitely after the call, taking by-value is efficient. You'll either be in the SBO case (and no allocations, just a few character copies to duplicate it), or you'll be able to move the heap-allocated buffer into a local std::string. Having two overloads std::string&& and std::string_view might be faster, but only marginally, and it would cause modest code bloat (which could cost you all of the speed gains).


¹ Small Buffer Optimization

² Actual use case.

When should I use std::string / std::string_view for parameter / return type

std::string_view is a way to get some std::string const member functions without creating a std::string if you have some char* or you want to reference subset of a string.

Consider it as a const reference. If the object it refers vanishes (or changes) for any reason, you have a problem. If your code can return a reference, you can return a string_view.

Example:

#include <cstdio>
#include <string>
#include <vector>
#include <string.h>
#include <iostream>

int main()
{
char* a = new char[10];
strcpy(a,"Hello");
std::string_view s(a);
std::cout << s; // OK
delete[] a;
std::cout << s; // whops. UD. If it was std::string, no problem, it would have been a copy
}

More info.

Edit: It doesn't have a c_str() member because this needs the creation of a \0 at the end of the substring which cannot be done without modification.

Is string_view really promoting use-after-free errors?

The problem with this code...

std::string_view sv = s + "World\n";

... is that sv is not set to s but to a nameless temporary created by the expression s + "world\n". That temporary is destroyed immediately after the whole expression ends (at the semicolon).

So yes, this is a "use after free" type error.

If you want to extend the life of that temporary you have to assign it to a variable that will maintain it - like a new std::string object:

std::string sv = s + "World\n"; // copy the temporary to new storage in sv

A std::string_view is merely a "view" onto a string, it is not a string in itself. It is only valid as long as the string it "looks" at is valid.

There is another quirk here too. You can also bind the temporary to a const reference which extends the life of temporaries:

std::string const& sv = s + "World\n"; // just keep the temporary living

Why is initializing a std::string_view from a temporary allowed?

I can not speak for the standards committee, but my suspicion would be that std::string_view is expected to be used as a function parameter so that temporaries can be passed into a function (like with a const ref). Obviously the lifetime is fine for that scenario.

If we forbade initialization from temporaries then a major use of std::string_view would be negated. You would be forced to create a new std::string (or binding to a const ref) before calling a function making the process awkward.

Creating a std::string from std::string_view

Creating a std::string object always copies (or moves, if it can) the string, and handles its own memory internally.

For your example, the strings handled by sv and s are totally different and separate.

Why is `std::string_view` not implemented differently?

string_view knows nothing about string. It is not a "wrapper" around a string. It has no idea that std::string even exists as a type; the conversion from string to string_view happens within std::string. string_view has no association with or reliance on std::string.

In fact, that is the entire purpose of string_view: to be able to have a non-modifiable sized string without knowing how it is allocated or managed. That it can reference any string type that stores its characters contiguously is the point of the thing. It allows you to create an interface that takes a string_view without knowing or caring whether the caller is using std::string, CString, or any other string type.

Since the owning string's behavior is not string_view's business, there is no possible mechanism for string_view to be told when the string it references is no longer valid.



We could store a pointer to the string object itself and then determine if the string is in SSO more or not and work accordingly.

For the sake of argument, let us ignore that string_view is not supposed to know or care whether its characters come from std::string. Let's assume that string_view only works with std::string (even though that makes the type completely worthless).

Even then, this would not work. Or rather, it would only work if the type was functionally no different from a std::string const&.

If string_view stores a pointer to the first character and a size, then any modification to the std::string might change this. It could change the size even without breaking small-string optimization. It could change the size without causing reallocation. The only way to correct this is to have the string_view always ask the std::string it references what its character data and size are.

And that's no different from just using a std::string const& directly.

string_view Vs const char* performance


But I pass them a std::string_view object using its data() member function. Is this bad practice

Yes, this is a bad practice. It's bad primarily because a string view doesn't necessarily point to a string that is null terminated. In case it doesn't, passing data() into a function that requires null termination will result in undefined behaviour.

Secondarily, there are cases where knowing the length of the string beforehand is more efficient. The length is known since it's stored in the string view. When you use data() only as an argument, you're not providing the known size to the function.

Use this instead: std::istringstream iss(std::string{str});

Should I revert back to const char*?

I see no good reason for doing so in this case.



Related Topics



Leave a reply



Submit