Why Is There No Support for Concatenating Std::String and Std::String_View

Why is there no support for concatenating std::string and std::string_view?

The reason for this is given in n3512 string_ref: a non-owning reference to a string, revision 2 by Jeffrey Yasskin:

I also omitted operator+(basic_string, basic_string_ref) because LLVM returns a lightweight object from this overload and only performs the concatenation lazily. If we define this overload, we'll have a hard time introducing that lightweight concatenation later.

It has been later suggested on the std-proposals mailing list to add these operator overloads to the standard.

Concatenating string_view objects

A std::string_view is an alias for std::basic_string_view<char>, which is a std::basic_string_view templated on a specific type of character, i.e. char.

But what does it look like?

Beside the fairly large number of useful member functions such as find, substr, and others (maybe it's an ordinary number, if compared to other container/string-like things offered by the STL), std::basic_string_view<_CharT>, with _CharT being the generic char-like type, has just 2 data members,

// directly from my /usr/include/c++/12.2.0/string_view
size_t _M_len;
const _CharT* _M_str;

i.e. a constant pointer to _CharT to indicate where the view starts, and a size_t (an appropriate type of number) to indicate how long the view is starting from _M_str's pointee.

In other words, a string view just knows where it starts and how long it is, so it represents a sequence of char-like entities which are consecutive in memory. With just two such memebrs, you can't represent a string which is made up of non-contiguous substrings.

Yet in other words, if you want to create a std::string_view, you need to be able to tell how many chars it is long and from which position. Can you tell where s1 + s2 would have to start and how many characters it should be long? Think about it: you can't, becase s1 and s2 are not adjacent.

Maybe a diagram can help.

Assume these lines of code

std::string s1{"hello"};
std::string s2{"world"};

s1 and s2 are totally unrelated objects, as far as their memory location is concerned; here is what they looks like:

                           &s2[0]
|
| &s2[1]
| |
&s1[0] | | &s2[2]
| | | |
| &s1[1] | | | &s2[3]
| | | | | |
| | &s1[2] | | | | &s2[4]
| | | | | | | |
| | | &s1[3] v v v v v
| | | | +---+---+---+---+---+
| | | | &s1[4] | w | o | r | l | d |
| | | | | +---+---+---+---+---+
v v v v v
+---+---+---+---+---+
| h | e | l | l | o |
+---+---+---+---+---+

I've intentionally drawn them misaligned to mean that &s1[0], the memory location where s1 starts, and &s2[0], the memory location where s2 starts, have nothing to do with each other.

Now, imagine you create two string views like this:

std::string_view sv1{s1};
std::string_view sv2(s2.begin() + 1, s2.begin() + 4);

Here's what they will look like, in terms of the two implementation-defined members _M_str and _M_len:

                                &s2[0]
|
| &s2[1]
| |
&s1[0] | | &s2[2]
| | | |
| &s1[1] | | | &s2[3]
| | | | | |
| | &s1[2] | | | | &s2[4]
| | | | | | | |
| | | &s1[3] v v v v v
| | | | +---+---+---+---+---+
| | | | &s1[4] | w | o | r | l | d |
| | | | | +---+---+---+---+---+
v v v v v · ^ ·
+---+---+---+---+---+ · | ·
| h | e | l | l | o | +---+ ·
+---+---+---+---+---+ | · ·
· ^ · | · s2._M_len ·
· | · | <----------->
+---+ · |
| · · +-- s2._M_str
| · s1._M_len ·
| <------------------->
|
+-------- s1._M_str

Given the above, can you see what's wrong with expecting that

std::string_view s3{s1 + s2};

works?

How can you possible define s3._M_str and s3._M_len (based on s1._M_str, s1._M_len, s2._M_str, and s2._M_len), such that they represent a view on "helloworld"?

You can't because "hello" and "world" are located in two unrelated areas of memory.

Why doesn't std::string_view have assign() and clear() methods?

This is only ever really going to be speculation, but general concensus seems to be that these operations would be middlingly unclear.

Personally I think "clearing a view" makes perfect sense (and let's also not forget that remove_prefix and remove_suffix exist! Though see below...), but I also agree that there are other interpretations, which may be common, which make less sense. Recall that string_view is intended to complement const std::string&, not std::string, and neither of the functions you name is a part of std::string's constant interface.

To be honest, the fact that we need this conversation at all is, itself, probably a good reason to just not have the function in the first place.

From the final proposal for string_view, the following passage is not about assign or clear specifically but does act as a relevant view [lol] into the minds of the committee on this subject:

s/remove_prefix/pop_front/, etc.

In Kona 2012, I proposed a range<> class with pop_front, etc. members that adjusted the bounds of the range. Discussion there indicated that committee members were uncomfortable using the same names for lightweight range operations as container operations. Existing practice doesn't agree on a name for this operation, so I've kept the name used by Google's StringPiece.

This proposal did in fact include a clear(), which was unceremoniously struck off the register in a later, isolated, rationale-starved proposal.

Now, one might argue that the functions could therefore have been provided under different names, but that was never proposed, and it's hard to imagine what alternative names would resolve this problem without being simply bad names for the operations.

Since we can assign a new string_view easily enough, including an empty one, the whole problem is solved by simply not bothering to address it.

Why doesn't std::stringstream work with std::string_view?

At this point (ie: as we approach C++23), there's just not much point to it.

Since you used stringstream instead of one of the more usage-specific versions, there are two possibilities: you either intend to be able to write to the stream, or you don't.

If you don't intend to write to the stream, then you don't need the data to be copied. All forms of stringstream own the characters it acts on, so you should try to avoid the copy. You can use the C++23 type ispanstream (a replacement for the old strstream). This takes a span<const CharT>, but string_view should be compatible with one of ispanstream's constructors too.

If you do intend to write to the stream, then you will need to copy the data into the stringstream. But you need not perform two copies. So C++20 gives stringstream a move-constructor from a std::string. See constructor #6 here:

explicit basic_stringstream( std::basic_string<CharT,Traits,Allocator>&& str,
std::ios_base::openmode mode =
std::ios_base::in | std::ios_base::out );

  1. Move-construct the contents of the underlying string device with str. The underlying basic_stringbuf object is constructed as basic_stringbuf<Char,Traits,Allocator>(std::move(str), mode).

And since std::string is constructable from a string_view, passing a std::string_view into a std::stringstream constructor will use this move-constructor overload, which should minimize copying.

So there's really no need for a string_view-specific constructor.

Why doesn't std::basic_string support concatenation through expression templates?

Because nobody proposed it for the standard; unless someone proposes something, it doesn't get in. Also because it could break existing code (if they use operator+ that is).

Also, expression templates don't work well in the presence of auto. Doing something as simple as auto concat = str1 % str2; can easily be broken. Hopefully, this is an issue that C++17 will resolve via some means.

Why doesn't std::string have a constructor that directly takes std::string_view?

The ambiguity is that std::string and std::string_view are both constructible from const char *. That makes things like

std::string{}.assign("ABCDE", 0, 1)

ambiguous if the first parameter can be either a string or a string_view.

There are several defect reports trying to sort this out, starting here.

https://cplusplus.github.io/LWG/lwg-defects.html#2758

The first thing was to make members taking string_view into templates, which lowers their priority in overload resolution. Apparently, that was a bit too effective, so other adjustments were added later.



Related Topics



Leave a reply



Submit