Why is there no support for concatenating std::string and std::string_view?
The reason for this is given in n3512 string_ref: a non-owning reference to a string, revision 2 by Jeffrey Yasskin:
I also omitted operator+(basic_string, basic_string_ref) because LLVM returns a lightweight object from this overload and only performs the concatenation lazily. If we define this overload, we'll have a hard time introducing that lightweight concatenation later.
It has been later suggested on the std-proposals mailing list to add these operator overloads to the standard.
Concatenating string_view objects
A std::string_view
is an alias for std::basic_string_view<char>
, which is a std::basic_string_view
templated on a specific type of character, i.e. char
.
But what does it look like?
Beside the fairly large number of useful member functions such as find
, substr
, and others (maybe it's an ordinary number, if compared to other container/string-like things offered by the STL), std::basic_string_view<_CharT>
, with _CharT
being the generic char
-like type, has just 2 data members,
// directly from my /usr/include/c++/12.2.0/string_view
size_t _M_len;
const _CharT* _M_str;
i.e. a const
ant pointer to _CharT
to indicate where the view starts, and a size_t
(an appropriate type of number) to indicate how long the view is starting from _M_str
's pointee.
In other words, a string view just knows where it starts and how long it is, so it represents a sequence of char
-like entities which are consecutive in memory. With just two such memebrs, you can't represent a string which is made up of non-contiguous substrings.
Yet in other words, if you want to create a std::string_view
, you need to be able to tell how many char
s it is long and from which position. Can you tell where s1 + s2
would have to start and how many characters it should be long? Think about it: you can't, becase s1
and s2
are not adjacent.
Maybe a diagram can help.
Assume these lines of code
std::string s1{"hello"};
std::string s2{"world"};
s1
and s2
are totally unrelated objects, as far as their memory location is concerned; here is what they looks like:
&s2[0]
|
| &s2[1]
| |
&s1[0] | | &s2[2]
| | | |
| &s1[1] | | | &s2[3]
| | | | | |
| | &s1[2] | | | | &s2[4]
| | | | | | | |
| | | &s1[3] v v v v v
| | | | +---+---+---+---+---+
| | | | &s1[4] | w | o | r | l | d |
| | | | | +---+---+---+---+---+
v v v v v
+---+---+---+---+---+
| h | e | l | l | o |
+---+---+---+---+---+
I've intentionally drawn them misaligned to mean that &s1[0]
, the memory location where s1
starts, and &s2[0]
, the memory location where s2
starts, have nothing to do with each other.
Now, imagine you create two string views like this:
std::string_view sv1{s1};
std::string_view sv2(s2.begin() + 1, s2.begin() + 4);
Here's what they will look like, in terms of the two implementation-defined members _M_str
and _M_len
:
&s2[0]
|
| &s2[1]
| |
&s1[0] | | &s2[2]
| | | |
| &s1[1] | | | &s2[3]
| | | | | |
| | &s1[2] | | | | &s2[4]
| | | | | | | |
| | | &s1[3] v v v v v
| | | | +---+---+---+---+---+
| | | | &s1[4] | w | o | r | l | d |
| | | | | +---+---+---+---+---+
v v v v v · ^ ·
+---+---+---+---+---+ · | ·
| h | e | l | l | o | +---+ ·
+---+---+---+---+---+ | · ·
· ^ · | · s2._M_len ·
· | · | <----------->
+---+ · |
| · · +-- s2._M_str
| · s1._M_len ·
| <------------------->
|
+-------- s1._M_str
Given the above, can you see what's wrong with expecting that
std::string_view s3{s1 + s2};
works?
How can you possible define s3._M_str
and s3._M_len
(based on s1._M_str
, s1._M_len
, s2._M_str
, and s2._M_len
), such that they represent a view on "helloworld"
?
You can't because "hello"
and "world"
are located in two unrelated areas of memory.
Why doesn't std::string_view have assign() and clear() methods?
This is only ever really going to be speculation, but general concensus seems to be that these operations would be middlingly unclear.
Personally I think "clearing a view" makes perfect sense (and let's also not forget that remove_prefix
and remove_suffix
exist! Though see below...), but I also agree that there are other interpretations, which may be common, which make less sense. Recall that string_view
is intended to complement const std::string&
, not std::string
, and neither of the functions you name is a part of std::string
's constant interface.
To be honest, the fact that we need this conversation at all is, itself, probably a good reason to just not have the function in the first place.
From the final proposal for string_view
, the following passage is not about assign
or clear
specifically but does act as a relevant view [lol] into the minds of the committee on this subject:
s/remove_prefix/pop_front/, etc.
In Kona 2012, I proposed a
range<>
class withpop_front
, etc. members that adjusted the bounds of the range. Discussion there indicated that committee members were uncomfortable using the same names for lightweight range operations as container operations. Existing practice doesn't agree on a name for this operation, so I've kept the name used by Google'sStringPiece
.
This proposal did in fact include a clear()
, which was unceremoniously struck off the register in a later, isolated, rationale-starved proposal.
Now, one might argue that the functions could therefore have been provided under different names, but that was never proposed, and it's hard to imagine what alternative names would resolve this problem without being simply bad names for the operations.
Since we can assign a new string_view
easily enough, including an empty one, the whole problem is solved by simply not bothering to address it.
Why doesn't std::stringstream work with std::string_view?
At this point (ie: as we approach C++23), there's just not much point to it.
Since you used stringstream
instead of one of the more usage-specific versions, there are two possibilities: you either intend to be able to write to the stream, or you don't.
If you don't intend to write to the stream, then you don't need the data to be copied. All forms of stringstream
own the characters it acts on, so you should try to avoid the copy. You can use the C++23 type ispanstream
(a replacement for the old strstream
). This takes a span<const CharT>
, but string_view
should be compatible with one of ispanstream
's constructors too.
If you do intend to write to the stream, then you will need to copy the data into the stringstream
. But you need not perform two copies. So C++20 gives stringstream
a move-constructor from a std::string
. See constructor #6 here:
explicit basic_stringstream( std::basic_string<CharT,Traits,Allocator>&& str,
std::ios_base::openmode mode =
std::ios_base::in | std::ios_base::out );
- Move-construct the contents of the underlying string device with
str
. The underlyingbasic_stringbuf
object is constructed asbasic_stringbuf<Char,Traits,Allocator>(std::move(str), mode)
.
And since std::string
is constructable from a string_view
, passing a std::string_view
into a std::stringstream
constructor will use this move-constructor overload, which should minimize copying.
So there's really no need for a string_view
-specific constructor.
Why doesn't std::basic_string support concatenation through expression templates?
Because nobody proposed it for the standard; unless someone proposes something, it doesn't get in. Also because it could break existing code (if they use operator+
that is).
Also, expression templates don't work well in the presence of auto
. Doing something as simple as auto concat = str1 % str2;
can easily be broken. Hopefully, this is an issue that C++17 will resolve via some means.
Why doesn't std::string have a constructor that directly takes std::string_view?
The ambiguity is that std::string
and std::string_view
are both constructible from const char *
. That makes things like
std::string{}.assign("ABCDE", 0, 1)
ambiguous if the first parameter can be either a string or a string_view.
There are several defect reports trying to sort this out, starting here.
https://cplusplus.github.io/LWG/lwg-defects.html#2758
The first thing was to make members taking string_view into templates, which lowers their priority in overload resolution. Apparently, that was a bit too effective, so other adjustments were added later.
Related Topics
Translating Python Dictionary to C++
Erasing Using Backspace Control Character
What's the Difference Between the Win32 and _Win32 Defines in C++
C++ Polymorphism Without Pointers
What Happens If 'Throw' Fails to Allocate Memory for Exception Object
What Are Near, Far and Huge Pointers
Understanding the Dangers of Sprintf(...)
Static Analysis Tool to Detect Abi Breaks in C++
Why Is Copy Constructor Called Instead of Conversion Constructor
Compiling Cuda Code in Qt Creator on Windows
Why Does Using a Temporary Object in the Range-Based for Initializer Result in a Crash
Boolean Values as 8 Bit in Compilers. Are Operations on Them Inefficient
Best Way for Interprocess Communication in C++
Why Callback Functions Needs to Be Static When Declared in Class
Does New[] Call Default Constructor in C++
Variadic Deduction Guide Not Taken by G++, Taken by Clang++ - Who Is Correct
How to Deal with "Signed/Unsigned Mismatch" Warnings (C4018)