Legal to overwrite std::string's null terminator?
LWG 2475 made this valid by editing the specification of operator[](size())
(inserted text in bold):
Otherwise, returns a reference to an object of type
charT
with value
charT()
, where modifying the object to any value other thancharT()
leads to undefined behavior.
Why did C++11 make std::string::data() add a null terminating character?
Advantages of the change:
When
data
also guarantees the null terminator, the programmer doesn't need to know obscure details of differences betweenc_str
anddata
and consequently would avoid undefined behaviour from passing strings without guarantee of null termination into functions that require null termination. Such functions are ubiquitous in C interfaces, and C interfaces are used in C++ a lot.The subscript operator was also changed to allow read access to
str[str.size()]
. Not allowing access tostr.data() + str.size()
would be inconsistent.While not initialising the null terminator upon resize etc. may make that operation faster, it forces the initialisation in
c_str
which makes that function slower¹. The optimisation case that was removed was not universally the better choice. Given the change mentioned in point 2. that slowness would have affected the subscript operator as well, which would certainly not have been acceptable for performance. As such, the null terminator was going to be there anyway, and therefore there would not be a downside in guaranteeing that it is.
Curious detail: str.at(str.size())
still throws an exception.
P.S. There was another change, that is to guarantee that strings have contiguous storage (which is why data
is provided in the first place). Prior to C++11, implementations could have used roped strings, and reallocate upon call to c_str
. No major implementation had chosen to exploit this freedom (to my knowledge).
P.P.S Old versions of GCC's libstdc++ for example apparently did set the null terminator only in c_str
until version 3.4. See the related commit for details.
¹ A factor to this is concurrency that was introduced to the language standard in C++11. Concurrent non-atomic modification is data-race undefined behaviour, which is why C++ compilers are allowed to optimize aggressively and keep things in registers. So a library implementation written in ordinary C++ would have UB for concurrent calls to .c_str()
In practice (see comments) having multiple threads writing the same thing wouldn't cause a correctness problem because asm for real CPUs doesn't have UB. And C++ UB rules mean that multiple threads actually modifying a std::string
object (other than calling c_str()
) without synchronization is something the compiler + library can assume doesn't happen.
But it would dirty cache and prevent other threads from reading it, so is still a poor choice, especially for strings that potentially have concurrent readers. Also it would stop .c_str()
from basically optimizing away because of the store side-effect.
Does std::string have a null terminator?
No, but if you say temp.c_str()
a null terminator will be included in the return from this method.
It's also worth saying that you can include a null character in a string just like any other character.
string s("hello");
cout << s.size() << ' ';
s[1] = '\0';
cout << s.size() << '\n';
prints
5 5
and not 5 1
as you might expect if null characters had a special meaning for strings.
Is it Safe to strncpy Into a string That Doesn't Have Room for the Null Terminator?
This is safe, as long as you copy [0, size())
characters into the string . Per [basic.string]/3
In all cases,
[data(), data() + size()]
is a valid range,data() + size()
points at an object with valuecharT()
(a “null terminator”), andsize() <= capacity()
istrue
.
So string bar(length, '\0')
gives you a string with a size()
of 11, with an immutable null terminator at the end (for a total of 12 characters in actual size). As long as you do not overwrite that null terminator, or try to write past it, you're okay.
Will std::string always be null-terminated in C++11?
Yes. Per the C++0x FDIS 21.4.7.1/1, std::basic_string::c_str()
must return
a pointer
p
such thatp + i == &operator[](i)
for eachi
in[0,size()]
.
This means that given a string s
, the pointer returned by s.c_str()
must be the same as the address of the initial character in the string (&s[0]
).
How to resize std::string to remove all null terminator characters?
Many ways to do this; but probably the one to me that seems to be most "C++" rather than C is:
str.erase(std::find(str.begin(), str.end(), '\0'), str.end());
i.e. Erase everything from the first null to the end.
C++: Does strcat() overwrite or move the null?
The first \0
is overwritten, and a new \0
is added at the end of the concatenated string. There is no scope for "moving" anything here. These are locations to which values get assigned.
Related Topics
C++Cli. Are Native Parts Written in Pure C++ But Compiled in Cli as Fast as Pure Native C++
Initialize Global Array of Function Pointers at Either Compile-Time, or Run-Time Before Main()
Passing a Pointer to a Class Member Function as a Parameter
How to Do an If Else Depending Type of Type in C++ Template
Easiest Way to Make a Cyclic Iterator (Circulator)
Initializing Container of Unique_Ptrs from Initializer List Fails with Gcc 4.7
Int *Array = New Int[N]; What Is This Function Actually Doing
How to Include the String Header
Static Polymorphism Definition and Implementation
Confusion About Pointers and References in C++
Is C++ Static Member Variable Initialization Thread-Safe
How to Resize a 2D Vector of Objects Given the Width and Height
How to Access Variables Defined and Declared in One Function in Another Function
Implicit VS Explicit Conversion
Clang C++ Cross Compiler - Generating Windows Executable from MAC Os X