Legal to Overwrite Std::String's Null Terminator

Legal to overwrite std::string's null terminator?

LWG 2475 made this valid by editing the specification of operator[](size()) (inserted text in bold):

Otherwise, returns a reference to an object of type charT with value
charT(), where modifying the object to any value other than charT()
leads to undefined behavior.

Why did C++11 make std::string::data() add a null terminating character?

Advantages of the change:

  1. When data also guarantees the null terminator, the programmer doesn't need to know obscure details of differences between c_str and data and consequently would avoid undefined behaviour from passing strings without guarantee of null termination into functions that require null termination. Such functions are ubiquitous in C interfaces, and C interfaces are used in C++ a lot.

  2. The subscript operator was also changed to allow read access to str[str.size()]. Not allowing access to str.data() + str.size() would be inconsistent.

  3. While not initialising the null terminator upon resize etc. may make that operation faster, it forces the initialisation in c_str which makes that function slower¹. The optimisation case that was removed was not universally the better choice. Given the change mentioned in point 2. that slowness would have affected the subscript operator as well, which would certainly not have been acceptable for performance. As such, the null terminator was going to be there anyway, and therefore there would not be a downside in guaranteeing that it is.

Curious detail: str.at(str.size()) still throws an exception.

P.S. There was another change, that is to guarantee that strings have contiguous storage (which is why data is provided in the first place). Prior to C++11, implementations could have used roped strings, and reallocate upon call to c_str. No major implementation had chosen to exploit this freedom (to my knowledge).

P.P.S Old versions of GCC's libstdc++ for example apparently did set the null terminator only in c_str until version 3.4. See the related commit for details.


¹ A factor to this is concurrency that was introduced to the language standard in C++11. Concurrent non-atomic modification is data-race undefined behaviour, which is why C++ compilers are allowed to optimize aggressively and keep things in registers. So a library implementation written in ordinary C++ would have UB for concurrent calls to .c_str()

In practice (see comments) having multiple threads writing the same thing wouldn't cause a correctness problem because asm for real CPUs doesn't have UB. And C++ UB rules mean that multiple threads actually modifying a std::string object (other than calling c_str()) without synchronization is something the compiler + library can assume doesn't happen.

But it would dirty cache and prevent other threads from reading it, so is still a poor choice, especially for strings that potentially have concurrent readers. Also it would stop .c_str() from basically optimizing away because of the store side-effect.

Does std::string have a null terminator?

No, but if you say temp.c_str() a null terminator will be included in the return from this method.

It's also worth saying that you can include a null character in a string just like any other character.

string s("hello");
cout << s.size() << ' ';
s[1] = '\0';
cout << s.size() << '\n';

prints

5 5

and not 5 1 as you might expect if null characters had a special meaning for strings.

Is it Safe to strncpy Into a string That Doesn't Have Room for the Null Terminator?

This is safe, as long as you copy [0, size()) characters into the string . Per [basic.string]/3

In all cases, [data(), data() + size()] is a valid range, data() + size() points at an object with value charT() (a “null terminator”), and size() <= capacity() is true.

So string bar(length, '\0') gives you a string with a size() of 11, with an immutable null terminator at the end (for a total of 12 characters in actual size). As long as you do not overwrite that null terminator, or try to write past it, you're okay.

Will std::string always be null-terminated in C++11?

Yes. Per the C++0x FDIS 21.4.7.1/1, std::basic_string::c_str() must return

a pointer p such that p + i == &operator[](i) for each i in [0,size()].

This means that given a string s, the pointer returned by s.c_str() must be the same as the address of the initial character in the string (&s[0]).

How to resize std::string to remove all null terminator characters?

Many ways to do this; but probably the one to me that seems to be most "C++" rather than C is:

str.erase(std::find(str.begin(), str.end(), '\0'), str.end());

i.e. Erase everything from the first null to the end.

C++: Does strcat() overwrite or move the null?

The first \0 is overwritten, and a new \0 is added at the end of the concatenated string. There is no scope for "moving" anything here. These are locations to which values get assigned.



Related Topics



Leave a reply



Submit