Does Std::String Have a Null Terminator

Does std::string have a null terminator?

No, but if you say temp.c_str() a null terminator will be included in the return from this method.

It's also worth saying that you can include a null character in a string just like any other character.

string s("hello");
cout << s.size() << ' ';
s[1] = '\0';
cout << s.size() << '\n';

prints

5 5

and not 5 1 as you might expect if null characters had a special meaning for strings.

Do std::strings end in '\0' when initialized with a string literal?

So the constructor copies the null terminator as well, but does not increment the length?

As you've known that std::string doesn't contain the null character (and it doesn't copy the null character here).

The point is that you're using std::basic_string::operator[]. According to C++11, std::basic_string::operator[] will return a null character when specified index is equivalent to size().

If pos == size(), a reference to the character with value CharT() (the null character) is returned.

For the first (non-const) version, the behavior is undefined if this character is modified to any value other than charT().

Are std::string with null-character possible?

Contrary to what you seem to think, C++ string are not null terminated.

The difference in behavior came from the << operator overloads.

This code:

cout << a.c_str(); // a.c_str() is char*

As explained here, use the << overloads that came with cout, it print a char array C style and stop at the first null char. (the char array should be null terminated).

This code:

cout << a; // a is string

As explained here, use the << overloads that came with string, it print a string object that internally known is length and accept null char.

Will std::string always be null-terminated in C++11?

Yes. Per the C++0x FDIS 21.4.7.1/1, std::basic_string::c_str() must return

a pointer p such that p + i == &operator[](i) for each i in [0,size()].

This means that given a string s, the pointer returned by s.c_str() must be the same as the address of the initial character in the string (&s[0]).

Is std::string str(array.begin(), array.end()) adding the null character on its own?

Since C++11 std::string must contain a terminating null character. However, a null character in a std::string does not necessarily terminate the std::string.

I hope it gets more clear with the following example:

#include <string>
#include <iostream>

int main() {
std::string x{"Hello World"};

std::cout << x.c_str() << "\n";

x[5] = '\0';
std::cout << x << "\n";
std::cout << x.c_str();
}

prints:

Hello World
HelloWorld
Hello

We can get a null-terminated c-string via c_str to get the expected output. The << overload knows where to stop because there is a null-terminator. Though after adding a \0 in the middle of the std::string we can still print the whole string with the << overload for std::string.
Calling c_str again, will again return a pointer to a c-string with 10 characters + the terminating \0, but this time the << overload for char* stops when it encounters the first \0, because thats what indicates the end of a c-string.

TL;DR: Unless you need to get a c-string from the std::string you need not worry about adding the null-terminator. std::string does that for you. On the other hand you should be aware that std::string can contain null characters also in the middle not only at their end.

std::string::c_str & Null termination

Before C++11, there was no requirement that a std::string (or the templated class std::basic_string - of which std::string is an instantiation) store a trailing '\0'. This was reflected in different specifications of the data() and c_str() member functions - data() returns a pointer to the underlying data (which was not required to be terminated with a '\0' and c_str() returned a copy with a terminating '\0'. However, equally, there was no requirement to NOT store a trailing '\0' internally (accessing characters past the end of the stored data was undefined behaviour) ..... and, for simplicity, some implementations chose to append a trailing '\0' anyway.

With C++11, this changed. Essentially, the data() member function was specified as giving the same effect as c_str() (i.e. the returned pointer is to the first character of an array that has a trailing '\0'). That has a consequence of requiring the trailing '\0' on the array returned by data(), and therefore on the internal representation.

So the behaviour you're seeing is consistent with C++11 - one of the invariants of the class is a trailing '\0' (i.e. constructors ensure that is the case, member functions which modify the string ensure it remains true, and all public member functions can rely on it being true).

The behaviour you're seeing is not inconsistent with C++ standards before C++11. Strictly speaking, std::string before C++11 was not required to maintain a trailing '\0' but, equally, an implementer could choose to do so.

Why does std::string_view::data not include a null terminator?

So, why does std::string_view::data not return a null-terminated
string like std::string::data

Simply because it can't. A string_view can be a narrower view into a larger string (a substring of a string). That means that the string viewed will not necessary have the null termination at the end of a particular view. You can't write the null terminator into the underlying string for obvious reasons and you can't create a copy of the string and return char * without a memory leak.

If you want a null terminating string, you would have to create a std::string copy out of it.

Let me show a good use of std::string_view:

auto tokenize(std::string_view str, Pred is_delim) -> std::vector<std::string_view>

Here the resulting vector contains tokens as views into the larger string.

Can a std::string contain embedded nulls?

Yes you can have embedded nulls in your std::string.

Example:

std::string s;
s.push_back('\0');
s.push_back('a');
assert(s.length() == 2);

Note: std::string's c_str() member will always append a null character to the returned char buffer; However, std::string's data() member may or may not append a null character to the returned char buffer.

Be careful of operator+=

One thing to look out for is to not use operator+= with a char* on the RHS. It will only add up until the null character.

For example:

std::string s = "hello";
s += "\0world";
assert(s.length() == 5);

The correct way:

std::string s = "hello";
s += std::string("\0world", 6);
assert(s.length() == 11);

Storing binary data more common to use std::vector

Generally it's more common to use std::vector to store arbitrary binary data.

std::vector<char> buf;
buf.resize(1024);
char *p = &buf.front();

It is probably more common since std::string's data() and c_str() members return const pointers so the memory is not modifiable. with &buf.front() you are free to modify the contents of the buffer directly.



Related Topics



Leave a reply



Submit