Will Std::String Always Be Null-Terminated in C++11

Will std::string always be null-terminated in C++11?

Yes. Per the C++0x FDIS 21.4.7.1/1, std::basic_string::c_str() must return

a pointer p such that p + i == &operator[](i) for each i in [0,size()].

This means that given a string s, the pointer returned by s.c_str() must be the same as the address of the initial character in the string (&s[0]).

Does std::string have a null terminator?

No, but if you say temp.c_str() a null terminator will be included in the return from this method.

It's also worth saying that you can include a null character in a string just like any other character.

string s("hello");
cout << s.size() << ' ';
s[1] = '\0';
cout << s.size() << '\n';

prints

5 5

and not 5 1 as you might expect if null characters had a special meaning for strings.

Will C++11 std::string::operator[] return null-terminated buffer

In practice, yes. There are exactly zero implementations of std::string that are standards-comforming that do not store a NUL character at the end of the buffer.

So if you aren't wondering for wondering sake, you are done.

However, if you are wondering about the standard being abtruse:


In C++14, yes. There is a clear requirement that [] return a contiguous set of elements, and [size()] must return a NUL character, and const methods may not modify state. So *((&str[0])+size()) must be the same as str[size()], and str[size()] must be a NUL, thus game over.


In C++11, almost certainly. There are rules that const methods may not modify state. There are guarantees that data() and c_str() return a null-terminated buffer that agrees with [] at each point.

A convoluted reading of C++11 standard would state that prior to any call of data() or c_str(), [size()] doesn't return the NUL terminator at the end of the buffer but rather a static const CharT that is stored separately, and the buffer has an unitialized (or even a trap value) where NUL should be. Due to the requirement that const methods not modify state I believe this reading is incorrect.

This requires &str[str.size()] change between calls to .data(), which is an observable change in state in string over a const call, which I would read as being illegal.

An alternative way to get around the standard might be to not initialize str[str.size()] until you legally access it via calling .data(), .c_str() or actually passing str.size() to operator[]. As there are no defined ways to access that element other than those 3 in the standard, you could stretch things and say lazy initialization of the NUL is legal.

I'd question this, as the definition of .data() implies that the return value of [] is contiguous, so &[0] is the same address as .data(), and .data()+.size() is guaranteed to point to a NUL CharT so must (&[0])+.size(), and with no non-const methods called the state of the std::string may not change between the calls.

But, what if the fact the compiler can look and see you'll never call .data() or .c_str(), does the requirement of contiguity hold if it can be proven you never call them?

At which point I'd throw my hands up and shoot the hostile compiler.


The standard is very passively voiced about this. So there may be a way to make an arguably standards conforming std::string that doesn't follow these rules. And because the guarantees get closer and closer to explicitly requiring that NUL terminator there, the odds against a new compiler showing up that uses a tortured reading of C++ to claim this is standards compliant is low.

Does std::string::c_str() always return a null-terminated string?

Does std::string's c_str() method always return a null-terminated string?

Yes.

It's specification is:

Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].

Note that the range specified for i is closed, so that size() is a valid index, referring to the character past the end of the string.

operator[] is specified thus:

Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT()

In the case of std::string, which is an alias for std::basic_string<char> so that charT is char, a value-constructed char has the value zero; therefore the character array pointed to by the result of std::string::c_str() is zero-terminated.

Is string::c_str() no longer null terminated in C++11?

Strings are now required to use null-terminated buffers internally. Look at the definition of operator[] (21.4.5):

Requires: pos <= size().

Returns: *(begin() + pos) if pos <
size()
, otherwise a reference to an object of type T with value
charT(); the referenced value shall not be modified.

Looking back at c_str (21.4.7.1/1), we see that it is defined in terms of operator[]:

Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].

And both c_str and data are required to be O(1), so the implementation is effectively forced to use null-terminated buffers.

Additionally, as David Rodríguez - dribeas points out in the comments, the return value requirement also means that you can use &operator[](0) as a synonym for c_str(), so the terminating null character must lie in the same buffer (since *(p + size()) must be equal to charT()); this also means that even if the terminator is initialised lazily, it's not possible to observe the buffer in the intermediate state.

Are C constant character strings always null terminated?

A string is only a string if it contains a null character.

A string is a contiguous sequence of characters terminated by and including the first null character. C11 §7.1.1 1

"abc" is a string literal. It also always contains a null character. A string literal may contain more than 1 null character.

"def\0ghi"  // 2 null characters.

In the following, though, x is not a string (it is an array of char without a null character). y and z are both arrays of char and both are strings.

char x[3] = "abc";
char y[4] = "abc";
char z[] = "abc";

With OP's code, s points to a string, the string literal "abc", *(s + 3) and s[3] have the value of 0. To attempt to modified s[3] is undefined behavior as 1) s is a const char * and 2) the data pointed to by s is a string literal. Attempting to modify a string literal is also undefined behavior.

const char* s = "abc";

Deeper: C does not define "constant character strings".

The language defines a string literal, like "abc" to be a character array of size 4 with the value of 'a', 'b', 'c', '\0'. Attempting to modify these is UB. How this is used depends on context.

The standard C library defines string.

With const char* s = "abc";, s is a pointer to data of type char. As a const some_type * pointer, using s to modify data is UB. s is initialized to point to the string literal "abc". s itself is not a string. The memory s initial points to is a string.

Should std::end for strings point past null terminator?

For a character array, std::end indeed points past the last character in the array. For

char test[3] = {'h', '\0', 'e'};

the pointer std::end(test) is the same as test + 3. Dereferencing it is the same as evaluating test[3]. This is undefined behaviour. In your particular case it just happened that it yielded '\0'. But in general it might yield a different value, or crash, or something else entirely. std::end(test) does not point to the '\0' character at index 1 in the array test!

Note that std::end behaves uniformly with respect to all arrays. That is, if we have an array T a[N], then std::end(a) returns a + N, regardless of whether T is char or what the content of a is. It doesn't give you the end of the string; it gives you the end of the array. Again, the return value is always a + N. No exceptions!

For std::string, there is a terminating null character, but it's not considered part of the string. (Unlike the other characters, you're not allowed to modify it, on pain of undefined behaviour.) If you have

std::string s("hello");

then s[5] will have the value of the null character, but as I said, it's not considered part of the string: s is considered to have five characters, not six. It's best to think of std::string as not being null-terminated at all. The last character is s[4] which has value 'o', and std::end(s) is the iterator just past std::begin(s) + 4, that is, std::begin(s) + 5.

This is a bit more subtle than it looks, as the standard doesn't technically guarantee that std::end(s) is dereferenceable at all, so you can't necessarily say that it points to the terminating null. In practice, it does point to the terminating null, but dereferencing it is still undefined behaviour.

Is wstring null terminated?

Does it include the length

Yes. It's required by the C++11 standard.

§ 21.4.4

size_type size() const noexcept;

1. Returns: A count of the number of char-like objects currently in the string.

2. Complexity: constant time.

Note however, that this is unaware of unicode.


Is it null terminated

Yes. It's also required by the C++11 standard that std::basic_string::c_str returns a valid pointer for the range of [0,size()] in which my_string[my_string.size()] will be valid, hence a null character.

§ 21.4.7.1

const charT* c_str() const noexcept;

const charT* data() const noexcept;

1. Returns: A pointer p such that p + i == &operator[](i) for
each i in [0,size()].

2. Complexity: constant time.

3. Requires: The program shall not alter any of the values
stored in the character array.

What is a null-terminated string?

A null-terminated string is a contiguous sequence of characters, the last one of which has the binary bit pattern all zeros. I'm not sure what you mean by a "usual string", but if you mean std::string, then a std::string is not required (until C++11) to be contiguous, and is not required to have a terminator. Also, a std::string's string data is always allocated and managed by the std::string object that contains it; for a null-terminated string, there is no such container, and you typically refer to and manage such strings using bare pointers.

All of this should really be covered in any decent C++ text book - I recommend getting hold of Accelerated C++, one of the best of them.



Related Topics



Leave a reply



Submit