Is Writing to &Str[0] Buffer (Of a Std:String) Well-Defined Behaviour in C++11

Is writing to &str[0] buffer (of a std:string) well-defined behaviour in C++11?

Yes, the code is legal in C++11 because the storage for std::string is guaranteed to be contiguous and your code avoids overwriting the terminating NULL character (or value initialized CharT).

From N3337, §21.4.5 [string.access]

 const_reference operator[](size_type pos) const;
reference operator[](size_type pos);

1 Requires: pos <= size().

2 Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object leads to undefined behavior.

Your example satisfies the requirements stated above, so the behavior is well defined.

C++: Safe reading from file with std::string (&str[0]) as a buffer?

std::string allocation is always sequential, at least as of C++11. I believe prior to that it wasn't clearly defined so, but no implementations used non-sequential storage.

You do not need to explicitly add space for a null terminator in std::string.

Are there downsides to using std::string as a buffer?

Don't use std::string as a buffer.

It is bad practice to use std::string as a buffer, for several reasons (listed in no particular order):

  • std::string was not intended for use as a buffer; you would need to double-check the description of the class to make sure there are no "gotchas" which would prevent certain usage patterns (or make them trigger undefined behavior).
  • As a concrete example: Before C++17, you can't even write through the pointer you get with data() - it's const Tchar *; so your code would cause undefined behavior. (But &(str[0]), &(str.front()), or &(*(str.begin())) would work.)
  • Using std::strings for buffers is confusing to readers of your function's definition, who assume you would be using std::string for, well, strings. In other words, doing so breaks the Principle of Least Astonishment.
  • Worse yet, it's confusing for whoever might use your function - they too may think what you're returning is a string, i.e. valid human-readable text.
  • std::unique_ptr would be fine for your case, or even std::vector. In C++17, you can use std::byte for the element type, too. A more sophisticated option is a class with an SSO-like feature, e.g. Boost's small_vector (thank you, @gast128, for mentioning it).
  • (Minor point:) libstdc++ had to change its ABI for std::string to conform to the C++11 standard, so in some cases (which by now are rather unlikely), you might run into some linkage or runtime issues that you wouldn't with a different type for your buffer.

Also, your code may make two instead of one heap allocations (implementation dependent): Once upon string construction and another when resize()ing. But that in itself is not really a reason to avoid std::string, since you can avoid the double allocation using the construction in @Jarod42's answer.

Directly write into char* buffer of std::string

C++98/03

Impossible. String can be copy on write so it needs to handle all reads and writes.

C++11/14

In [string.require]:

The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string
object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

So &str.front() and &str[0] should work.

C++17

str.data(), &str.front() and &str[0] work.

Here it says:

charT* data() noexcept;

Returns: A pointer p such that p + i == &operator[](i) for each i in [0, size()].

Complexity: Constant time.

Requires: The program shall not alter the value stored at p + size().

The non-const .data() just works.

The recent draft has the following wording for .front():

const charT& front() const;

charT& front();

Requires: !empty().

Effects: Equivalent to operator[](0).

And the following for operator[]:

const_reference operator[](size_type pos) const;

reference operator[](size_type pos);

Requires: pos <= size().

Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object leads to undefined behavior.

Throws: Nothing.

Complexity: Constant time.

So it uses iterator arithmetic. so we need to inspect the information about iterators. Here it says:

3 A basic_string is a contiguous container ([container.requirements.general]).

So we need to go here:

A contiguous container is a container that supports random access iterators ([random.access.iterators]) and whose member types iterator and const_iterator are contiguous iterators ([iterator.requirements.general]).

Then here:

Iterators that further satisfy the requirement that, for integral values n and dereferenceable iterator values a and (a + n), *(a + n) is equivalent to *(addressof(*a) + n), are called contiguous iterators.

Apparently, contiguous iterators are a C++17 feature which was added in these papers.

The requirement can be rewritten as:

assert(*(a + n) == *(&*a + n));

So, in the second part we dereference iterator, then take address of the value it points to, then do a pointer arithmetic on it, dereference it and it's the same as incrementing an iterator and then dereferencing it. This means that contiguous iterator points to the memory where each value stored right after the other, hence contiguous. Since functions that take char* expect contiguous memory, you can pass the result of &str.front() or &str[0] to these functions.

Is it safe to use std::string::c_str() to modify the underlying std::string?

The reference for c_str that you cited is quite clear on the matter:

The program shall not modify any of the values stored in the character array; otherwise, the behavior is undefined.

The fact that it happens to work doesn't mean anything. Undefined behavior means the program could do the "right" thing if it wants to.


If you do want to modify the underlying data, you can use the non-const overload of data(), which is available from C++17, which allows you to modify all but the null-terminator:

The program shall not modify the value stored at p + size() to any value other than charT(); otherwise, the behavior is undefined.

Why shouldn't I use std::string.c_str() as a buffer?

It's not allowed!

21.4.7 basic_string string operations[string.ops]

21.4.7.1 basic_string accessors[string.accessors]

const charT* c_str() const noexcept;
const charT* data() const noexcept;
  1. Returns: A pointer p such that p + i == &operator for each i in [0,size()].
  2. Complexity: constant time.
  3. Requires: The program shall not alter any of the values stored in the character array.

Other than that, you're modifying data references by a const char *, which usually indicates a const_cast<char*>. Not only will this result in undefined behaviour, but according to Herb Sutter const should be read as thread-safe nowadays (see his talk about const and mutable).

However, as it has been stated, the use of std::string str; &str[0] is safe if str is sufficiently large. Just don't use .c_str() or .data().

is there a way to set the length of a std::string without modifying the buffer content?

You should be using resize() not reserve(), then resize() again to set the final length.

Otherwise when you resize() from zero to the result returned by strlen() the array will be filled with zero characters, overwriting what you wrote into it. The string is allowed to do that, because it (correctly) assumes that everything from the current size to the current reserved capacity is uninitialized data that doesn't contain anything.

In order for the string to know that the characters are actually valid and their contents should be preserved, you need to use resize() initially, not reserve(). Then when you resize() again to make the string smaller it only truncates the unwanted end of the string and adds a null terminator, it won't overwrite what you wrote into it.

N.B. the initial resize() will zero-fill the string, which is not strictly necessary in your case because you're going to overwrite the portion you care about and then discard the rest anyway. If the strings are very long and profiling shows the zero-filling is a problem then you could do this instead:

std::unique_ptr<char[]> str(new char[SOME_MAX_VALUE]);
some_C_API_func(str.get());

Can you avoid using temporary buffers when using std::string to interact with C style APIs?

In C++11 you can simply pass a pointer to the first element of the string (&str[0]): its elements are guaranteed to be contiguous.

Previously, you can use .data() or .c_str() but the string is not mutable through these.

Otherwise, yes, you must perform a copy. But I wouldn't worry about this too much until profiling indicates that it's really an issue for you.

Will std::string always be null-terminated in C++11?

Yes. Per the C++0x FDIS 21.4.7.1/1, std::basic_string::c_str() must return

a pointer p such that p + i == &operator[](i) for each i in [0,size()].

This means that given a string s, the pointer returned by s.c_str() must be the same as the address of the initial character in the string (&s[0]).

C++ string's internal buffer address undefined behaviour

In C++11, the code is well-defined, but may not do what you expect. The exact effects are, as per 21.4.5/2:

Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value
charT(); the referenced value shall not be modified.

So if the string is non-empty, it returns a reference to the internal buffer's start. If it's empty, it returns a reference to a char with value 0, whose location is memory is an implementation details.



Related Topics



Leave a reply



Submit