Is writing to &str[0] buffer (of a std:string) well-defined behaviour in C++11?
Yes, the code is legal in C++11 because the storage for std::string
is guaranteed to be contiguous and your code avoids overwriting the terminating NULL character (or value initialized CharT
).
From N3337, §21.4.5 [string.access]
const_reference operator[](size_type pos) const;
reference operator[](size_type pos);
1 Requires:
pos <= size()
.
2 Returns:*(begin() + pos)
ifpos < size()
. Otherwise, returns a reference to an object of typecharT
with valuecharT()
, where modifying the object leads to undefined behavior.
Your example satisfies the requirements stated above, so the behavior is well defined.
C++: Safe reading from file with std::string (&str[0]) as a buffer?
std::string
allocation is always sequential, at least as of C++11. I believe prior to that it wasn't clearly defined so, but no implementations used non-sequential storage.
You do not need to explicitly add space for a null terminator in std::string
.
Are there downsides to using std::string as a buffer?
Don't use std::string
as a buffer.
It is bad practice to use std::string
as a buffer, for several reasons (listed in no particular order):
std::string
was not intended for use as a buffer; you would need to double-check the description of the class to make sure there are no "gotchas" which would prevent certain usage patterns (or make them trigger undefined behavior).- As a concrete example: Before C++17, you can't even write through the pointer you get with
data()
- it'sconst Tchar *
; so your code would cause undefined behavior. (But&(str[0])
,&(str.front())
, or&(*(str.begin()))
would work.) - Using
std::string
s for buffers is confusing to readers of your function's definition, who assume you would be usingstd::string
for, well, strings. In other words, doing so breaks the Principle of Least Astonishment. - Worse yet, it's confusing for whoever might use your function - they too may think what you're returning is a string, i.e. valid human-readable text.
std::unique_ptr
would be fine for your case, or evenstd::vector
. In C++17, you can usestd::byte
for the element type, too. A more sophisticated option is a class with an SSO-like feature, e.g. Boost'ssmall_vector
(thank you, @gast128, for mentioning it).- (Minor point:) libstdc++ had to change its ABI for
std::string
to conform to the C++11 standard, so in some cases (which by now are rather unlikely), you might run into some linkage or runtime issues that you wouldn't with a different type for your buffer.
Also, your code may make two instead of one heap allocations (implementation dependent): Once upon string construction and another when resize()
ing. But that in itself is not really a reason to avoid std::string
, since you can avoid the double allocation using the construction in @Jarod42's answer.
Directly write into char* buffer of std::string
C++98/03
Impossible. String can be copy on write so it needs to handle all reads and writes.
C++11/14
In [string.require]:
The char-like objects in a
basic_string
object shall be stored contiguously. That is, for anybasic_string
objects
, the identity&*(s.begin() + n) == &*s.begin() + n
shall hold for all values ofn
such that0 <= n < s.size()
.
So &str.front()
and &str[0]
should work.
C++17
str.data()
, &str.front()
and &str[0]
work.
Here it says:
charT* data() noexcept;
Returns: A pointer
p
such thatp + i == &operator[](i)
for eachi
in[0, size()]
.Complexity: Constant time.
Requires: The program shall not alter the value stored
at p + size()
.
The non-const .data()
just works.
The recent draft has the following wording for .front()
:
const charT& front() const;
charT& front();
Requires:
!empty()
.Effects: Equivalent to
operator[](0)
.
And the following for operator[]
:
const_reference operator[](size_type pos) const;
reference operator[](size_type pos);
Requires:
pos <= size()
.Returns:
*(begin() + pos) if pos < size()
. Otherwise, returns a reference to an object of typecharT
with valuecharT()
, where modifying the object leads to undefined behavior.Throws: Nothing.
Complexity: Constant time.
So it uses iterator arithmetic. so we need to inspect the information about iterators. Here it says:
3 A basic_string is a contiguous container ([container.requirements.general]).
So we need to go here:
A contiguous container is a container that supports random access iterators ([random.access.iterators]) and whose member types
iterator
andconst_iterator
are contiguous iterators ([iterator.requirements.general]).
Then here:
Iterators that further satisfy the requirement that, for integral values n and dereferenceable iterator values
a
and(a + n)
,*(a + n)
is equivalent to*(addressof(*a) + n)
, are called contiguous iterators.
Apparently, contiguous iterators are a C++17 feature which was added in these papers.
The requirement can be rewritten as:
assert(*(a + n) == *(&*a + n));
So, in the second part we dereference iterator, then take address of the value it points to, then do a pointer arithmetic on it, dereference it and it's the same as incrementing an iterator and then dereferencing it. This means that contiguous iterator points to the memory where each value stored right after the other, hence contiguous. Since functions that take char*
expect contiguous memory, you can pass the result of &str.front()
or &str[0]
to these functions.
Is it safe to use std::string::c_str() to modify the underlying std::string?
The reference for c_str
that you cited is quite clear on the matter:
The program shall not modify any of the values stored in the character array; otherwise, the behavior is undefined.
The fact that it happens to work doesn't mean anything. Undefined behavior means the program could do the "right" thing if it wants to.
If you do want to modify the underlying data, you can use the non-const overload of data()
, which is available from C++17, which allows you to modify all but the null-terminator:
The program shall not modify the value stored at p + size() to any value other than charT(); otherwise, the behavior is undefined.
Why shouldn't I use std::string.c_str() as a buffer?
It's not allowed!
21.4.7 basic_string string operations[string.ops]
21.4.7.1 basic_string accessors[string.accessors]
const charT* c_str() const noexcept;
const charT* data() const noexcept;
- Returns: A pointer p such that p + i == &operator for each i in [0,size()].
- Complexity: constant time.
- Requires: The program shall not alter any of the values stored in the character array.
Other than that, you're modifying data references by a const char *
, which usually indicates a const_cast<char*>
. Not only will this result in undefined behaviour, but according to Herb Sutter const
should be read as thread-safe nowadays (see his talk about const
and mutable
).
However, as it has been stated, the use of std::string str; &str[0]
is safe if str
is sufficiently large. Just don't use .c_str()
or .data()
.
is there a way to set the length of a std::string without modifying the buffer content?
You should be using resize()
not reserve()
, then resize()
again to set the final length.
Otherwise when you resize()
from zero to the result returned by strlen()
the array will be filled with zero characters, overwriting what you wrote into it. The string is allowed to do that, because it (correctly) assumes that everything from the current size to the current reserved capacity is uninitialized data that doesn't contain anything.
In order for the string to know that the characters are actually valid and their contents should be preserved, you need to use resize()
initially, not reserve()
. Then when you resize()
again to make the string smaller it only truncates the unwanted end of the string and adds a null terminator, it won't overwrite what you wrote into it.
N.B. the initial resize()
will zero-fill the string, which is not strictly necessary in your case because you're going to overwrite the portion you care about and then discard the rest anyway. If the strings are very long and profiling shows the zero-filling is a problem then you could do this instead:
std::unique_ptr<char[]> str(new char[SOME_MAX_VALUE]);
some_C_API_func(str.get());
Can you avoid using temporary buffers when using std::string to interact with C style APIs?
In C++11 you can simply pass a pointer to the first element of the string (&str[0]
): its elements are guaranteed to be contiguous.
Previously, you can use .data()
or .c_str()
but the string is not mutable through these.
Otherwise, yes, you must perform a copy. But I wouldn't worry about this too much until profiling indicates that it's really an issue for you.
Will std::string always be null-terminated in C++11?
Yes. Per the C++0x FDIS 21.4.7.1/1, std::basic_string::c_str()
must return
a pointer
p
such thatp + i == &operator[](i)
for eachi
in[0,size()]
.
This means that given a string s
, the pointer returned by s.c_str()
must be the same as the address of the initial character in the string (&s[0]
).
C++ string's internal buffer address undefined behaviour
In C++11, the code is well-defined, but may not do what you expect. The exact effects are, as per 21.4.5/2:
Returns:
*(begin() + pos)
ifpos < size()
, otherwise a reference to an object of typeT
with value
charT()
; the referenced value shall not be modified.
So if the string is non-empty, it returns a reference to the internal buffer's start. If it's empty, it returns a reference to a char
with value 0
, whose location is memory is an implementation details.
Related Topics
Is There a C++ Iterator That Can Iterate Over a File Line by Line
C++Cli. Are Native Parts Written in Pure C++ But Compiled in Cli as Fast as Pure Native C++
Error C2065: 'Cout':Undeclared Identifier
Cannot Create Constexpr Std::Vector
Is Std::Ifstream Significantly Slower Than File
Linux Optimistic Malloc: Will New Always Throw When Out of Memory
Are There Binary Memory Streams in C++
Signedness of Enum in C/C99/C++/C++X/Gnu C/Gnu C99
Is an Iterator in C++ a Pointer
Performance Wise, How Fast Are Bitwise Operators VS. Normal Modulus
Weird Msc 8.0 Error: "The Value of Esp Was Not Properly Saved Across a Function Call..."
Adding Static Libcurl to Code::Blocks Ide
Move Constructor on Derived Object
Generic Way to Print Out Variable Name in C++