Why does this work: returning C string literal from std::string function and calling c_str()
Your analysis is correct. What you have is undefined behaviour. This means pretty much anything can happen. It seems in your case the memory used for the string, although de-allocated, still holds the original contents when you access it. This often happens because the OS does not clear out de-allocated memory. It just marks it as available for future use. This is not something the C++ language has to deal with: it is really an OS implementation detail. As far as C++ is concerned, the catch-all "undefined behaviour" applies.
Can std::string::c_str() be used whenever a string literal is expected?
Looking at the documentation you linked to, it seems like you are trying to call the overload of AddMember
taking two StringRefType
s (and an Allocator
). StringRefType
is a typedef for GenericStringRef<Ch>
, which has two overloaded constructors taking a single argument:
template<SizeType N>
GenericStringRef(const CharType(&str)[N]) RAPIDJSON_NOEXCEPT;
explicit GenericStringRef(const CharType *str);
When you pass a string literal, the type is const char[N]
, where N
is the length of the string + 1 (for the null terminator). This can be implicitly converted to a GenericStringRef<Ch>
using the first constructor overload. However, std::string::c_str()
returns a const char*
, which cannot be converted implicitly to a GenericStringRef<Ch>
, because the second constructor overload is declared explicit
.
The error message you get from the compiler is caused by it choosing another overload of AddMember
which is a closer match.
What happens when a string literal is passed to a function accepting a const std::string & in C++?
When you pass a string literal to a function that accepts const std::string&
, the following events occur:
- The string literal is converted to
const char*
- A temporary
std::string
object is created. Its internal buffer is allocated, and initialized by copying the data from theconst char*
until the terminating null is seen. The parameter refers to this temporary object. - The function body runs.
- Assuming the function returns normally, the temporary object is destroyed at some unspecified point between when the function returns and the end of the calling expression.
If the c_str()
pointer is saved from the parameter, it becomes a dangling pointer after the temporary object is destroyed since it points into the temporary object's internal buffer.
A similar problem will occur if the function accepts std::string
. The std::string
object will be created when the function is called and destroyed when the function returns or soon afterward, so any saved c_str()
pointer will become dangling.
If the function accepts const std::string&
and the argument has type std::string
, however, no new object is created when the function is called. The reference refers to the existing object. The c_str()
pointer will remain valid until the original std::string
object is destroyed.
Why does calling std::string.c_str() on a function that returns a string not work?
getString()
would return a copy ofstr
(getString()
returns by value);
It's right.
thus, the copy of
str
would stay "alive" inmain()
untilmain()
returns.
No, the returned copy is a temporary std::string
, which will be destroyed at the end of the statement in which it was created, i.e. before std::cout << cStr << std::endl;
. Then cStr
becomes dangled, dereference on it leads to UB, anything is possible.
You can copy the returned temporary to a named variable, or bind it to a const
lvalue-reference or rvalue-reference (the lifetime of the temporary will be extended until the reference goes out of scope). Such as:
std::string s1 = getString(); // s1 will be copy initialized from the temporary
const char* cStr1 = s1.c_str();
std::cout << cStr1 << std::endl; // safe
const std::string& s2 = getString(); // lifetime of temporary will be extended when bound to a const lvalue-reference
const char* cStr2 = s2.c_str();
std::cout << cStr2 << std::endl; // safe
std::string&& s3 = getString(); // similar with above
const char* cStr3 = s3.c_str();
std::cout << cStr3 << std::endl; // safe
Or use the pointer before the temporary gets destroyed. e.g.
std::cout << getString().c_str() << std::endl; // temporary gets destroyed after the full expression
Here is an explanation from [The.C++.Programming.Language.Special.Edition] 10.4.10 Temporary Objects [class.temp]]:
Unless bound to a reference or used to initialize a named object, a
temporary object is destroyed at the end of the full expression in
which it was created. A full expression is an expression that is
not a subexpression of some other expression.The standard string class has a member function c_str() that
returns a C-style, zero-terminated array of characters (§3.5.1, §20.4.1). Also, the operator + is defined to mean string concatenation.
These are very useful facilities for strings . However, in combination they can cause obscure problems.
For example:void f(string& s1, string& s2, string& s3)
{
const char* cs = (s1 + s2).c_str();
cout << cs ;
if (strlen(cs=(s2+s3).c_str())<8 && cs[0]==´a´) {
// cs used here
}
}Probably, your first reaction is "but don’t do that," and I agree.
However, such code does get written, so it is worth knowing how it is
interpreted.A temporary object of class string is created to hold s1 + s2 .
Next, a pointer to a C-style string is extracted from that object. Then
– at the end of the expression – the temporary object is deleted. Now,
where was the C-style string allocated? Probably as part of the
temporary object holding s1 + s2 , and that storage is not guaranteed
to exist after that temporary is destroyed. Consequently, cs points
to deallocated storage. The output operation cout << cs might work
as expected, but that would be sheer luck. A compiler can detect and
warn against many variants of this problem.
Curious behaviour of c_str() and strings when passed to class
As already noted, the problems in the posted code rise from dangling references to temporary objects, either stored as class members or returned and accessed by .c_str()
.
The first fix is to store actual std::string
s as members, not (dangling) references and then write accessor functions returning const references to those:
#include <iostream>
#include <string>
class DataContainer {
public:
DataContainer(std::string name, std::string description)
: name_(std::move(name)), description_(std::move(description)) {}
auto getName() const -> std::string const& { return name_; }
auto getDescription() const -> std::string const& { return description_; }
private:
const std::string name_;
const std::string description_;
};
int main() {
auto dataContainer = DataContainer{"parameterName", "parameterDescription"};
std::cout << "name: " << dataContainer.getName().c_str() << std::endl;
std::cout << "description: " << dataContainer.getDescription().c_str() << std::endl;
return 0;
}
You can see here that the output is as expected (even when using intermediate local variables).
I use
*.c_str()
here as this is how I use it my actual codebase
Then consider adding a couple of accessors returning exactly that:
//...
auto Name() const { return name_.c_str(); }
auto Description() const { return description_.c_str(); }
//...
std::cout << "name: " << dataContainer.Name() << std::endl;
std::cout << "description: " << dataContainer.Description() << std::endl;
Why does string::c_str() return a const char* when strings are allocated dynamically?
In C and C++, const
translates more or less to "read only".
So, when something returns a char const *
, that doesn't necessarily mean the data it's pointing at is actually const
--it just means that the pointer you're receiving only supports reading, not writing, the data it points at.
The string object itself may be able to modify that data--but (at least via the pointer you're receiving) you're not allowed to modify the data directly.
What actually is done when `string::c_str()` is invoked?
Since C++11, std::string::c_str()
and std::string::data()
are both required to return a pointer to the string's internal buffer. And since c_str()
(but not data()
) must be null-terminated, that effectively requires the internal buffer to always be null-terminated, though the null terminator is not counted by size()
/length()
, or returned by std::string
iterators, etc.
Prior to C++11, the behavior of c_str()
was technically implementation-specific, but most implementations I've ever seen worked this way, as it is the simplest and sanest way to implement it. C++11 just standardized the behavior that was already in wide use.
UPDATE
Since C++11, the buffer is always null-terminated, even for an empty string. However, that does not mean the buffer is required to be dynamically allocated when the string is empty. It could point to an SSO buffer, or even to a single static
nul character. There is no guarantee that the pointer returned by c_str()
/data()
remains pointing at the same memory address as the content of the string changes.
std::string::substr()
returns a new std::string
with its own null-terminated buffer. The string being copied from is unaffected.
C++ String Literal Changing After Function Terminates
c++ is an old language that grew out of C, the result is that both the behavior and the terminology used to describe that behavior can be rather confusing.
A "string literal" is a sequence of characters in the source code surrounded by quotes. In most contexts it evaluates to a pointer to a null-terminated sequence of characters (a "C string"). Under normal circumstances* said sequence of characters will indeed remain valid for the entire lifetime of the progream.
The type string
in your code on the other hand is probably referring to std::string
(via using namespace std
somewhere) which is a class representing an automatically managed string
When you do get_var("string literal");
or string my_literal = "string literal";
the "C string" is implicitly converted to a std::string
. This operation creates a copy of the sequence of characters. Unlike the original sequence of characters this sequence of characters will be freed when the std::string
that owns it is destroyed.
&*literal.begin is a somewhat unorthadox way to get a pointer to the sequence of characters owned by the std::string. using c_str would be more normal. That isn't relevant to your problem though. The important bit is the sequence of characters in memory is one owned by the std::string, not the original sequence from the string literal.
In the case of get_var("string literal");
the std::string is destroyed as soon as the statement completes. In the case of string my_literal = "string literal";
it is destroyed when the variable my_literal
goes out of scope. Either way it is destroyed before foo()
returns. So when you do std::cout << var.string_attribute;
you are referencing a stale pointer for which the associated memory has already been freed.
The reason it works "sometimes" is that memory managers do not generally overwrite memory as soon as it is freed. Typically the memory is not actually overwritten until something re-uses it.
Edit: misread your question. It is possible for a use-after free to "work" sometimes but that is not what is going on here. The cout calls you say are working are at points in the code where the std::string is still alive.
* Excluding cases like unloading shared libraries at runtime that are beyond the scope of the C standard.
Related Topics
Under What Circumstances Is It Advantageous to Give an Implementation of a Pure Virtual Function
Why Do My Sfinae Expressions No Longer Work with Gcc 8.2
What Does ## in a #Define Mean
Setting File Permissions When Opening a File with Ofstream
Execute Rdmsr and Wrmsr Instructions from C/C++ Code
Fseek Does Not Work When File Is Opened in "A" (Append) Mode
Can This MACro Be Converted to a Function
Dynamically Loading Static Library
Gcc 7, -Wimplicit-Fallthrough Warnings, and Portable Way to Clear Them
Mbcs Error Building Mfc C++ Project with Visual Studio
Priority of Kernel Modules and Sched_Rr Threads
Why Are Override and Final Identifiers with Special Meaning Instead of Reserved Keywords