Why Does This Work: Returning C String Literal from Std::String Function and Calling C_Str()

Why does this work: returning C string literal from std::string function and calling c_str()

Your analysis is correct. What you have is undefined behaviour. This means pretty much anything can happen. It seems in your case the memory used for the string, although de-allocated, still holds the original contents when you access it. This often happens because the OS does not clear out de-allocated memory. It just marks it as available for future use. This is not something the C++ language has to deal with: it is really an OS implementation detail. As far as C++ is concerned, the catch-all "undefined behaviour" applies.

Can std::string::c_str() be used whenever a string literal is expected?

Looking at the documentation you linked to, it seems like you are trying to call the overload of AddMember taking two StringRefTypes (and an Allocator). StringRefType is a typedef for GenericStringRef<Ch>, which has two overloaded constructors taking a single argument:

template<SizeType N>
GenericStringRef(const CharType(&str)[N]) RAPIDJSON_NOEXCEPT;

explicit GenericStringRef(const CharType *str);

When you pass a string literal, the type is const char[N], where N is the length of the string + 1 (for the null terminator). This can be implicitly converted to a GenericStringRef<Ch> using the first constructor overload. However, std::string::c_str() returns a const char*, which cannot be converted implicitly to a GenericStringRef<Ch>, because the second constructor overload is declared explicit.

The error message you get from the compiler is caused by it choosing another overload of AddMember which is a closer match.

What happens when a string literal is passed to a function accepting a const std::string & in C++?

When you pass a string literal to a function that accepts const std::string&, the following events occur:

  • The string literal is converted to const char*
  • A temporary std::string object is created. Its internal buffer is allocated, and initialized by copying the data from the const char* until the terminating null is seen. The parameter refers to this temporary object.
  • The function body runs.
  • Assuming the function returns normally, the temporary object is destroyed at some unspecified point between when the function returns and the end of the calling expression.

If the c_str() pointer is saved from the parameter, it becomes a dangling pointer after the temporary object is destroyed since it points into the temporary object's internal buffer.

A similar problem will occur if the function accepts std::string. The std::string object will be created when the function is called and destroyed when the function returns or soon afterward, so any saved c_str() pointer will become dangling.

If the function accepts const std::string& and the argument has type std::string, however, no new object is created when the function is called. The reference refers to the existing object. The c_str() pointer will remain valid until the original std::string object is destroyed.

Why does calling std::string.c_str() on a function that returns a string not work?

getString() would return a copy of str (getString() returns by value);

It's right.

thus, the copy of str would stay "alive" in main() until main() returns.

No, the returned copy is a temporary std::string, which will be destroyed at the end of the statement in which it was created, i.e. before std::cout << cStr << std::endl;. Then cStr becomes dangled, dereference on it leads to UB, anything is possible.

You can copy the returned temporary to a named variable, or bind it to a const lvalue-reference or rvalue-reference (the lifetime of the temporary will be extended until the reference goes out of scope). Such as:

std::string s1 = getString();    // s1 will be copy initialized from the temporary
const char* cStr1 = s1.c_str();
std::cout << cStr1 << std::endl; // safe

const std::string& s2 = getString(); // lifetime of temporary will be extended when bound to a const lvalue-reference
const char* cStr2 = s2.c_str();
std::cout << cStr2 << std::endl; // safe

std::string&& s3 = getString(); // similar with above
const char* cStr3 = s3.c_str();
std::cout << cStr3 << std::endl; // safe

Or use the pointer before the temporary gets destroyed. e.g.

std::cout << getString().c_str() << std::endl;  // temporary gets destroyed after the full expression

Here is an explanation from [The.C++.Programming.Language.Special.Edition] 10.4.10 Temporary Objects [class.temp]]:

Unless bound to a reference or used to initialize a named object, a
temporary object is destroyed at the end of the full expression in
which it was created. A full expression is an expression that is
not a subexpression of some other expression.

The standard string class has a member function c_str() that
returns a C-style, zero-terminated array of characters (§3.5.1, §20.4.1). Also, the operator + is defined to mean string concatenation.
These are very useful facilities for strings . However, in combination they can cause obscure problems.
For example:

void f(string& s1, string& s2, string& s3)
{

const char* cs = (s1 + s2).c_str();
cout << cs ;
if (strlen(cs=(s2+s3).c_str())<8 && cs[0]==´a´) {
// cs used here
}

}

Probably, your first reaction is "but don’t do that," and I agree.
However, such code does get written, so it is worth knowing how it is
interpreted.

A temporary object of class string is created to hold s1 + s2 .
Next, a pointer to a C-style string is extracted from that object. Then
– at the end of the expression – the temporary object is deleted. Now,
where was the C-style string allocated? Probably as part of the
temporary object holding s1 + s2 , and that storage is not guaranteed
to exist after that temporary is destroyed. Consequently, cs points
to deallocated storage. The output operation cout << cs might work
as expected, but that would be sheer luck. A compiler can detect and
warn against many variants of this problem.

Curious behaviour of c_str() and strings when passed to class

As already noted, the problems in the posted code rise from dangling references to temporary objects, either stored as class members or returned and accessed by .c_str().

The first fix is to store actual std::strings as members, not (dangling) references and then write accessor functions returning const references to those:

#include <iostream>
#include <string>

class DataContainer {
public:
DataContainer(std::string name, std::string description)
: name_(std::move(name)), description_(std::move(description)) {}
auto getName() const -> std::string const& { return name_; }
auto getDescription() const -> std::string const& { return description_; }
private:
const std::string name_;
const std::string description_;
};

int main() {
auto dataContainer = DataContainer{"parameterName", "parameterDescription"};

std::cout << "name: " << dataContainer.getName().c_str() << std::endl;
std::cout << "description: " << dataContainer.getDescription().c_str() << std::endl;
return 0;
}

You can see here that the output is as expected (even when using intermediate local variables).



I use *.c_str() here as this is how I use it my actual codebase

Then consider adding a couple of accessors returning exactly that:

//...
auto Name() const { return name_.c_str(); }
auto Description() const { return description_.c_str(); }
//...
std::cout << "name: " << dataContainer.Name() << std::endl;
std::cout << "description: " << dataContainer.Description() << std::endl;

Why does string::c_str() return a const char* when strings are allocated dynamically?

In C and C++, const translates more or less to "read only".

So, when something returns a char const *, that doesn't necessarily mean the data it's pointing at is actually const--it just means that the pointer you're receiving only supports reading, not writing, the data it points at.

The string object itself may be able to modify that data--but (at least via the pointer you're receiving) you're not allowed to modify the data directly.

What actually is done when `string::c_str()` is invoked?

Since C++11, std::string::c_str() and std::string::data() are both required to return a pointer to the string's internal buffer. And since c_str() (but not data()) must be null-terminated, that effectively requires the internal buffer to always be null-terminated, though the null terminator is not counted by size()/length(), or returned by std::string iterators, etc.

Prior to C++11, the behavior of c_str() was technically implementation-specific, but most implementations I've ever seen worked this way, as it is the simplest and sanest way to implement it. C++11 just standardized the behavior that was already in wide use.

UPDATE

Since C++11, the buffer is always null-terminated, even for an empty string. However, that does not mean the buffer is required to be dynamically allocated when the string is empty. It could point to an SSO buffer, or even to a single static nul character. There is no guarantee that the pointer returned by c_str()/data() remains pointing at the same memory address as the content of the string changes.

std::string::substr() returns a new std::string with its own null-terminated buffer. The string being copied from is unaffected.

C++ String Literal Changing After Function Terminates

c++ is an old language that grew out of C, the result is that both the behavior and the terminology used to describe that behavior can be rather confusing.

A "string literal" is a sequence of characters in the source code surrounded by quotes. In most contexts it evaluates to a pointer to a null-terminated sequence of characters (a "C string"). Under normal circumstances* said sequence of characters will indeed remain valid for the entire lifetime of the progream.

The type string in your code on the other hand is probably referring to std::string (via using namespace std somewhere) which is a class representing an automatically managed string

When you do get_var("string literal"); or string my_literal = "string literal"; the "C string" is implicitly converted to a std::string. This operation creates a copy of the sequence of characters. Unlike the original sequence of characters this sequence of characters will be freed when the std::string that owns it is destroyed.

&*literal.begin is a somewhat unorthadox way to get a pointer to the sequence of characters owned by the std::string. using c_str would be more normal. That isn't relevant to your problem though. The important bit is the sequence of characters in memory is one owned by the std::string, not the original sequence from the string literal.

In the case of get_var("string literal"); the std::string is destroyed as soon as the statement completes. In the case of string my_literal = "string literal"; it is destroyed when the variable my_literal goes out of scope. Either way it is destroyed before foo() returns. So when you do std::cout << var.string_attribute; you are referencing a stale pointer for which the associated memory has already been freed.

The reason it works "sometimes" is that memory managers do not generally overwrite memory as soon as it is freed. Typically the memory is not actually overwritten until something re-uses it.

Edit: misread your question. It is possible for a use-after free to "work" sometimes but that is not what is going on here. The cout calls you say are working are at points in the code where the std::string is still alive.

* Excluding cases like unloading shared libraries at runtime that are beyond the scope of the C standard.



Related Topics



Leave a reply



Submit