Std::String::C_Str() and Temporaries

std::string::c_str() and temporaries

The pointer returned by std::string::c_str() points to memory
maintained by the string object. It remains valid until a non-const
function is called on the string object, or the string object is
destructed. The string object you're concerned about is a temporary.
It will be destructed at the end of the full expression, not before and
not after. In your case, the end of the full expression is after the
call to consumer, so your code is safe. It wouldn't be if consumer
saved the pointer somewhere, with the idea of using it later.

The lifetime of temporaries has been strictly defined since C++98.
Before that, it varied, depending on the compiler, and the code you've
written wouldn't have worked with g++ (pre 1995, roughly—g++
changed this almost immediately when the standards committee voted it).
(There wasn't an std::string then either, but the same issues affect
any user written string class.)

Usage of string::c_str on temporary string

The temporary will be destroyed at the end of the expression, namely the ; semicolon. So you are safe.

§ 12.2 ... Temporary objects are destroyed as the last step in
evaluating the full-expression (1.9) that (lexically) contains the
point where they were created. This is true even if that evaluation
ends in throwing an exception.

Why does calling std::string.c_str() on a function that returns a string not work?

getString() would return a copy of str (getString() returns by value);

It's right.

thus, the copy of str would stay "alive" in main() until main() returns.

No, the returned copy is a temporary std::string, which will be destroyed at the end of the statement in which it was created, i.e. before std::cout << cStr << std::endl;. Then cStr becomes dangled, dereference on it leads to UB, anything is possible.

You can copy the returned temporary to a named variable, or bind it to a const lvalue-reference or rvalue-reference (the lifetime of the temporary will be extended until the reference goes out of scope). Such as:

std::string s1 = getString();    // s1 will be copy initialized from the temporary
const char* cStr1 = s1.c_str();
std::cout << cStr1 << std::endl; // safe

const std::string& s2 = getString(); // lifetime of temporary will be extended when bound to a const lvalue-reference
const char* cStr2 = s2.c_str();
std::cout << cStr2 << std::endl; // safe

std::string&& s3 = getString(); // similar with above
const char* cStr3 = s3.c_str();
std::cout << cStr3 << std::endl; // safe

Or use the pointer before the temporary gets destroyed. e.g.

std::cout << getString().c_str() << std::endl;  // temporary gets destroyed after the full expression

Here is an explanation from [The.C++.Programming.Language.Special.Edition] 10.4.10 Temporary Objects [class.temp]]:

Unless bound to a reference or used to initialize a named object, a
temporary object is destroyed at the end of the full expression in
which it was created. A full expression is an expression that is
not a subexpression of some other expression.

The standard string class has a member function c_str() that
returns a C-style, zero-terminated array of characters (§3.5.1, §20.4.1). Also, the operator + is defined to mean string concatenation.
These are very useful facilities for strings . However, in combination they can cause obscure problems.
For example:

void f(string& s1, string& s2, string& s3)
{

const char* cs = (s1 + s2).c_str();
cout << cs ;
if (strlen(cs=(s2+s3).c_str())<8 && cs[0]==´a´) {
// cs used here
}

}

Probably, your first reaction is "but don’t do that," and I agree.
However, such code does get written, so it is worth knowing how it is
interpreted.

A temporary object of class string is created to hold s1 + s2 .
Next, a pointer to a C-style string is extracted from that object. Then
– at the end of the expression – the temporary object is deleted. Now,
where was the C-style string allocated? Probably as part of the
temporary object holding s1 + s2 , and that storage is not guaranteed
to exist after that temporary is destroyed. Consequently, cs points
to deallocated storage. The output operation cout << cs might work
as expected, but that would be sheer luck. A compiler can detect and
warn against many variants of this problem.

Curious behaviour of c_str() and strings when passed to class

As already noted, the problems in the posted code rise from dangling references to temporary objects, either stored as class members or returned and accessed by .c_str().

The first fix is to store actual std::strings as members, not (dangling) references and then write accessor functions returning const references to those:

#include <iostream>
#include <string>

class DataContainer {
public:
DataContainer(std::string name, std::string description)
: name_(std::move(name)), description_(std::move(description)) {}
auto getName() const -> std::string const& { return name_; }
auto getDescription() const -> std::string const& { return description_; }
private:
const std::string name_;
const std::string description_;
};

int main() {
auto dataContainer = DataContainer{"parameterName", "parameterDescription"};

std::cout << "name: " << dataContainer.getName().c_str() << std::endl;
std::cout << "description: " << dataContainer.getDescription().c_str() << std::endl;
return 0;
}

You can see here that the output is as expected (even when using intermediate local variables).



I use *.c_str() here as this is how I use it my actual codebase

Then consider adding a couple of accessors returning exactly that:

//...
auto Name() const { return name_.c_str(); }
auto Description() const { return description_.c_str(); }
//...
std::cout << "name: " << dataContainer.Name() << std::endl;
std::cout << "description: " << dataContainer.Description() << std::endl;

Is it safe to call c_str() directly on returned std::string from a function?

It is safe in the given example as the lifetime of the returned string ends at the end of the statement containing the function call.

In general I recommend reading https://en.cppreference.com/w/cpp/language/lifetime

Can I safely get a c_str out of std::stringstream in c++?

Yes that's fine, and ss.str().c_str() crops up quite often when working with C-style APIs.

Informally speaking, the anonymous temporary std::string returned from ss.str() survives the puts() function call. This means that the const char* pointer returned by c_str() remains valid for the duration of puts().

What you can't do is:

const char* ub = ss.str().c_str();
puts(ub);

as the pointer ub is dangling.

Keep temporary std::string and return c_str() to prevent memory leaks

In C++ you cannot simply ignore object lifetimes. You cannot talk to an interface while ignoring object lifetimes.

If you think you are ignoring object lifetimes, you almost certainly have a bug.

Your interface ignores the lifetime of the returned buffer. It lasts "long enough" -- "until someone calls me again". That is a vague guarantee that will lead to really bad bugs.

Ownership should be clear. One way to make ownership clear is to use a C-style interface. Another is to use a C++ library types, and require your clients to match your library version. Another is to use custom smart objects, and guarantee their stability over versions.

These all have downsides. C-style interfaces are annoying. Forcing the same C++ library on your clients is annoying. Having custom smart objects is code duplication, and forces your clients to use whatever string classes you wrote, not whatever they want to use, or well written std ones.

A final way is to type erase, and guarantee the stability of the type erasure.

Let us look at that option. We type erase down to assigning-to a std like container. This means we forget the type of the thing we erase, but we remember how to assign-to it.

namespace container_writer {
using std::begin; using std::end;
template<class C, class It, class...LowPriority>
void append( C& c, It b, It e, LowPriority&&... ) {
c.insert( end(c), b, e );
}

template<class C, class...LowPriority>
void clear(C& c, LowPriority&&...) {
c = {};
}
template<class T>
struct sink {
using append_f = void(*)(void*, T const* b, T const* e);
using clear_f = void(*)(void*);
void* ptr = nullptr;
append_f append_to = nullptr;
clear_f clear_it = nullptr;

template<class C,
std::enable_if_t< !std::is_same<std::decay_t<C>, sink>{}, int> =0
>
sink( C&& c ):
ptr(std::addressof(c)),
append_to([](void* ptr, T const* b, T const* e){
auto* pc = static_cast< std::decay_t<C>* >(ptr);
append( *pc, b, e );
}),
clear_it([](void* ptr){
auto* pc = static_cast< std::decay_t<C>* >(ptr);
clear(*pc);
})
{}
sink(sink&&)=default;
sink(sink const&)=delete;
sink()=default;

void set( T const* b, T const* e ) {
clear_it(ptr);
append_to(ptr, b, e);
}
explicit operator bool()const{return ptr;}
template<class Traits>
sink& operator=(std::basic_string<T, Traits> const& str) {
set( str.data(), str.data()+str.size() );
return *this;
}
template<class A>
sink& operator=(std::vector<T, A> const& str) {
set( str.data(), str.data()+str.size() );
return *this;
}
};
}

Now, container_writer::sink<T> is a pretty darn DLL-safe class. Its state is 3 C-style pointers. While it is a template, it is also standard layout, and standard layout basically means "has a layout like a C struct would".

A C struct that contains 3 pointers is ABI safe.

Your code takes a container_writer::sink<char>, and inside your DLL you can assign a std::string or a std::vector<char> to it. (extending it to support more ways to assign to it is easy).

The DLL-calling code sees the container_writer::sink<char> interface, and on the client side converts a passed std::string to it. This creates some function pointers on the client side that know how to resize and insert stuff into a std::string.

These function pointers (and a void*) pass over the DLL boundary. On the DLL side, they are blindly called.

No allocated memory passes from the DLL side to the client side, or vice versa. Despite that, every bit of data has well defined lifetime associated with an object (RAII style). There is no messy lifetime issues, because the client controls the lifetime of the buffer being written to, while the server writes to it with an automatically written callback.

If you have a non-std style container and you want to support container_sink it is easy. Add append and clear free functions to the namespace of your type, and have them do the required action. container_sink will automatically find them and use them to fill your container.

As an example, you can use CStringA like this:

void append( CStringA& str, char const* b, char const* e) {
str += CStringA( b, e-b );
}
void clear( CStringA& str ) {
str = CStringA{};
}

and magically CStringA is now a valid argument for something taking a container_writer::sink<char>.

The use of append is there just in case you need fancier construction of the container. You could write a container_writer::sink method that eats non-contiguous buffers by having it feed the stored container fixed sized chunks at a time; it does a clear, then repeated appends.

live example

Now, this doesn't let you return the value from a function.

To get that to work, first do the above. Expose functions that return their strings through container_writer::sink<char> over the DLL barrier.

Make them private. Or mark them as not-to-be-called. Whatever.

Next, write inline public functions that call those functions, and return the filled std::string. These are pure header file constructs, so the code lives in the DLL client.

So we get:

class SomeClass
{
private:
void Name(container_writer::container_sink<char>);
public:
// in header file exposed from DLL:
// (block any kind of symbol export of this!)
std::string Name() {
std::string r;
Name(r);
return r;
}
};

void SomeClass::Name(container_writer::container_sink<char> s)
{
std::string tempStr = "My name is: " +
_rawName + ". Your name is: " + GetOtherName();
s = tempStr;
}

and done. The DLL interface acts C++, but is actually just passing 3 raw C pointers through. All resources are owned at all times.

Passing pointer to a temporary std::string to another thread

Or am I running into some kind of undefined behavior?

That is exactly what happens. The string that was returned from getString only lives until the end of the epxresion

std::thread T(printString, getString().c_str());

That means that in printString you have a pointer to data that is no longer valid. You might get what it contained, or something else can happen. Accessing the pointer is undefined behavior so any result you get is "correct".


If you change getString to

const char * getString() {
return "hello world";
}

and the create the thread like

std::thread T(printString, getString());

Then this would be okay since "hello world" has static storage duration so it will live for the rest of the life of the program



Related Topics



Leave a reply



Submit