Extending Temporary's Lifetime Through Rvalue Data-Member Works with Aggregate, But Not with Constructor, Why

Extending temporary's lifetime through rvalue data-member works with aggregate, but not with constructor, why?

TL;DR

Aggregate initialization can be used to extend the life-time of a temporary, a user-defined constructor cannot do the same since it's effectively a function call.

_{Note: Both T const& and T&& apply in the case of aggregate-initalization and extending the life of temporaries bound to them.}

What is an Aggregate?

struct S {                // (1)
  std::vector<int>&& vec;
};

To answer this question we will have to dive into the difference between initialization of an aggregate and initialization of a class type, but first we must establish what an aggregate is:

8.5.1p1 Aggregates [dcl.init.aggr]

An aggregate is an array or a class (Clause 9) with no user-provided constructors (12.1), no private or protected non-static data members (Clause 11), no base classes (Clause 10), and no virtual functions (10.3)

^{Note: The above means that (1) is an aggregate.}

How are Aggregates initialized?

The initialization between an aggregate and a "non-aggregate" differs greatly, here comes another section straight from the Standard:

8.5.1p2 Aggregates [dcl.init.aggr]

When an aggregate is initialized by an initializer list, as specified in 8.5.4, the elements of the initializer list are taken as initializers for the members of the aggregate, in increasing subscript or member order. Each member is copy-initialized from the corresponding initializer-clause.

The above quotation states that we are initializing the members of our aggregate with the initializers in the initializer-clause, there is no step in between.

struct A { std::string a; int b; };

A x { std::string {"abc"}, 2 };

Semantically the above is equivalent to initializing our members using the below, just that A::a and A::b in this case is only accessible through x.a and x.b.

std::string A::a { std::string {"abc"} };
int         A::b { 2 };

If we change the type of A::a to an rvalue-reference, or a const lvalue-reference, we will directly bind the temporary use for initialization to x.a.

The rules of rvalue-references, and const lvalue-references, says that the temporaries lifetime will be extended to that of the host, which is exactly what is going to happen.

How does initialization using a user-declared constructor differ?

struct S {                    // (2)
    std::vector<int>&& vec;
    S(std::vector<int>&& v)
        : vec{std::move(v)}   // bind to the temporary provided
    { }
};

A constructor is really nothing more than a fancy function, used to initialize a class instance. The same rules that apply to functions, apply to them.

When it comes to extending the life-time of temporaries there is no difference.

std::string&& func (std::string&& ref) {
  return std::move (ref);
}

A temporary passed to func will not have its life-time extended just because we have an argument declared as being a rvalue/lvalue-reference. Even if we return the "same" reference so that it's available outside of func, it just won't happen.

This is what happens in the constructor of (2), after all a constructor is just a "fancy function" used to initialize an object.

12.2p5 Temporary objects [class.temporary]

The temporary to which the reference is bound or the temporary that is the complete object of a subobject to which the reference is bound persists for the lifetime of the reference except:

A temporary bound to a reference member in a constructor's ctor-initializer (12.6.2) persists until the constructor exits.

A temporary bound to a reference parameter in a function call (5.2.2) persists until the completion of the full-expression containing the call.

The lifetime of a temporary bound to the returned value in a function return statement (6.6.3) is not extended; the temporary is destroyed at the end of the full-expression in the return statement.

A temporary bound to a reference in a new-initializer (5.3.4) persists until the completion of the full-expression containing the new-initializer.

_{Note: Do note that aggregate initialization through a new T { ... } differ from the previously mentioned rules.}

Extending temporary's lifetime, works with block-scoped aggregate, but not through `new`; why?

^{LONG STORY, SHORT

The compiler cannot extend the lifetime of the temporary involved in new A { "temporary " }, because the A created, and the temporary, has different storage durations.

A refence to what the Standard says can be found at the end of this post. The Standard explicitly says that the lifetime will not be extended, but it doesn't go into detail to why this is.

This post will try explain the reason in a way that is understandable for a broader audience, not only to the average language-lawyer.}

Introduction

In C++ there are several types of different storage durations an object can have, among them are automatic- and dynamic storage duration, explained briefly below:

Automatic storage duration

The storage for an object with automatic storage duration will persist until the block in which they are created exits.

Objects declared in block-scope has automatic storage duration (unless they are declared static or extern, but not register).
Temporaries are, by definition, declared at block-scope so they too have automatic storage duration.

Dynamic storage duration

The storage for an object with dynamic storage duration will persist until it is explicitly stated that it should be released; such storage is, in other words, not bound to any specific scope.

Objects created dynamically through operator new have, as hinted, dynamic storage duration.
The storage will be persist until a matching call to operator delete has been made.

Aggregate initialization with automatic storage duration

As stated in the previous section, a temporary has automated storage duration.

If we construct an aggregate with automatic storage duration, this too will have storage bound to the current scope; meaning that the lifetime of the temporary can easily be extended to match that of the aggregate.

_{Note: We can imagine them living in the same "box", and at the end of the scope we discard this box, which is fine; neither the temporary, nor the aggregate, will outlive the lifetime of the box.}

Our implementation (A)

struct A { std::string const& ref; };

void func () {
  A x { {"hello world"} };
}

Behind the scenes (A)

Since both x, and the temporary, have automatic storage duration, the compiler can implement the function as the following, semantically equivalent, snippet:

void  __func () {
  std::string __unnamed_temporary { "hello world" };
  A x { __unnamed_temporary };
}

_{Note: Both the temporary and the aggregate has their lifetime bound to the current scope, awesome!}

Aggregate initialization with dynamic storage duration

Our implementation (B)

A* gunc () {
  A *    ptr = new A { { "hello world" } };
  return ptr;
}

int main () {
  A * p = gunc ();

  std::cout << p->ref << std::endl; // DANGER, WILL ROBINSON!

  delete p;
}

In the previous sections it has been stated that temporaries have automatic storage duration, which means that our temporary, bound to A::ref, will be constructed on storage that resides in the current scope.

Behind the scene (B)

The semantically equivalence of gunc can look as the below implementation:

A* gunc () {
  A __unnamed_temporary { "hello world " };

  A * ptr = new A { __unnamed_temporary }; // (1)

  return ptr;
}

You are thinking it too, aren't you?

No longer can we extend the lifetime of our temporary to match that of the A created with dynamic storage duration, at (1).

The problem is that automatic storage for __unnamed_temporary will disappear as soon as we return from gunc, effectively killing our temporary.

The dynamically created A will however still be alive, leaving us with a dangling reference in main.

Conclusion

The compiler is unable to extend the lifetime of any temporaries involved when creating an object through a new-initializer because the newed object, and the temporaries, will have different storage duration.

What does the Standard (n3797) say?

12.2p5 Temporary objects [class.temporary]

The temporary to which the reference is bound or the temporary that is the complete object of a subobject to which the reference is bound persists for the lifetime of the reference except:

...

A temporary bound to a reference in a new-initializer (5.3.4) persists until the completion of the full-expression containing the new-initializer.

[ Note: This may introduce a dangling reference, and implementations are encouraged to issue a warning in such case. -- end note ]

Extending the lifetime of a temporary object without copying it

Why is the lifetime of object x not extended past the function call even if a const reference has been bound to it?

Technically, the lifetime of the object is extended past the function call. It is not however extended past the initialization of wrap. But that's a technicality.

Before we dive in, I'm going to impose a simplification: let's get rid of wrapper. Also, I'm removing the template part because it too is irrelevant:

const object &function(const object &arg)
{
  return arg;
}

This changes precisely nothing about the validity of your code.

Given this statement:

const object &obj = function(object{}); // Let's call that temporary object x

What you want is for the compiler to recognize that "object x" and obj refer to the same object, and therefore the temporary's lifetime should be extended.

That's not possible. The compiler isn't guaranteed to have enough information to know that. Why? Because the compiler may only know this:

const object &function(const object &arg);

See, it's the definition of function that associates arg with the return value. If the compiler doesn't have the definition of function, then it cannot know that the object being passed in is the reference being returned. Without that knowledge, it cannot know to extend x's lifetime.

Now, you might say that if function's definition is provided, then the compiler can know. Well, there are complicated chains of logic that might prevent the compiler from knowing at compile time. You might do this:

const object *minimum(const object &lhs, const object &rhs)
{
  return lhs < rhs ? lhs : rhs;
}

Well, that returns a reference to one of them, but which one will only be determined based on the runtime values of the object. Whose lifetime should be extended by the caller?

We also don't want the behavior of code to change based on whether the compiler only has a declaration or has a full definition. Either it's always OK to compile the code if it only has a declaration, or it's never OK to compile the code only with a declaration (as in the case of inline, constexpr, or template functions). A declaration may affect performance, but never behavior. And that's good.

Since the compiler may not have the information needed to recognize that a parameter const& lives beyond the lifetime of a function, and even if it has that information it may not be something that can be statically determined, the C++ standard does not permit an implementation to even try to solve the problem. Thus, every C++ user has to recognize that calling functions on temporaries if it returns a reference can cause problems. Even if the reference is hidden inside some other object.

What you want cannot be done. This is one of the reasons why you should not make an object non-moveable at all unless it is essential to its behavior or performance.

Is it possible to extend the life of an rvalue passed into a function?

Because the rvalues are bound to the parameters of the constructor firstly, and bounding them to the class members later does not further extend their lifetime.

You can define your LazyAddition class as an aggregate to avoid binding the rvalues to parameters of constructor:

struct LazyAddition
{
    const Number& lhs;
    const Number& rhs;
};

But be careful. After C++20, you should use list-initialization syntax to initialize such a LazyAddition aggregate, like LazyAddition { Number(3), Number(4) }, otherwise the lifetime will still not be extended.

Weird behaviour when holding std::ostream rvalue member

The temporary std::ofstream{"testic"} that you created only exists for the duration of the constructor call. After that it is destroyed and the file is closed, which means you are left with a reference that refers to garbage. Using that reference results in undefined behavior.

~~To fix it you can remove the reference all together (the && from std::ostream&& out and std::ostream&& o) and have it create a new object that is initialized from the temporary.~~

The above won't work because std::ostream cannot be moved. You will have to use a pointer instead if you want to maintain polymorphism. If that isn't important you can change all std::ostream&& to std::ofstream:

class A {
 private:
  std::unique_ptr<std::ostream> out;

 public:
  A(std::unique_ptr<std::ostream> o) : out(std::move(o)) {
    out->write("test", 4);
  }

  void writeTest2() {
    out->write("test2", 5);
    out->flush();
  }
};

int main() {
  A a{std::make_unique<std::ofstream>("testic")};
  a.writeTest2();
}

What determines when the lifetimes of temporaries get extended into const references or rvalue references?

From [class.temporary]:

There are two contexts in which temporaries are destroyed at a different point than the end of the fullexpression.
The first context is when a default constructor is called to initialize an element of an array [...]

The second context is when a reference is bound to a temporary. The temporary to which the reference is
bound or the temporary that is the complete object of a sub-object to which the reference is bound persists
for the lifetime of the reference except:

(5.1) — A temporary object bound to a reference parameter in a function call (5.2.2) persists until the completion
of the full-expression containing the call.

(5.2) — The lifetime of a temporary bound to the returned value in a function return statement (6.6.3) is not
extended; the temporary is destroyed at the end of the full-expression in the return statement.

(5.3) — A temporary bound to a reference in a new-initializer (5.3.4) persists until the completion of the
full-expression containing the new-initializer.

So two things. First, get_mhurg is undefined behavior. The lifetime of the temporary you're returning is not extended. Second, the temporary passed into id lasts until the end of the full-expression containing the function call, but no further. As with get_mhurg, the temporary is not through-extended. So that would also be undefined behavior.

Aggregate reference member and temporary lifetime

The lifetime of temporary objects bound to references is extended, unless there's a specific exception. That is, if there is no such exception, then the lifetime will be extended.

From a fairly recent draft, N4567:

The second context [where the lifetime is extended] is when a
reference is bound to a temporary. The temporary to which the
reference is bound or the temporary that is the complete object of a
subobject to which the reference is bound persists for the lifetime of
the reference except:
(5.1) A temporary object bound to a reference parameter in a function call (5.2.2) persists until the completion of the
full-expression containing the call.
(5.2) The lifetime of a temporary bound to the returned value in a function return statement (6.6.3) is not extended; the temporary is
destroyed at the end of the full-expression in the return statement.
(5.3) A temporary bound to a reference in a new-initializer (5.3.4) persists until the completion of the full-expression
containing the new-initializer.

The only significant change to C++11 is, as the OP mentioned, that in C++11 there was an additional exception for data members of reference types (from N3337):

A temporary bound to a reference member in a constructor’s ctor-initializer (12.6.2) persists until the constructor exits.

This was removed in CWG 1696 (post-C++14), and binding temporary objects to reference data members via the mem-initializer is now ill-formed.

Regarding the examples in the OP:

struct S
{
    const std::string& str_;
};

S a{"foo"}; // direct-initialization

This creates a temporary std::string and initializes the str_ data member with it. The S a{"foo"} uses aggregate-initialization, so no mem-initializer is involved. None of the exceptions for lifetime extensions apply, therefore the lifetime of that temporary is extended to the lifetime of the reference data member str_.

auto b = S{"bar"}; // copy-initialization with rvalue

Prior to mandatory copy elision with C++17:
Formally, we create a temporary std::string, initialize a temporary S by binding the temporary std::string to the str_ reference member. Then, we move that temporary S into b. This will "copy" the reference, which will not extend the lifetime of the std::string temporary.
However, implementations will elide the move from the temporary S to b. This must not affect the lifetime of the temporary std::string though. You can observe this in the following program:

#include <iostream>

#define PRINT_FUNC() { std::cout << __PRETTY_FUNCTION__ << "\n"; }

struct loud
{
    loud() PRINT_FUNC()
    loud(loud const&) PRINT_FUNC()
    loud(loud&&) PRINT_FUNC()
    ~loud() PRINT_FUNC()
};

struct aggr
{
    loud const& l;
    ~aggr() PRINT_FUNC()
};

int main() {
    auto x = aggr{loud{}};
    std::cout << "end of main\n";
    (void)x;
}

Live demo

Note that the destructor of loud is called before the "end of main", whereas x lives until after that trace. Formally, the temporary loud is destroyed at the end of the full-expression which created it.

The behaviour does not change if the move constructor of aggr is user-defined.

With mandatory copy-elision in C++17: We identify the object on the rhs S{"bar"} with the object on the lhs b. This causes the lifetime of the temporary to be extended to the lifetime of b. See CWG 1697.

For the remaining two examples, the move constructor - if called - simply copies the reference. The move constructor (of S) can be elided, of course, but this is not observable since it only copies the reference.

Binding rvalue ref to string literal in constructor vs construct in-place

Option 1 will copy when you have a movable string and construct and move from a literal.

Option 2 will move when you have a movable string and construct and move when you have a literal.

Option 3 is the worst as it will always copy.

As you can see Option 2 <= 1 <= 3.

Also consider

Constructor("string_literal"s)

This is a std::string literal. So Option 2 will just move that right in.

Note: The compiler can optimize copies away as well in many cases.

Extending Temporary's Lifetime Through Rvalue Data-Member Works with Aggregate, But Not with Constructor, Why