C++11: the Range-Based for Statement: "Range-Init" Lifetime

C++11: The range-based for statement: range-init lifetime?

Is this reasoning correct? If not, why not?

It is correct up until this point:

And so the temporary return value of boo() is destroyed at the end of the statement "auto&&r=boo()" [...]

Binding a temporary to a reference extends its lifetime to be that of the reference. So the temporary lasts for the whole loop (that's also why there is an extra set of {} around the whole construct: to correctly limit the lifetime of that temporary).

This is according to paragraph 5 of §12.2 of the C++ standard:

The second context is when a reference is bound to a temporary. The
temporary to which the reference is bound or the temporary that is the
complete object of a subobject to which the reference is bound
persists for the lifetime of the reference except:

[various exceptions that don't apply here]

This is an interesting property that allows abusing the ranged-for loop for non-rangey things: http://ideone.com/QAXNf

Is it safe to use a C++11 range-based for-loop with an rvalue range-init?

Yes, it's perfectly safe.

From [class.temporary]/4-5:

There are two contexts in which temporaries are destroyed at a different point than the end of the fullexpression. The first context is when a default constructor is called [...]
The second context is when a reference is bound to a temporary. The temporary to which the reference is
bound or the temporary that is the complete object of a subobject to which the reference is bound persists
for the lifetime of the reference except:
A temporary bound to a reference member in a constructor’s ctor-initializer [...]
A temporary bound to a reference parameter in a function call [...]
The lifetime of a temporary bound to the returned value in a function return statement [...]
A temporary bound to a reference in a new-initializer [...]

None of those exceptions apply. The temporary thus persists for the lifetime of the reference, __range, which is the entire loop.

Range-based for loop on a temporary range

Note that using a temporary as the range expression directly is fine, its lefetime will be extended. But for f()[5], what f() returns is the temporary and it's constructed within the expression, and it'll be destroyed after the whole expression where it's constructed.

From C++20, you can use init-statement for range-based for loop to solve such problems.

(emphasis mine)

If range_expression returns a temporary, its lifetime is extended
until the end of the loop, as indicated by binding to the rvalue
reference __range, but beware that the lifetime of any temporary
within range_expression is not extended.

This problem may be worked around using init-statement:
for (auto& x : foo().items()) { /* .. */ } // undefined behavior if foo() returns by value
for (T thing = foo(); auto& x : thing.items()) { /* ... */ } // OK

e.g.

for(auto thing = f(); auto e : thing[5])
    std::cout << e << std::endl;

What's the lifetime of temporary objects in a range-for?

The current standard says in The range-based for statement [stmt.ranged] that

The range-based for statement

for ( init-statementopt for-range-declaration : for-range-initializer ) statement

is equivalent to

{
    init-statementopt
    auto &&__range = for-range-initializer ;
    auto __begin = begin-expr ;
    auto __end = end-expr ;
    for ( ; __begin != __end; ++__begin ) {
      for-range-declaration = *__begin;
      statement
    }
}

This means that your Foo().words() is only used in the assignment auto &&__range = Foo().words(); and that the temporary object not lives until the code reaches the for loop.

Please note that I copied from the latest C++20 draft. In C++11 the code is a bit different, but the relevant part is the same.

A temporary object in range-based for-loop

This is an obscure form of temporary lifetime extension. Normally you have to bind the temporary directly to the reference for it to work (e.g. for (auto x : foo())), but according to cppreference, this effect propagates through:

parentheses ( ) (grouping, not a function call),
array access [ ] (not overloaded; must use an array and not a pointer),
member access ., .*,
ternary operator ? :,
comma operator , (not overloaded),
any cast that doesn't involve a "user-defined conversion" (presumably uses no constructors nor conversion operators)

I.e. if a.b is bound to a reference, the lifetime of a is extended.

temporary lifetime in range-for expression

Lifetime extension only occurs when binding directly to references outside of a constructor.

Reference lifetime extension within a constructor would be technically challenging for compilers to implement.

If you want reference lifetime extension, you will be forced to make a copy of it. The usual way is:

struct wrap {
  wrap(A&& a) : a(std::move(a))
  {} 

  const char* begin() const { return a.begin(); }
  const char* end() const { return a.end(); }

  A a;
};

In many contexts, wrap is itself a template:

template<class A>
struct wrap {
  wrap(A&& a) : a(std::forward<A>(a))
  {} 

  const char* begin() const { return a.begin(); }
  const char* end() const { return a.end(); }

  A a;
};

and if A is a Foo& or a Foo const&, references are stored. If it is a Foo, then a copy is made.

An example of such a pattern in use would be if wrap where called backwards, and it returned iterators that where reverse iterators constructed from A. Then temporary ranges would be copied into backwards, while non-temporary objects would be just viewed.

In theory, a language that allowed you to markup parameters to functions and constructors are "dependent sources" whose lifetime should be extended as long as the object/return value would be interesting. This probably is tricky. As an example, imagine new wrap( A{"works"} ) -- the automatic storage temporary now has to last as long as the free store wrap!

temporary object in range-based for

Since you're using C++11, you could use initialization list instead. This will pass valgrind:

int main() {
  for (auto i : QList<int>{1, 2, 3})
    std::cout << i << std::endl;
  return 0;
}

The problem is not totally related to range-based for or even C++11. The following code demonstrates the same problem:

QList<int>& things = QList<int>() << 1;
things.end();

or:

#include <iostream>

struct S {
    int* x;

    S() { x = NULL; }
    ~S() { delete x; }

    S& foo(int y) {
        x = new int(y);
        return *this;
    }
};

int main() {
    S& things = S().foo(2);
    std::cout << *things.x << std::endl;
    return 0;
}

The invalid read is because the temporary object from the expression S() (or QList<int>{}) is destructed after the declaration (following C++03 and C++11 §12.2/5), because the compiler has no idea that the method foo() (or operator<<) will return that temporary object. So you are now refering to content of freed memory.

Why does using a temporary object in the range-based for initializer result in a crash?

The range initialization line of a for(:) loop does not extend lifetime of anything but the final temporary (if any). Any other temporaries are discarded prior to the for(:) loop executing.

Now, do not despair; there is an easy fix to this problem. But first a walk through of what is going wrong.

The code for(auto x:exp){ /* code */ } expands to, basically:

{
  auto&& __range=exp;
  auto __it=std::begin(__range);
  auto __end=std::end(__range);
  for(; __it!=__end;++__it){
    auto x=*__it;
    /* code */
  }
}

(With a modest lies on the __it and __end lines, and all variables starting with __ have no visible name. Also I am showing C++17 version, because I believe in a better world, and the differences do not matter here.)

Your exp creates a temporary object, then returns a reference to within it. The temporary dies after that line, so you have a dangling reference in the rest of the code.

Fixing it is relatively easy. To fix it:

std::string const& func() const& // notice &
{
    return m.find("key")->second;
}
std::string func() && // notice &&
{
    return std::move(m.find("key")->second);
}

do rvalue overloads and return moved-into values by value when consuming temporaries instead of returning references into them.

Then the

auto&& __range=exp;

line does reference lifetime extension on the by-value returned string, and no more dangling references.

As a general rule, never return a range by reference to a parameter that could be an rvalue.

Appendix: Wait, && and const& after methods? rvalue references to *this?

C++11 added rvalue references. But the this or self parameter to functions is special. To select which overload of a method based on the rvalue/lvalue-ness of the object being invoked, you can use & or && after the end of the method.

This works much like the type of a parameter to a function. && after the method states that the method should be called only on non-const rvalues; const& means it should be called for constant lvalues. Things that don't exactly match follow the usual precidence rules.

When you have a method that returns a reference into an object, make sure you catch temporaries with a && overload and either don't return a reference in those cases (return a value), or =delete the method.

range based loop C++11 for range(L,R)

Specialization

template< class Iterator>
struct range_impl<Iterator, false>
{
    range_impl(Iterator first, Iterator last)
    : first(first), last(last)
    {}

    constexpr Iterator begin()const noexcept { return { first }; }
    constexpr Iterator end  ()const noexcept { return { last  }; }

    Iterator first;
    Iterator last ;
};

Test:

int main(){
     for(auto e : range(0,10) ) std::cout << e <<  ' ';
     std::cout << std::endl;
     const char* a[] = { "Say", "hello", "to", "the", "world" };
     for(auto e : range(a, a + 5) ) std::cout << e <<  ' ';
     std::cout << std::endl;
}

Why is a type of range based for loop over brace-init list illegal c++

It does not work because it isn't supposed to work. That's the design of the feature. That feature being list initialization, which as the name suggests is about initializing something.

When C++11 introduced initializer_list, it was done for precisely one purpose: to allow the system to generate an array of values from a braced-init-list and pass them to a properly-tagged constructor (possibly indirectly) so that the object could initialize itself with that sequence of values. The "proper tag" in this case being that the constructor took the newly-minted std::initializer_list type as its first/only parameter. That's the purpose of initializer_list as a type.

Initialization, broadly speaking, should not modify the values it is given. The fact that the array backing the list is a temporary also doubles-down on the idea that the input values should logically be non-modifiable. If you have:

std::vector<int> v = {1, 2, 3, 4, 5};

We want to give the compiler the freedom to make that array of 5 elements a static array in the compiled binary, rather than a stack array that bloats the stack size of this function. More to the point, we logically want to think of the braced-init-list like a literal.

And we don't allow literals to be modified.

Your attempt to make {a, b, c, d} into a range of modifiable references is essentially trying to take a construct that already exists for one purpose and turn it into a construct for a different purpose. You're not initializing anything; you're just using a seemingly convenient syntax that happens to make iterable lists of values. But that syntax is called a "braced-init-list", and it generates an initializer list; the terminology is not an accident.

If you take a tool intended for X and try to hijack it do Y, you're likely to encounter rough edges somewhere down the road. So the reason why it doesn't work is that this it's not meant to; these are not what braced-init-lists and initializer_lists are for.

You might next say "if that's the case, why does for(auto i: {1, 2, 3, 4, 5}) work at all, if braced-init-lists are only for initialization?"

Once upon a time, it didn't; in C++11, that would be il-formed. It was only in C++14 when auto l = {1, 2, 3, 4}; became legal syntax; an auto variable was allowed to automatically deduce a braced-init-list as an initializer_list.

Range-based for uses auto deduction for the range type, so it inherited this ability. This naturally led people to believe that braced-init-lists are about building ranges of values, not initializing things.

In short, someone's convenience feature led you to believe that a construct meant to initialize an objects is just a quick way to create an array. It never was.

Having established that braced-init-lists aren't supposed to do the thing you want them to do, what would it take to make them do what you want?

Well, there are basically two ways to do it: small scale and large scale. The large-scale version would be to change how auto i = {a, b, c, d}; works, so that it could (based on something) create a modifiable range of references to expressions. Range-based for would just use its existing auto-deduction machinery to pick up on it.

This is of course a non-starter. That definition already has a well-defined meaning: it creates a non-modifiable list of copies of those expressions, not references to their results. Changing it would be a breaking change.

A small-scale change would be to hack range-based for to do some fancy deduction based on whether the range expression is a braced-init-list or not. Now, because such ranges and their iterators are buried by the compiler-generated code for range-based for, you won't have as many backwards compatibility problems. So you could do make a rule where, if your range-statement defines a non-const reference variable, and the range-expression is a braced-init-list, then you do some different deduction mechanisms.

The biggest problem here is that it's a complete and total hack. If it's useful to do for(auto &i : {a, b, c d}), then it's probably useful to be able to get the same kind of range outside of a range-based for loop. As it currently stands, the rules about auto-deduction of braced-init-lists are consistent everywhere. Range-based for gains its capabilities simply because it uses auto deduction.

The last thing C++ needs is more inconsistency.

This is doubly important in light of C++20 adding an init-statement to range-for. These two things ought to be equivalent:

for(auto &i : {a, b, c, d})
for(auto &&rng = {a, b, c, d}; auto &i: rng)

But if you change the rules only based on the range-expression and range-statement, they wouldn't be. rng would be deduced according to existing auto rules, thus making the auto &i non-functional. And having a super-special-case rule that changes how the init-statement of a range-for behaves, different from the same statement in other locations, would be even more inconsistent.

Besides, it's not too difficult to write a generic reference_range function that takes non-const variadic reference parameters (of the same type) and returns some kind of random-access range over them. That will work everywhere equally and without compatibility problems.

So let's just avoid trying to make a syntactic construct intended for initializing objects into a generic "list of stuff" tool.

C++11: the Range-Based for Statement: "Range-Init" Lifetime