Why Does Using a Temporary Object in the Range-Based for Initializer Result in a Crash

Why does using a temporary object in the range-based for initializer result in a crash?

The range initialization line of a for(:) loop does not extend lifetime of anything but the final temporary (if any). Any other temporaries are discarded prior to the for(:) loop executing.

Now, do not despair; there is an easy fix to this problem. But first a walk through of what is going wrong.

The code for(auto x:exp){ /* code */ } expands to, basically:

{
  auto&& __range=exp;
  auto __it=std::begin(__range);
  auto __end=std::end(__range);
  for(; __it!=__end;++__it){
    auto x=*__it;
    /* code */
  }
}

(With a modest lies on the __it and __end lines, and all variables starting with __ have no visible name. Also I am showing C++17 version, because I believe in a better world, and the differences do not matter here.)

Your exp creates a temporary object, then returns a reference to within it. The temporary dies after that line, so you have a dangling reference in the rest of the code.

Fixing it is relatively easy. To fix it:

std::string const& func() const& // notice &
{
    return m.find("key")->second;
}
std::string func() && // notice &&
{
    return std::move(m.find("key")->second);
}

do rvalue overloads and return moved-into values by value when consuming temporaries instead of returning references into them.

Then the

auto&& __range=exp;

line does reference lifetime extension on the by-value returned string, and no more dangling references.

As a general rule, never return a range by reference to a parameter that could be an rvalue.

Appendix: Wait, && and const& after methods? rvalue references to *this?

C++11 added rvalue references. But the this or self parameter to functions is special. To select which overload of a method based on the rvalue/lvalue-ness of the object being invoked, you can use & or && after the end of the method.

This works much like the type of a parameter to a function. && after the method states that the method should be called only on non-const rvalues; const& means it should be called for constant lvalues. Things that don't exactly match follow the usual precidence rules.

When you have a method that returns a reference into an object, make sure you catch temporaries with a && overload and either don't return a reference in those cases (return a value), or =delete the method.

Range-based for loop on a temporary range

Note that using a temporary as the range expression directly is fine, its lefetime will be extended. But for f()[5], what f() returns is the temporary and it's constructed within the expression, and it'll be destroyed after the whole expression where it's constructed.

From C++20, you can use init-statement for range-based for loop to solve such problems.

(emphasis mine)

If range_expression returns a temporary, its lifetime is extended
until the end of the loop, as indicated by binding to the rvalue
reference __range, but beware that the lifetime of any temporary
within range_expression is not extended.

This problem may be worked around using init-statement:
for (auto& x : foo().items()) { /* .. */ } // undefined behavior if foo() returns by value
for (T thing = foo(); auto& x : thing.items()) { /* ... */ } // OK

e.g.

for(auto thing = f(); auto e : thing[5])
    std::cout << e << std::endl;

R-value reference to a temporary object does not segfault

If you build the program with address sanitizer, you get:

SUMMARY: AddressSanitizer: heap-use-after-free

At this line:

uto &&x = S().func();

Memory issues may not crash your program immediately, but they may crash a long-running program at other functions or other threads, lead to very weird crash scenarios.

Why is a type of range based for loop over brace-init list illegal c++

It does not work because it isn't supposed to work. That's the design of the feature. That feature being list initialization, which as the name suggests is about initializing something.

When C++11 introduced initializer_list, it was done for precisely one purpose: to allow the system to generate an array of values from a braced-init-list and pass them to a properly-tagged constructor (possibly indirectly) so that the object could initialize itself with that sequence of values. The "proper tag" in this case being that the constructor took the newly-minted std::initializer_list type as its first/only parameter. That's the purpose of initializer_list as a type.

Initialization, broadly speaking, should not modify the values it is given. The fact that the array backing the list is a temporary also doubles-down on the idea that the input values should logically be non-modifiable. If you have:

std::vector<int> v = {1, 2, 3, 4, 5};

We want to give the compiler the freedom to make that array of 5 elements a static array in the compiled binary, rather than a stack array that bloats the stack size of this function. More to the point, we logically want to think of the braced-init-list like a literal.

And we don't allow literals to be modified.

Your attempt to make {a, b, c, d} into a range of modifiable references is essentially trying to take a construct that already exists for one purpose and turn it into a construct for a different purpose. You're not initializing anything; you're just using a seemingly convenient syntax that happens to make iterable lists of values. But that syntax is called a "braced-init-list", and it generates an initializer list; the terminology is not an accident.

If you take a tool intended for X and try to hijack it do Y, you're likely to encounter rough edges somewhere down the road. So the reason why it doesn't work is that this it's not meant to; these are not what braced-init-lists and initializer_lists are for.

You might next say "if that's the case, why does for(auto i: {1, 2, 3, 4, 5}) work at all, if braced-init-lists are only for initialization?"

Once upon a time, it didn't; in C++11, that would be il-formed. It was only in C++14 when auto l = {1, 2, 3, 4}; became legal syntax; an auto variable was allowed to automatically deduce a braced-init-list as an initializer_list.

Range-based for uses auto deduction for the range type, so it inherited this ability. This naturally led people to believe that braced-init-lists are about building ranges of values, not initializing things.

In short, someone's convenience feature led you to believe that a construct meant to initialize an objects is just a quick way to create an array. It never was.

Having established that braced-init-lists aren't supposed to do the thing you want them to do, what would it take to make them do what you want?

Well, there are basically two ways to do it: small scale and large scale. The large-scale version would be to change how auto i = {a, b, c, d}; works, so that it could (based on something) create a modifiable range of references to expressions. Range-based for would just use its existing auto-deduction machinery to pick up on it.

This is of course a non-starter. That definition already has a well-defined meaning: it creates a non-modifiable list of copies of those expressions, not references to their results. Changing it would be a breaking change.

A small-scale change would be to hack range-based for to do some fancy deduction based on whether the range expression is a braced-init-list or not. Now, because such ranges and their iterators are buried by the compiler-generated code for range-based for, you won't have as many backwards compatibility problems. So you could do make a rule where, if your range-statement defines a non-const reference variable, and the range-expression is a braced-init-list, then you do some different deduction mechanisms.

The biggest problem here is that it's a complete and total hack. If it's useful to do for(auto &i : {a, b, c d}), then it's probably useful to be able to get the same kind of range outside of a range-based for loop. As it currently stands, the rules about auto-deduction of braced-init-lists are consistent everywhere. Range-based for gains its capabilities simply because it uses auto deduction.

The last thing C++ needs is more inconsistency.

This is doubly important in light of C++20 adding an init-statement to range-for. These two things ought to be equivalent:

for(auto &i : {a, b, c, d})
for(auto &&rng = {a, b, c, d}; auto &i: rng)

But if you change the rules only based on the range-expression and range-statement, they wouldn't be. rng would be deduced according to existing auto rules, thus making the auto &i non-functional. And having a super-special-case rule that changes how the init-statement of a range-for behaves, different from the same statement in other locations, would be even more inconsistent.

Besides, it's not too difficult to write a generic reference_range function that takes non-const variadic reference parameters (of the same type) and returns some kind of random-access range over them. That will work everywhere equally and without compatibility problems.

So let's just avoid trying to make a syntactic construct intended for initializing objects into a generic "list of stuff" tool.

What's the lifetime of temporary objects in a range-for?

The current standard says in The range-based for statement [stmt.ranged] that

The range-based for statement

for ( init-statementopt for-range-declaration : for-range-initializer ) statement

is equivalent to

{
    init-statementopt
    auto &&__range = for-range-initializer ;
    auto __begin = begin-expr ;
    auto __end = end-expr ;
    for ( ; __begin != __end; ++__begin ) {
      for-range-declaration = *__begin;
      statement
    }
}

This means that your Foo().words() is only used in the assignment auto &&__range = Foo().words(); and that the temporary object not lives until the code reaches the for loop.

Please note that I copied from the latest C++20 draft. In C++11 the code is a bit different, but the relevant part is the same.

Is this unsafe usage of a braced initializer list in a range-based for loop?

I'll answer in two parts: A meta-answer and specific one.

The general (and more relevant) answer: Don't use `initializer_list`'s unless you have to.

Is this unsafe usage of a braced initializer list?

Of course it is. Generally, you shouldn't construct initializer lists yourself. It's an ugly piece of detail on the seam between the language and the standard library. You should ignore its very existence unless you absolutely have to work with it, and even then - do so because some other code made you, don't put yourself into holes from which you need to be a language lawyer to climb out. The definition of std::initializer_list has a lot of fine print full of weasel words like "proxy", "provides access to" (as opposed to "is"), "may be implemented as" (but also may not) and "is not guaranteed to exist after". Ugh.

More specifically, you're using a ranged-based for loop. Whatever made you want to explicitly instantiate an std::initializer_list? What's wrong with the following:

struct Test { int x; };

int main() {
    std::unique_ptr<Test> a(new Test);
    std::unique_ptr<Test> b(new Test);
    std::unique_ptr<Test> c(new Test);
    int id = 0;
    for( auto t : {a.get(), b.get(), c.get()} )
         t->x = id++;
}

? That's the way your example should be written.

Now, you might be asking: "But what's the type of that brace-enclosed initializer list?"

Well, to that I would say:

What's it to you? The intent is clear, and there is no reason to believe any segfault should occur here. If this code had not worked, you could (and should) have complained straight to the standard committee about how the language was brain-dead.
I don't know. Or I rather, I make it a point not to know unless I really have to.
(Ok, just between you and me - it's std::initializer_list<Test *> &&, but really, forget that fact.)

About your specific final version of the code

Your loop,

for(auto t : std::initializer_list<Test*>({a.get(), b.get(), c.get()})) { t->x = id++; }

is equivalent to the following code:

auto && range = std::initializer_list<Test*>({a.get(), b.get(), c.get()});
for (auto begin = std::begin(range), end = range.end(); begin != end; ++begin) {
    auto t = *begin;
    t->x = id++;
}

The initializer list definition says:

The underlying array is not guaranteed to exist after the lifetime of the original initializer list object has ended.

I am not a language lawyer, but my guess is as follows: It seems like you're initializing the outer initializer_list with an inner one, and thus not getting a guaranteed lifetime extension of the inner one beyond the constructor of the outer initializer_list. It is therefore up to the compiler's whim whether to keep it alive or not, and it's entirely possible that different versions of different compilers behave differently in this respect.

Using Auto based Ranged for loop vs using pair based Ranged for loop

Well, here's the thing. Your two loops don't use the same type of element.

The map iterator returns a reference to std::pair<const std::string, int>¹. And your first loop constructs a temporary object out of that (since your std::string is not const qualified). So you aren't printing the address of the pair inside the map in the first loop.

It's not impossible the compiler cleverly uses the same storage to construct the temporary object at every iteration, so you always see the same address printed.

_{¹ You aren't allowed to modify the key via iterator, remember. It will mess up the map invariants.}

Why does const std::pairK,V& in range-based for loop on std::map not work?

std::map<K, V>::value_type is std::pair<const K, V>, not std::pair<K, V> (see cppreference)

#include <map>
#include <string>
#include <iostream>

int main(void)
{
    std::map<std::string, int> my_map {{"foo", 42}};
    for(const auto& entry : my_map)
        std::cout << entry.first << ' ' << entry.second << ' ' << &(entry.second) << std::endl;
    for(const std::pair<std::string, int>& entry : my_map)
        std::cout << entry.first << ' ' << entry.second << ' ' << &(entry.second) << std::endl;
    for(const std::pair<const std::string, int>& entry : my_map)
        std::cout << entry.first << ' ' << entry.second << ' ' << &(entry.second) << std::endl;
    return 0;
}

Example output:

foo 42 0x2065eb0
foo 42 0x7ffc2d536070
foo 42 0x2065eb0

Your second loop works because it's creating a temporary std::pair<std::string, int> and binding it to your reference (explanation). You can see it fail if you try to use a non-const reference instead (since it can't bind to a temporary):

error: invalid initialization of reference of type 'std::pair<std::__cxx11::basic_string<char>, int>&' from expression of type 'std::pair<const std::__cxx11::basic_string<char>, int>'

Why Does Using a Temporary Object in the Range-Based for Initializer Result in a Crash