Is the Ranged Based for Loop Beneficial to Performance

Is the ranged based for loop beneficial to performance?

The Standard is your friend, see [stmt.ranged]/1

For a range-based for statement of the form
for ( for-range-declaration : expression ) statement
let range-init be equivalent to the expression surrounded by parentheses
( expression )
and for a range-based for statement of the form
for ( for-range-declaration : braced-init-list ) statement
let range-init be equivalent to the braced-init-list. In each case, a range-based for statement is equivalent to
{
  auto && __range = range-init;
  for ( auto __begin = begin-expr,
             __end = end-expr;
        __begin != __end;
        ++__begin )
  {
    for-range-declaration = *__begin;
    statement
  }
}

So yes, the Standard guarantees that the best possible form is achieved.

And for a number of containers, such as vector, it is undefined behavior to modify (insert/erase) them during this iteration.

C++ for loop and range-based loop performance

Here's a crude test. I'm not saying this is a definitive answer as to which is faster, but it seems to me in this particular case, the gcc compiler is able to optimize both loops to roughly the same performance level. You can definitely improve on the testing method, if you wish.

On my system (Ubuntu 14.04, some sort of i7, 8 GB DDR3, gcc):

Without optimization (g++ main.cpp -std=c++11):

Old-fashioned loop: 5.45131 seconds.

Range-based loop: 9.90306 seconds.

With optimization (g++ main.cpp -O3 -std=c++11):

Old-fashioned loop: 0.469001 seconds.

Range-based loop: 0.467045 seconds.

#include <iostream>
#include <vector>
#include <time.h>
using namespace std;

double time_elapsed(timespec& start, timespec& end)
{
    return ((1e9 * end.tv_sec + end.tv_nsec) - 
            (1e9 * start.tv_sec + start.tv_nsec)) / 1.0e9;
}

int main()
{
    vector<int> v(1e9, 42);

    timespec start, end;

//  Old-fashioned loop.
    clock_gettime(CLOCK_MONOTONIC_RAW, &start);
    size_t size = v.size();
    for (size_t i = 0; i < size; i++)
    {
        v[i] *= v[i];
    }
    clock_gettime(CLOCK_MONOTONIC_RAW, &end);

    cout << "Old-fashioned loop: " << time_elapsed(start, end) << " seconds\n";

//  Range-based loop.
    clock_gettime(CLOCK_MONOTONIC_RAW, &start); 
    for (int& val : v)
    {
        val *= val;
    }
    clock_gettime(CLOCK_MONOTONIC_RAW, &end);

    cout << "Range-based loop: " << time_elapsed(start, end) << " seconds.\n";
}

Is there any advantage of using a range for loop rather than an iterator?

Iterators predate range-based for loops, so they used to be the only of these two alternatives available. Nowadays, range-based for has mostly replaced iterators when dealing with a simple loop.

However, iterators can be used in various other contexts as well. For example, you can do std::sort(v.begin(), std::next(v.begin(), 5)) to sort the first five elements of a vector while leaving the rest of it alone.

Going back to iterating over a whole container:

If you can accomplish what you want with a range-based for, then it leads to more legible code, so they are preferable by default.
If you need iterators for some reason, such as using an algorithm that requires them, or because you need to jump ahead or back while iterating, then use those instead.

Also: In the later case, you can/should still use auto when declaring the iterator:

for(auto it = just_a_vec.begin(); it < just_a_vec.end(); it++) {
}

Edit: as asked: here's a simple, if a bit contrived, example where an iterator-based loop can still be useful:

// adds all values in the vector, but skips over twos values when encountering a 0
// e.g.: {1,2,0,4,5,2} => 5
int my_weird_accum(const std::vector<int>& data) {
  int result = 0; 
 
  for(auto it = data.begin(); it != data.end(); ++it) {
    auto v = *it;
    result += v;

    if(v == 0) {
      // skip over the next two
      assert(std::distance(it, data.end()) > 2);
      std::advance(it, 2);
    }
  }
  return 0;
}

Range-based for loop C++11 from optimization side

There are no fundamental reasons why a ranged-based for loop would be slower than a manual loop. The code that a ranged-based for loop is defined to be identical to is a relatively optimal loop.

In your case, each loop iteration logically calls size(). If at any point in the loop the compiler becomes unable to prove that size() cannot change in your loop, the compiler must actually call size() (which typically involves subtracting two pointers) and check it. This could cost some performance in your manual loop case.

In the for(:) loop it could cost correctness, in that if the vector being looped over has its begin or end iterator (or current iterator) invalidated during the loop, the loop continues to loop over the old iterators and undefined behavior can result.

So the two loops are not (in general) equivalent.

In the cases where the looped-over range is unchanging, the for(:) loop will perform as well, or better, than most hand-crafted loops that just iterate over elements, because most hand-crafted loops don't make a copy of end() and compare against that copy.

In this particular case, the overhead of IO (and to a lesser extent, formatting) will massively overwhelm any loop overhead.

Performance of range-based for in C++

Yep, that's correct. Accidentally copying into a ranged for loop variable is a particularly common problem when one is using auto:

for (auto it : x)
     sum += it.size();

That's inefficient since, even when using auto to automatically set the type and even though the iteration is over a set of vector&s, it ends up having type vector. (The solution there would be auto&, or even better, auto const&.)

Incidentally, the main performance sink here would not just be the copying of elements from inside x to your temporary it, but the allocation and deallocation of the memory used by it.

Should a range for loop be used instead of iterators on a vector?

From the performance point of view there isn't really a difference. As Bjarne Stroustrup writes in his book the C++ Programming language 4th edition:

The simplest loop is a range- for -statement; it simply gives the programmer access to each element
of a range.

As a fan of the KISS principle I tend to prefer simpler constructs over more complex ones. However, it really boils down to what you want to achieve. From the same book Bjarne reveals why a range-for loop is designed to be simple:

Note that a range- for loop is a deliberately simple construct. For
example, using it you can’t touch two elements at the same time and
can’t effectively traverse two ranges simultaneously. For that we need
a general for-statement.

Consequently, there are contexts that you can't use a range-for loop and you have to reside to the classical for-statement.

Bottom line use range-for loop when ever possible because is simpler and more readable.

Does the range-based 'for' loop deprecate many simple algorithms?

The first version

std::generate(numbers.begin(), numbers.end(), rand);

tells us that you want to generate a sequence of values.

In the second version the reader will have to figure that out himself.

Saving on typing is usually suboptimal, as it is most often lost in reading time. Most code is read a lot more than it is typed.

What is the advantage of using forwarding references in range-based for loops?

The only advantage I can see is when the sequence iterator returns a proxy reference and you need to operate on that reference in a non-const way. For example consider:

#include <vector>

int main()
{
    std::vector<bool> v(10);
    for (auto& e : v)
        e = true;
}

This doesn't compile because rvalue vector<bool>::reference returned from the iterator won't bind to a non-const lvalue reference. But this will work:

#include <vector>

int main()
{
    std::vector<bool> v(10);
    for (auto&& e : v)
        e = true;
}

All that being said, I wouldn't code this way unless you knew you needed to satisfy such a use case. I.e. I wouldn't do this gratuitously because it does cause people to wonder what you're up to. And if I did do it, it wouldn't hurt to include a comment as to why:

#include <vector>

int main()
{
    std::vector<bool> v(10);
    // using auto&& so that I can handle the rvalue reference
    //   returned for the vector<bool> case
    for (auto&& e : v)
        e = true;
}

Edit

This last case of mine should really be a template to make sense. If you know the loop is always handling a proxy reference, then auto would work as well as auto&&. But when the loop was sometimes handling non-proxy references and sometimes proxy-references, then I think auto&& would become the solution of choice.

Is the Ranged Based for Loop Beneficial to Performance