Are Loops Really Faster in Reverse

Are loops really faster in reverse?

It's not that i-- is faster than i++. Actually, they're both equally fast.

What takes time in ascending loops is evaluating, for each i, the size of your array. In this loop:

for(var i = array.length; i--;)

You evaluate .length only once, when you declare i, whereas for this loop

for(var i = 1; i <= array.length; i++)

you evaluate .length each time you increment i, when you check if i <= array.length.

In most cases you shouldn't even worry about this kind of optimization.

Are for-loops in Java faster when run backwards?

There does not seem to be any performance benefit in Java, in general. (Also, the assertion that the JDK code does reverse looping a lot seems to be unfounded.)

See all the linked similar/duplicate questions, such as Why does decrement loop runs faster than increment loop?

There are also some articles online benchmarking the performance of different loop types, based on different collection types.
Like https://www.trinea.cn/android/arraylist-linkedlist-loop-performance-en/

Which is faster for reverse iteration, for or while loops?

TL;DR: Use the for loop.

Both should be equally fast. We can check the compiler's ability to peel away the layers of abstraction involved in the for loop quite simply:

#[inline(never)]
fn blackhole() {}

#[inline(never)]
fn with_for(n: usize) {
    for i in (0..n).rev() { blackhole(); }
}

#[inline(never)]
fn with_while(n: usize) {
    let mut i = n;
    while i > 0 {
        blackhole();
        i -= 1;
    }
}

This generates this LLVM IR:

; Function Attrs: noinline nounwind readnone uwtable
define internal void @_ZN8with_for20h645c385965fcce1fhaaE(i64) unnamed_addr #0 {
entry-block:
  ret void
}

; Function Attrs: noinline nounwind readnone uwtable
define internal void @_ZN10with_while20hc09c3331764a9434yaaE(i64) unnamed_addr #0 {
entry-block:
  ret void
}

Even if you are not versed in LLVM, it is obvious that both functions compiled down to the same IR (and thus obviously to the same assembly).

Since their performance is the same, one should prefer the more explicit for loop and reserve the while loop to cases where the iteration is irregular.

EDIT: to address starblue's concern of unfitness.

#[link(name = "snappy")]
extern {
    fn blackhole(i: libc::c_int) -> libc::c_int;
}

#[inline(never)]
fn with_for(n: i32) {
    for i in (0..n).rev() { unsafe { blackhole(i as libc::c_int); } }
}

#[inline(never)]
fn with_while(n: i32) {
    let mut i = n;
    while i > 0 {
        unsafe { blackhole(i as libc::c_int); }
        i -= 1;
    }
}

compiles down to:

; Function Attrs: noinline nounwind uwtable
define internal void @_ZN8with_for20h7cf06f33e247fa35maaE(i32) unnamed_addr #1 {
entry-block:
  %1 = icmp sgt i32 %0, 0
  br i1 %1, label %match_case.preheader, label %clean_ast_95_

match_case.preheader:                             ; preds = %entry-block
  br label %match_case

match_case:                                       ; preds = %match_case.preheader, %match_case
  %.in = phi i32 [ %2, %match_case ], [ %0, %match_case.preheader ]
  %2 = add i32 %.in, -1
  %3 = tail call i32 @blackhole(i32 %2)
  %4 = icmp sgt i32 %2, 0
  br i1 %4, label %match_case, label %clean_ast_95_.loopexit

clean_ast_95_.loopexit:                           ; preds = %match_case
  br label %clean_ast_95_

clean_ast_95_:                                    ; preds = %clean_ast_95_.loopexit, %entry-block
  ret void
}

; Function Attrs: noinline nounwind uwtable
define internal void @_ZN10with_while20hee8edd624cfe9293IaaE(i32) unnamed_addr #1 {
entry-block:
  %1 = icmp sgt i32 %0, 0
  br i1 %1, label %while_body.preheader, label %while_exit

while_body.preheader:                             ; preds = %entry-block
  br label %while_body

while_exit.loopexit:                              ; preds = %while_body
  br label %while_exit

while_exit:                                       ; preds = %while_exit.loopexit, %entry-block
  ret void

while_body:                                       ; preds = %while_body.preheader, %while_body
  %i.05 = phi i32 [ %3, %while_body ], [ %0, %while_body.preheader ]
  %2 = tail call i32 @blackhole(i32 %i.05)
  %3 = add nsw i32 %i.05, -1
  %4 = icmp sgt i32 %i.05, 1
  br i1 %4, label %while_body, label %while_exit.loopexit
}

The core loops are:

; -- for loop
match_case:                                       ; preds = %match_case.preheader, %match_case
  %.in = phi i32 [ %2, %match_case ], [ %0, %match_case.preheader ]
  %2 = add i32 %.in, -1
  %3 = tail call i32 @blackhole(i32 %2)
  %4 = icmp sgt i32 %2, 0
  br i1 %4, label %match_case, label %clean_ast_95_.loopexit

; -- while loop
while_body:                                       ; preds = %while_body.preheader, %while_body
  %i.05 = phi i32 [ %3, %while_body ], [ %0, %while_body.preheader ]
  %2 = tail call i32 @blackhole(i32 %i.05)
  %3 = add nsw i32 %i.05, -1
  %4 = icmp sgt i32 %i.05, 1
  br i1 %4, label %while_body, label %while_exit.loopexit

And the only difference is that:

for decrements before calling blackhole, while decrements after
for compares against 0, while compares against 1

otherwise, it's the same core loop.

for loop vs for loop in reverse: performance

Yes, something has changed since the article was released. Firefox has gone from version 3 to version 38 for one thing. Mostly when a new version of a browser is released, the performance of several things has changed.

If you try that code in different versions of different browsers on different systems, you will see that you will get quite a difference in performance. Different browsers are optimised for different Javascript code.

As performance differs, and you can't rely on any measurements to be useful for very long, there are basically two principles that you can follow if you need to optimise Javascript:

Use the simplest and most common code for each task; that is the code that browser vendors will try to optimise the most.
Don't look for the best performance in a specific browser, look for the worst performance in any brower. Test the code in different browsers, and pick a method that doesn't give remarkably bad performance in any of them.

Why is using a loop to iterate from start of array to end faster than iterating both start to end and end to start?

The answer is pretty obvious:

More operations take more time.

When judging the speed of code, you look at how many operations it will perform. Just step through and count them. Every instruction will take one or more CPU cycles, and the more there are the longer it will take to run. That different instructions take a different amount of cycles mostly does not matter - while an array lookup might be more costly than integer arithmetic, both of them basically take constant time and if there are too many, it dominates the cost of our algorithm.

In your example, there are few different types of operations that you might want to count individually:

comparisons
increments/decrements
array lookup
conditional jumps

(we could be more granular, such as counting variable fetch and store operations, but those hardly matter - everything is in registers anyway - and their number basically is linear to the others).

Now both of your codes iterate about 50 times - they element on which they break the loop is in the middle of the array. Ignoring off-by-a-few errors, those are the counts:

               |  forwards  | forwards and backwards
---------------+------------+------------------------
>=/===/<       |       100  |                   200
++/--          |        50  |                   100
a[b]           |        50  |                   100
&&/||/if/for   |       100  |                   200

Given that, it's not unexpected that doing twice the works takes considerably longer.

I'll also answer a few questions from your comments:

Is additional time needed for the second object lookup?

Yes, every individual lookup counts. It's not like they could be performed at once, or optimised into a single lookup (imaginable if they had looked up the same index).

Should there be two separate loops for each start to end and end to start?

Doesn't matter for the number of operations, just for their order.

Or, put differently still, what is the fastest approach to find an element in an array?

There is no "fastest" regarding the order, if you don't know where the element is (and they are evenly distributed) you have to try every index. Any order - even random ones - would work the same. Notice however that your code is strictly worse, as it looks at each index twice when the element is not found - it does not stop in the middle.

But still, there are a few different approaches at micro-optimising such a loop - check these benchmarks.

let is (still?) slower than var, see Why is using `let` inside a `for` loop so slow on Chrome? and Why is let slower than var in a for loop in nodejs?. This tear-up and tear-down (about 50 times) of the loop body scope in fact does dominate your runtime - that's why your inefficient code isn't completely twice as slow.
comparing against 0 is marginally faster than comparing against the length, which puts looping backwards at an advantage. See Why is iterating through an array backwards faster then forwards, JavaScript loop performance - Why is to decrement the iterator toward 0 faster than incrementing and Are loops really faster in reverse?
in general, see What's the fastest way to loop through an array in JavaScript?: it changes from engine update to engine update. Don't do anything weird, write idiomatic code, that's what will get optimised better.

Under what circumstances is a reverse for loop faster in Flash and why?

You don't have to recalculate the length of an array or Vector if you iterate backwards.

for(var i:int = list.length; i > 0; i--)
// -------------^^^^^^^^^^^ Length is calculated once for the start value.

Versus:

for(var i:int = 0; i < list.length; i++)
// --------------------^^^^^^^^^^^ Length is calculated for each iteration.

FYI, the difference is speed is negligible. I personally use whichever is more readable to me (the former).

That aside - if you are iterating over a collection where all objects are of the same type, you should be using a for each loop. This is much faster, more readable and more logical than either of the above. The reason that it is faster is because no type conversion is required at each iteration - the type is set when you define the loop:

var strings:Array = ["a","b","c"];
for each(var i:String in strings)
{
    trace(i);
}

Why this loop backwards is much slower than going forward?

I found the bottle-neck myself.

when I did this

for (i = string.length; i > 1; i--) {

I accidentaly deleted the "var" from var i, so I've made i global.
After fixing it I got the expected results.

for (var i = string.length; i > 1; i--) {

I never though that this may be a HUGE difference, so pay attention guys.

Fixed Benckmark test here

Before:

Before

After:

After

PS: for practical use, do NOT use this functions, the indexOf version is much faster.

What's the fastest way to loop through an array in JavaScript?

After performing this test with most modern browsers:
https://jsben.ch/wY5fo

Currently, the fastest form of loop (and in my opinion the most syntactically obvious).

A standard for-loop with length caching

    var i = 0, len = myArray.length;
    while (i < len) {
        // your code
        i++
    }

I would say, this is definitely a case where I applaud JavaScript engine developers. A runtime should be optimized for clarity, not cleverness.

Are Loops Really Faster in Reverse