Why Is Looping Over Range() in Python Faster Than Using a While Loop

Why is looping over range() in Python faster than using a while loop?

see the disassembly of python byte code, you may get a more concrete idea

use while loop:

1           0 LOAD_CONST               0 (0)
            3 STORE_NAME               0 (i)

2           6 SETUP_LOOP              28 (to 37)
      >>    9 LOAD_NAME                0 (i)              # <-
           12 LOAD_CONST               1 (100000000)      # <-
           15 COMPARE_OP               0 (<)              # <-
           18 JUMP_IF_FALSE           14 (to 35)          # <-
           21 POP_TOP                                     # <-

3          22 LOAD_NAME                0 (i)              # <-
           25 LOAD_CONST               2 (1)              # <-
           28 INPLACE_ADD                                 # <-
           29 STORE_NAME               0 (i)              # <-
           32 JUMP_ABSOLUTE            9                  # <-
      >>   35 POP_TOP
           36 POP_BLOCK

The loop body has 10 op

use range:

1           0 SETUP_LOOP              23 (to 26)
            3 LOAD_NAME                0 (range)
            6 LOAD_CONST               0 (0)
            9 LOAD_CONST               1 (100000000)
           12 CALL_FUNCTION            2
           15 GET_ITER
      >>   16 FOR_ITER                 6 (to 25)        # <-
           19 STORE_NAME               1 (n)            # <-

2          22 JUMP_ABSOLUTE           16                # <-
      >>   25 POP_BLOCK
      >>   26 LOAD_CONST               2 (None)
           29 RETURN_VALUE

The loop body has 3 op

The time to run C code is much shorter than intepretor and can be ignored.

Why is while loop much faster than range in this case?

Since you are using Python 2+ ( your code needs to use integer division to work in Python 3+ ) you are running into the fact that Python 2+ range generates a list of all elements and then iterates over them.

This would explain the differences in time that it takes for the while and range functions to run.

Incedentally the code for Python 3+ needs the following change:

def isPrimeRange(n):
    for i in range(2,n//2+1): # integer division
        if(n%i == 0):
            return i
    return n

This Python Blog Post explains in great detail the differences between range (returns a list) and xrange (returns an iterator) in Python 2+ and how Python 3+ has changed this functionality.

A pick of the most relevent paragraph from that source is here:

When you're using an iterator, every loop of the for statement produces the next number on the fly. Whereas the original range() function produced all numbers instantaneously, before the for loop started executing. The problem with the original range() function was that it used a very large amount of memory when producing a lot of numbers. However it tends to be quicker with a small amount of numbers. Note that in Python 3.x, you can still produce a list by passing the generator returned to the list() function.

While loop 1000 times faster than for loop?

You are creating 10k range() objects. These take some time to materialise. You then have to create iterator objects for those 10k objects too (for the for loop to iterate over the values). Next, the for loop uses the iterator protocol by calling the __next__ method on the resulting iterator. Those latter two steps also apply to the for loop over a list.

But most of all, you are cheating on the while loop test. The while loop only has to run once, because you never reset i back to 0 (thanks to Jim Fasarakis Hilliard pointing that out). You are in effect running a while loop through a total of 19999 comparisons; the first test runs 10k comparisons, the remaining 9999 tests run one comparison. And that comparison is fast:

>>> import timeit
>>> timeit.timeit('while i<10000: True; i+=1',setup='i=0', number=10000)
0.0008302750065922737
>>> (
...     timeit.timeit('while i<10000: True; i+=1', setup='i=0', number=1) +
...     timeit.timeit('10000 < 10000', number=9999)
... )
0.0008467709994874895

See how close those numbers are?

My machine is a little faster, so lets create a baseline to compare against; this is using 3.6.1 on a Macbook Pro (Retina, 15-inch, Mid 2015) running on OS X 10.12.5. And lets also fix the while loop to set i = 0 in the test, not the setup (which is run just once):

>>> import timeit
>>> timeit.timeit('for i in range(10000): pass', number=10000)
1.9789885189966299
>>> timeit.timeit('i=0\nwhile i<10000: True; i+=1', number=10000)
5.172155902953818

Oops, so a correctly running while is actually slower here, there goes your premise (and mine!).

I used pass to avoid having to answer question about how fast referencing that object is (it's fast but besides the point). My timings are going to be 6x faster than your machine.

If you wanted to explore why the iteration is faster, you could time the various components of the for loop in Python, starting with creating the range() object:

>>> timeit.timeit('range(10000)', number=10000)
0.0036197409499436617

So creating 10000 range() objects takes more time than running a single while loop that iterates 10k times. range() objects are more expensive to create than integers.

This does involves a global name lookup, which is slower, you could make it faster by using setup='_range = range' then use _range(1000); this shaves of about 1/3rd of the timings.

Next, create an iterator for this; here I'll use a local name for the iter() function, as the for loop doesn't have to do a hash-table lookup and just reaches for the C function instead. Hard-coded references to a memory location in a binary is a lot faster, of course:

>>> timeit.timeit('_iter(r)', setup='_iter = iter; r = range(10000)', number=10000)
0.0009729859884828329

Fairly fast, but; it takes the same amount of time as your single while loop iterating 10k times. So creating iterable objects is cheap. The C implementation is faster still. We haven't iterated yet.

Last, we call __next__ on the iterator object, 10k times. This is again done in C code, with cached references to internal C implementations, but with a functools.partial() object we can at least attempt to get a ball-park figure:

>>> timeit.timeit('n()', setup='from functools import partial; i = iter(range(10000)); n = partial(i.__next__)', number=10000) * 10000
7.759470026940107

Boy, 10k times 10k calls to iter(range(1000)).__next__ takes almost 4x more time than the for loop managed; this goes to show how efficient the actual C implementation really is.

However, it does illustrate that looping in C code is a lot faster, and this is why the while loop is actually slower when executed correctly; summing integers and making boolean comparisons in bytecode takes more time than iterating over range() in C code (where the CPU does the incrementing and comparisons directly in CPU registers):

>>> (
...     timeit.timeit('9999 + 1', number=10000 ** 2) +
...     timeit.timeit('9999 < 10000', number=10000 ** 2)
... )    
3.695550534990616

It is those operations that make the while loop about 3 seconds slower.

TLDR: You didn't actually test a while loop correctly. I should have noticed this earlier too.

Why is a loop on list slice faster than range()?

As far I as know, the slice operation will create a new list with the copy of the reference of the each item in the slice

Indeed, the standard implementation which is the CPython interpreter does make a copy.

In my imagination, it should be expensive than the no_slice way which doesn't have any copy operation. But from the test result, the slice way even is faster than the pure loop on a range(). Why does this happen?

The point is copying a list is much faster than iterating over it because the loops of the interpreter are pretty slow compared to the slicing operation optimized in C (using the same interpreter).

On my machine, with nums containing 5000 random items, slicing nums takes about 8 µs while iterating over nums takes about 33 µs. Iterating over a sliced nums takes 41 µs. Iterating over a range object (with the same number of items) takes 68 µs.

The fact that iterating over a range object is significantly slower than other methods is quite surprising at first glance. You may expect it to be faster than iterating over nums or at least as fast. It is more expensive because the loop iterate over a generator (more specifically a range_iterator). Iterating over such a generator object is pretty slow on CPython. This is mainly due to the increment of pure-Python integers and their comparisons for each loop iteration. Pure-Python integers are quite expensive because they can be bigger than native integers (CPython needs to consider the case where the range can be very large although it never happens in practice). Managing pure-Python objects also introduce an additional overhead too (allocation, deallocation, reference counting, indirections, etc.). There is no need to use such integers when CPython internally iterates over lists because the size of the list should always fit in a native integer (typically about 2**64 on a 64-bit machine).

To conclude, with_slice measures the time of slicing nums and then iterating over a list object. pure_loop measures the time taken to iterate over a range object. no_slice measures the time of iterating over a range object and for each item accessing nums to put the value in a local variable. This last operation is much more expensive than the two others because the nums access is interpreted for each iteration while the interpreter can iterate over the list faster in the first function. It also include the cost of the second function which is already pretty big. What is missing is a comparison between an iteration of nums[:] versus nums.

Note that the cost of with_slice is closer to pure_loop when size is big because the list is too big to fit in the fast L1 cache and need to be copied from the LLC cache or typically from the slow main RAM.

What is faster in Python, while or for xrange

I am sure the while version is slower. Python will have to lookup the add operation for the integer object on each turn of the loop etc, it is not pure C just because it looks like it!

And if you want a pythonic version of exactly the above, use:

print " ".join(str(i) for i in xrange(10))

Edit: My timings look like this. This is just a silly running loop without printing, just to show you what writing out "i += 1" etc costs in Python.

$ python -mtimeit "i=0" "while i < 1000: i+=1"
1000 loops, best of 3: 303 usec per loop
$ python -mtimeit "for i in xrange(1000): pass"
10000 loops, best of 3: 120 usec per loop

Which loop is faster, while or for?

That clearly depends on the particular implementation of the interpreter/compiler of the specific language.

That said, theoretically, any sane implementation is likely to be able to implement one in terms of the other if it was faster so the difference should be negligible at most.

_{Of course, I assumed while and for behave as they do in C and similar languages. You could create a language with completely different semantics for while and for}

Why Is Looping Over Range() in Python Faster Than Using a While Loop