Are List-Comprehensions and Functional Functions Faster Than "For Loops"

Are list-comprehensions and functional functions faster than for loops?

The following are rough guidelines and educated guesses based on experience. You should timeit or profile your concrete use case to get hard numbers, and those numbers may occasionally disagree with the below.

A list comprehension is usually a tiny bit faster than the precisely equivalent for loop (that actually builds a list), most likely because it doesn't have to look up the list and its append method on every iteration. However, a list comprehension still does a bytecode-level loop:

>>> dis.dis(<the code object for `[x for x in range(10)]`>)
1 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE

Using a list comprehension in place of a loop that doesn't build a list, nonsensically accumulating a list of meaningless values and then throwing the list away, is often slower because of the overhead of creating and extending the list. List comprehensions aren't magic that is inherently faster than a good old loop.

As for functional list processing functions: While these are written in C and probably outperform equivalent functions written in Python, they are not necessarily the fastest option. Some speed up is expected if the function is written in C too. But most cases using a lambda (or other Python function), the overhead of repeatedly setting up Python stack frames etc. eats up any savings. Simply doing the same work in-line, without function calls (e.g. a list comprehension instead of map or filter) is often slightly faster.

Suppose that in a game that I'm developing I need to draw complex and huge maps using for loops. This question would be definitely relevant, for if a list-comprehension, for example, is indeed faster, it would be a much better option in order to avoid lags (Despite the visual complexity of the code).

Chances are, if code like this isn't already fast enough when written in good non-"optimized" Python, no amount of Python level micro optimization is going to make it fast enough and you should start thinking about dropping to C. While extensive micro optimizations can often speed up Python code considerably, there is a low (in absolute terms) limit to this. Moreover, even before you hit that ceiling, it becomes simply more cost efficient (15% speedup vs. 300% speed up with the same effort) to bite the bullet and write some C.

Why is a list comprehension so much faster than appending to a list?

List comprehension is basically just a "syntactic sugar" for the regular for loop. In this case the reason that it performs better is because it doesn't need to load the append attribute of the list and call it as a function at each iteration. In other words and in general, list comprehensions perform faster because suspending and resuming a function's frame, or multiple functions in other cases, is slower than creating a list on demand.

Consider the following examples :

In [1]: def f1(): 
...: l = []
...: for i in range(5):
...: l.append(i)
...:
...:
...: def f2():
...: [i for i in range(5)]
...:

In [3]: import dis

In [4]: dis.dis(f1)
2 0 BUILD_LIST 0
2 STORE_FAST 0 (l)

3 4 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (5)
8 CALL_FUNCTION 1
10 GET_ITER
>> 12 FOR_ITER 14 (to 28)
14 STORE_FAST 1 (i)

4 16 LOAD_FAST 0 (l)
18 LOAD_METHOD 1 (append)
20 LOAD_FAST 1 (i)
22 CALL_METHOD 1
24 POP_TOP
26 JUMP_ABSOLUTE 12
>> 28 LOAD_CONST 0 (None)
30 RETURN_VALUE

In [5]:

In [5]: dis.dis(f2)
8 0 LOAD_CONST 1 (<code object <listcomp> at 0x7f397abc0d40, file "<ipython-input-1-45c11e415ee9>", line 8>)
2 LOAD_CONST 2 ('f2.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_GLOBAL 0 (range)
8 LOAD_CONST 3 (5)
10 CALL_FUNCTION 1
12 GET_ITER
14 CALL_FUNCTION 1
16 POP_TOP
18 LOAD_CONST 0 (None)
20 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x7f397abc0d40, file "<ipython-input-1-45c11e415ee9>", line 8>:
8 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 8 (to 14)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LIST_APPEND 2
12 JUMP_ABSOLUTE 4
>> 14 RETURN_VALUE

In [6]:

You can see that on offset 18 in the first function we have an append attribute while there's no such thing in second function using list comprehension. All those extra bytecodes will make the appending approach slower and since in this case you'll have loading of the append attribute in each iteration, in the end it will make the code to take approximately twice as slower as the second function using only list comprehension.

Speed/efficiency comparison for loop vs list comprehension vs other methods

Don't be too quick to write off the humble for loop. If you don't actually need a list, like in this case, a standard for loop can be faster than using a list comprehension. And of course it has less memory overheads.

Here's a program to perform timing tests; it can be easily modified to add more tests.

#!/usr/bin/env python

''' Time various implementations of string diff function

From http://stackoverflow.com/q/28581218/4014959

Written by PM 2Ring 2015.02.18
'''

from itertools import imap, izip, starmap
from operator import ne

from timeit import Timer
from random import random, seed

def h_dist0(s1,s2):
''' For loop '''
tot = 0
for d1, d2 in zip(s1, s2):
if d1 != d2:
tot += 1
return tot

def h_dist1(s1,s2):
''' List comprehension '''
return sum([dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2)])

def h_dist2(s1,s2):
''' Generator expression '''
return sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))

def h_dist3(s1,s2):
''' Generator expression with if '''
return sum(1 for dgt1, dgt2 in zip(s1, s2) if dgt1 != dgt2)

def h_dist3a(s1,s2):
''' Generator expression with izip '''
return sum(1 for dgt1, dgt2 in izip(s1, s2) if dgt1 != dgt2)

def h_dist4(s1,s2):
''' imap '''
return sum(imap(ne, s1, s2))

def h_dist5(s1,s2):
''' starmap '''
return sum(starmap(ne, izip(s1, s2)))

funcs = [
h_dist0,
h_dist1,
h_dist2,
h_dist3,
h_dist3a,
h_dist4,
h_dist5,
]

# ------------------------------------

def check_full():
print 'Testing all functions with strings of length', len(s1)
for func in funcs:
print '%s:%s\n%d\n' % (func.func_name, func.__doc__, func(s1, s2))

def check():
print 'Testing all functions with strings of length', len(s1)
print [func(s1, s2) for func in funcs], '\n'

def time_test(loops=10000, reps=3):
''' Print timing stats for all the functions '''
slen = len(s1)
print 'Length = %d, Loops = %d, Repetitions = %d' % (slen, loops, reps)

for func in funcs:
#Get function name and docstring
fname = func.func_name
fdoc = func.__doc__

print '\n%s:%s' % (fname, fdoc)
t = Timer('%s(s1, s2)' % fname, 'from __main__ import s1, s2, %s' % fname)
results = t.repeat(reps, loops)
results.sort()
print results
print '\n' + '- '*30 + '\n'

def make_strings(n, r=0.5):
print 'r:', r
s1 = 'a' * n
s2 = ''.join(['b' if random() < r else 'a' for _ in xrange(n)])
return s1, s2

# ------------------------------------

seed(37)

s1, s2 = make_strings(100)
#print '%s\n%s\n' % (s1, s2)
check()
time_test(10000)

s1, s2 = make_strings(100, 0.1)
check()
time_test(10000)

s1, s2 = make_strings(100, 0.9)
check()
time_test(10000)

s1, s2 = make_strings(10)
check()
time_test(50000)

s1, s2 = make_strings(1000)
check()
time_test(1000)

The results below are from a 32 bit 2GHz Pentium 4 running Python 2.6.6 on Linux.

output

r: 0.5
Testing all functions with strings of length 100
[45, 45, 45, 45, 45, 45, 45]

Length = 100, Loops = 10000, Repetitions = 3

h_dist0: For loop
[0.62271595001220703, 0.63597297668457031, 0.65991997718811035]

h_dist1: List comprehension
[0.80136799812316895, 1.0849411487579346, 1.1687240600585938]

h_dist2: Generator expression
[0.81829214096069336, 0.82315492630004883, 0.85774612426757812]

h_dist3: Generator expression with if
[0.67409086227416992, 0.67418098449707031, 0.68189001083374023]

h_dist3a: Generator expression with izip
[0.54596519470214844, 0.54696321487426758, 0.54910516738891602]

h_dist4: imap
[0.4574120044708252, 0.45927596092224121, 0.46362900733947754]

h_dist5: starmap
[0.38610100746154785, 0.38653087615966797, 0.39858913421630859]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

r: 0.1
Testing all functions with strings of length 100
[13, 13, 13, 13, 13, 13, 13]

Length = 100, Loops = 10000, Repetitions = 3

h_dist0: For loop
[0.59487199783325195, 0.61918497085571289, 0.62035894393920898]

h_dist1: List comprehension
[0.77733206748962402, 0.77883815765380859, 0.78676295280456543]

h_dist2: Generator expression
[0.8313758373260498, 0.83669614791870117, 0.8419950008392334]

h_dist3: Generator expression with if
[0.60900688171386719, 0.61443901062011719, 0.6202390193939209]

h_dist3a: Generator expression with izip
[0.48425912857055664, 0.48703289031982422, 0.49215483665466309]

h_dist4: imap
[0.45452284812927246, 0.46001195907592773, 0.4652099609375]

h_dist5: starmap
[0.37329483032226562, 0.37666082382202148, 0.40111804008483887]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

r: 0.9
Testing all functions with strings of length 100
[94, 94, 94, 94, 94, 94, 94]

Length = 100, Loops = 10000, Repetitions = 3

h_dist0: For loop
[0.69256496429443359, 0.69339799880981445, 0.70190787315368652]

h_dist1: List comprehension
[0.80547499656677246, 0.81107187271118164, 0.81337189674377441]

h_dist2: Generator expression
[0.82524299621582031, 0.82638883590698242, 0.82899308204650879]

h_dist3: Generator expression with if
[0.80344915390014648, 0.8050081729888916, 0.80581092834472656]

h_dist3a: Generator expression with izip
[0.63276004791259766, 0.63585305213928223, 0.64699077606201172]

h_dist4: imap
[0.46122288703918457, 0.46677708625793457, 0.46921491622924805]

h_dist5: starmap
[0.38288688659667969, 0.38731098175048828, 0.38867902755737305]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

r: 0.5
Testing all functions with strings of length 10
[5, 5, 5, 5, 5, 5, 5]

Length = 10, Loops = 50000, Repetitions = 3

h_dist0: For loop
[0.55377697944641113, 0.55385804176330566, 0.56589198112487793]

h_dist1: List comprehension
[0.69614696502685547, 0.71386599540710449, 0.71778011322021484]

h_dist2: Generator expression
[0.74240994453430176, 0.77340388298034668, 0.77429509162902832]

h_dist3: Generator expression with if
[0.66713404655456543, 0.66874384880065918, 0.67353487014770508]

h_dist3a: Generator expression with izip
[0.59427285194396973, 0.59525203704833984, 0.60147690773010254]

h_dist4: imap
[0.46971893310546875, 0.4749150276184082, 0.4831998348236084]

h_dist5: starmap
[0.46615099906921387, 0.47054886817932129, 0.47225403785705566]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

r: 0.5
Testing all functions with strings of length 1000
[506, 506, 506, 506, 506, 506, 506]

Length = 1000, Loops = 1000, Repetitions = 3

h_dist0: For loop
[0.59869503974914551, 0.60042905807495117, 0.60753512382507324]

h_dist1: List comprehension
[0.68359518051147461, 0.70072579383850098, 0.7146599292755127]

h_dist2: Generator expression
[0.7492527961730957, 0.75325894355773926, 0.75805497169494629]

h_dist3: Generator expression with if
[0.59286904335021973, 0.59505105018615723, 0.59793591499328613]

h_dist3a: Generator expression with izip
[0.49536395072937012, 0.49821090698242188, 0.54327893257141113]

h_dist4: imap
[0.42384982109069824, 0.43060398101806641, 0.43535709381103516]

h_dist5: starmap
[0.34122705459594727, 0.35040402412414551, 0.35851287841796875]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Is it more efficient to use list comprehension or nested loops?

As with any question about efficiency, the only real answer is to profile your code, because it always depends.

That said, there should be no noticeable difference between for loops and list comprehensions (they most likely compile to the same byte code) and you should use what is most readable. In this case, I'd say the nested for loops are far more readable than nested comprehensions.

Comparing list comprehensions and explicit loops (3 array generators faster than 1 for loop)

Let's define the functions we will need to answer the question and timeit them:

In [18]: def iter():
l = [x for x in range(100) if x > 10]
....:

In [19]: %timeit iter()
100000 loops, best of 3: 7.92 µs per loop

In [20]: def loop():
l = []
for x in range(100):
if x > 10:
l.append(x)
....:

In [21]: %timeit loop()
10000 loops, best of 3: 20 µs per loop

In [22]: def loop_fast():
l = []
for x in range(100):
if x > 10:
pass
....:

In [23]: %timeit loop_fast()
100000 loops, best of 3: 4.69 µs per loop

we can see that the for loops without the append command is as fast as the list comprehension. In fact, if we have a look at the bytecode we can see that in the case of the list comprehension python is able to use a built-in bytecode command called LIST_APPEND instead of:

  • Load the list: 40 LOAD_FAST
  • Load the attribute: 43 LOAD_ATTRIBUTE
  • Call the loaded function: 49 CALL_FUNCTION
  • Unload the list(?): 52 POP_TOP

As you can see from the output below the previous bytecode are missing with list comprehension and with the "loop_fast" function. Comparing the timeit of the three function is clear that those are responsible for the different timing of the three methods.

In [27]: dis.dis(iter)
2 0 BUILD_LIST 0
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (1)
9 LOAD_CONST 2 (100)
12 CALL_FUNCTION 2
15 GET_ITER
>> 16 FOR_ITER 24 (to 43)
19 STORE_FAST 0 (x)
22 LOAD_FAST 0 (x)
25 LOAD_CONST 2 (100)
28 COMPARE_OP 4 (>)
31 POP_JUMP_IF_FALSE 16
34 LOAD_FAST 0 (x)
37 LIST_APPEND 2
40 JUMP_ABSOLUTE 16
>> 43 STORE_FAST 1 (l)
46 LOAD_CONST 0 (None)
49 RETURN_VALUE

In [28]: dis.dis(loop)
2 0 BUILD_LIST 0
3 STORE_FAST 0 (1)

3 6 SETUP_LOOP 51 (to 60)
9 LOAD_GLOBAL 0 (range)
12 LOAD_CONST 1 (1)
15 LOAD_CONST 2 (100)
18 CALL_FUNCTION 2
21 GET_ITER
>> 22 FOR_ITER 34 (to 59)
25 STORE_FAST 1 (x)

4 28 LOAD_FAST 1 (x)
31 LOAD_CONST 3 (10)
34 COMPARE_OP 4 (>)
37 POP_JUMP_IF_FALSE 22

5 40 LOAD_FAST 0 (l)
43 LOAD_ATTR 1 (append)
46 LOAD_FAST 1 (x)
49 CALL_FUNCTION 1
52 POP_TOP
53 JUMP_ABSOLUTE 22
56 JUMP_ABSOLUTE 22
>> 59 POP_BLOCK
>> 60 LOAD_CONST 0 (None)
63 RETURN_VALUE

In [29]: dis.dis(loop_fast)
2 0 BUILD_LIST 0
3 STORE_FAST 0 (1)

3 6 SETUP_LOOP 38 (to 47)
9 LOAD_GLOBAL 0 (range)
12 LOAD_CONST 1 (1)
15 LOAD_CONST 2 (100)
18 CALL_FUNCTION 2
21 GET_ITER
>> 22 FOR_ITER 21 (to 46)
25 STORE_FAST 1 (x)

4 28 LOAD_FAST 1 (x)
31 LOAD_CONST 3 (10)
34 COMPARE_OP 4 (>)
37 POP_JUMP_IF_FALSE 22

5 40 JUMP_ABSOLUTE 22
43 JUMP_ABSOLUTE 22
>> 46 POP_BLOCK
>> 47 LOAD_CONST 0 (None)
50 RETURN_VALUE

Python: Why is list comprehension slower than for loop

You are using a generator expression in your list comprehension:

sum(samples[i-j] for j in range(n))

Generator expressions require a new frame to be created each time you run one, just like a function call. This is relatively expensive.

You don't need to use a generator expression at all; you only need to slice the samples list:

sum(samples[i - n + 1:i + 1])

You can specify a second argument, a start value for the sum() function; set it to 0.0 to get a float result:

sum(samples[i - n + 1:i + 1], 0.0)

Together these changes make all the difference:

>>> from timeit import timeit
>>> import random
>>> testdata = [i*random.random() for i in range(1000)]
>>> def slow_moving_average(samples, n=3):
... return [float(sum(samples[i-j] for j in range(n)))/n for i in range(n-1, len(samples))]
...
>>> def fast_moving_average(samples, n=3):
... return [sum(samples[i - n + 1:i + 1], 0.0) / n for i in range(n-1, len(samples))]
...
>>> def verbose_moving_average(samples, n=3):
... l =[]
... for i in range(n-1, len(samples)):
... x = 0.0
... for j in range(n):
... x+= samples[i-j]
... l.append(x / n)
... return l
...
>>> timeit('f(s)', 'from __main__ import verbose_moving_average as f, testdata as s', number=1000)
0.9375386269966839
>>> timeit('f(s)', 'from __main__ import slow_moving_average as f, testdata as s', number=1000)
1.9631599469939829
>>> timeit('f(s)', 'from __main__ import fast_moving_average as f, testdata as s', number=1000)
0.5647804250038462

Can all for loops in python be converted into list comprehension equivalent?

Yes, sort of.

You can convert nested for loops to list comprehensions, and even use variable assignment with :=.

It is true that list comprehensions are slighly faster than for loops, but the difference is so negligible that it is not worth trying to cram a complicated loop into a list comprehension.



Related Topics



Leave a reply



Submit