What Is the Advantage of a List Comprehension Over a for Loop

What is the advantage of a list comprehension over a for loop?

List comprehensions are more compact and faster than an explicit for loop building a list:

def slower():
result = []
for elem in some_iterable:
result.append(elem)
return result

def faster():
return [elem for elem in some_iterable]

This is because calling .append() on a list causes the list object to grow (in chunks) to make space for new elements individually, while the list comprehension gathers all elements first before creating the list to fit the elements in one go:

>>> some_iterable = range(1000)
>>> import timeit
>>> timeit.timeit('f()', 'from __main__ import slower as f', number=10000)
1.4456570148468018
>>> timeit.timeit('f()', 'from __main__ import faster as f', number=10000)
0.49323201179504395

However, this does not mean you should start using list comprehensions for everything! A list comprehension will still build a list object; if you are using a list comprehension just because it gives you a one-line loop, think again. You are probably wasting cycles building a list object that you then discard again. Just stick to a normal for loop in that case.

When to use 'For Loop vs List Comprehension' to create New Lists?

What are the advantages of using List Comprehensions? First of all, you’re reducing 3 lines of code into one, which will be instantly recognizable to anyone who understands list comprehensions. Secondly, the second code is faster, as Python will allocate the list’s memory first, before adding the elements to it, instead of having to resize on runtime. It’ll also avoid having to make calls to ‘append’, which may be cheap but add up. Lastly, code using comprehensions is considered more ‘Pythonic’ — better fitting Python’s style guidelines. Python’s List Comprehensions: Uses and Advantages, Luciano Strika

List comprehension:

  • Easier to read
  • Quicker, because of prior memory allocation

For loop:

  • More flexible

Are list-comprehensions and functional functions faster than for loops?

The following are rough guidelines and educated guesses based on experience. You should timeit or profile your concrete use case to get hard numbers, and those numbers may occasionally disagree with the below.

A list comprehension is usually a tiny bit faster than the precisely equivalent for loop (that actually builds a list), most likely because it doesn't have to look up the list and its append method on every iteration. However, a list comprehension still does a bytecode-level loop:

>>> dis.dis(<the code object for `[x for x in range(10)]`>)
1 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE

Using a list comprehension in place of a loop that doesn't build a list, nonsensically accumulating a list of meaningless values and then throwing the list away, is often slower because of the overhead of creating and extending the list. List comprehensions aren't magic that is inherently faster than a good old loop.

As for functional list processing functions: While these are written in C and probably outperform equivalent functions written in Python, they are not necessarily the fastest option. Some speed up is expected if the function is written in C too. But most cases using a lambda (or other Python function), the overhead of repeatedly setting up Python stack frames etc. eats up any savings. Simply doing the same work in-line, without function calls (e.g. a list comprehension instead of map or filter) is often slightly faster.

Suppose that in a game that I'm developing I need to draw complex and huge maps using for loops. This question would be definitely relevant, for if a list-comprehension, for example, is indeed faster, it would be a much better option in order to avoid lags (Despite the visual complexity of the code).

Chances are, if code like this isn't already fast enough when written in good non-"optimized" Python, no amount of Python level micro optimization is going to make it fast enough and you should start thinking about dropping to C. While extensive micro optimizations can often speed up Python code considerably, there is a low (in absolute terms) limit to this. Moreover, even before you hit that ceiling, it becomes simply more cost efficient (15% speedup vs. 300% speed up with the same effort) to bite the bullet and write some C.

In Python, is it better to use list comprehensions or for-each loops?

If the iteration is being done for its side effect ( as it is in your "print" example ), then a loop is clearer.

If the iteration is executed in order to build a composite value, then list comprehensions are usually more readable.

Python List Comprehension vs For

Essentially, list comprehension and for loops does pretty similar things, with list comprehension doing away some overheads and making it look pretty.
To understand why this is faster, you should look in Efficiency of list comprehensions and to quote the relevant part for your problem:

List comprehensions perform better here because you don’t need to load
the append attribute off of the list (loop program, bytecode 28) and
call it as a function (loop program, bytecode 38). Instead, in a
comprehension, a specialized LIST_APPEND bytecode is generated for a
fast append onto the result list (comprehension program, bytecode 33).

In the loop_faster program, you avoid the overhead of the append
attribute lookup by hoisting it out of the loop and placing the result
in a fastlocal (bytecode 9-12), so it loops more quickly; however, the
comprehension uses a specialized LIST_APPEND bytecode instead of
incurring the overhead of a function call, so it still trumps.

The link also details some of the possible pitfalls associated with lc and I would recommend you to go through it once.

Why is a list comprehension so much faster than appending to a list?

List comprehension is basically just a "syntactic sugar" for the regular for loop. In this case the reason that it performs better is because it doesn't need to load the append attribute of the list and call it as a function at each iteration. In other words and in general, list comprehensions perform faster because suspending and resuming a function's frame, or multiple functions in other cases, is slower than creating a list on demand.

Consider the following examples :

In [1]: def f1(): 
...: l = []
...: for i in range(5):
...: l.append(i)
...:
...:
...: def f2():
...: [i for i in range(5)]
...:

In [3]: import dis

In [4]: dis.dis(f1)
2 0 BUILD_LIST 0
2 STORE_FAST 0 (l)

3 4 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (5)
8 CALL_FUNCTION 1
10 GET_ITER
>> 12 FOR_ITER 14 (to 28)
14 STORE_FAST 1 (i)

4 16 LOAD_FAST 0 (l)
18 LOAD_METHOD 1 (append)
20 LOAD_FAST 1 (i)
22 CALL_METHOD 1
24 POP_TOP
26 JUMP_ABSOLUTE 12
>> 28 LOAD_CONST 0 (None)
30 RETURN_VALUE

In [5]:

In [5]: dis.dis(f2)
8 0 LOAD_CONST 1 (<code object <listcomp> at 0x7f397abc0d40, file "<ipython-input-1-45c11e415ee9>", line 8>)
2 LOAD_CONST 2 ('f2.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_GLOBAL 0 (range)
8 LOAD_CONST 3 (5)
10 CALL_FUNCTION 1
12 GET_ITER
14 CALL_FUNCTION 1
16 POP_TOP
18 LOAD_CONST 0 (None)
20 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x7f397abc0d40, file "<ipython-input-1-45c11e415ee9>", line 8>:
8 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 8 (to 14)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LIST_APPEND 2
12 JUMP_ABSOLUTE 4
>> 14 RETURN_VALUE

In [6]:

You can see that on offset 18 in the first function we have an append attribute while there's no such thing in second function using list comprehension. All those extra bytecodes will make the appending approach slower and since in this case you'll have loading of the append attribute in each iteration, in the end it will make the code to take approximately twice as slower as the second function using only list comprehension.

Multiple list comprehension vs single for loop

I would try it without a loop using np.where clauses for the if-elif-else combinations. That's usually pretty fast.

import numpy as np

# dataframe is a DataFrame containing data
# Now this:

dataframe["Price"] = np.where(dataframe["Price_Dummy"] == "0",0,1)

# String operations work on whole string columns as well
unit_of_measure = dataframe["Size"].str.split(" ", expand=True)[1].lower()

size = dataframe["Size"].str.split(" ", expand=True)[0].astype("float")

kb_case = np.where(unit_of_measure =="kb", size/1000, size)
dataframe["Size"] = np.where(unit_of_measure =="gb", size*1000, kb_case)

Notice that I replaced the [-1] in the unit_of_measure = line with [1] as the expand=True option does not support the -1 indexing. So you would have to know at which position your unit ends up.

Information on splitting strings in DataFrames can be found here.

In the last two lines, I reproduced the if-elif-else combination which you kind of have to create from the bottom up: Your final result dataframe["Size"] equals size*1000 if the unit is gb. If not, it equals the kb_case which includes the case where the unit is kb as well as all other cases.

Speed/efficiency comparison for loop vs list comprehension vs other methods

Don't be too quick to write off the humble for loop. If you don't actually need a list, like in this case, a standard for loop can be faster than using a list comprehension. And of course it has less memory overheads.

Here's a program to perform timing tests; it can be easily modified to add more tests.

#!/usr/bin/env python

''' Time various implementations of string diff function

From http://stackoverflow.com/q/28581218/4014959

Written by PM 2Ring 2015.02.18
'''

from itertools import imap, izip, starmap
from operator import ne

from timeit import Timer
from random import random, seed

def h_dist0(s1,s2):
''' For loop '''
tot = 0
for d1, d2 in zip(s1, s2):
if d1 != d2:
tot += 1
return tot

def h_dist1(s1,s2):
''' List comprehension '''
return sum([dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2)])

def h_dist2(s1,s2):
''' Generator expression '''
return sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))

def h_dist3(s1,s2):
''' Generator expression with if '''
return sum(1 for dgt1, dgt2 in zip(s1, s2) if dgt1 != dgt2)

def h_dist3a(s1,s2):
''' Generator expression with izip '''
return sum(1 for dgt1, dgt2 in izip(s1, s2) if dgt1 != dgt2)

def h_dist4(s1,s2):
''' imap '''
return sum(imap(ne, s1, s2))

def h_dist5(s1,s2):
''' starmap '''
return sum(starmap(ne, izip(s1, s2)))

funcs = [
h_dist0,
h_dist1,
h_dist2,
h_dist3,
h_dist3a,
h_dist4,
h_dist5,
]

# ------------------------------------

def check_full():
print 'Testing all functions with strings of length', len(s1)
for func in funcs:
print '%s:%s\n%d\n' % (func.func_name, func.__doc__, func(s1, s2))

def check():
print 'Testing all functions with strings of length', len(s1)
print [func(s1, s2) for func in funcs], '\n'

def time_test(loops=10000, reps=3):
''' Print timing stats for all the functions '''
slen = len(s1)
print 'Length = %d, Loops = %d, Repetitions = %d' % (slen, loops, reps)

for func in funcs:
#Get function name and docstring
fname = func.func_name
fdoc = func.__doc__

print '\n%s:%s' % (fname, fdoc)
t = Timer('%s(s1, s2)' % fname, 'from __main__ import s1, s2, %s' % fname)
results = t.repeat(reps, loops)
results.sort()
print results
print '\n' + '- '*30 + '\n'

def make_strings(n, r=0.5):
print 'r:', r
s1 = 'a' * n
s2 = ''.join(['b' if random() < r else 'a' for _ in xrange(n)])
return s1, s2

# ------------------------------------

seed(37)

s1, s2 = make_strings(100)
#print '%s\n%s\n' % (s1, s2)
check()
time_test(10000)

s1, s2 = make_strings(100, 0.1)
check()
time_test(10000)

s1, s2 = make_strings(100, 0.9)
check()
time_test(10000)

s1, s2 = make_strings(10)
check()
time_test(50000)

s1, s2 = make_strings(1000)
check()
time_test(1000)

The results below are from a 32 bit 2GHz Pentium 4 running Python 2.6.6 on Linux.

output

r: 0.5
Testing all functions with strings of length 100
[45, 45, 45, 45, 45, 45, 45]

Length = 100, Loops = 10000, Repetitions = 3

h_dist0: For loop
[0.62271595001220703, 0.63597297668457031, 0.65991997718811035]

h_dist1: List comprehension
[0.80136799812316895, 1.0849411487579346, 1.1687240600585938]

h_dist2: Generator expression
[0.81829214096069336, 0.82315492630004883, 0.85774612426757812]

h_dist3: Generator expression with if
[0.67409086227416992, 0.67418098449707031, 0.68189001083374023]

h_dist3a: Generator expression with izip
[0.54596519470214844, 0.54696321487426758, 0.54910516738891602]

h_dist4: imap
[0.4574120044708252, 0.45927596092224121, 0.46362900733947754]

h_dist5: starmap
[0.38610100746154785, 0.38653087615966797, 0.39858913421630859]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

r: 0.1
Testing all functions with strings of length 100
[13, 13, 13, 13, 13, 13, 13]

Length = 100, Loops = 10000, Repetitions = 3

h_dist0: For loop
[0.59487199783325195, 0.61918497085571289, 0.62035894393920898]

h_dist1: List comprehension
[0.77733206748962402, 0.77883815765380859, 0.78676295280456543]

h_dist2: Generator expression
[0.8313758373260498, 0.83669614791870117, 0.8419950008392334]

h_dist3: Generator expression with if
[0.60900688171386719, 0.61443901062011719, 0.6202390193939209]

h_dist3a: Generator expression with izip
[0.48425912857055664, 0.48703289031982422, 0.49215483665466309]

h_dist4: imap
[0.45452284812927246, 0.46001195907592773, 0.4652099609375]

h_dist5: starmap
[0.37329483032226562, 0.37666082382202148, 0.40111804008483887]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

r: 0.9
Testing all functions with strings of length 100
[94, 94, 94, 94, 94, 94, 94]

Length = 100, Loops = 10000, Repetitions = 3

h_dist0: For loop
[0.69256496429443359, 0.69339799880981445, 0.70190787315368652]

h_dist1: List comprehension
[0.80547499656677246, 0.81107187271118164, 0.81337189674377441]

h_dist2: Generator expression
[0.82524299621582031, 0.82638883590698242, 0.82899308204650879]

h_dist3: Generator expression with if
[0.80344915390014648, 0.8050081729888916, 0.80581092834472656]

h_dist3a: Generator expression with izip
[0.63276004791259766, 0.63585305213928223, 0.64699077606201172]

h_dist4: imap
[0.46122288703918457, 0.46677708625793457, 0.46921491622924805]

h_dist5: starmap
[0.38288688659667969, 0.38731098175048828, 0.38867902755737305]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

r: 0.5
Testing all functions with strings of length 10
[5, 5, 5, 5, 5, 5, 5]

Length = 10, Loops = 50000, Repetitions = 3

h_dist0: For loop
[0.55377697944641113, 0.55385804176330566, 0.56589198112487793]

h_dist1: List comprehension
[0.69614696502685547, 0.71386599540710449, 0.71778011322021484]

h_dist2: Generator expression
[0.74240994453430176, 0.77340388298034668, 0.77429509162902832]

h_dist3: Generator expression with if
[0.66713404655456543, 0.66874384880065918, 0.67353487014770508]

h_dist3a: Generator expression with izip
[0.59427285194396973, 0.59525203704833984, 0.60147690773010254]

h_dist4: imap
[0.46971893310546875, 0.4749150276184082, 0.4831998348236084]

h_dist5: starmap
[0.46615099906921387, 0.47054886817932129, 0.47225403785705566]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

r: 0.5
Testing all functions with strings of length 1000
[506, 506, 506, 506, 506, 506, 506]

Length = 1000, Loops = 1000, Repetitions = 3

h_dist0: For loop
[0.59869503974914551, 0.60042905807495117, 0.60753512382507324]

h_dist1: List comprehension
[0.68359518051147461, 0.70072579383850098, 0.7146599292755127]

h_dist2: Generator expression
[0.7492527961730957, 0.75325894355773926, 0.75805497169494629]

h_dist3: Generator expression with if
[0.59286904335021973, 0.59505105018615723, 0.59793591499328613]

h_dist3a: Generator expression with izip
[0.49536395072937012, 0.49821090698242188, 0.54327893257141113]

h_dist4: imap
[0.42384982109069824, 0.43060398101806641, 0.43535709381103516]

h_dist5: starmap
[0.34122705459594727, 0.35040402412414551, 0.35851287841796875]

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Generator expressions vs. list comprehensions

John's answer is good (that list comprehensions are better when you want to iterate over something multiple times). However, it's also worth noting that you should use a list if you want to use any of the list methods. For example, the following code won't work:

def gen():
return (something for something in get_some_stuff())

print gen()[:2] # generators don't support indexing or slicing
print [5,6] + gen() # generators can't be added to lists

Basically, use a generator expression if all you're doing is iterating once. If you want to store and use the generated results, then you're probably better off with a list comprehension.

Since performance is the most common reason to choose one over the other, my advice is to not worry about it and just pick one; if you find that your program is running too slowly, then and only then should you go back and worry about tuning your code.



Related Topics



Leave a reply



Submit