What is the advantage of a list comprehension over a for loop?
List comprehensions are more compact and faster than an explicit for
loop building a list:
def slower():
result = []
for elem in some_iterable:
result.append(elem)
return result
def faster():
return [elem for elem in some_iterable]
This is because calling .append()
on a list
causes the list object to grow (in chunks) to make space for new elements individually, while the list comprehension gathers all elements first before creating the list
to fit the elements in one go:
>>> some_iterable = range(1000)
>>> import timeit
>>> timeit.timeit('f()', 'from __main__ import slower as f', number=10000)
1.4456570148468018
>>> timeit.timeit('f()', 'from __main__ import faster as f', number=10000)
0.49323201179504395
However, this does not mean you should start using list comprehensions for everything! A list comprehension will still build a list object; if you are using a list comprehension just because it gives you a one-line loop, think again. You are probably wasting cycles building a list object that you then discard again. Just stick to a normal for
loop in that case.
When to use 'For Loop vs List Comprehension' to create New Lists?
What are the advantages of using List Comprehensions? First of all, you’re reducing 3 lines of code into one, which will be instantly recognizable to anyone who understands list comprehensions. Secondly, the second code is faster, as Python will allocate the list’s memory first, before adding the elements to it, instead of having to resize on runtime. It’ll also avoid having to make calls to ‘append’, which may be cheap but add up. Lastly, code using comprehensions is considered more ‘Pythonic’ — better fitting Python’s style guidelines. Python’s List Comprehensions: Uses and Advantages, Luciano Strika
List comprehension:
- Easier to read
- Quicker, because of prior memory allocation
For loop:
- More flexible
Are list-comprehensions and functional functions faster than for loops?
The following are rough guidelines and educated guesses based on experience. You should timeit
or profile your concrete use case to get hard numbers, and those numbers may occasionally disagree with the below.
A list comprehension is usually a tiny bit faster than the precisely equivalent for
loop (that actually builds a list), most likely because it doesn't have to look up the list and its append
method on every iteration. However, a list comprehension still does a bytecode-level loop:
>>> dis.dis(<the code object for `[x for x in range(10)]`>)
1 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE
Using a list comprehension in place of a loop that doesn't build a list, nonsensically accumulating a list of meaningless values and then throwing the list away, is often slower because of the overhead of creating and extending the list. List comprehensions aren't magic that is inherently faster than a good old loop.
As for functional list processing functions: While these are written in C and probably outperform equivalent functions written in Python, they are not necessarily the fastest option. Some speed up is expected if the function is written in C too. But most cases using a lambda
(or other Python function), the overhead of repeatedly setting up Python stack frames etc. eats up any savings. Simply doing the same work in-line, without function calls (e.g. a list comprehension instead of map
or filter
) is often slightly faster.
Suppose that in a game that I'm developing I need to draw complex and huge maps using for loops. This question would be definitely relevant, for if a list-comprehension, for example, is indeed faster, it would be a much better option in order to avoid lags (Despite the visual complexity of the code).
Chances are, if code like this isn't already fast enough when written in good non-"optimized" Python, no amount of Python level micro optimization is going to make it fast enough and you should start thinking about dropping to C. While extensive micro optimizations can often speed up Python code considerably, there is a low (in absolute terms) limit to this. Moreover, even before you hit that ceiling, it becomes simply more cost efficient (15% speedup vs. 300% speed up with the same effort) to bite the bullet and write some C.
In Python, is it better to use list comprehensions or for-each loops?
If the iteration is being done for its side effect ( as it is in your "print" example ), then a loop is clearer.
If the iteration is executed in order to build a composite value, then list comprehensions are usually more readable.
Python List Comprehension vs For
Essentially, list comprehension and for loops does pretty similar things, with list comprehension doing away some overheads and making it look pretty.
To understand why this is faster, you should look in Efficiency of list comprehensions and to quote the relevant part for your problem:
List comprehensions perform better here because you don’t need to load
the append attribute off of the list (loop program, bytecode 28) and
call it as a function (loop program, bytecode 38). Instead, in a
comprehension, a specialized LIST_APPEND bytecode is generated for a
fast append onto the result list (comprehension program, bytecode 33).In the loop_faster program, you avoid the overhead of the append
attribute lookup by hoisting it out of the loop and placing the result
in a fastlocal (bytecode 9-12), so it loops more quickly; however, the
comprehension uses a specialized LIST_APPEND bytecode instead of
incurring the overhead of a function call, so it still trumps.
The link also details some of the possible pitfalls associated with lc and I would recommend you to go through it once.
Why is a list comprehension so much faster than appending to a list?
List comprehension is basically just a "syntactic sugar" for the regular for
loop. In this case the reason that it performs better is because it doesn't need to load the append attribute of the list and call it as a function at each iteration. In other words and in general, list comprehensions perform faster because suspending and resuming a function's frame, or multiple functions in other cases, is slower than creating a list on demand.
Consider the following examples :
In [1]: def f1():
...: l = []
...: for i in range(5):
...: l.append(i)
...:
...:
...: def f2():
...: [i for i in range(5)]
...:
In [3]: import dis
In [4]: dis.dis(f1)
2 0 BUILD_LIST 0
2 STORE_FAST 0 (l)
3 4 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (5)
8 CALL_FUNCTION 1
10 GET_ITER
>> 12 FOR_ITER 14 (to 28)
14 STORE_FAST 1 (i)
4 16 LOAD_FAST 0 (l)
18 LOAD_METHOD 1 (append)
20 LOAD_FAST 1 (i)
22 CALL_METHOD 1
24 POP_TOP
26 JUMP_ABSOLUTE 12
>> 28 LOAD_CONST 0 (None)
30 RETURN_VALUE
In [5]:
In [5]: dis.dis(f2)
8 0 LOAD_CONST 1 (<code object <listcomp> at 0x7f397abc0d40, file "<ipython-input-1-45c11e415ee9>", line 8>)
2 LOAD_CONST 2 ('f2.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_GLOBAL 0 (range)
8 LOAD_CONST 3 (5)
10 CALL_FUNCTION 1
12 GET_ITER
14 CALL_FUNCTION 1
16 POP_TOP
18 LOAD_CONST 0 (None)
20 RETURN_VALUE
Disassembly of <code object <listcomp> at 0x7f397abc0d40, file "<ipython-input-1-45c11e415ee9>", line 8>:
8 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 8 (to 14)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LIST_APPEND 2
12 JUMP_ABSOLUTE 4
>> 14 RETURN_VALUE
In [6]:
You can see that on offset 18 in the first function we have an append
attribute while there's no such thing in second function using list comprehension. All those extra bytecodes will make the appending approach slower and since in this case you'll have loading of the append
attribute in each iteration, in the end it will make the code to take approximately twice as slower as the second function using only list comprehension.
Multiple list comprehension vs single for loop
I would try it without a loop using np.where
clauses for the if-elif-else combinations. That's usually pretty fast.
import numpy as np
# dataframe is a DataFrame containing data
# Now this:
dataframe["Price"] = np.where(dataframe["Price_Dummy"] == "0",0,1)
# String operations work on whole string columns as well
unit_of_measure = dataframe["Size"].str.split(" ", expand=True)[1].lower()
size = dataframe["Size"].str.split(" ", expand=True)[0].astype("float")
kb_case = np.where(unit_of_measure =="kb", size/1000, size)
dataframe["Size"] = np.where(unit_of_measure =="gb", size*1000, kb_case)
Notice that I replaced the [-1]
in the unit_of_measure =
line with [1]
as the expand=True
option does not support the -1
indexing. So you would have to know at which position your unit ends up.
Information on splitting strings in DataFrames can be found here.
In the last two lines, I reproduced the if-elif-else combination which you kind of have to create from the bottom up: Your final result dataframe["Size"]
equals size*1000
if the unit is gb
. If not, it equals the kb_case
which includes the case where the unit is kb
as well as all other cases.
Speed/efficiency comparison for loop vs list comprehension vs other methods
Don't be too quick to write off the humble for
loop. If you don't actually need a list, like in this case, a standard for
loop can be faster than using a list comprehension. And of course it has less memory overheads.
Here's a program to perform timing tests; it can be easily modified to add more tests.
#!/usr/bin/env python
''' Time various implementations of string diff function
From http://stackoverflow.com/q/28581218/4014959
Written by PM 2Ring 2015.02.18
'''
from itertools import imap, izip, starmap
from operator import ne
from timeit import Timer
from random import random, seed
def h_dist0(s1,s2):
''' For loop '''
tot = 0
for d1, d2 in zip(s1, s2):
if d1 != d2:
tot += 1
return tot
def h_dist1(s1,s2):
''' List comprehension '''
return sum([dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2)])
def h_dist2(s1,s2):
''' Generator expression '''
return sum(dgt1 != dgt2 for dgt1, dgt2 in zip(s1, s2))
def h_dist3(s1,s2):
''' Generator expression with if '''
return sum(1 for dgt1, dgt2 in zip(s1, s2) if dgt1 != dgt2)
def h_dist3a(s1,s2):
''' Generator expression with izip '''
return sum(1 for dgt1, dgt2 in izip(s1, s2) if dgt1 != dgt2)
def h_dist4(s1,s2):
''' imap '''
return sum(imap(ne, s1, s2))
def h_dist5(s1,s2):
''' starmap '''
return sum(starmap(ne, izip(s1, s2)))
funcs = [
h_dist0,
h_dist1,
h_dist2,
h_dist3,
h_dist3a,
h_dist4,
h_dist5,
]
# ------------------------------------
def check_full():
print 'Testing all functions with strings of length', len(s1)
for func in funcs:
print '%s:%s\n%d\n' % (func.func_name, func.__doc__, func(s1, s2))
def check():
print 'Testing all functions with strings of length', len(s1)
print [func(s1, s2) for func in funcs], '\n'
def time_test(loops=10000, reps=3):
''' Print timing stats for all the functions '''
slen = len(s1)
print 'Length = %d, Loops = %d, Repetitions = %d' % (slen, loops, reps)
for func in funcs:
#Get function name and docstring
fname = func.func_name
fdoc = func.__doc__
print '\n%s:%s' % (fname, fdoc)
t = Timer('%s(s1, s2)' % fname, 'from __main__ import s1, s2, %s' % fname)
results = t.repeat(reps, loops)
results.sort()
print results
print '\n' + '- '*30 + '\n'
def make_strings(n, r=0.5):
print 'r:', r
s1 = 'a' * n
s2 = ''.join(['b' if random() < r else 'a' for _ in xrange(n)])
return s1, s2
# ------------------------------------
seed(37)
s1, s2 = make_strings(100)
#print '%s\n%s\n' % (s1, s2)
check()
time_test(10000)
s1, s2 = make_strings(100, 0.1)
check()
time_test(10000)
s1, s2 = make_strings(100, 0.9)
check()
time_test(10000)
s1, s2 = make_strings(10)
check()
time_test(50000)
s1, s2 = make_strings(1000)
check()
time_test(1000)
The results below are from a 32 bit 2GHz Pentium 4 running Python 2.6.6 on Linux.
output
r: 0.5
Testing all functions with strings of length 100
[45, 45, 45, 45, 45, 45, 45]
Length = 100, Loops = 10000, Repetitions = 3
h_dist0: For loop
[0.62271595001220703, 0.63597297668457031, 0.65991997718811035]
h_dist1: List comprehension
[0.80136799812316895, 1.0849411487579346, 1.1687240600585938]
h_dist2: Generator expression
[0.81829214096069336, 0.82315492630004883, 0.85774612426757812]
h_dist3: Generator expression with if
[0.67409086227416992, 0.67418098449707031, 0.68189001083374023]
h_dist3a: Generator expression with izip
[0.54596519470214844, 0.54696321487426758, 0.54910516738891602]
h_dist4: imap
[0.4574120044708252, 0.45927596092224121, 0.46362900733947754]
h_dist5: starmap
[0.38610100746154785, 0.38653087615966797, 0.39858913421630859]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
r: 0.1
Testing all functions with strings of length 100
[13, 13, 13, 13, 13, 13, 13]
Length = 100, Loops = 10000, Repetitions = 3
h_dist0: For loop
[0.59487199783325195, 0.61918497085571289, 0.62035894393920898]
h_dist1: List comprehension
[0.77733206748962402, 0.77883815765380859, 0.78676295280456543]
h_dist2: Generator expression
[0.8313758373260498, 0.83669614791870117, 0.8419950008392334]
h_dist3: Generator expression with if
[0.60900688171386719, 0.61443901062011719, 0.6202390193939209]
h_dist3a: Generator expression with izip
[0.48425912857055664, 0.48703289031982422, 0.49215483665466309]
h_dist4: imap
[0.45452284812927246, 0.46001195907592773, 0.4652099609375]
h_dist5: starmap
[0.37329483032226562, 0.37666082382202148, 0.40111804008483887]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
r: 0.9
Testing all functions with strings of length 100
[94, 94, 94, 94, 94, 94, 94]
Length = 100, Loops = 10000, Repetitions = 3
h_dist0: For loop
[0.69256496429443359, 0.69339799880981445, 0.70190787315368652]
h_dist1: List comprehension
[0.80547499656677246, 0.81107187271118164, 0.81337189674377441]
h_dist2: Generator expression
[0.82524299621582031, 0.82638883590698242, 0.82899308204650879]
h_dist3: Generator expression with if
[0.80344915390014648, 0.8050081729888916, 0.80581092834472656]
h_dist3a: Generator expression with izip
[0.63276004791259766, 0.63585305213928223, 0.64699077606201172]
h_dist4: imap
[0.46122288703918457, 0.46677708625793457, 0.46921491622924805]
h_dist5: starmap
[0.38288688659667969, 0.38731098175048828, 0.38867902755737305]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
r: 0.5
Testing all functions with strings of length 10
[5, 5, 5, 5, 5, 5, 5]
Length = 10, Loops = 50000, Repetitions = 3
h_dist0: For loop
[0.55377697944641113, 0.55385804176330566, 0.56589198112487793]
h_dist1: List comprehension
[0.69614696502685547, 0.71386599540710449, 0.71778011322021484]
h_dist2: Generator expression
[0.74240994453430176, 0.77340388298034668, 0.77429509162902832]
h_dist3: Generator expression with if
[0.66713404655456543, 0.66874384880065918, 0.67353487014770508]
h_dist3a: Generator expression with izip
[0.59427285194396973, 0.59525203704833984, 0.60147690773010254]
h_dist4: imap
[0.46971893310546875, 0.4749150276184082, 0.4831998348236084]
h_dist5: starmap
[0.46615099906921387, 0.47054886817932129, 0.47225403785705566]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
r: 0.5
Testing all functions with strings of length 1000
[506, 506, 506, 506, 506, 506, 506]
Length = 1000, Loops = 1000, Repetitions = 3
h_dist0: For loop
[0.59869503974914551, 0.60042905807495117, 0.60753512382507324]
h_dist1: List comprehension
[0.68359518051147461, 0.70072579383850098, 0.7146599292755127]
h_dist2: Generator expression
[0.7492527961730957, 0.75325894355773926, 0.75805497169494629]
h_dist3: Generator expression with if
[0.59286904335021973, 0.59505105018615723, 0.59793591499328613]
h_dist3a: Generator expression with izip
[0.49536395072937012, 0.49821090698242188, 0.54327893257141113]
h_dist4: imap
[0.42384982109069824, 0.43060398101806641, 0.43535709381103516]
h_dist5: starmap
[0.34122705459594727, 0.35040402412414551, 0.35851287841796875]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Generator expressions vs. list comprehensions
John's answer is good (that list comprehensions are better when you want to iterate over something multiple times). However, it's also worth noting that you should use a list if you want to use any of the list methods. For example, the following code won't work:
def gen():
return (something for something in get_some_stuff())
print gen()[:2] # generators don't support indexing or slicing
print [5,6] + gen() # generators can't be added to lists
Basically, use a generator expression if all you're doing is iterating once. If you want to store and use the generated results, then you're probably better off with a list comprehension.
Since performance is the most common reason to choose one over the other, my advice is to not worry about it and just pick one; if you find that your program is running too slowly, then and only then should you go back and worry about tuning your code.
Related Topics
How to Join Two Wav Files Using Python
Can't Install New Packages for Python (Python 3.9.0, Windows 10)
How to Do a Not Equal in Django Queryset Filtering
Zlib.Error: Error -3 While Decompressing: Incorrect Header Check
Configuring Spark to Work with Jupyter Notebook and Anaconda
Inverse Distance Weighted (Idw) Interpolation with Python
Making an Asynchronous Task in Flask
Python: Call a Function from String Name
Calculating Difference Between Two Rows in Python/Pandas
How to Add Items into a Numpy Array
Difference Between Static Static_Url and Static_Root on Django
The Problem with Installing Pil Using Virtualenv or Buildout
Python Serialization - Why Pickle
Removing Index Column in Pandas When Reading a CSV