Why Is a List Comprehension So Much Faster Than Appending to a List

Why is a list comprehension so much faster than appending to a list?

List comprehension is basically just a "syntactic sugar" for the regular for loop. In this case the reason that it performs better is because it doesn't need to load the append attribute of the list and call it as a function at each iteration. In other words and in general, list comprehensions perform faster because suspending and resuming a function's frame, or multiple functions in other cases, is slower than creating a list on demand.

Consider the following examples :

In [1]: def f1(): 
...: l = []
...: for i in range(5):
...: l.append(i)
...:
...:
...: def f2():
...: [i for i in range(5)]
...:

In [3]: import dis

In [4]: dis.dis(f1)
2 0 BUILD_LIST 0
2 STORE_FAST 0 (l)

3 4 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (5)
8 CALL_FUNCTION 1
10 GET_ITER
>> 12 FOR_ITER 14 (to 28)
14 STORE_FAST 1 (i)

4 16 LOAD_FAST 0 (l)
18 LOAD_METHOD 1 (append)
20 LOAD_FAST 1 (i)
22 CALL_METHOD 1
24 POP_TOP
26 JUMP_ABSOLUTE 12
>> 28 LOAD_CONST 0 (None)
30 RETURN_VALUE

In [5]:

In [5]: dis.dis(f2)
8 0 LOAD_CONST 1 (<code object <listcomp> at 0x7f397abc0d40, file "<ipython-input-1-45c11e415ee9>", line 8>)
2 LOAD_CONST 2 ('f2.<locals>.<listcomp>')
4 MAKE_FUNCTION 0
6 LOAD_GLOBAL 0 (range)
8 LOAD_CONST 3 (5)
10 CALL_FUNCTION 1
12 GET_ITER
14 CALL_FUNCTION 1
16 POP_TOP
18 LOAD_CONST 0 (None)
20 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x7f397abc0d40, file "<ipython-input-1-45c11e415ee9>", line 8>:
8 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 8 (to 14)
6 STORE_FAST 1 (i)
8 LOAD_FAST 1 (i)
10 LIST_APPEND 2
12 JUMP_ABSOLUTE 4
>> 14 RETURN_VALUE

In [6]:

You can see that on offset 18 in the first function we have an append attribute while there's no such thing in second function using list comprehension. All those extra bytecodes will make the appending approach slower and since in this case you'll have loading of the append attribute in each iteration, in the end it will make the code to take approximately twice as slower as the second function using only list comprehension.

Why is updating a list faster when using a list comprehension as opposed to a generator expression?

This answer concerns CPython implementation only. Using a list comprehension is faster, since the generator is first converted into a list anyway. This is done because the length of the sequence should be determined before proceeding to replace data, and a generator can't tell you its length.

For list slice assignment, this operation is handled by the amusingly named list_ass_slice. There is a special-case handling for assigning a list or tuple, here - they can use PySequence_Fast ops.

This is the v3.7.4 implementation of PySequence_Fast, where you can clearly see a type-check for list or tuples:

PyObject *
PySequence_Fast(PyObject *v, const char *m)
{
PyObject *it;

if (v == NULL) {
return null_error();
}

if (PyList_CheckExact(v) || PyTuple_CheckExact(v)) {
Py_INCREF(v);
return v;
}

it = PyObject_GetIter(v);
if (it == NULL) {
if (PyErr_ExceptionMatches(PyExc_TypeError))
PyErr_SetString(PyExc_TypeError, m);
return NULL;
}

v = PySequence_List(it);
Py_DECREF(it);

return v;
}

A generator expression will fail this type check and continue to the fallback code, where it is converted into a list object, so that the length can be predetermined.

In the general case, a predetermined length is desirable in order to allow efficient allocation of list storage, and also to provide useful error messages with extended slice assignment:

>>> vals = (x for x in 'abc')
>>> L = [1,2,3]
>>> L[::2] = vals # attempt assigning 3 values into 2 positions
---------------------------------------------------------------------------
Traceback (most recent call last)
...
ValueError: attempt to assign sequence of size 3 to extended slice of size 2
>>> L # data unchanged
[1, 2, 3]
>>> list(vals) # generator was fully consumed
[]

Why list comprehension can be faster than map() in Python?

Test one: throwing away the result.

Here's our dummy function:

def examplefunc(x):
pass

And here are our challengers:

def listcomp_throwaway():
[examplefunc(i) for i in range(100)]

def forloop_throwaway():
for i in range(100):
examplefunc(i)

I won't do an analysis of its raw speed, only why, per the OP's question. Lets take a look at the diffs of the machine code.

--- List comprehension
+++ For loop
@@ -1,15 +1,16 @@
- 55 0 BUILD_LIST 0
+ 59 0 SETUP_LOOP 30 (to 33)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (100)
9 CALL_FUNCTION 1
12 GET_ITER
- >> 13 FOR_ITER 18 (to 34)
+ >> 13 FOR_ITER 16 (to 32)
16 STORE_FAST 0 (i)
- 19 LOAD_GLOBAL 1 (examplefunc)
+
+ 60 19 LOAD_GLOBAL 1 (examplefunc)
22 LOAD_FAST 0 (i)
25 CALL_FUNCTION 1
- 28 LIST_APPEND 2
- 31 JUMP_ABSOLUTE 13
- >> 34 POP_TOP
- 35 LOAD_CONST 0 (None)
- 38 RETURN_VALUE
+ 28 POP_TOP
+ 29 JUMP_ABSOLUTE 13
+ >> 32 POP_BLOCK
+ >> 33 LOAD_CONST 0 (None)
+ 36 RETURN_VALUE

The race is on. Listcomp's first move is to build an empty list, while for loop's is to setup a loop. Both of them then proceed to load global range(), the constant 100, and call the range function for a generator. Then they both get the current iterator and get the next item, and store it into the variable i. Then they load examplefunc and i and call examplefunc. Listcomp appends it to the list and starts the loop over again. For loop does the same in three instructions instead of two. Then they both load None and return it.

So who seems better in this analysis? Here, list comprehension does some redundant operations such as building the list and appending to it, if you don't care about the result. For loop is pretty efficient too.

If you time them, using a for loop is about one-third faster than a list comprehension. (In this test, examplefunc divided its argument by five and threw it away instead of doing nothing at all.)

Test two: Keeping the result like normal.

No dummy function this test. So here are our challengers:

def listcomp_normal():
l = [i*5 for i in range(100)]

def forloop_normal():
l = []
for i in range(100):
l.append(i*5)

The diff isn't any use to us today. It's just the two machine codes in two blocks.

List comp's machine code:

 55           0 BUILD_LIST               0
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (100)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 16 (to 32)
16 STORE_FAST 0 (i)
19 LOAD_FAST 0 (i)
22 LOAD_CONST 2 (5)
25 BINARY_MULTIPLY
26 LIST_APPEND 2
29 JUMP_ABSOLUTE 13
>> 32 STORE_FAST 1 (l)
35 LOAD_CONST 0 (None)
38 RETURN_VALUE

For loop's machine code:

 59           0 BUILD_LIST               0
3 STORE_FAST 0 (l)

60 6 SETUP_LOOP 37 (to 46)
9 LOAD_GLOBAL 0 (range)
12 LOAD_CONST 1 (100)
15 CALL_FUNCTION 1
18 GET_ITER
>> 19 FOR_ITER 23 (to 45)
22 STORE_FAST 1 (i)

61 25 LOAD_FAST 0 (l)
28 LOAD_ATTR 1 (append)
31 LOAD_FAST 1 (i)
34 LOAD_CONST 2 (5)
37 BINARY_MULTIPLY
38 CALL_FUNCTION 1
41 POP_TOP
42 JUMP_ABSOLUTE 19
>> 45 POP_BLOCK
>> 46 LOAD_CONST 0 (None)
49 RETURN_VALUE

As you can probably already tell, the list comprehension has fewer instructions than for loop does.

List comprehension's checklist:

  1. Build an anonymous empty list.
  2. Load range.
  3. Load 100.
  4. Call range.
  5. Get the iterator.
  6. Get the next item on that iterator.
  7. Store that item onto i.
  8. Load i.
  9. Load the integer five.
  10. Multiply times five.
  11. Append the list.
  12. Repeat steps 6-10 until range is empty.
  13. Point l to the anonymous empty list.

For loop's checklist:

  1. Build an anonymous empty list.
  2. Point l to the anonymous empty list.
  3. Setup a loop.
  4. Load range.
  5. Load 100.
  6. Call range.
  7. Get the iterator.
  8. Get the next item on that iterator.
  9. Store that item onto i.
  10. Load the list l.
  11. Load the attribute append on that list.
  12. Load i.
  13. Load the integer five.
  14. Multiply times five.
  15. Call append.
  16. Go to the top.
  17. Go to absolute.

(Not including these steps: Load None, return it.)

The list comprehension doesn't have to do these things:

  • Load append of the list every time, since it's pre-bound as a local variable.
  • Load i twice per loop
  • Spend two instructions going to the top
  • Directly append to the list instead of calling a wrapper that appens the list

In conclusion, listcomp is a lot faster if you are going to use the values, but if you don't it's pretty slow.

Real speeds

Test one: for loop is faster by about one-third*

Test two: list comprehension is faster by about two-thirds*

*About -> second decimal place acurrate

Are list-comprehensions and functional functions faster than for loops?

The following are rough guidelines and educated guesses based on experience. You should timeit or profile your concrete use case to get hard numbers, and those numbers may occasionally disagree with the below.

A list comprehension is usually a tiny bit faster than the precisely equivalent for loop (that actually builds a list), most likely because it doesn't have to look up the list and its append method on every iteration. However, a list comprehension still does a bytecode-level loop:

>>> dis.dis(<the code object for `[x for x in range(10)]`>)
1 0 BUILD_LIST 0
3 LOAD_FAST 0 (.0)
>> 6 FOR_ITER 12 (to 21)
9 STORE_FAST 1 (x)
12 LOAD_FAST 1 (x)
15 LIST_APPEND 2
18 JUMP_ABSOLUTE 6
>> 21 RETURN_VALUE

Using a list comprehension in place of a loop that doesn't build a list, nonsensically accumulating a list of meaningless values and then throwing the list away, is often slower because of the overhead of creating and extending the list. List comprehensions aren't magic that is inherently faster than a good old loop.

As for functional list processing functions: While these are written in C and probably outperform equivalent functions written in Python, they are not necessarily the fastest option. Some speed up is expected if the function is written in C too. But most cases using a lambda (or other Python function), the overhead of repeatedly setting up Python stack frames etc. eats up any savings. Simply doing the same work in-line, without function calls (e.g. a list comprehension instead of map or filter) is often slightly faster.

Suppose that in a game that I'm developing I need to draw complex and huge maps using for loops. This question would be definitely relevant, for if a list-comprehension, for example, is indeed faster, it would be a much better option in order to avoid lags (Despite the visual complexity of the code).

Chances are, if code like this isn't already fast enough when written in good non-"optimized" Python, no amount of Python level micro optimization is going to make it fast enough and you should start thinking about dropping to C. While extensive micro optimizations can often speed up Python code considerably, there is a low (in absolute terms) limit to this. Moreover, even before you hit that ceiling, it becomes simply more cost efficient (15% speedup vs. 300% speed up with the same effort) to bite the bullet and write some C.

Why is my list comprehension function slower than my list appending function for string concatenation?

With the list-comprehension, three if-conditions, one function call and global.
The second version only uses two if-conditions.

You can reduce to one if-condition inside the loop, to get even faster results:

def string_compression(string):
string = iter(string)
char = next(string)
result = [char]
counter = 1
for new_char in string:
if new_char != char:
# if char doesn't equal previous char, append counter and new char then reset counter
result.append(str(counter))
result.append(new_char)
counter = 0
char = new_char
counter += 1 # inc counter every pass
result.append(str(counter)) # append final count of latest char
return ''.join(result)

Why is list.remove faster than list comprehension?

As mentioned in the comments, your list comprehension is O(n) no matter what the content of the list, while remove will iterate over the list until the first element is present and then will break. So this depends on the position of the element you want to remove.

A second reason that the remove is much faster is that it's implemented in C, the interpreter has overhead of calling the magic method __eq__ while the C code calls a C function (PyObject_RichCompareBool).

You can see the source code here:

https://svn.python.org/projects/python/trunk/Objects/listobject.c

Search for listremove



Related Topics



Leave a reply



Submit