Why Is [] Faster Than List()

Why is [] faster than list()?

Because [] and {} are literal syntax. Python can create bytecode just to create the list or dictionary objects:

>>> import dis
>>> dis.dis(compile('[]', '', 'eval'))
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
>>> dis.dis(compile('{}', '', 'eval'))
  1           0 BUILD_MAP                0
              3 RETURN_VALUE

list() and dict() are separate objects. Their names need to be resolved, the stack has to be involved to push the arguments, the frame has to be stored to retrieve later, and a call has to be made. That all takes more time.

For the empty case, that means you have at the very least a LOAD_NAME (which has to search through the global namespace as well as the builtins module) followed by a CALL_FUNCTION, which has to preserve the current frame:

>>> dis.dis(compile('list()', '', 'eval'))
  1           0 LOAD_NAME                0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
>>> dis.dis(compile('dict()', '', 'eval'))
  1           0 LOAD_NAME                0 (dict)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE

You can time the name lookup separately with timeit:

>>> import timeit
>>> timeit.timeit('list', number=10**7)
0.30749011039733887
>>> timeit.timeit('dict', number=10**7)
0.4215109348297119

The time discrepancy there is probably a dictionary hash collision. Subtract those times from the times for calling those objects, and compare the result against the times for using literals:

>>> timeit.timeit('[]', number=10**7)
0.30478692054748535
>>> timeit.timeit('{}', number=10**7)
0.31482696533203125
>>> timeit.timeit('list()', number=10**7)
0.9991960525512695
>>> timeit.timeit('dict()', number=10**7)
1.0200958251953125

So having to call the object takes an additional 1.00 - 0.31 - 0.30 == 0.39 seconds per 10 million calls.

You can avoid the global lookup cost by aliasing the global names as locals (using a timeit setup, everything you bind to a name is a local):

>>> timeit.timeit('_list', '_list = list', number=10**7)
0.1866450309753418
>>> timeit.timeit('_dict', '_dict = dict', number=10**7)
0.19016098976135254
>>> timeit.timeit('_list()', '_list = list', number=10**7)
0.841480016708374
>>> timeit.timeit('_dict()', '_dict = dict', number=10**7)
0.7233691215515137

but you never can overcome that CALL_FUNCTION cost.

Why is list.remove faster than list comprehension?

As mentioned in the comments, your list comprehension is O(n) no matter what the content of the list, while remove will iterate over the list until the first element is present and then will break. So this depends on the position of the element you want to remove.

A second reason that the remove is much faster is that it's implemented in C, the interpreter has overhead of calling the magic method __eq__ while the C code calls a C function (PyObject_RichCompareBool).

You can see the source code here:

https://svn.python.org/projects/python/trunk/Objects/listobject.c

Search for listremove

Why is tuple faster than list in Python?

The reported "speed of construction" ratio only holds for constant tuples (ones whose items are expressed by literals). Observe carefully (and repeat on your machine -- you just need to type the commands at a shell/command window!)...:

$ python3.1 -mtimeit -s'x,y,z=1,2,3' '[x,y,z]'
1000000 loops, best of 3: 0.379 usec per loop
$ python3.1 -mtimeit '[1,2,3]'
1000000 loops, best of 3: 0.413 usec per loop

$ python3.1 -mtimeit -s'x,y,z=1,2,3' '(x,y,z)'
10000000 loops, best of 3: 0.174 usec per loop
$ python3.1 -mtimeit '(1,2,3)'
10000000 loops, best of 3: 0.0602 usec per loop

$ python2.6 -mtimeit -s'x,y,z=1,2,3' '[x,y,z]'
1000000 loops, best of 3: 0.352 usec per loop
$ python2.6 -mtimeit '[1,2,3]'
1000000 loops, best of 3: 0.358 usec per loop

$ python2.6 -mtimeit -s'x,y,z=1,2,3' '(x,y,z)'
10000000 loops, best of 3: 0.157 usec per loop
$ python2.6 -mtimeit '(1,2,3)'
10000000 loops, best of 3: 0.0527 usec per loop

I didn't do the measurements on 3.0 because of course I don't have it around -- it's totally obsolete and there is absolutely no reason to keep it around, since 3.1 is superior to it in every way (Python 2.7, if you can upgrade to it, measures as being almost 20% faster than 2.6 in each task -- and 2.6, as you see, is faster than 3.1 -- so, if you care seriously about performance, Python 2.7 is really the only release you should be going for!).

Anyway, the key point here is that, in each Python release, building a list out of constant literals is about the same speed, or slightly slower, than building it out of values referenced by variables; but tuples behave very differently -- building a tuple out of constant literals is typically three times as fast as building it out of values referenced by variables! You may wonder how this can be, right?-)

Answer: a tuple made out of constant literals can easily be identified by the Python compiler as being one, immutable constant literal itself: so it's essentially built just once, when the compiler turns the source into bytecodes, and stashed away in the "constants table" of the relevant function or module. When those bytecodes execute, they just need to recover the pre-built constant tuple -- hey presto!-)

This easy optimization cannot be applied to lists, because a list is a mutable object, so it's crucial that, if the same expression such as [1, 2, 3] executes twice (in a loop -- the timeit module makes the loop on your behalf;-), a fresh new list object is constructed anew each time -- and that construction (like the construction of a tuple when the compiler cannot trivially identify it as a compile-time constant and immutable object) does take a little while.

That being said, tuple construction (when both constructions actually have to
occur) still is about twice as fast as list construction -- and that discrepancy can be explained by the tuple's sheer simplicity, which other answers have mentioned repeatedly. But, that simplicity does not account for a speedup of six times or more, as you observe if you only compare the construction of lists and tuples with simple constant literals as their items!_)

What makes sets faster than lists?

Sets are implemented using hash tables. Whenever you add an object to a set, the position within the memory of the set object is determined using the hash of the object to be added. When testing for membership, all that needs to be done is basically to look if the object is at the position determined by its hash, so the speed of this operation does not depend on the size of the set. For lists, in contrast, the whole list needs to be searched, which will become slower as the list grows.

This is also the reason that sets do not preserve the order of the objects you add.

Note that sets aren't faster than lists in general -- membership test is faster for sets, and so is removing an element. As long as you don't need these operations, lists are often faster.

Why use Lists, when Arrays are faster?

How can Array be so fast even though it has to copy around items continuously?

Arrays are faster for linear processing because array contents are stored contiguously in memory. When you access memory linearly, multiple objects are fetched to the processor cache simultaneously. Linked list nodes on the other hand are scattered throughout the memory, so processing them linearly results in more acccesses in main memory. Reading cache is much, much faster than reading main memory.

And why even use Lists then?

One major reason to use a linked list, is that inserting new elements, or removing existing ones, does not invalidate references (including iterators and pointers) to other elements in the linked list. An array can not have such guarantee.

Are sets really faster than lists?

What you've been told is correct, searching in a set is O(1) since members are stored using a hash table. Searching in an (unsorted) array is O(n).

The problem with your tests is that you're both creating the set/array and searching it in the same line. In this case, you're both testing the speed of inserting all the items, and then searching for a single entry.

Try something like this instead:

test_range = range(10000000)
test_set = set(test_range)
test_array = list(test_range)

timeit.timeit('10000 in test_set', number=10)
timeit.timeit('10000 in test_array', number=10)

Why is a list comprehension so much faster than appending to a list?

List comprehension is basically just a "syntactic sugar" for the regular for loop. In this case the reason that it performs better is because it doesn't need to load the append attribute of the list and call it as a function at each iteration. In other words and in general, list comprehensions perform faster because suspending and resuming a function's frame, or multiple functions in other cases, is slower than creating a list on demand.

Consider the following examples :

In [1]: def f1(): 
   ...:         l = [] 
   ...:         for i in range(5): 
   ...:             l.append(i) 
   ...:     
   ...:  
   ...: def f2(): 
   ...:     [i for i in range(5)] 
   ...:                                                                                                                                                                                                     

In [3]: import dis                                                                                                                                                                                          

In [4]: dis.dis(f1)                                                                                                                                                                                         
  2           0 BUILD_LIST               0
              2 STORE_FAST               0 (l)

  3           4 LOAD_GLOBAL              0 (range)
              6 LOAD_CONST               1 (5)
              8 CALL_FUNCTION            1
             10 GET_ITER
        >>   12 FOR_ITER                14 (to 28)
             14 STORE_FAST               1 (i)

  4          16 LOAD_FAST                0 (l)
             18 LOAD_METHOD              1 (append)
             20 LOAD_FAST                1 (i)
             22 CALL_METHOD              1
             24 POP_TOP
             26 JUMP_ABSOLUTE           12
        >>   28 LOAD_CONST               0 (None)
             30 RETURN_VALUE

In [5]:                                                                                                                                                                                                     

In [5]: dis.dis(f2)                                                                                                                                                                                         
  8           0 LOAD_CONST               1 (<code object <listcomp> at 0x7f397abc0d40, file "<ipython-input-1-45c11e415ee9>", line 8>)
              2 LOAD_CONST               2 ('f2.<locals>.<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_GLOBAL              0 (range)
              8 LOAD_CONST               3 (5)
             10 CALL_FUNCTION            1
             12 GET_ITER
             14 CALL_FUNCTION            1
             16 POP_TOP
             18 LOAD_CONST               0 (None)
             20 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x7f397abc0d40, file "<ipython-input-1-45c11e415ee9>", line 8>:
  8           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 8 (to 14)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LIST_APPEND              2
             12 JUMP_ABSOLUTE            4
        >>   14 RETURN_VALUE

In [6]:

You can see that on offset 18 in the first function we have an append attribute while there's no such thing in second function using list comprehension. All those extra bytecodes will make the appending approach slower and since in this case you'll have loading of the append attribute in each iteration, in the end it will make the code to take approximately twice as slower as the second function using only list comprehension.

Why in python does the list(range) way much faster than the [i for i in range()] way when creating a number sequence?

The short answer is that CPython is not a very performant implementation of Python. CPython barely ever does much optimizations even when such optimizations might be trivially simple, this is deliberately done to keep CPython implementation simple.

For the long answer, continue for the next few paragraphs.

The key to understanding the performance difference between list () and list comprehension is in the disassembly:

import dis
N = 10000
def m1():
    return list(range(N))
def m2():
    return [i for i in range(N)]
dis.dis(m1)
dis.dis(m2)

The disassembly for m1 outputs:

  8           0 LOAD_GLOBAL              0 (list)
              3 LOAD_GLOBAL              1 (range)
              6 LOAD_GLOBAL              2 (N)
              9 CALL_FUNCTION            1
             12 CALL_FUNCTION            1
             15 RETURN_VALUE

as the disassembly shows, the entire conversion is just a single bytecode instruction which is function call to the list method, and the entire implementation of list is in C.

On the other hand, the disassembly for the list comprehension:

 11           0 BUILD_LIST               0
              3 LOAD_GLOBAL              0 (range)
              6 LOAD_GLOBAL              1 (N)
              9 CALL_FUNCTION            1
             12 GET_ITER
        >>   13 FOR_ITER                12 (to 28)
             16 STORE_FAST               0 (i)
             19 LOAD_FAST                0 (i)
             22 LIST_APPEND              2
             25 JUMP_ABSOLUTE           13
        >>   28 RETURN_VALUE

as the disassembly shows, a list comprehension is a lot more complicated than list(range(N)), but more importantly is that the looping of the list comprehension happens over multiple bytecodes instruction. Processing each bytecode costs an entire interpreter loop, with the interpreter fetching the next bytecode, figuring out what to do, and executing the bytecode instruction, all these overhead does not exist in list().

So yes, the implementation of list is significantly more efficient than list comprehension, so yes you would want to just call list if all you want to do is realize a generator and not use any of the other list comprehension features.

Why Is [] Faster Than List()