Why Sum on Lists Is (Sometimes) Faster Than Itertools.Chain

why sum on lists is (sometimes) faster than itertools.chain?

Your test inputs are tiny. At those scales, the horrific O(n^2) asymptotic runtime of the sum version isn't visible. The timings are dominated by constant factors, and sum has a better constant factor, since it doesn't have to work through iterators.

With bigger lists, it becomes clear that sum is not at all designed for this kind of thing:

>>> timeit.timeit('list(itertools.chain.from_iterable(l))',
...               'l = [[i] for i in xrange(5000)]; import itertools',
...               number=1000)
0.20425895931668947
>>> timeit.timeit('sum(l, [])', 'l = [[i] for i in xrange(5000)]', number=1000)
49.55303902059097

could sum be faster on lists

We could try to make sum() smarter, but Alex Martelli and Guido van Rossum wanted to keep it focused on arithmetic summations.

FWIW, you should get reasonable performance with this simple code:

result = []
for seq in mylists:
    result += seq

For your other question, "why can't sum use this accumulative approach when available?", see this comment for builtin_sum() in Python/bltinmodule.c:

    /* It's tempting to use PyNumber_InPlaceAdd instead of
       PyNumber_Add here, to avoid quadratic running time
       when doing 'sum(list_of_lists, [])'.  However, this
       would produce a change in behaviour: a snippet like

         empty = []
         sum([[x] for x in range(10)], empty)

       would change the value of empty. */

why 'functools.reduce' and 'itertools.chain.from_itertools' had different computation time when nested list merged

Yes, because list concatenation, i.e. using +, is an O(N) operation. When you do that to incrementally build a list of size N, it becomes O(N²).

Instead, using chain.from_iterable will simply iterate over all N items in the final list, using the list type constructor, which will have linear performance.

This is why you shouldn't use sum to flatten a list (note, reduce(lambda x, y: x+y,...) is simply sum).

Note, the idiomatic way to flatten a nested list like this is to use a list comprehension:

[x for sub in a for x in sub]

This is such an anti-pattern, the sum method prevents you doing it with str objects:

>>> sum(['here', 'is', 'some', 'strings'], '')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sum() can't sum strings [use ''.join(seq) instead]

Note, your reduce/sum approach is equivalent to:

result = []
for sub in a:
    result = result + sub

Which demonstrates the expensive + in the loop quite clearly. Note, the following naive approach actually has O(N) behavior instead of O(N²):

result = []
for sub in a:
    result += sub

That is because my_list += something is equivalent to my_list.extend(something), and .extend (along with .append) have amortized constant-time behavior, so overall, it will be O(N).

Why is itertools.chain faster than a flattening list comprehension?

Here is itertools.chain.from_iterable. It's not hard to read even if you don't know C and you can tell everything is happening at the c level (before being used to generate a list in your code).

The bytecode for list comprehensions is like so:

def f(lsts):
    return [x for y in lsts for x in y]

dis.dis(f.__code__.co_consts[1])
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                18 (to 24)
              6 STORE_FAST               1 (y)
              8 LOAD_FAST                1 (y)
             10 GET_ITER
        >>   12 FOR_ITER                 8 (to 22)
             14 STORE_FAST               2 (x)
             16 LOAD_FAST                2 (x)
             18 LIST_APPEND              3
             20 JUMP_ABSOLUTE           12
        >>   22 JUMP_ABSOLUTE            4
        >>   24 RETURN_VALUE

These are all the python interpreter operations involved in creating a list comprehension. Just having all the operations at the C level (in chain) rather than having the interpreter step over each byte code step (in the comprehension) is what will give you that performance boost.

Still, that boost is so small I wouldn't worry about it. This is python, readability over speed.

Edit:

For a list wrapped generator comprehension

def g(lists):
    return list(x for y in lsts for x in y)

# the comprehension
dis.dis(g.__code__.co_consts[1])
  2           0 LOAD_FAST                0 (.0)
        >>    2 FOR_ITER                20 (to 24)
              4 STORE_FAST               1 (y)
              6 LOAD_FAST                1 (y)
              8 GET_ITER
        >>   10 FOR_ITER                10 (to 22)
             12 STORE_FAST               2 (x)
             14 LOAD_FAST                2 (x)
             16 YIELD_VALUE
             18 POP_TOP
             20 JUMP_ABSOLUTE           10
        >>   22 JUMP_ABSOLUTE            2
        >>   24 LOAD_CONST               0 (None)
             26 RETURN_VALUE

So the interpreter has a similar number of steps to go to in running the generator expression being unpacked by list, but as you would expect, the python level overhead of having list unpack a generator (as opposed to the C LIST_APPEND instruction) is what slows it down.

dis.dis(f)
  2           0 LOAD_CONST               1 (<code object <listcomp> at 0x000000000FB58B70, file "<ipython-input-33-1d46ced34d66>", line 2>)
              2 LOAD_CONST               2 ('f.<locals>.<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_FAST                0 (lsts)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

dis.dis(g)
  2           0 LOAD_GLOBAL              0 (list)
              2 LOAD_CONST               1 (<code object <genexpr> at 0x000000000FF6F420, file "<ipython-input-40-0334a7cdeb8f>", line 2>)
              4 LOAD_CONST               2 ('g.<locals>.<genexpr>')
              6 MAKE_FUNCTION            0
              8 LOAD_GLOBAL              1 (lsts)
             10 GET_ITER
             12 CALL_FUNCTION            1
             14 CALL_FUNCTION            1
             16 RETURN_VALUE

How to get a set from a Pandas series formed by lists of terms

O(n) extended iterable unpacking with set.union.

>>> set().union(*my_series)
{'a', 'b', 'c', 'd', 'e'}

If you prefer old-fashioned, there's the set-comprehension equivalent -

>>> {y for x in my_series for y in x}
{'a', 'b', 'c', 'd', 'e'}

Concatenating all arrays in a bigger array python

You could use chain.from_iterable:

from itertools import chain

a = [[0, 1], [2, 3], [4, 5], [6, 7]]
result = list(chain.from_iterable(a))

print(result)

Output

[0, 1, 2, 3, 4, 5, 6, 7]

Most computationally efficient way to build a Python list with repeating and iterating numbers like [0, 0, 0, 1, 1, 1, 2, 2, 2 . . . ,etc]

Use itertools.chain on itertools.repeat iterables:

import itertools

result = list(itertools.chain.from_iterable(itertools.repeat(i,3) for i in range(32)))

print(result)

result:

[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11, 11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21, 22, 22, 22, 23, 23, 23, 24, 24, 24, 25, 25, 25, 26, 26, 26, 27, 27, 27, 28, 28, 28, 29, 29, 29, 30, 30, 30, 31, 31, 31]

This technique avoids the creation of intermediate lists and minimizes the pure python loops (one python loop total, using map could be possible to remove that last one, but that would require a lambda in that case, which adds one more function call).

EDIT: let's bench this answer and Ted's answer

import itertools,time

n=1000000

start_time = time.time()
for _ in range(n):
    list(itertools.chain.from_iterable(itertools.repeat(i,3) for i in range(32)))

print("itertools",time.time() - start_time)

start_time = time.time()
for _ in range(n):
    [i for i in range(32) for _ in range(3)]
print("flat listcomp",time.time() - start_time)

results:

itertools 10.719785928726196
flat listcomp 13.869723081588745

so using itertools instead of list comprension is around 30% faster (python 3.4, windows)

Notes:

the small number of repeats generates a lot of itertools.repeat calls in the inner loop, so in that case of 3 repeats, it's faster to do what NickA suggests:

list(itertools.chain.from_iterable((i,)*3 for i in range(32)))

(7 seconds vs 10 in the above bench)

And numpy solution is even faster (around 1.5 second), if you can use numpy:

import numpy as np
np.arange(32).repeat(3)  # credits: liliscent

Summing two lists together

start defaults to 0.

source

sum(..., [])

join list of lists in python

import itertools
a = [['a','b'], ['c']]
print(list(itertools.chain.from_iterable(a)))

This gives

['a', 'b', 'c']

Why Sum on Lists Is (Sometimes) Faster Than Itertools.Chain