What Is the Fastest Way to Flatten Arbitrarily Nested Lists in Python

What is the fastest way to flatten arbitrarily nested lists in Python?

Here's a recursive approach that is string friendly:

nests = [1, 2, [3, 4, [5],['hi']], [6, [[[7, 'hello']]]]]

def flatten(container):
for i in container:
if isinstance(i, (list,tuple)):
for j in flatten(i):
yield j
else:
yield i

print list(flatten(nests))

returns:

[1, 2, 3, 4, 5, 'hi', 6, 7, 'hello']

Note, this doesn't make any guarantees for speed or overhead use, but illustrates a recursive solution that hopefully will be helpful.

Flatten an irregular (arbitrarily nested) list of lists

Using generator functions can make your example easier to read and improve performance.

Python 2

Using the Iterable ABC added in 2.6:

from collections import Iterable

def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, basestring):
for item in flatten(x):
yield item
else:
yield x

Python 3

In Python 3, basestring is no more, but the tuple (str, bytes) gives the same effect. Also, the yield from operator returns an item from a generator one at a time.

from collections.abc import Iterable

def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
yield from flatten(x)
else:
yield x

Flattening arbitrarily deep nested lists

You can use a recursive generator to yield elements from nested lists:

from typing import Collection

def check_nested(obj):
for sub_obj in obj:
# tuples, lists, dicts, and sets are all Collections
if isinstance(sub_obj, Collection):
yield from check_nested(sub_obj)
else:
yield sub_obj

l = [[[[[1, 2]]]]]
list(check_nested(l))
[1, 2]

# This will work for other formats
l = [[[[[1, 2]]]], [[3, 4]]]

list(check_nested(l))
[1, 2, 3, 4]

Note about typing.Collection

Because this got a new upvote, I wanted to come back and correct something:

from typing import Collection

isinstance('', Collection)
True

This could result in unintended errors, so a better solution would be an instance check:

def check_nested(obj):
for sub_obj in obj:
if isinstance(sub_obj, (list, dict, set, tuple)):
yield from check_nested(sub_obj)
else:
yield sub_obj

Flattening a shallow list in Python

If you're just looking to iterate over a flattened version of the data structure and don't need an indexable sequence, consider itertools.chain and company.

>>> list_of_menuitems = [['image00', 'image01'], ['image10'], []]
>>> import itertools
>>> chain = itertools.chain(*list_of_menuitems)
>>> print(list(chain))
['image00', 'image01', 'image10']

It will work on anything that's iterable, which should include Django's iterable QuerySets, which it appears that you're using in the question.

Edit: This is probably as good as a reduce anyway, because reduce will have the same overhead copying the items into the list that's being extended. chain will only incur this (same) overhead if you run list(chain) at the end.

Meta-Edit: Actually, it's less overhead than the question's proposed solution, because you throw away the temporary lists you create when you extend the original with the temporary.

Edit: As J.F. Sebastian says itertools.chain.from_iterable avoids the unpacking and you should use that to avoid * magic, but the timeit app shows negligible performance difference.

How do I make a flat list out of a list of lists?

Given a list of lists l,

flat_list = [item for sublist in l for item in sublist]

which means:

flat_list = []
for sublist in l:
for item in sublist:
flat_list.append(item)

is faster than the shortcuts posted so far. (l is the list to flatten.)

Here is the corresponding function:

def flatten(l):
return [item for sublist in l for item in sublist]

As evidence, you can use the timeit module in the standard library:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the shortcuts based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

Recursively flatten nested list in python

So, you're trying to flatten a list. You're on the right track but you've made a couple of mistakes. Here they are.

  1. Move your try-except inside the loop. With your code, if a TypeError is raised for one element, then the loop stops running. You don't want that happening.

  2. In the try, you yield nothing. Only making a function call. You should be returning something from there as well. I'd recommend yield from if you have python3.3+.

  3. Finally, in the except, you need to yield element, not toflatten. Don't yield the entire list.

def flatten(toflatten):    
for element in toflatten:
try:
yield from flatten(element)
except TypeError:
yield element

This gives,

>>> list(flatten([1,2,3,[4,5,6]]))
[1, 2, 3, 4, 5, 6]

You've used the EAFP (Easier to Ask Forgiveness, than Permission), which is nice. That's one way to do it (my favourite, actually), but there's a drawback: This crashes on strings.


There's another approach: LYBL (Look Before You Leap). It consists of being more cautious, using if statements so errors are not raised.

def flatten(toflatten):    
for element in toflatten:
if isinstance(element, list):
yield from flatten(element)
else:
yield element

Which works the same as before, and gives,

>>> list(flatten([1,2,3,[4,5,6]]))
[1, 2, 3, 4, 5, 6]

However, this is advantageous, in the fact that the yield from generator delegation is only invoked upon sub-lists. Did I mention it also works with string elements?

>>> list(flatten([1,2,3,[4,5,'abc']]))
[1, 2, 3, 4, 5, 'abc']

Note that in either case, the caveat of infinite recursion, if you have recursively defined lists. For example, flatten will crash with this sort of input.

x = [1, 2, 3]
x.append(x)

flatten(x)

You'll end up getting a runtime error:

RuntimeError: maximum recursion depth exceeded

Flatten an irregular (arbitrarily nested) list of lists

Using generator functions can make your example easier to read and improve performance.

Python 2

Using the Iterable ABC added in 2.6:

from collections import Iterable

def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, basestring):
for item in flatten(x):
yield item
else:
yield x

Python 3

In Python 3, basestring is no more, but the tuple (str, bytes) gives the same effect. Also, the yield from operator returns an item from a generator one at a time.

from collections.abc import Iterable

def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
yield from flatten(x)
else:
yield x


Related Topics



Leave a reply



Submit