What is the fastest way to flatten arbitrarily nested lists in Python?
Here's a recursive approach that is string friendly:
nests = [1, 2, [3, 4, [5],['hi']], [6, [[[7, 'hello']]]]]
def flatten(container):
for i in container:
if isinstance(i, (list,tuple)):
for j in flatten(i):
yield j
else:
yield i
print list(flatten(nests))
returns:
[1, 2, 3, 4, 5, 'hi', 6, 7, 'hello']
Note, this doesn't make any guarantees for speed or overhead use, but illustrates a recursive solution that hopefully will be helpful.
Flatten an irregular (arbitrarily nested) list of lists
Using generator functions can make your example easier to read and improve performance.
Python 2
Using the Iterable
ABC added in 2.6:
from collections import Iterable
def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, basestring):
for item in flatten(x):
yield item
else:
yield x
Python 3
In Python 3, basestring
is no more, but the tuple (str, bytes)
gives the same effect. Also, the yield from
operator returns an item from a generator one at a time.
from collections.abc import Iterable
def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
yield from flatten(x)
else:
yield x
Flattening arbitrarily deep nested lists
You can use a recursive generator to yield elements from nested lists:
from typing import Collection
def check_nested(obj):
for sub_obj in obj:
# tuples, lists, dicts, and sets are all Collections
if isinstance(sub_obj, Collection):
yield from check_nested(sub_obj)
else:
yield sub_obj
l = [[[[[1, 2]]]]]
list(check_nested(l))
[1, 2]
# This will work for other formats
l = [[[[[1, 2]]]], [[3, 4]]]
list(check_nested(l))
[1, 2, 3, 4]
Note about typing.Collection
Because this got a new upvote, I wanted to come back and correct something:
from typing import Collection
isinstance('', Collection)
True
This could result in unintended errors, so a better solution would be an instance check:
def check_nested(obj):
for sub_obj in obj:
if isinstance(sub_obj, (list, dict, set, tuple)):
yield from check_nested(sub_obj)
else:
yield sub_obj
Flattening a shallow list in Python
If you're just looking to iterate over a flattened version of the data structure and don't need an indexable sequence, consider itertools.chain and company.
>>> list_of_menuitems = [['image00', 'image01'], ['image10'], []]
>>> import itertools
>>> chain = itertools.chain(*list_of_menuitems)
>>> print(list(chain))
['image00', 'image01', 'image10']
It will work on anything that's iterable, which should include Django's iterable QuerySet
s, which it appears that you're using in the question.
Edit: This is probably as good as a reduce anyway, because reduce will have the same overhead copying the items into the list that's being extended. chain
will only incur this (same) overhead if you run list(chain)
at the end.
Meta-Edit: Actually, it's less overhead than the question's proposed solution, because you throw away the temporary lists you create when you extend the original with the temporary.
Edit: As J.F. Sebastian says itertools.chain.from_iterable
avoids the unpacking and you should use that to avoid *
magic, but the timeit app shows negligible performance difference.
How do I make a flat list out of a list of lists?
Given a list of lists l
,
flat_list = [item for sublist in l for item in sublist]
which means:
flat_list = []
for sublist in l:
for item in sublist:
flat_list.append(item)
is faster than the shortcuts posted so far. (l
is the list to flatten.)
Here is the corresponding function:
def flatten(l):
return [item for sublist in l for item in sublist]
As evidence, you can use the timeit
module in the standard library:
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop
Explanation: the shortcuts based on +
(including the implied use in sum
) are, of necessity, O(L**2)
when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2
.
The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.
Recursively flatten nested list in python
So, you're trying to flatten a list. You're on the right track but you've made a couple of mistakes. Here they are.
Move your
try-except
inside the loop. With your code, if aTypeError
is raised for one element, then the loop stops running. You don't want that happening.In the try, you yield nothing. Only making a function call. You should be returning something from there as well. I'd recommend
yield from
if you have python3.3+.Finally, in the
except
, you need toyield element
, nottoflatten
. Don't yield the entire list.
def flatten(toflatten):
for element in toflatten:
try:
yield from flatten(element)
except TypeError:
yield element
This gives,
>>> list(flatten([1,2,3,[4,5,6]]))
[1, 2, 3, 4, 5, 6]
You've used the EAFP (Easier to Ask Forgiveness, than Permission), which is nice. That's one way to do it (my favourite, actually), but there's a drawback: This crashes on strings.
There's another approach: LYBL (Look Before You Leap). It consists of being more cautious, using if
statements so errors are not raised.
def flatten(toflatten):
for element in toflatten:
if isinstance(element, list):
yield from flatten(element)
else:
yield element
Which works the same as before, and gives,
>>> list(flatten([1,2,3,[4,5,6]]))
[1, 2, 3, 4, 5, 6]
However, this is advantageous, in the fact that the yield from
generator delegation is only invoked upon sub-lists. Did I mention it also works with string elements?
>>> list(flatten([1,2,3,[4,5,'abc']]))
[1, 2, 3, 4, 5, 'abc']
Note that in either case, the caveat of infinite recursion, if you have recursively defined lists. For example, flatten
will crash with this sort of input.
x = [1, 2, 3]
x.append(x)
flatten(x)
You'll end up getting a runtime error:
RuntimeError: maximum recursion depth exceeded
Flatten an irregular (arbitrarily nested) list of lists
Using generator functions can make your example easier to read and improve performance.
Python 2
Using the Iterable
ABC added in 2.6:
from collections import Iterable
def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, basestring):
for item in flatten(x):
yield item
else:
yield x
Python 3
In Python 3, basestring
is no more, but the tuple (str, bytes)
gives the same effect. Also, the yield from
operator returns an item from a generator one at a time.
from collections.abc import Iterable
def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
yield from flatten(x)
else:
yield x
Related Topics
How to Create a Namespace Package in Python
Parse HTML Table to Python List
How to Make a Call to an Executable from Python Script
How I Open Remote Server Folder Using Python
Loading .Rdata Files into Python
How Is the 'Is' Keyword Implemented in Python
Convert String in Base64 to Image and Save on Filesystem
Transpose Column to Row with Spark
Matplotlib Make Tick Labels Font Size Smaller
Difference Between Subprocess.Popen and Os.System
Editing the Date Formatting of X-Axis Tick Labels in Matplotlib
Make Virtualenv Inherit Specific Packages from Your Global Site-Packages
Split a Large Pandas Dataframe
Tkinter: "Python May Not Be Configured for Tk"
What's a Good Rate Limiting Algorithm