How do I make a flat list out of a list of lists?
Given a list of lists l
,
flat_list = [item for sublist in l for item in sublist]
which means:
flat_list = []
for sublist in l:
for item in sublist:
flat_list.append(item)
is faster than the shortcuts posted so far. (l
is the list to flatten.)
Here is the corresponding function:
def flatten(l):
return [item for sublist in l for item in sublist]
As evidence, you can use the timeit
module in the standard library:
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop
Explanation: the shortcuts based on +
(including the implied use in sum
) are, of necessity, O(L**2)
when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2
.
The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.
Flatten an irregular (arbitrarily nested) list of lists
Using generator functions can make your example easier to read and improve performance.
Python 2
Using the Iterable
ABC added in 2.6:
from collections import Iterable
def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, basestring):
for item in flatten(x):
yield item
else:
yield x
Python 3
In Python 3, basestring
is no more, but the tuple (str, bytes)
gives the same effect. Also, the yield from
operator returns an item from a generator one at a time.
from collections.abc import Iterable
def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
yield from flatten(x)
else:
yield x
Flatten nested lists in a list
Loop through the list, unlist recursively, then return as a list:
lapply(LIST2, function(i) list(unlist(i, recursive = TRUE)))
How can I completely flatten a list (of lists (of lists) ... )
Unfortunately there's no direct built-in that completely flattens a data structure even when sub-lists are wrapped in item containers.
Some possible solutions:
Gather/take
You've already come up with a solution like this, but deepmap
can take care of all the tree iteration logic to simplify it. Its callback is called once for every leaf node of the data structure, so using take
as the callback means that gather
will collect a flat list of the leaf values:
sub reallyflat (+@list) { gather @list.deepmap: *.take }
Custom recursive function
You could use a subroutine like this to recursively slip
lists into their parent:
multi reallyflat (@list) { @list.map: { slip reallyflat $_ } }
multi reallyflat (\leaf) { leaf }
Another approach would be to recursively apply <>
to sub-lists to free them of any item containers they're wrapped in, and then call flat
on the result:
sub reallyflat (+@list) {
flat do for @list {
when Iterable { reallyflat $_<> }
default { $_ }
}
}
Multi-dimensional array indexing
The postcircumfix [ ]
operator can be used with a multi-dimensional subscript to get a flat list of leaf nodes up to a certain depth, though unfortunately the "infinite depth" version is not yet implemented:
say @ab[*;*]; # (a (b c) (d) e f [a (b c)] x (y z) w)
say @ab[*;*;*]; # (a b c d e f a (b c) x y z w)
say @ab[*;*;*;*]; # (a b c d e f a b c x y z w)
say @ab[**]; # HyperWhatever in array index not yet implemented. Sorry.
Still, if you know the maximum depth of your data structure this is a viable solution.
Avoiding containerization
The built-in flat
function can flatten a deeply nested lists of lists just fine. The problem is just that it doesn't descend into item containers (Scalar
s). Common sources of unintentional item containers in nested lists are:
An
Array
(but notList
) wraps each of its elements in a fresh item container, no matter if it had one before.- How to avoid: Use Lists of Lists instead of Arrays of Arrays, if you don't need the mutability that Array provides. Binding with
:=
can be used instead of assignment, to store aList
in a@
variable without turning it into anArray
:
my @a := 'a', ('b', 'c' );
my @b := ('d',), 'e', 'f', @a;
say flat @b; # (d e f a b c)
- How to avoid: Use Lists of Lists instead of Arrays of Arrays, if you don't need the mutability that Array provides. Binding with
$
variables are item containers.- How to avoid: When storing a list in a
$
variable and then inserting it as an element into another list, use<>
to decontainerize it. The parent list's container can also be bypassed using|
when passing it toflat
:
my $a = (3, 4, 5);
my $b = (1, 2, $a<>, 6);
say flat |$b; # (1 2 3 4 5 6)
- How to avoid: When storing a list in a
Related Topics
Generalise Slicing Operation in a Numpy Array
How to Get Md5 Sum of a String Using Python
How to Share Numpy Random State of a Parent Process with Child Processes
How to Use MySQLdb with Python and Django in Osx 10.6
Python Read File as Stream from Hdfs
Read a File Line by Line from S3 Using Boto
Set Up Python Simplehttpserver on Windows
Efficiently Updating Database Using SQLalchemy Orm
Writing to a File in a for Loop Only Writes the Last Value
Add Column with Constant Value to Pandas Dataframe
Installing MySQL Python on MAC Os X
Can Pandas Plot a Histogram of Dates
How to Let a Raw_Input Repeat Until I Want to Quit
How to Implement a Minimal Server for Ajax in Python