Flatten List of Lists

How do I make a flat list out of a list of lists?

Given a list of lists l,

flat_list = [item for sublist in l for item in sublist]

which means:

flat_list = []
for sublist in l:
for item in sublist:
flat_list.append(item)

is faster than the shortcuts posted so far. (l is the list to flatten.)

Here is the corresponding function:

def flatten(l):
return [item for sublist in l for item in sublist]

As evidence, you can use the timeit module in the standard library:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the shortcuts based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

How to flatten list of lists?

Use SelectMany:

var legalEntityIds = 
query.SelectMany(x => x.LegalEntities).Select(y => y.LegalEntityId).ToList();

or, using query syntax:

var legalEntityIds = (
from item in query
from legalEntity in item
select legalEntity.LegalEntityId
).ToList();

Flatten an irregular (arbitrarily nested) list of lists

Using generator functions can make your example easier to read and improve performance.

Python 2

Using the Iterable ABC added in 2.6:

from collections import Iterable

def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, basestring):
for item in flatten(x):
yield item
else:
yield x

Python 3

In Python 3, basestring is no more, but the tuple (str, bytes) gives the same effect. Also, the yield from operator returns an item from a generator one at a time.

from collections.abc import Iterable

def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
yield from flatten(x)
else:
yield x

How can I completely flatten a list (of lists (of lists) ... )

Unfortunately there's no direct built-in that completely flattens a data structure even when sub-lists are wrapped in item containers.

Some possible solutions:

Gather/take

You've already come up with a solution like this, but deepmap can take care of all the tree iteration logic to simplify it. Its callback is called once for every leaf node of the data structure, so using take as the callback means that gather will collect a flat list of the leaf values:

sub reallyflat (+@list) { gather @list.deepmap: *.take }

Custom recursive function

You could use a subroutine like this to recursively slip lists into their parent:

multi reallyflat (@list) { @list.map: { slip reallyflat $_ } }
multi reallyflat (\leaf) { leaf }

Another approach would be to recursively apply <> to sub-lists to free them of any item containers they're wrapped in, and then call flat on the result:

sub reallyflat (+@list) {
flat do for @list {
when Iterable { reallyflat $_<> }
default { $_ }
}
}

Multi-dimensional array indexing

The postcircumfix [ ] operator can be used with a multi-dimensional subscript to get a flat list of leaf nodes up to a certain depth, though unfortunately the "infinite depth" version is not yet implemented:

say @ab[*;*];     # (a (b c) (d) e f [a (b c)] x (y z) w)
say @ab[*;*;*]; # (a b c d e f a (b c) x y z w)
say @ab[*;*;*;*]; # (a b c d e f a b c x y z w)
say @ab[**]; # HyperWhatever in array index not yet implemented. Sorry.

Still, if you know the maximum depth of your data structure this is a viable solution.

Avoiding containerization

The built-in flat function can flatten a deeply nested lists of lists just fine. The problem is just that it doesn't descend into item containers (Scalars). Common sources of unintentional item containers in nested lists are:

  • An Array (but not List) wraps each of its elements in a fresh item container, no matter if it had one before.

    • How to avoid: Use Lists of Lists instead of Arrays of Arrays, if you don't need the mutability that Array provides. Binding with := can be used instead of assignment, to store a List in a @ variable without turning it into an Array:

      my @a := 'a', ('b', 'c' );
      my @b := ('d',), 'e', 'f', @a;

      say flat @b; # (d e f a b c)
  • $ variables are item containers.

    • How to avoid: When storing a list in a $ variable and then inserting it as an element into another list, use <> to decontainerize it. The parent list's container can also be bypassed using | when passing it to flat:

      my $a = (3, 4, 5);
      my $b = (1, 2, $a<>, 6);

      say flat |$b; # (1 2 3 4 5 6)

Flattening a list of strings and lists to work on each item

A more concise approach is:

for entry in initial_list:
for term in ([entry] if isinstance(entry, str) else entry):
do_something(term)


Related Topics



Leave a reply



Submit