Comprehension for Flattening a Sequence of Sequences

Comprehension for flattening a sequence of sequences?

With a comprehension? Well...

>>> seq = '012345'
>>> swapped_pairs = zip(seq[1::2], seq[::2])
>>> ''.join(item for pair in swapped_pairs for item in pair)
'103254'

How does the list comprehension to flatten a python list work?

Let's take a look at your list comprehension then, but first let's start with list comprehension at it's easiest.

l = [1,2,3,4,5]
print [x for x in l] # prints [1, 2, 3, 4, 5]

You can look at this the same as a for loop structured like so:

for x in l:
print x

Now let's look at another one:

l = [1,2,3,4,5]
a = [x for x in l if x % 2 == 0]
print a # prints [2,4]

That is the exact same as this:

a = []
l = [1,2,3,4,5]
for x in l:
if x % 2 == 0:
a.append(x)
print a # prints [2,4]

Now let's take a look at the examples you provided.

l = [[1,2,3],[4,5,6]]
flattened_l = [item for sublist in l for item in sublist]
print flattened_l # prints [1,2,3,4,5,6]

For list comprehension start at the farthest to the left for loop and work your way in. The variable, item, in this case, is what will be added. It will produce this equivalent:

l = [[1,2,3],[4,5,6]]
flattened_l = []
for sublist in l:
for item in sublist:
flattened_l.append(item)

Now for the last one

exactly_the_same_as_l = [item for item in sublist for sublist in l]

Using the same knowledge we can create a for loop and see how it would behave:

for item in sublist:
for sublist in l:
exactly_the_same_as_l.append(item)

Now the only reason the above one works is because when flattened_l was created, it also created sublist. It is a scoping reason to why that did not throw an error. If you ran that without defining the flattened_l first, you would get a NameError

Can I flatten a list using recursive function and list comprehension?

You can though it is a little strange:

def fun_d(x):
return [i for e in x for i in (fun_d(e) if isinstance(e,list) else [e])]

In[] :
l=[1,[2,3],[4,5,[6,7,8,9]]]
fun_d(l)

Out[]:
[1, 2, 3, 4, 5, 6, 7, 8, 9]

You may want to use Sequence rather than list so other types of sequences would also be flattened.

from typing import Sequence

def fun_d(x):
return [i for e in x for i in (fun_d(e) if isinstance(e, Sequence) and not isinstance(e, str) else [e])]

A named lambda is trivial, for a truly anonymous lambda you can use a y-combinator

And just to show how ridiculous this is, an anonymous recursive lambda:

In []:
lis = [1,[2,3],[4,5,[6,[7,8],9]]]
(lambda f: f(f))(lambda f: (lambda fun_d: lambda x: [i for e in x for i in (fun_d(e) if isinstance(e, list) else [e])])(lambda x: f(f)(x)))(lis)

Out[]:
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Flattening a shallow list in Python

If you're just looking to iterate over a flattened version of the data structure and don't need an indexable sequence, consider itertools.chain and company.

>>> list_of_menuitems = [['image00', 'image01'], ['image10'], []]
>>> import itertools
>>> chain = itertools.chain(*list_of_menuitems)
>>> print(list(chain))
['image00', 'image01', 'image10']

It will work on anything that's iterable, which should include Django's iterable QuerySets, which it appears that you're using in the question.

Edit: This is probably as good as a reduce anyway, because reduce will have the same overhead copying the items into the list that's being extended. chain will only incur this (same) overhead if you run list(chain) at the end.

Meta-Edit: Actually, it's less overhead than the question's proposed solution, because you throw away the temporary lists you create when you extend the original with the temporary.

Edit: As J.F. Sebastian says itertools.chain.from_iterable avoids the unpacking and you should use that to avoid * magic, but the timeit app shows negligible performance difference.

Flatten an irregular (arbitrarily nested) list of lists

Using generator functions can make your example easier to read and improve performance.

Python 2

Using the Iterable ABC added in 2.6:

from collections import Iterable

def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, basestring):
for item in flatten(x):
yield item
else:
yield x

Python 3

In Python 3, basestring is no more, but the tuple (str, bytes) gives the same effect. Also, the yield from operator returns an item from a generator one at a time.

from collections.abc import Iterable

def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
yield from flatten(x)
else:
yield x

Idiom for flattening a shallow nested list: how does it work?

From the list displays documentation:

When a list comprehension is supplied, it consists of a single expression followed by at least one for clause and zero or more for or if clauses. In this case, the elements of the new list are those that would be produced by considering each of the for or if clauses a block, nesting from left to right, and evaluating the expression to produce a list element each time the innermost block is reached.

Thus, your expression can be rewritten as:

thingys = []
for y in l:
for x in y:
thingys.append(x)

Flatten a list of strings to characters and then de-dupe the new list

Easiest is to use a set comprehension instead of a list comp:

letterlist = {lt for wd in wordlist for lt in wd}

All I did was replace the square brackets with curly braces. This works in Python 2.7 and up.

For Python 2.6 and earlier, you'd use the set() callable with a generator expression instead:

letterlist = set(lt for wd in wordlist for lt in wd)

Last, but not least, you can replace the comprehension syntax altogether by producing the letters from all sequences by chaining the strings together, treat them all like one long sequence, with itertools.chain.from_iterable(); you give that a sequence of sequences and it'll give you back one long sequence:

from itertools import chain
letterlist = set(chain.from_iterable(wordlist))

How do I make a flat list out of a list of lists?

Given a list of lists l,

flat_list = [item for sublist in l for item in sublist]

which means:

flat_list = []
for sublist in l:
for item in sublist:
flat_list.append(item)

is faster than the shortcuts posted so far. (l is the list to flatten.)

Here is the corresponding function:

def flatten(l):
return [item for sublist in l for item in sublist]

As evidence, you can use the timeit module in the standard library:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the shortcuts based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.



Related Topics



Leave a reply



Submit