Flattening a Shallow List in Python

Flattening a shallow list in Python

If you're just looking to iterate over a flattened version of the data structure and don't need an indexable sequence, consider itertools.chain and company.

>>> list_of_menuitems = [['image00', 'image01'], ['image10'], []]
>>> import itertools
>>> chain = itertools.chain(*list_of_menuitems)
>>> print(list(chain))
['image00', 'image01', 'image10']

It will work on anything that's iterable, which should include Django's iterable QuerySets, which it appears that you're using in the question.

Edit: This is probably as good as a reduce anyway, because reduce will have the same overhead copying the items into the list that's being extended. chain will only incur this (same) overhead if you run list(chain) at the end.

Meta-Edit: Actually, it's less overhead than the question's proposed solution, because you throw away the temporary lists you create when you extend the original with the temporary.

Edit: As J.F. Sebastian says itertools.chain.from_iterable avoids the unpacking and you should use that to avoid * magic, but the timeit app shows negligible performance difference.

Flatten a list in python

If you just want to flatten the list, just use itertools.chain.from_iterable: http://docs.python.org/library/itertools.html#itertools.chain.from_iterable

Flattening list in python

What reduce does, in plain English, is that it takes two things:

  1. A function f that:
    1. Accepts exactly 2 arguments
    2. Returns a value computed using those two values
  2. An iterable iter (e.g. a list or str)

reduce computes the result of f(iter[0],iter[1]) (the first two items of the iterable), and keeps track of this value that was just computed (call it temp). reduce then computes f(temp,iter[2]) and now keeps track of this new value. This process continues until every item in iter has been passed into f, and returns the final value computed.

The use of * in passing *myList into the reduce function is that it takes an iterable and turns it into multiple arguments. These two lines do the same thing:

myFunc(10,12)
myFunc(*[10,12])

In the case of myList, you're using a list that contains only exactly one list in it. For that reason, putting the * in front replaces myList with myList[0].

Regarding compatibility, note that the reduce function works totally fine in Python 2, but in Python 3 you'll have to do this:

import functools
functools.reduce(some_iterable)

Flattening shallow list with pandas

Just change the join to :

join = lambda list_of_lists: (val for sublist in list_of_lists for val in sublist if isinstance(sublist, list))

Here is the output :

In[69]: df_grouped['merged'] = df_grouped['recipe'].apply(lambda x: list(join(x)))
In[70]: df_grouped['merged']
Out[70]:
0 [olive oil, low sodium chicken broth, cilantro...
1 [coconut milk, frozen banana, pure acai puree,...
Name: merged, dtype: object

How do I make a flat list out of a list of lists?

Given a list of lists l,

flat_list = [item for sublist in l for item in sublist]

which means:

flat_list = []
for sublist in l:
for item in sublist:
flat_list.append(item)

is faster than the shortcuts posted so far. (l is the list to flatten.)

Here is the corresponding function:

def flatten(l):
return [item for sublist in l for item in sublist]

As evidence, you can use the timeit module in the standard library:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the shortcuts based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.

How to flatten a hetrogenous list of list into a single list in python?

Here is a relatively simple recursive version which will flatten any depth of list

l = [35,53,[525,6743],64,63,[743,754,757]]

def flatten(xs):
result = []
if isinstance(xs, (list, tuple)):
for x in xs:
result.extend(flatten(x))
else:
result.append(xs)
return result

print flatten(l)

How does the list comprehension to flatten a python list work?

Let's take a look at your list comprehension then, but first let's start with list comprehension at it's easiest.

l = [1,2,3,4,5]
print [x for x in l] # prints [1, 2, 3, 4, 5]

You can look at this the same as a for loop structured like so:

for x in l:
print x

Now let's look at another one:

l = [1,2,3,4,5]
a = [x for x in l if x % 2 == 0]
print a # prints [2,4]

That is the exact same as this:

a = []
l = [1,2,3,4,5]
for x in l:
if x % 2 == 0:
a.append(x)
print a # prints [2,4]

Now let's take a look at the examples you provided.

l = [[1,2,3],[4,5,6]]
flattened_l = [item for sublist in l for item in sublist]
print flattened_l # prints [1,2,3,4,5,6]

For list comprehension start at the farthest to the left for loop and work your way in. The variable, item, in this case, is what will be added. It will produce this equivalent:

l = [[1,2,3],[4,5,6]]
flattened_l = []
for sublist in l:
for item in sublist:
flattened_l.append(item)

Now for the last one

exactly_the_same_as_l = [item for item in sublist for sublist in l]

Using the same knowledge we can create a for loop and see how it would behave:

for item in sublist:
for sublist in l:
exactly_the_same_as_l.append(item)

Now the only reason the above one works is because when flattened_l was created, it also created sublist. It is a scoping reason to why that did not throw an error. If you ran that without defining the flattened_l first, you would get a NameError

Flatten a list of strings and lists of strings and lists in Python

The oft-repeated flatten function can be applied to this circumstance with a simple modification.

from collections import Iterable
def flatten(coll):
for i in coll:
if isinstance(i, Iterable) and not isinstance(i, basestring):
for subc in flatten(i):
yield subc
else:
yield i

basestring will make sure that both str and unicode objects are not split.

There are also versions which count on i not having the __iter__ attribute. I don't know about all that, because I think that str now has that attribute. But, it's worth mentioning.

(Please upvote the linked answer.)



Related Topics



Leave a reply



Submit