Flattening a shallow list in Python
If you're just looking to iterate over a flattened version of the data structure and don't need an indexable sequence, consider itertools.chain and company.
>>> list_of_menuitems = [['image00', 'image01'], ['image10'], []]
>>> import itertools
>>> chain = itertools.chain(*list_of_menuitems)
>>> print(list(chain))
['image00', 'image01', 'image10']
It will work on anything that's iterable, which should include Django's iterable QuerySet
s, which it appears that you're using in the question.
Edit: This is probably as good as a reduce anyway, because reduce will have the same overhead copying the items into the list that's being extended. chain
will only incur this (same) overhead if you run list(chain)
at the end.
Meta-Edit: Actually, it's less overhead than the question's proposed solution, because you throw away the temporary lists you create when you extend the original with the temporary.
Edit: As J.F. Sebastian says itertools.chain.from_iterable
avoids the unpacking and you should use that to avoid *
magic, but the timeit app shows negligible performance difference.
Flatten a list in python
If you just want to flatten the list, just use itertools.chain.from_iterable
: http://docs.python.org/library/itertools.html#itertools.chain.from_iterable
Flattening list in python
What reduce
does, in plain English, is that it takes two things:
- A function
f
that:- Accepts exactly 2 arguments
- Returns a value computed using those two values
- An iterable
iter
(e.g. alist
orstr
)
reduce
computes the result of f(iter[0],iter[1])
(the first two items of the iterable), and keeps track of this value that was just computed (call it temp
). reduce
then computes f(temp,iter[2])
and now keeps track of this new value. This process continues until every item in iter
has been passed into f
, and returns the final value computed.
The use of *
in passing *myList
into the reduce
function is that it takes an iterable and turns it into multiple arguments. These two lines do the same thing:
myFunc(10,12)
myFunc(*[10,12])
In the case of myList
, you're using a list
that contains only exactly one list
in it. For that reason, putting the *
in front replaces myList
with myList[0]
.
Regarding compatibility, note that the reduce
function works totally fine in Python 2, but in Python 3 you'll have to do this:
import functools
functools.reduce(some_iterable)
Flattening shallow list with pandas
Just change the join to :
join = lambda list_of_lists: (val for sublist in list_of_lists for val in sublist if isinstance(sublist, list))
Here is the output :
In[69]: df_grouped['merged'] = df_grouped['recipe'].apply(lambda x: list(join(x)))
In[70]: df_grouped['merged']
Out[70]:
0 [olive oil, low sodium chicken broth, cilantro...
1 [coconut milk, frozen banana, pure acai puree,...
Name: merged, dtype: object
How do I make a flat list out of a list of lists?
Given a list of lists l
,
flat_list = [item for sublist in l for item in sublist]
which means:
flat_list = []
for sublist in l:
for item in sublist:
flat_list.append(item)
is faster than the shortcuts posted so far. (l
is the list to flatten.)
Here is the corresponding function:
def flatten(l):
return [item for sublist in l for item in sublist]
As evidence, you can use the timeit
module in the standard library:
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop
Explanation: the shortcuts based on +
(including the implied use in sum
) are, of necessity, O(L**2)
when there are L sublists -- as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2
.
The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.
How to flatten a hetrogenous list of list into a single list in python?
Here is a relatively simple recursive version which will flatten any depth of list
l = [35,53,[525,6743],64,63,[743,754,757]]
def flatten(xs):
result = []
if isinstance(xs, (list, tuple)):
for x in xs:
result.extend(flatten(x))
else:
result.append(xs)
return result
print flatten(l)
How does the list comprehension to flatten a python list work?
Let's take a look at your list comprehension then, but first let's start with list comprehension at it's easiest.
l = [1,2,3,4,5]
print [x for x in l] # prints [1, 2, 3, 4, 5]
You can look at this the same as a for loop structured like so:
for x in l:
print x
Now let's look at another one:
l = [1,2,3,4,5]
a = [x for x in l if x % 2 == 0]
print a # prints [2,4]
That is the exact same as this:
a = []
l = [1,2,3,4,5]
for x in l:
if x % 2 == 0:
a.append(x)
print a # prints [2,4]
Now let's take a look at the examples you provided.
l = [[1,2,3],[4,5,6]]
flattened_l = [item for sublist in l for item in sublist]
print flattened_l # prints [1,2,3,4,5,6]
For list comprehension start at the farthest to the left for loop and work your way in. The variable, item, in this case, is what will be added. It will produce this equivalent:
l = [[1,2,3],[4,5,6]]
flattened_l = []
for sublist in l:
for item in sublist:
flattened_l.append(item)
Now for the last one
exactly_the_same_as_l = [item for item in sublist for sublist in l]
Using the same knowledge we can create a for loop and see how it would behave:
for item in sublist:
for sublist in l:
exactly_the_same_as_l.append(item)
Now the only reason the above one works is because when flattened_l was created, it also created sublist
. It is a scoping reason to why that did not throw an error. If you ran that without defining the flattened_l first, you would get a NameError
Flatten a list of strings and lists of strings and lists in Python
The oft-repeated flatten
function can be applied to this circumstance with a simple modification.
from collections import Iterable
def flatten(coll):
for i in coll:
if isinstance(i, Iterable) and not isinstance(i, basestring):
for subc in flatten(i):
yield subc
else:
yield i
basestring
will make sure that both str
and unicode
objects are not split.
There are also versions which count on i
not having the __iter__
attribute. I don't know about all that, because I think that str
now has that attribute. But, it's worth mentioning.
(Please upvote the linked answer.)
Related Topics
Running Shell Command and Capturing the Output
Why Do I Get Attributeerror: 'Nonetype' Object Has No Attribute 'Something'
How to Parse a String to a Float or Int
How to Provide a Reproducible Copy of Your Dataframe With To_Clipboard()
Static Class Variables and Methods in Python
Understanding Python Super() With _Init_() Methods
Split (Explode) Pandas Dataframe String Entry to Separate Rows
Difference Between Python'S List Methods Append and Extend
What Is the Maximum Recursion Depth in Python, and How to Increase It
How to Group Dataframe Rows into List in Pandas Groupby
I'M Getting an Indentationerror. How to Fix It
Fastest Way to List All Primes Below N
Flattening a Shallow List in Python
When Should I (Not) Want to Use Pandas Apply() in My Code
How to Print Curly-Brace Characters in a String While Using .Format
How to Check Whether a File Exists Without Exceptions