How exactly does a generator comprehension work?
Do you understand list comprehensions? If so, a generator expression is like a list comprehension, but instead of finding all the items you're interested and packing them into list, it waits, and yields each item out of the expression, one by one.
>>> my_list = [1, 3, 5, 9, 2, 6]
>>> filtered_list = [item for item in my_list if item > 3]
>>> print(filtered_list)
[5, 9, 6]
>>> len(filtered_list)
3
>>> # compare to generator expression
...
>>> filtered_gen = (item for item in my_list if item > 3)
>>> print(filtered_gen) # notice it's a generator object
<generator object <genexpr> at 0x7f2ad75f89e0>
>>> len(filtered_gen) # So technically, it has no length
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'generator' has no len()
>>> # We extract each item out individually. We'll do it manually first.
...
>>> next(filtered_gen)
5
>>> next(filtered_gen)
9
>>> next(filtered_gen)
6
>>> next(filtered_gen) # Should be all out of items and give an error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> # Yup, the generator is spent. No values for you!
...
>>> # Let's prove it gives the same results as our list comprehension
...
>>> filtered_gen = (item for item in my_list if item > 3)
>>> gen_to_list = list(filtered_gen)
>>> print(gen_to_list)
[5, 9, 6]
>>> filtered_list == gen_to_list
True
>>>
Because a generator expression only has to yield one item at a time, it can lead to big savings in memory usage. Generator expressions make the most sense in scenarios where you need to take one item at a time, do a lot of calculations based on that item, and then move on to the next item. If you need more than one value, you can also use a generator expression and grab a few at a time. If you need all the values before your program proceeds, use a list comprehension instead.
How does this lambda/yield/generator comprehension work?
Since Python 2.5, yield <value>
is an expression, not a statement. See PEP 342.
The code is hideously and unnecessarily ugly, but it's legal. Its central trick is using f((yield x))
inside the generator expression. Here's a simpler example of how this works:
>>> def f(val):
... return "Hi"
>>> x = [1, 2, 3]
>>> list(f((yield a)) for a in x)
[1, 'Hi', 2, 'Hi', 3, 'Hi']
Basically, using yield
in the generator expression causes it to produce two values for every value in the source iterable. As the generator expression iterates over the list of strings, on each iteration, the yield x
first yields a string from the list. The target expression of the genexp is f((yield x))
, so for every value in the list, the "result" of the generator expression is the value of f((yield x))
. But f
just ignores its argument and always returns the option string "-o"
. So on every step through the generator, it yields first the key-value string (e.g., "x=1"
), then "-o"
. The outer list(reversed(list(...)))
just makes a list out of this generator and then reverses it so that the "-o"
s will come before each option instead of after.
However, there is no reason to do it this way. There are a number of much more readable alternatives. Perhaps the most explicit is simply:
kvs = [...] # same list comprehension can be used for this part
result = []
for keyval in kvs:
result.append("-o")
result.append(keyval)
return result
Even if you like terse, "clever" code, you could still just do
return sum([["-o", keyval] for keyval in kvs], [])
The kvs
list comprehension itself is a bizarre mix of attempted readability and unreadability. It is more simply written:
kvs = [str(optName) + separator + str(optValue) for optName, optValue in options.items()]
You should consider arranging an "intervention" for whoever put this in your codebase.
Differences between generator comprehension expressions
This is what you should be doing:
g = (i for i in range(10))
It's a generator expression. It's equivalent to
def temp(outer):
for i in outer:
yield i
g = temp(range(10))
but if you just wanted an iterable with the elements of range(10)
, you could have done
g = range(10)
You do not need to wrap any of this in a function.
If you're here to learn what code to write, you can stop reading. The rest of this post is a long and technical explanation of why the other code snippets are broken and should not be used, including an explanation of why your timings are broken too.
This:
g = [(yield i) for i in range(10)]
is a broken construct that should have been taken out years ago. 8 years after the problem was originally reported, the process to remove it is finally beginning. Don't do it.
While it's still in the language, on Python 3, it's equivalent to
def temp(outer):
l = []
for i in outer:
l.append((yield i))
return l
g = temp(range(10))
List comprehensions are supposed to return lists, but because of the yield
, this one doesn't. It acts kind of like a generator expression, and it yields the same things as your first snippet, but it builds an unnecessary list and attaches it to the StopIteration
raised at the end.
>>> g = [(yield i) for i in range(10)]
>>> [next(g) for i in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> next(g)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration: [None, None, None, None, None, None, None, None, None, None]
This is confusing and a waste of memory. Don't do it. (If you want to know where all those None
s are coming from, read PEP 342.)
On Python 2, g = [(yield i) for i in range(10)]
does something entirely different. Python 2 doesn't give list comprehensions their own scope - specifically list comprehensions, not dict or set comprehensions - so the yield
is executed by whatever function contains this line. On Python 2, this:
def f():
g = [(yield i) for i in range(10)]
is equivalent to
def f():
temp = []
for i in range(10):
temp.append((yield i))
g = temp
making f
a generator-based coroutine, in the pre-async sense. Again, if your goal was to get a generator, you've wasted a bunch of time building a pointless list.
This:
g = [(yield from range(10))]
is silly, but none of the blame is on Python this time.
There is no comprehension or genexp here at all. The brackets are not a list comprehension; all the work is done by yield from
, and then you build a 1-element list containing the (useless) return value of yield from
. Your f3
:
def f3():
g = [(yield from range(10))]
when stripped of the unnecessary list-building, simplifies to
def f3():
yield from range(10)
or, ignoring all the coroutine support stuff yield from
does,
def f3():
for i in range(10):
yield i
Your timings are also broken.
In your first timing, f1
and f2
create generator objects that can be used inside those functions, though f2
's generator is weird. f3
doesn't do that; f3
is a generator function. f3
's body does not run in your timings, and if it did, its g
would behave quite unlike the other functions' g
s. A timing that would actually be comparable with f1
and f2
would be
def f4():
g = f3()
In your second timing, f2
doesn't actually run, for the same reason f3
was broken in the previous timing. In your second timing, f2
is not iterating over a generator. Instead, the yield from
turns f2
into a generator function itself.
Generator expressions vs. list comprehensions
John's answer is good (that list comprehensions are better when you want to iterate over something multiple times). However, it's also worth noting that you should use a list if you want to use any of the list methods. For example, the following code won't work:
def gen():
return (something for something in get_some_stuff())
print gen()[:2] # generators don't support indexing or slicing
print [5,6] + gen() # generators can't be added to lists
Basically, use a generator expression if all you're doing is iterating once. If you want to store and use the generated results, then you're probably better off with a list comprehension.
Since performance is the most common reason to choose one over the other, my advice is to not worry about it and just pick one; if you find that your program is running too slowly, then and only then should you go back and worry about tuning your code.
list comprehension returning generator object...
Just use:
articles = [i['title'] for i in news['article']]
The list comprehension already return a list, so there is no need to create an empty one and then append values to it. For a gide on list comprehensions you may check this one.
Regarding the generator object, the issue here is that using list comprehensions between ()
(or simply when they are not enclosed) will create a generator instead of a list. For more info on generators and how are they different than lists, see Generator Expressions vs. List Comprehension and for generator comprehentions, see How exactly does a generator comprehension work?.
Generator Comprehension different output from list comprehension?
In a list comprehension, expressions are evaluated eagerly. In a generator expression, they are only looked up as needed.
Thus, as the generator expression iterates over for c in all_configs
, it refers to c[k]
but only looks up c
after the loop is done, so it only uses the latest value for both tuples. By contrast, the list comprehension is evaluated immediately, so it creates a tuple with the first value of c
and another tuple with the second value of c
.
Consider this small example:
>>> r = range(3)
>>> i = 0
>>> a = [i for _ in r]
>>> b = (i for _ in r)
>>> i = 3
>>> print(*a)
0 0 0
>>> print(*b)
3 3 3
When creating a
, the interpreter created that list immediately, looking up the value of i
as soon as it was evaluated. When creating b
, the interpreter just set up that generator and didn't actually iterate over it and look up the value of i
. The print
calls told the interpreter to evaluate those objects. a
already existed as a full list in memory with the old value of i
, but b
was evaluated at that point, and when it looked up the value of i
, it found the new value.
How does this input work with the Python 'any' function?
If you use any(lst)
you see that lst
is the iterable, which is a list of some items. If it contained [0, False, '', 0.0, [], {}, None]
(which all have boolean values of False
) then any(lst)
would be False
. If lst
also contained any of the following [-1, True, "X", 0.00001]
(all of which evaluate to True
) then any(lst)
would be True
.
In the code you posted, x > 0 for x in lst
, this is a different kind of iterable, called a generator expression. Before generator expressions were added to Python, you would have created a list comprehension, which looks very similar, but with surrounding []
's: [x > 0 for x in lst]
. From the lst
containing [-1, -2, 10, -4, 20]
, you would get this comprehended list: [False, False, True, False, True]
. This internal value would then get passed to the any
function, which would return True
, since there is at least one True
value.
But with generator expressions, Python no longer has to create that internal list of True(s)
and False(s)
, the values will be generated as the any
function iterates through the values generated one at a time by the generator expression. And, since any
short-circuits, it will stop iterating as soon as it sees the first True
value. This would be especially handy if you created lst
using something like lst = range(-1,int(1e9))
(or xrange
if you are using Python2.x). Even though this expression will generate over a billion entries, any
only has to go as far as the third entry when it gets to 1
, which evaluates True
for x>0
, and so any
can return True
.
If you had created a list comprehension, Python would first have had to create the billion-element list in memory, and then pass that to any
. But by using a generator expression, you can have Python's builtin functions like any
and all
break out early, as soon as a True
or False
value is seen.
Why do list comprehensions write to the loop variable, but generators don't?
Python’s creator, Guido van Rossum, mentions this when he wrote about generator expressions that were uniformly built into Python 3: (emphasis mine)
We also made another change in Python 3, to improve equivalence between list comprehensions and generator expressions. In Python 2, the list comprehension "leaks" the loop control variable into the surrounding scope:
x = 'before'
a = [x for x in 1, 2, 3]
print x # this prints '3', not 'before'
This was an artifact of the original implementation of list comprehensions; it was one of Python's "dirty little secrets" for years. It started out as an intentional compromise to make list comprehensions blindingly fast, and while it was not a common pitfall for beginners, it definitely stung people occasionally. For generator expressions we could not do this. Generator expressions are implemented using generators, whose execution requires a separate execution frame. Thus, generator expressions (especially if they iterate over a short sequence) were less efficient than list comprehensions.
However, in Python 3, we decided to fix the "dirty little secret" of list comprehensions by using the same implementation strategy as for generator expressions. Thus, in Python 3, the above example (after modification to use print(x) :-) will print 'before', proving that the 'x' in the list comprehension temporarily shadows but does not override the 'x' in the surrounding scope.
So in Python 3 you won’t see this happen anymore.
Interestingly, dict comprehensions in Python 2 don’t do this either; this is mostly because dict comprehensions were backported from Python 3 and as such already had that fix in them.
There are some other questions that cover this topic too, but I’m sure you have already seen those when you searched for the topic, right? ;)
- Python list comprehension rebind names even after scope of comprehension. Is this right?
- Why the list comprehension variable is accessible after the operation is done?
How to implement comprehension in a class
You are mixing up two things that are not related: class instantiation and (generator-)comprehensions.
tuple(i for i in range(3))
is equivalent to
tuple((i for i in range(3)))
which is equivalent to
generator = (i for i in range(3))
tuple(generator)
The generator-comprehension is evaluated before tuple.__init__
(or __new__
) is called. In general, all arguments in Python are evaluated before being passed to a callable.
Any class can accept an iterable (such as generators) for instantiation if you code __init__
accordingly, e.g.
class AcceptIterable:
def __init__(self, it):
for x in it:
print(x)
Demo:
>>> a = AcceptIterable(i for i in range(3))
0
1
2
>>> a = AcceptIterable([i for i in range(3)])
0
1
2
>>> a = AcceptIterable(range(3))
0
1
2
>>> a = AcceptIterable([0, 1, 2])
0
1
2
Related Topics
How to Open Process Again in Linux Terminal
How Remove Camera Preview to Raspberry Pi
Check If One Package Is Installed in My System with Python
Find Out Who Is Logged in on Linux Using Python
Compare Two Files for Differences in Python
Python 3.4.3 Modules Installation in Linux Error
How to Protect My Python Scripts on Raspberry Pi
No Such File or Directory "Limits.H" When Installing Pillow on Alpine Linux
How Transform a Python Program .Py in an Executable Program in Ubuntu
Tensorflow Not Found Using Pip
Import Error: No Module Name Urllib2
Run Certain Code Every N Seconds
Else Clause on Python While Statement
Pycharm Shows Unresolved References Error for Valid Code
Matplotlib Scatterplot; Color as a Function of a Third Variable