Return and yield in the same function
Yes, it' still a generator. The return
is (almost) equivalent to raising StopIteration
.
PEP 255 spells it out:
Specification: Return
A generator function can also contain return statements of the form:
"return"
Note that an expression_list is not allowed on return statements in
the body of a generator (although, of course, they may appear in the
bodies of non-generator functions nested within the generator).When a return statement is encountered, control proceeds as in any
function return, executing the appropriate finally clauses (if any
exist). Then a StopIteration exception is raised, signalling that the
iterator is exhausted. A StopIteration exception is also raised if
control flows off the end of the generator without an explict return.Note that return means "I'm done, and have nothing interesting to
return", for both generator functions and non-generator functions.Note that return isn't always equivalent to raising StopIteration:
the difference lies in how enclosing try/except constructs are
treated. For example,>>> def f1():
... try:
... return
... except:
... yield 1
>>> print list(f1())
[]
because, as in any function, return simply exits, but
>>> def f2():
... try:
... raise StopIteration
... except:
... yield 42
>>> print list(f2())
[42]
because StopIteration is captured by a bare "except", as is any
exception.
Return or yield from a function that calls a generator?
Generators are lazy-evaluating so return
or yield
will behave differently when you're debugging your code or if an exception is thrown.
With return
any exception that happens in your generator
won't know anything about generate_all
, that's because when generator
is really executed you have already left the generate_all
function. With yield
in there it will have generate_all
in the traceback.
def generator(some_list):
for i in some_list:
raise Exception('exception happened :-)')
yield i
def generate_all():
some_list = [1,2,3]
return generator(some_list)
for item in generate_all():
...
Exception Traceback (most recent call last)
<ipython-input-3-b19085eab3e1> in <module>
8 return generator(some_list)
9
---> 10 for item in generate_all():
11 ...
<ipython-input-3-b19085eab3e1> in generator(some_list)
1 def generator(some_list):
2 for i in some_list:
----> 3 raise Exception('exception happened :-)')
4 yield i
5
Exception: exception happened :-)
And if it's using yield from
:
def generate_all():
some_list = [1,2,3]
yield from generator(some_list)
for item in generate_all():
...
Exception Traceback (most recent call last)
<ipython-input-4-be322887df35> in <module>
8 yield from generator(some_list)
9
---> 10 for item in generate_all():
11 ...
<ipython-input-4-be322887df35> in generate_all()
6 def generate_all():
7 some_list = [1,2,3]
----> 8 yield from generator(some_list)
9
10 for item in generate_all():
<ipython-input-4-be322887df35> in generator(some_list)
1 def generator(some_list):
2 for i in some_list:
----> 3 raise Exception('exception happened :-)')
4 yield i
5
Exception: exception happened :-)
However this comes at the cost of performance. The additional generator layer does have some overhead. So return
will be generally a bit faster than yield from ...
(or for item in ...: yield item
). In most cases this won't matter much, because whatever you do in the generator typically dominates the run-time so that the additional layer won't be noticeable.
However yield
has some additional advantages: You aren't restricted to a single iterable, you can also easily yield additional items:
def generator(some_list):
for i in some_list:
yield i
def generate_all():
some_list = [1,2,3]
yield 'start'
yield from generator(some_list)
yield 'end'
for item in generate_all():
print(item)
start
1
2
3
end
In your case the operations are quite simple and I don't know if it's even necessary to create multiple functions for this, one could easily just use the built-in map
or a generator expression instead:
map(do_something, get_the_list()) # map
(do_something(i) for i in get_the_list()) # generator expression
Both should be identical (except for some differences when exceptions happen) to use. And if they need a more descriptive name, then you could still wrap them in one function.
There are multiple helpers that wrap very common operations on iterables built-in and further ones can be found in the built-in itertools
module. In such simple cases I would simply resort to these and only for non-trivial cases write your own generators.
But I assume your real code is more complicated so that may not be applicable but I thought it wouldn't be a complete answer without mentioning alternatives.
What does the yield keyword do?
To understand what yield
does, you must understand what generators are. And before you can understand generators, you must understand iterables.
Iterables
When you create a list, you can read its items one by one. Reading its items one by one is called iteration:
>>> mylist = [1, 2, 3]
>>> for i in mylist:
... print(i)
1
2
3
mylist
is an iterable. When you use a list comprehension, you create a list, and so an iterable:
>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
... print(i)
0
1
4
Everything you can use "for... in...
" on is an iterable; lists
, strings
, files...
These iterables are handy because you can read them as much as you wish, but you store all the values in memory and this is not always what you want when you have a lot of values.
Generators
Generators are iterators, a kind of iterable you can only iterate over once. Generators do not store all the values in memory, they generate the values on the fly:
>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
... print(i)
0
1
4
It is just the same except you used ()
instead of []
. BUT, you cannot perform for i in mygenerator
a second time since generators can only be used once: they calculate 0, then forget about it and calculate 1, and end calculating 4, one by one.
Yield
yield
is a keyword that is used like return
, except the function will return a generator.
>>> def create_generator():
... mylist = range(3)
... for i in mylist:
... yield i*i
...
>>> mygenerator = create_generator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object create_generator at 0xb7555c34>
>>> for i in mygenerator:
... print(i)
0
1
4
Here it's a useless example, but it's handy when you know your function will return a huge set of values that you will only need to read once.
To master yield
, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is a bit tricky.
Then, your code will continue from where it left off each time for
uses the generator.
Now the hard part:
The first time the for
calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield
, then it'll return the first value of the loop. Then, each subsequent call will run another iteration of the loop you have written in the function and return the next value. This will continue until the generator is considered empty, which happens when the function runs without hitting yield
. That can be because the loop has come to an end, or because you no longer satisfy an "if/else"
.
Your code explained
Generator:
# Here you create the method of the node object that will return the generator
def _get_child_candidates(self, distance, min_dist, max_dist):
# Here is the code that will be called each time you use the generator object:
# If there is still a child of the node object on its left
# AND if the distance is ok, return the next child
if self._leftchild and distance - max_dist < self._median:
yield self._leftchild
# If there is still a child of the node object on its right
# AND if the distance is ok, return the next child
if self._rightchild and distance + max_dist >= self._median:
yield self._rightchild
# If the function arrives here, the generator will be considered empty
# there are no more than two values: the left and the right children
Caller:
# Create an empty list and a list with the current object reference
result, candidates = list(), [self]
# Loop on candidates (they contain only one element at the beginning)
while candidates:
# Get the last candidate and remove it from the list
node = candidates.pop()
# Get the distance between obj and the candidate
distance = node._get_dist(obj)
# If the distance is ok, then you can fill in the result
if distance <= max_dist and distance >= min_dist:
result.extend(node._values)
# Add the children of the candidate to the candidate's list
# so the loop will keep running until it has looked
# at all the children of the children of the children, etc. of the candidate
candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))
return result
This code contains several smart parts:
The loop iterates on a list, but the list expands while the loop is being iterated. It's a concise way to go through all these nested data even if it's a bit dangerous since you can end up with an infinite loop. In this case,
candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))
exhausts all the values of the generator, butwhile
keeps creating new generator objects which will produce different values from the previous ones since it's not applied on the same node.The
extend()
method is a list object method that expects an iterable and adds its values to the list.
Usually, we pass a list to it:
>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]
But in your code, it gets a generator, which is good because:
- You don't need to read the values twice.
- You may have a lot of children and you don't want them all stored in memory.
And it works because Python does not care if the argument of a method is a list or not. Python expects iterables so it will work with strings, lists, tuples, and generators! This is called duck typing and is one of the reasons why Python is so cool. But this is another story, for another question...
You can stop here, or read a little bit to see an advanced use of a generator:
Controlling a generator exhaustion
>>> class Bank(): # Let's create a bank, building ATMs
... crisis = False
... def create_atm(self):
... while not self.crisis:
... yield "$100"
>>> hsbc = Bank() # When everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # Crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # It's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # The trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # Build a new one to get back in business
>>> for cash in brand_new_atm:
... print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...
Note: For Python 3, useprint(corner_street_atm.__next__())
or print(next(corner_street_atm))
It can be useful for various things like controlling access to a resource.
Itertools, your best friend
The itertools module contains special functions to manipulate iterables. Ever wish to duplicate a generator?
Chain two generators? Group values in a nested list with a one-liner? Map / Zip
without creating another list?
Then just import itertools
.
An example? Let's see the possible orders of arrival for a four-horse race:
>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
(1, 2, 4, 3),
(1, 3, 2, 4),
(1, 3, 4, 2),
(1, 4, 2, 3),
(1, 4, 3, 2),
(2, 1, 3, 4),
(2, 1, 4, 3),
(2, 3, 1, 4),
(2, 3, 4, 1),
(2, 4, 1, 3),
(2, 4, 3, 1),
(3, 1, 2, 4),
(3, 1, 4, 2),
(3, 2, 1, 4),
(3, 2, 4, 1),
(3, 4, 1, 2),
(3, 4, 2, 1),
(4, 1, 2, 3),
(4, 1, 3, 2),
(4, 2, 1, 3),
(4, 2, 3, 1),
(4, 3, 1, 2),
(4, 3, 2, 1)]
Understanding the inner mechanisms of iteration
Iteration is a process implying iterables (implementing the __iter__()
method) and iterators (implementing the __next__()
method).
Iterables are any objects you can get an iterator from. Iterators are objects that let you iterate on iterables.
There is more about it in this article about how for
loops work.
Mixing yield and return. `yield [cand]; return` vs `return [[cand]]`. Why do they lead to different output?
In a generator function, return
just defines the value associated with the StopIteration
exception implicitly raised to indicate an iterator is exhausted. It's not produced during iteration, and most iterating constructs (e.g. for
loops) intentionally ignore the StopIteration
exception (it means the loop is over, you don't care if someone attached random garbage to a message that just means "we're done").
For example, try:
>>> def foo():
... yield 'onlyvalue' # Existence of yield keyword makes this a generator
... return 'returnvalue'
...
>>> f = foo() # Makes a generator object, stores it in f
>>> next(f) # Pull one value from generator
'onlyvalue'
>>> next(f) # There is no other yielded value, so this hits the return; iteration over
--------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
...
StopIteration: 'returnvalue'
As you can see, your return
value does get "returned" in a sense (it's not completely discarded), but it's never seen by anything iterating normally, so it's largely useless. Outside of rare cases involving using generators as coroutines (where you're using .send()
and .throw()
on instances of the generator and manually advancing it with next(genobj)
), the return value of a generator won't be seen.
In short, you have to pick one:
- Use
yield
anywhere in a function, and it's a generator (whether or not the code path of a particular call ever reaches ayield
) andreturn
just ends generation (while maybe hiding some data in theStopIteration
exception). No matter what you do, calling the generator function "returns" a new generator object (which you can loop over until exhausted), it can never return a raw value computed inside the generator function (which doesn't even begin running until you loop over it at least once). - Don't use
yield
, andreturn
works as expected (because it's not a generator function).
As an example to explain what happens to the return
value in normal looping constructs, this is what for x in gen():
effectively expands to a C optimized version of:
__unnamed_iterator = iter(gen())
while True:
try:
x = next(__unnamed_iterator)
except StopIteration: # StopIteration caught here without inspecting it
break # Loop ends, StopIteration exception cleaned even from sys.exc_info() to avoid possible reference cycles
# body of loop goes here
# Outside of loop, there is no StopIteration object left
As you can see, the expanded form of the for
loop has to look for a StopIteration
to indicate the loop is over, but it doesn't use it. And for anything that's not a generator, the StopIteration
never has any associated values; the for
loop has no way to report them even if it did (it has to end the loop when it's told iteration is over, and the arguments to StopIteration
are explicitly not part of the values iterated anyway). Anything else that consumes the generator (e.g. calling list
on it) is doing roughly the same thing as the for
loop, ignoring the StopIteration
in the same way; nothing except code that specifically expects generators (as opposed to more generalized iterables and iterators) will ever bother to inspect the StopIteration
object (at the C layer, there are optimizations that StopIteration
objects aren't even produced by most iterators; they return NULL
and leave the set exception empty, which all iterator protocol using things know is equivalent to returning NULL
and setting a StopIteration
object, so for anything but a generator, there isn't even an exception to inspect much of the time).
Why can't I use yield with return?
Python has to decide whether a function is a generator at bytecode compilation time. This is because the semantics of generators say that none of the code in a generator function runs before the first next
call; the generator function returns a generator iterator that, when next
is called, runs the generator code. Thus, Python can't decide whether a function should be a generator or not by running it until it hits a yield
or a return
; instead, the presence of a yield
in a function signals that the function is a generator.
Return in generator together with yield
This is a new feature in Python 3.3. Much like return
in a generator has long been equivalent to raise StopIteration()
, return <something>
in a generator is now equivalent to raise StopIteration(<something>)
. For that reason, the exception you're seeing should be printed as StopIteration: 3
, and the value is accessible through the attribute value
on the exception object. If the generator is delegated to using the (also new) yield from
syntax, it is the result. See PEP 380 for details.
def f():
return 1
yield 2
def g():
x = yield from f()
print(x)
# g is still a generator so we need to iterate to run it:
for _ in g():
pass
This prints 1
, but not 2
.
Python yield statement returns the same value every time
Because you're initiating the generator every time you loop instead of looping a range just iterate the generator:
for sunrise, sunset in forecast(daily_return):
print(sunrise, sunset)
If you only want the first 3 you can zip it with a range or use itertools.islice
as @cs95 has shown:
for sunrise, sunset, _ in zip(forecast(daily_return), range(3)):
print(rise, set)
If you must use next
then initiate the generator outside the loop:
gen = forecast(daily_return)
for i in range(3):
print(next(gen))
You can also use operator.itemgetter
to achieve this same functionality instead of your custom function:
from operator import itemgetter
from itertools import islice
forecast_gen = map(itemgetter('sunrise', 'sunset'), daily_return)
for sunrise, sunset in islice(forecast_gen, 3):
print(sunrise, sunset)
Related Topics
Pandas Groupby Without Turning Grouped by Column into Index
Longest Common Substring from More Than Two Strings
Pandas Filling Missing Dates and Values Within Group
How to Open a Website with Urllib via Proxy in Python
Rolling Mean on Pandas on a Specific Column
Python Variables as Keys to Dict
How to Find Duplicate Elements in Array Using for Loop in Python
Using Lxml and Iterparse() to Parse a Big (+- 1Gb) Xml File
How to Make Custom Legend in Matplotlib
How to Get 'Real-Time' Information Back from a Subprocess.Popen in Python (2.5)
Comparing Numpy Arrays Containing Nan
How to Find First Non-Zero Value in Every Column of a Numpy Array
Overriding "+=" in Python? (_Iadd_() Method)