Return and Yield in the Same Function

Return and yield in the same function

Yes, it' still a generator. The return is (almost) equivalent to raising StopIteration.

PEP 255 spells it out:

Specification: Return


A generator function can also contain return statements of the form:

"return"

Note that an expression_list is not allowed on return statements in
the body of a generator (although, of course, they may appear in the
bodies of non-generator functions nested within the generator).

When a return statement is encountered, control proceeds as in any
function return, executing the appropriate finally clauses (if any
exist). Then a StopIteration exception is raised, signalling that the
iterator is exhausted. A StopIteration exception is also raised if
control flows off the end of the generator without an explict return.

Note that return means "I'm done, and have nothing interesting to
return", for both generator functions and non-generator functions.

Note that return isn't always equivalent to raising StopIteration:
the difference lies in how enclosing try/except constructs are
treated. For example,

>>> def f1():
... try:
... return
... except:
... yield 1
>>> print list(f1())
[]

because, as in any function, return simply exits, but

>>> def f2():
... try:
... raise StopIteration
... except:
... yield 42
>>> print list(f2())
[42]

because StopIteration is captured by a bare "except", as is any
exception.

Return or yield from a function that calls a generator?

Generators are lazy-evaluating so return or yield will behave differently when you're debugging your code or if an exception is thrown.

With return any exception that happens in your generator won't know anything about generate_all, that's because when generator is really executed you have already left the generate_all function. With yield in there it will have generate_all in the traceback.

def generator(some_list):
for i in some_list:
raise Exception('exception happened :-)')
yield i

def generate_all():
some_list = [1,2,3]
return generator(some_list)

for item in generate_all():
...
Exception                                 Traceback (most recent call last)
<ipython-input-3-b19085eab3e1> in <module>
8 return generator(some_list)
9
---> 10 for item in generate_all():
11 ...

<ipython-input-3-b19085eab3e1> in generator(some_list)
1 def generator(some_list):
2 for i in some_list:
----> 3 raise Exception('exception happened :-)')
4 yield i
5

Exception: exception happened :-)

And if it's using yield from:

def generate_all():
some_list = [1,2,3]
yield from generator(some_list)

for item in generate_all():
...
Exception                                 Traceback (most recent call last)
<ipython-input-4-be322887df35> in <module>
8 yield from generator(some_list)
9
---> 10 for item in generate_all():
11 ...

<ipython-input-4-be322887df35> in generate_all()
6 def generate_all():
7 some_list = [1,2,3]
----> 8 yield from generator(some_list)
9
10 for item in generate_all():

<ipython-input-4-be322887df35> in generator(some_list)
1 def generator(some_list):
2 for i in some_list:
----> 3 raise Exception('exception happened :-)')
4 yield i
5

Exception: exception happened :-)

However this comes at the cost of performance. The additional generator layer does have some overhead. So return will be generally a bit faster than yield from ... (or for item in ...: yield item). In most cases this won't matter much, because whatever you do in the generator typically dominates the run-time so that the additional layer won't be noticeable.

However yield has some additional advantages: You aren't restricted to a single iterable, you can also easily yield additional items:

def generator(some_list):
for i in some_list:
yield i

def generate_all():
some_list = [1,2,3]
yield 'start'
yield from generator(some_list)
yield 'end'

for item in generate_all():
print(item)
start
1
2
3
end

In your case the operations are quite simple and I don't know if it's even necessary to create multiple functions for this, one could easily just use the built-in map or a generator expression instead:

map(do_something, get_the_list())          # map
(do_something(i) for i in get_the_list()) # generator expression

Both should be identical (except for some differences when exceptions happen) to use. And if they need a more descriptive name, then you could still wrap them in one function.

There are multiple helpers that wrap very common operations on iterables built-in and further ones can be found in the built-in itertools module. In such simple cases I would simply resort to these and only for non-trivial cases write your own generators.

But I assume your real code is more complicated so that may not be applicable but I thought it wouldn't be a complete answer without mentioning alternatives.

What does the yield keyword do?

To understand what yield does, you must understand what generators are. And before you can understand generators, you must understand iterables.

Iterables

When you create a list, you can read its items one by one. Reading its items one by one is called iteration:

>>> mylist = [1, 2, 3]
>>> for i in mylist:
... print(i)
1
2
3

mylist is an iterable. When you use a list comprehension, you create a list, and so an iterable:

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
... print(i)
0
1
4

Everything you can use "for... in..." on is an iterable; lists, strings, files...

These iterables are handy because you can read them as much as you wish, but you store all the values in memory and this is not always what you want when you have a lot of values.

Generators

Generators are iterators, a kind of iterable you can only iterate over once. Generators do not store all the values in memory, they generate the values on the fly:

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
... print(i)
0
1
4

It is just the same except you used () instead of []. BUT, you cannot perform for i in mygenerator a second time since generators can only be used once: they calculate 0, then forget about it and calculate 1, and end calculating 4, one by one.

Yield

yield is a keyword that is used like return, except the function will return a generator.

>>> def create_generator():
... mylist = range(3)
... for i in mylist:
... yield i*i
...
>>> mygenerator = create_generator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object create_generator at 0xb7555c34>
>>> for i in mygenerator:
... print(i)
0
1
4

Here it's a useless example, but it's handy when you know your function will return a huge set of values that you will only need to read once.

To master yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is a bit tricky.

Then, your code will continue from where it left off each time for uses the generator.

Now the hard part:

The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it'll return the first value of the loop. Then, each subsequent call will run another iteration of the loop you have written in the function and return the next value. This will continue until the generator is considered empty, which happens when the function runs without hitting yield. That can be because the loop has come to an end, or because you no longer satisfy an "if/else".



Your code explained

Generator:

# Here you create the method of the node object that will return the generator
def _get_child_candidates(self, distance, min_dist, max_dist):

# Here is the code that will be called each time you use the generator object:

# If there is still a child of the node object on its left
# AND if the distance is ok, return the next child
if self._leftchild and distance - max_dist < self._median:
yield self._leftchild

# If there is still a child of the node object on its right
# AND if the distance is ok, return the next child
if self._rightchild and distance + max_dist >= self._median:
yield self._rightchild

# If the function arrives here, the generator will be considered empty
# there are no more than two values: the left and the right children

Caller:

# Create an empty list and a list with the current object reference
result, candidates = list(), [self]

# Loop on candidates (they contain only one element at the beginning)
while candidates:

# Get the last candidate and remove it from the list
node = candidates.pop()

# Get the distance between obj and the candidate
distance = node._get_dist(obj)

# If the distance is ok, then you can fill in the result
if distance <= max_dist and distance >= min_dist:
result.extend(node._values)

# Add the children of the candidate to the candidate's list
# so the loop will keep running until it has looked
# at all the children of the children of the children, etc. of the candidate
candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))

return result

This code contains several smart parts:

  • The loop iterates on a list, but the list expands while the loop is being iterated. It's a concise way to go through all these nested data even if it's a bit dangerous since you can end up with an infinite loop. In this case, candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) exhausts all the values of the generator, but while keeps creating new generator objects which will produce different values from the previous ones since it's not applied on the same node.

  • The extend() method is a list object method that expects an iterable and adds its values to the list.

Usually, we pass a list to it:

>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]

But in your code, it gets a generator, which is good because:

  1. You don't need to read the values twice.
  2. You may have a lot of children and you don't want them all stored in memory.

And it works because Python does not care if the argument of a method is a list or not. Python expects iterables so it will work with strings, lists, tuples, and generators! This is called duck typing and is one of the reasons why Python is so cool. But this is another story, for another question...

You can stop here, or read a little bit to see an advanced use of a generator:

Controlling a generator exhaustion

>>> class Bank(): # Let's create a bank, building ATMs
... crisis = False
... def create_atm(self):
... while not self.crisis:
... yield "$100"
>>> hsbc = Bank() # When everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # Crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # It's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # The trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # Build a new one to get back in business
>>> for cash in brand_new_atm:
... print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

Note: For Python 3, useprint(corner_street_atm.__next__()) or print(next(corner_street_atm))

It can be useful for various things like controlling access to a resource.

Itertools, your best friend

The itertools module contains special functions to manipulate iterables. Ever wish to duplicate a generator?
Chain two generators? Group values in a nested list with a one-liner? Map / Zip without creating another list?

Then just import itertools.

An example? Let's see the possible orders of arrival for a four-horse race:

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
(1, 2, 4, 3),
(1, 3, 2, 4),
(1, 3, 4, 2),
(1, 4, 2, 3),
(1, 4, 3, 2),
(2, 1, 3, 4),
(2, 1, 4, 3),
(2, 3, 1, 4),
(2, 3, 4, 1),
(2, 4, 1, 3),
(2, 4, 3, 1),
(3, 1, 2, 4),
(3, 1, 4, 2),
(3, 2, 1, 4),
(3, 2, 4, 1),
(3, 4, 1, 2),
(3, 4, 2, 1),
(4, 1, 2, 3),
(4, 1, 3, 2),
(4, 2, 1, 3),
(4, 2, 3, 1),
(4, 3, 1, 2),
(4, 3, 2, 1)]

Understanding the inner mechanisms of iteration

Iteration is a process implying iterables (implementing the __iter__() method) and iterators (implementing the __next__() method).
Iterables are any objects you can get an iterator from. Iterators are objects that let you iterate on iterables.

There is more about it in this article about how for loops work.

Mixing yield and return. `yield [cand]; return` vs `return [[cand]]`. Why do they lead to different output?

In a generator function, return just defines the value associated with the StopIteration exception implicitly raised to indicate an iterator is exhausted. It's not produced during iteration, and most iterating constructs (e.g. for loops) intentionally ignore the StopIteration exception (it means the loop is over, you don't care if someone attached random garbage to a message that just means "we're done").

For example, try:

>>> def foo():
... yield 'onlyvalue' # Existence of yield keyword makes this a generator
... return 'returnvalue'
...

>>> f = foo() # Makes a generator object, stores it in f

>>> next(f) # Pull one value from generator
'onlyvalue'

>>> next(f) # There is no other yielded value, so this hits the return; iteration over
--------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
...
StopIteration: 'returnvalue'

As you can see, your return value does get "returned" in a sense (it's not completely discarded), but it's never seen by anything iterating normally, so it's largely useless. Outside of rare cases involving using generators as coroutines (where you're using .send() and .throw() on instances of the generator and manually advancing it with next(genobj)), the return value of a generator won't be seen.

In short, you have to pick one:

  1. Use yield anywhere in a function, and it's a generator (whether or not the code path of a particular call ever reaches a yield) and return just ends generation (while maybe hiding some data in the StopIteration exception). No matter what you do, calling the generator function "returns" a new generator object (which you can loop over until exhausted), it can never return a raw value computed inside the generator function (which doesn't even begin running until you loop over it at least once).
  2. Don't use yield, and return works as expected (because it's not a generator function).

As an example to explain what happens to the return value in normal looping constructs, this is what for x in gen(): effectively expands to a C optimized version of:

__unnamed_iterator = iter(gen())
while True:
try:
x = next(__unnamed_iterator)
except StopIteration: # StopIteration caught here without inspecting it
break # Loop ends, StopIteration exception cleaned even from sys.exc_info() to avoid possible reference cycles

# body of loop goes here

# Outside of loop, there is no StopIteration object left

As you can see, the expanded form of the for loop has to look for a StopIteration to indicate the loop is over, but it doesn't use it. And for anything that's not a generator, the StopIteration never has any associated values; the for loop has no way to report them even if it did (it has to end the loop when it's told iteration is over, and the arguments to StopIteration are explicitly not part of the values iterated anyway). Anything else that consumes the generator (e.g. calling list on it) is doing roughly the same thing as the for loop, ignoring the StopIteration in the same way; nothing except code that specifically expects generators (as opposed to more generalized iterables and iterators) will ever bother to inspect the StopIteration object (at the C layer, there are optimizations that StopIteration objects aren't even produced by most iterators; they return NULL and leave the set exception empty, which all iterator protocol using things know is equivalent to returning NULL and setting a StopIteration object, so for anything but a generator, there isn't even an exception to inspect much of the time).

Why can't I use yield with return?

Python has to decide whether a function is a generator at bytecode compilation time. This is because the semantics of generators say that none of the code in a generator function runs before the first next call; the generator function returns a generator iterator that, when next is called, runs the generator code. Thus, Python can't decide whether a function should be a generator or not by running it until it hits a yield or a return; instead, the presence of a yield in a function signals that the function is a generator.

Return in generator together with yield

This is a new feature in Python 3.3. Much like return in a generator has long been equivalent to raise StopIteration(), return <something> in a generator is now equivalent to raise StopIteration(<something>). For that reason, the exception you're seeing should be printed as StopIteration: 3, and the value is accessible through the attribute value on the exception object. If the generator is delegated to using the (also new) yield from syntax, it is the result. See PEP 380 for details.

def f():
return 1
yield 2

def g():
x = yield from f()
print(x)

# g is still a generator so we need to iterate to run it:
for _ in g():
pass

This prints 1, but not 2.

Python yield statement returns the same value every time

Because you're initiating the generator every time you loop instead of looping a range just iterate the generator:

for sunrise, sunset in forecast(daily_return):
print(sunrise, sunset)

If you only want the first 3 you can zip it with a range or use itertools.islice as @cs95 has shown:

for sunrise, sunset, _ in zip(forecast(daily_return), range(3)):
print(rise, set)

If you must use next then initiate the generator outside the loop:

gen = forecast(daily_return)
for i in range(3):
print(next(gen))

You can also use operator.itemgetter to achieve this same functionality instead of your custom function:

from operator import itemgetter
from itertools import islice

forecast_gen = map(itemgetter('sunrise', 'sunset'), daily_return)

for sunrise, sunset in islice(forecast_gen, 3):
print(sunrise, sunset)


Related Topics



Leave a reply



Submit