Return in Generator Together with Yield

Return in generator together with yield

This is a new feature in Python 3.3 (as a comment notes, it doesn't even work in 3.2). Much like return in a generator has long been equivalent to raise StopIteration(), return <something> in a generator is now equivalent to raise StopIteration(<something>). For that reason, the exception you're seeing should be printed as StopIteration: 3, and the value is accessible through the attribute value on the exception object. If the generator is delegated to using the (also new) yield from syntax, it is the result. See PEP 380 for details.

def f():
return 1
yield 2

def g():
x = yield from f()
print(x)

# g is still a generator so we need to iterate to run it:
for _ in g():
pass

This prints 1, but not 2.

Best way of getting both the yield'ed output and return'ed value of a generator in Python

without wrapping it inside another class?

Maybe with just a function instead?

Version 1:

def output_and_return(it):
def with_result():
yield (yield from it)
*elements, result = with_result()
return elements, result

Version 2:

def output_and_return(it):
result = None
def get_result():
nonlocal result
result = yield from it
return list(get_result()), result

Mixing yield and return. `yield [cand]; return` vs `return [[cand]]`. Why do they lead to different output?

In a generator function, return just defines the value associated with the StopIteration exception implicitly raised to indicate an iterator is exhausted. It's not produced during iteration, and most iterating constructs (e.g. for loops) intentionally ignore the StopIteration exception (it means the loop is over, you don't care if someone attached random garbage to a message that just means "we're done").

For example, try:

>>> def foo():
... yield 'onlyvalue' # Existence of yield keyword makes this a generator
... return 'returnvalue'
...

>>> f = foo() # Makes a generator object, stores it in f

>>> next(f) # Pull one value from generator
'onlyvalue'

>>> next(f) # There is no other yielded value, so this hits the return; iteration over
--------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
...
StopIteration: 'returnvalue'

As you can see, your return value does get "returned" in a sense (it's not completely discarded), but it's never seen by anything iterating normally, so it's largely useless. Outside of rare cases involving using generators as coroutines (where you're using .send() and .throw() on instances of the generator and manually advancing it with next(genobj)), the return value of a generator won't be seen.

In short, you have to pick one:

  1. Use yield anywhere in a function, and it's a generator (whether or not the code path of a particular call ever reaches a yield) and return just ends generation (while maybe hiding some data in the StopIteration exception). No matter what you do, calling the generator function "returns" a new generator object (which you can loop over until exhausted), it can never return a raw value computed inside the generator function (which doesn't even begin running until you loop over it at least once).
  2. Don't use yield, and return works as expected (because it's not a generator function).

As an example to explain what happens to the return value in normal looping constructs, this is what for x in gen(): effectively expands to a C optimized version of:

__unnamed_iterator = iter(gen())
while True:
try:
x = next(__unnamed_iterator)
except StopIteration: # StopIteration caught here without inspecting it
break # Loop ends, StopIteration exception cleaned even from sys.exc_info() to avoid possible reference cycles

# body of loop goes here

# Outside of loop, there is no StopIteration object left

As you can see, the expanded form of the for loop has to look for a StopIteration to indicate the loop is over, but it doesn't use it. And for anything that's not a generator, the StopIteration never has any associated values; the for loop has no way to report them even if it did (it has to end the loop when it's told iteration is over, and the arguments to StopIteration are explicitly not part of the values iterated anyway). Anything else that consumes the generator (e.g. calling list on it) is doing roughly the same thing as the for loop, ignoring the StopIteration in the same way; nothing except code that specifically expects generators (as opposed to more generalized iterables and iterators) will ever bother to inspect the StopIteration object (at the C layer, there are optimizations that StopIteration objects aren't even produced by most iterators; they return NULL and leave the set exception empty, which all iterator protocol using things know is equivalent to returning NULL and setting a StopIteration object, so for anything but a generator, there isn't even an exception to inspect much of the time).

Python `yield from`, or return a generator?

The difference is that your first mymap is just a usual function,
in this case a factory which returns a generator. Everything
inside the body gets executed as soon as you call the function.

def gen_factory(func, seq):
"""Generator factory returning a generator."""
# do stuff ... immediately when factory gets called
print("build generator & return")
return (func(*args) for args in seq)

The second mymap is also a factory, but it's also a generator
itself, yielding from a self-built sub-generator inside.
Because it is a generator itself, execution of the body does
not start until the first invokation of next(generator).

def gen_generator(func, seq):
"""Generator yielding from sub-generator inside."""
# do stuff ... first time when 'next' gets called
print("build generator & yield")
yield from (func(*args) for args in seq)

I think the following example will make it clearer.
We define data packages which shall be processed with functions,
bundled up in jobs we pass to the generators.

def add(a, b):
return a + b

def sqrt(a):
return a ** 0.5

data1 = [*zip(range(1, 5))] # [(1,), (2,), (3,), (4,)]
data2 = [(2, 1), (3, 1), (4, 1), (5, 1)]

job1 = (sqrt, data1)
job2 = (add, data2)

Now we run the following code inside an interactive shell like IPython to
see the different behavior. gen_factory immediately prints
out, while gen_generator only does so after next() being called.

gen_fac = gen_factory(*job1)
# build generator & return <-- printed immediately
next(gen_fac) # start
# Out: 1.0
[*gen_fac] # deplete rest of generator
# Out: [1.4142135623730951, 1.7320508075688772, 2.0]

gen_gen = gen_generator(*job1)
next(gen_gen) # start
# build generator & yield <-- printed with first next()
# Out: 1.0
[*gen_gen] # deplete rest of generator
# Out: [1.4142135623730951, 1.7320508075688772, 2.0]

To give you a more reasonable use case example for a construct
like gen_generator we'll extend it a little and make a coroutine
out of it by assigning yield to variables, so we can inject jobs
into the running generator with send().

Additionally we create a helper function which will run all tasks
inside a job and ask as for a new one upon completion.

def gen_coroutine():
"""Generator coroutine yielding from sub-generator inside."""
# do stuff... first time when 'next' gets called
print("receive job, build generator & yield, loop")
while True:
try:
func, seq = yield "send me work ... or I quit with next next()"
except TypeError:
return "no job left"
else:
yield from (func(*args) for args in seq)

def do_job(gen, job):
"""Run all tasks in job."""
print(gen.send(job))
while True:
result = next(gen)
print(result)
if result == "send me work ... or I quit with next next()":
break

Now we run gen_coroutinewith our helper function do_joband two jobs.

gen_co = gen_coroutine()
next(gen_co) # start
# receive job, build generator & yield, loop <-- printed with first next()
# Out:'send me work ... or I quit with next next()'
do_job(gen_co, job1) # prints out all results from job
# 1
# 1.4142135623730951
# 1.7320508075688772
# 2.0
# send me work... or I quit with next next()
do_job(gen_co, job2) # send another job into generator
# 3
# 4
# 5
# 6
# send me work... or I quit with next next()
next(gen_co)
# Traceback ...
# StopIteration: no job left

To come back to your question which version is the better approach in general.
IMO something like gen_factory makes only sense if you need the same thing done for multiple generators you are going to create, or in cases your construction process for generators is complicated enough to justify use of a factory instead of building individual generators in place with a generator comprehension.

Note:

The description above for the gen_generator function (second mymap) states
"it is a generator itself". That is a bit vague and technically not
really correct, but facilitates reasoning about the differences of the functions
in this tricky setup where gen_factory also returns a generator, namely that
one built by the generator comprehension inside.

In fact any function (not only those from this question with generator comprehensions inside!) with a yield inside, upon invocation, just
returns a generator object which gets constructed out of the function body.

type(gen_coroutine) # function
gen_co = gen_coroutine(); type(gen_co) # generator

So the whole action we observed above for gen_generator and gen_coroutine
takes place within these generator objects, functions with yield inside have spit out before.

Return or yield from a function that calls a generator?

Generators are lazy-evaluating so return or yield will behave differently when you're debugging your code or if an exception is thrown.

With return any exception that happens in your generator won't know anything about generate_all, that's because when generator is really executed you have already left the generate_all function. With yield in there it will have generate_all in the traceback.

def generator(some_list):
for i in some_list:
raise Exception('exception happened :-)')
yield i

def generate_all():
some_list = [1,2,3]
return generator(some_list)

for item in generate_all():
...
Exception                                 Traceback (most recent call last)
<ipython-input-3-b19085eab3e1> in <module>
8 return generator(some_list)
9
---> 10 for item in generate_all():
11 ...

<ipython-input-3-b19085eab3e1> in generator(some_list)
1 def generator(some_list):
2 for i in some_list:
----> 3 raise Exception('exception happened :-)')
4 yield i
5

Exception: exception happened :-)

And if it's using yield from:

def generate_all():
some_list = [1,2,3]
yield from generator(some_list)

for item in generate_all():
...
Exception                                 Traceback (most recent call last)
<ipython-input-4-be322887df35> in <module>
8 yield from generator(some_list)
9
---> 10 for item in generate_all():
11 ...

<ipython-input-4-be322887df35> in generate_all()
6 def generate_all():
7 some_list = [1,2,3]
----> 8 yield from generator(some_list)
9
10 for item in generate_all():

<ipython-input-4-be322887df35> in generator(some_list)
1 def generator(some_list):
2 for i in some_list:
----> 3 raise Exception('exception happened :-)')
4 yield i
5

Exception: exception happened :-)

However this comes at the cost of performance. The additional generator layer does have some overhead. So return will be generally a bit faster than yield from ... (or for item in ...: yield item). In most cases this won't matter much, because whatever you do in the generator typically dominates the run-time so that the additional layer won't be noticeable.

However yield has some additional advantages: You aren't restricted to a single iterable, you can also easily yield additional items:

def generator(some_list):
for i in some_list:
yield i

def generate_all():
some_list = [1,2,3]
yield 'start'
yield from generator(some_list)
yield 'end'

for item in generate_all():
print(item)
start
1
2
3
end

In your case the operations are quite simple and I don't know if it's even necessary to create multiple functions for this, one could easily just use the built-in map or a generator expression instead:

map(do_something, get_the_list())          # map
(do_something(i) for i in get_the_list()) # generator expression

Both should be identical (except for some differences when exceptions happen) to use. And if they need a more descriptive name, then you could still wrap them in one function.

There are multiple helpers that wrap very common operations on iterables built-in and further ones can be found in the built-in itertools module. In such simple cases I would simply resort to these and only for non-trivial cases write your own generators.

But I assume your real code is more complicated so that may not be applicable but I thought it wouldn't be a complete answer without mentioning alternatives.

Generator with return statement

The presence of yield in a function body turns it into a generator function instead of a normal function. And in a generator function, using return is a way of saying "The generator has ended, there are no more elements." By having the first statement of a generator method be return str_in, you are guaranteed to have a generator that returns no elements.

As a comment mentions, the return value is used as an argument to the StopIteration exception that gets raised when the generator has ended. See:

>>> gen = simple_gen_function("hello", "foo")
>>> next(gen)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration: hello

If there's a yield anywhere in your def, it's a generator!

In the comments, the asker mentions they thought the function turned into a generator dynamically, when the yield statement is executed. But this is not how it works! The decision is made before the code is ever excuted. If Python finds a yield anywhere at all under your def, it turns that def into a generator function.

See this ultra-condensed example:

>>> def foo():
... if False:
... yield "bar"
... return "baz"
>>> foo()
<generator object foo at ...>
>>> # The return value "baz" is only exposed via StopIteration
>>> # You probably shouldn't use this behavior.
>>> next(foo())
Traceback (most recent call last):
...
StopIteration: baz
>>> # Nothing is ever yielded from the generator, so it generates no values.
>>> list(foo())
[]

Return and yield in the same function

Yes, it' still a generator. The return is (almost) equivalent to raising StopIteration.

PEP 255 spells it out:

Specification: Return


A generator function can also contain return statements of the form:

"return"

Note that an expression_list is not allowed on return statements in
the body of a generator (although, of course, they may appear in the
bodies of non-generator functions nested within the generator).

When a return statement is encountered, control proceeds as in any
function return, executing the appropriate finally clauses (if any
exist). Then a StopIteration exception is raised, signalling that the
iterator is exhausted. A StopIteration exception is also raised if
control flows off the end of the generator without an explict return.

Note that return means "I'm done, and have nothing interesting to
return", for both generator functions and non-generator functions.

Note that return isn't always equivalent to raising StopIteration:
the difference lies in how enclosing try/except constructs are
treated. For example,

>>> def f1():
... try:
... return
... except:
... yield 1
>>> print list(f1())
[]

because, as in any function, return simply exits, but

>>> def f2():
... try:
... raise StopIteration
... except:
... yield 42
>>> print list(f2())
[42]

because StopIteration is captured by a bare "except", as is any
exception.



Related Topics



Leave a reply



Submit