Some Help Understanding "Yield"

Some help understanding yield

A method using yield return must be declared as returning one of the following two interfaces:

IEnumerable<SomethingAppropriate>
IEnumerator<SomethingApropriate>

(thanks Jon and Marc for pointing out IEnumerator)

Example:

public IEnumerable<AClass> YourMethod()
{
foreach (XElement header in headersXml.Root.Elements())
{
yield return (ParseHeader(header));
}
}

yield is a lazy producer of data, only producing another item after the first has been retrieved, whereas returning a list will return everything in one go.

So there is a difference, and you need to declare the method correctly.

For more information, read Jon's answer here, which contains some very useful links.

Need help understanding C# yield in IEnumerable

Best way to think of it is when you first request an item from an IEnumerator (for example in a foreach), it starts running trough the method, and when it hits a yield return it pauses execution and returns that item for you to use in your foreach. Then you request the next item, it resumes the code where it left and repeats the cycle until it encounters either yield break or the end of the method.

public IEnumerator<string> enumerateSomeStrings()
{
yield return "one";
yield return "two";
var array = new[] { "three", "four" }
foreach (var item in array)
yield return item;
yield return "five";
}

What does the yield keyword do?

To understand what yield does, you must understand what generators are. And before you can understand generators, you must understand iterables.

Iterables

When you create a list, you can read its items one by one. Reading its items one by one is called iteration:

>>> mylist = [1, 2, 3]
>>> for i in mylist:
... print(i)
1
2
3

mylist is an iterable. When you use a list comprehension, you create a list, and so an iterable:

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
... print(i)
0
1
4

Everything you can use "for... in..." on is an iterable; lists, strings, files...

These iterables are handy because you can read them as much as you wish, but you store all the values in memory and this is not always what you want when you have a lot of values.

Generators

Generators are iterators, a kind of iterable you can only iterate over once. Generators do not store all the values in memory, they generate the values on the fly:

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
... print(i)
0
1
4

It is just the same except you used () instead of []. BUT, you cannot perform for i in mygenerator a second time since generators can only be used once: they calculate 0, then forget about it and calculate 1, and end calculating 4, one by one.

Yield

yield is a keyword that is used like return, except the function will return a generator.

>>> def create_generator():
... mylist = range(3)
... for i in mylist:
... yield i*i
...
>>> mygenerator = create_generator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object create_generator at 0xb7555c34>
>>> for i in mygenerator:
... print(i)
0
1
4

Here it's a useless example, but it's handy when you know your function will return a huge set of values that you will only need to read once.

To master yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is a bit tricky.

Then, your code will continue from where it left off each time for uses the generator.

Now the hard part:

The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it'll return the first value of the loop. Then, each subsequent call will run another iteration of the loop you have written in the function and return the next value. This will continue until the generator is considered empty, which happens when the function runs without hitting yield. That can be because the loop has come to an end, or because you no longer satisfy an "if/else".



Your code explained

Generator:

# Here you create the method of the node object that will return the generator
def _get_child_candidates(self, distance, min_dist, max_dist):

# Here is the code that will be called each time you use the generator object:

# If there is still a child of the node object on its left
# AND if the distance is ok, return the next child
if self._leftchild and distance - max_dist < self._median:
yield self._leftchild

# If there is still a child of the node object on its right
# AND if the distance is ok, return the next child
if self._rightchild and distance + max_dist >= self._median:
yield self._rightchild

# If the function arrives here, the generator will be considered empty
# there are no more than two values: the left and the right children

Caller:

# Create an empty list and a list with the current object reference
result, candidates = list(), [self]

# Loop on candidates (they contain only one element at the beginning)
while candidates:

# Get the last candidate and remove it from the list
node = candidates.pop()

# Get the distance between obj and the candidate
distance = node._get_dist(obj)

# If the distance is ok, then you can fill in the result
if distance <= max_dist and distance >= min_dist:
result.extend(node._values)

# Add the children of the candidate to the candidate's list
# so the loop will keep running until it has looked
# at all the children of the children of the children, etc. of the candidate
candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))

return result

This code contains several smart parts:

  • The loop iterates on a list, but the list expands while the loop is being iterated. It's a concise way to go through all these nested data even if it's a bit dangerous since you can end up with an infinite loop. In this case, candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) exhausts all the values of the generator, but while keeps creating new generator objects which will produce different values from the previous ones since it's not applied on the same node.

  • The extend() method is a list object method that expects an iterable and adds its values to the list.

Usually, we pass a list to it:

>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]

But in your code, it gets a generator, which is good because:

  1. You don't need to read the values twice.
  2. You may have a lot of children and you don't want them all stored in memory.

And it works because Python does not care if the argument of a method is a list or not. Python expects iterables so it will work with strings, lists, tuples, and generators! This is called duck typing and is one of the reasons why Python is so cool. But this is another story, for another question...

You can stop here, or read a little bit to see an advanced use of a generator:

Controlling a generator exhaustion

>>> class Bank(): # Let's create a bank, building ATMs
... crisis = False
... def create_atm(self):
... while not self.crisis:
... yield "$100"
>>> hsbc = Bank() # When everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # Crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # It's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # The trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # Build a new one to get back in business
>>> for cash in brand_new_atm:
... print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

Note: For Python 3, useprint(corner_street_atm.__next__()) or print(next(corner_street_atm))

It can be useful for various things like controlling access to a resource.

Itertools, your best friend

The itertools module contains special functions to manipulate iterables. Ever wish to duplicate a generator?
Chain two generators? Group values in a nested list with a one-liner? Map / Zip without creating another list?

Then just import itertools.

An example? Let's see the possible orders of arrival for a four-horse race:

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
(1, 2, 4, 3),
(1, 3, 2, 4),
(1, 3, 4, 2),
(1, 4, 2, 3),
(1, 4, 3, 2),
(2, 1, 3, 4),
(2, 1, 4, 3),
(2, 3, 1, 4),
(2, 3, 4, 1),
(2, 4, 1, 3),
(2, 4, 3, 1),
(3, 1, 2, 4),
(3, 1, 4, 2),
(3, 2, 1, 4),
(3, 2, 4, 1),
(3, 4, 1, 2),
(3, 4, 2, 1),
(4, 1, 2, 3),
(4, 1, 3, 2),
(4, 2, 1, 3),
(4, 2, 3, 1),
(4, 3, 1, 2),
(4, 3, 2, 1)]

Understanding the inner mechanisms of iteration

Iteration is a process implying iterables (implementing the __iter__() method) and iterators (implementing the __next__() method).
Iterables are any objects you can get an iterator from. Iterators are objects that let you iterate on iterables.

There is more about it in this article about how for loops work.

Trouble understanding yield in C#

This is called deferred execution, yield is lazy and will only work as much as it needs to.

This has great many advantages, one of which being that you can create seemingly infinite enumerations:

public IEnumerable<int> InfiniteOnes()
{
while (true)
yield 1;
}

Now imagine that the following:

var infiniteOnes = InfiniteOnes();

Would execute eagerly, you'd have a StackOverflow exception coming your way quite happily.

On the other hand, because its lazy, you can do the following:

var infiniteOnes = InfiniteOnes();
//.... some code
foreach (var one in infiniteOnes.Take(100)) { ... }

And later,

foreach (var one in infiniteOnes.Take(10000)) { ... }

Iterator blocks will run only when they need to; when the enumeration is iterated, not before, not after.

Help understanding yield and enumerators in Ruby

The Enumerator::Yielder#yield method and the Enumerator::Yielder::<< method are exactly the same. In fact, they are aliases.

So, which one of those two you use, is 100% personal preference, just like Enumerable#collect and Enumerable#map or Enumerable#inject and Enumerable#reduce.

Understanding 'Yield' keyword in javascript?

There is no direct equivalent. However, one can fake it by returning a "generator" object. Basically, the continuation code is moved into the next() of the generator.

Consider this fib-generator example on MDN:

function fib() {
var i = 0, j = 1;
while (true) {
yield i;
var t = i;
i = j;
j += t;
}
}

var g = fib();
for (var i = 0; i < 10; i++) {
console.log(g.next());
}

And re-written using a fake generator:

function fib() {
var i = 0, j = 1;
return {
'next': function () {
var yieldRet = i;
// These haven't occurred before the `yield` in the above generator,
// but it makes it easier to do it in the same order here.
// Just make sure there are no OBSERVABLE side-effects.
var t = i;
i = j;
j += t;
return yieldRet;
}
};
}

var g = fib();
for (var i = 0; i < 10; i++) {
console.log(g.next());
}

Now, this does become a bit trickier with the addition of observable mutable state; the given example could still be expressed as a state machine. Note that each next can "advance" the state.

var currentNode;
function yield1 () {
var y = { next: st0 };
return y;
function st0 () {
if (currentNode) {
y.next = st1;
return currentNode;
} else {
y.next = stZ;
}
}
function st1 () {
y.next = stZ;
currentNode = null; // observable side-effect!
}
function stZ () {
}
}

var g = yield1();
currentNode = "x";
console.log(g.next()); // "x"
console.log(currentNode); // still "x"
g.next();
console.log(currentNode); // null

What is the yield keyword used for in C#?

The yield contextual keyword actually does quite a lot here.

The function returns an object that implements the IEnumerable<object> interface. If a calling function starts foreaching over this object, the function is called again until it "yields". This is syntactic sugar introduced in C# 2.0. In earlier versions you had to create your own IEnumerable and IEnumerator objects to do stuff like this.

The easiest way understand code like this is to type-in an example, set some breakpoints and see what happens. Try stepping through this example:

public void Consumer()
{
foreach(int i in Integers())
{
Console.WriteLine(i.ToString());
}
}

public IEnumerable<int> Integers()
{
yield return 1;
yield return 2;
yield return 4;
yield return 8;
yield return 16;
yield return 16777216;
}

When you step through the example, you'll find the first call to Integers() returns 1. The second call returns 2 and the line yield return 1 is not executed again.

Here is a real-life example:

public IEnumerable<T> Read<T>(string sql, Func<IDataReader, T> make, params object[] parms)
{
using (var connection = CreateConnection())
{
using (var command = CreateCommand(CommandType.Text, sql, connection, parms))
{
command.CommandTimeout = dataBaseSettings.ReadCommandTimeout;
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
yield return make(reader);
}
}
}
}
}

Attempting to understand yield as an expression

The confusion here is that the generator expression is doing a hidden yield. Here it is in function form:

def foo():
for x in range(10):
yield (yield x)

When you do a .send(), what happens is the inner yield x gets executed, which yields x. Then the expression evaluates to the value of the .send, and the next yield yields that. Here it is in clearer form:

def foo():
for x in range(10):
sent_value = (yield x)
yield sent_value

Thus the output is very predictable:

>>> a = foo()
#start it off
>>> a.next()
0
#execution has now paused at "sent_value = ?"
#now we fill in the "?". whatever we send here will be immediately yielded.
>>> a.send("yieldnow")
'yieldnow'
#execution is now paused at the 'yield sent_value' expression
#as this is not assigned to anything, whatever is sent now will be lost
>>> a.send("this is lost")
1
#now we're back where we were at the 'yieldnow' point of the code
>>> a.send("yieldnow")
'yieldnow'
#etc, the loop continues
>>> a.send("this is lost")
2
>>> a.send("yieldnow")
'yieldnow'
>>> a.send("this is lost")
3
>>> a.send("yieldnow")
'yieldnow'

EDIT: Example usage. By far the coolest one I've seen so far is twisted's inlineCallbacks function. See here for an article explaining it. The nub of it is it lets you yield functions to be run in threads, and once the functions are done, twisted sends the result of the function back into your code. Thus you can write code that heavily relies on threads in a very linear and intuitive manner, instead of having to write tons of little functions all over the place.

See the PEP 342 for more info on the rationale of having .send work with potential use cases (the twisted example I provided is an example of the boon to asynchronous I/O this change offered).

In practice, what are the main uses for the yield from syntax in Python 3.3?

Let's get one thing out of the way first. The explanation that yield from g is equivalent to for v in g: yield v does not even begin to do justice to what yield from is all about. Because, let's face it, if all yield from does is expand the for loop, then it does not warrant adding yield from to the language and preclude a whole bunch of new features from being implemented in Python 2.x.

What yield from does is it establishes a transparent bidirectional connection between the caller and the sub-generator:

  • The connection is "transparent" in the sense that it will propagate everything correctly too, not just the elements being generated (e.g. exceptions are propagated).

  • The connection is "bidirectional" in the sense that data can be both sent from and to a generator.

(If we were talking about TCP, yield from g might mean "now temporarily disconnect my client's socket and reconnect it to this other server socket".)

BTW, if you are not sure what sending data to a generator even means, you need to drop everything and read about coroutines first—they're very useful (contrast them with subroutines), but unfortunately lesser-known in Python. Dave Beazley's Curious Course on Coroutines is an excellent start. Read slides 24-33 for a quick primer.

Reading data from a generator using yield from

def reader():
"""A generator that fakes a read from a file, socket, etc."""
for i in range(4):
yield '<< %s' % i

def reader_wrapper(g):
# Manually iterate over data produced by reader
for v in g:
yield v

wrap = reader_wrapper(reader())
for i in wrap:
print(i)

# Result
<< 0
<< 1
<< 2
<< 3

Instead of manually iterating over reader(), we can just yield from it.

def reader_wrapper(g):
yield from g

That works, and we eliminated one line of code. And probably the intent is a little bit clearer (or not). But nothing life changing.

Sending data to a generator (coroutine) using yield from - Part 1

Now let's do something more interesting. Let's create a coroutine called writer that accepts data sent to it and writes to a socket, fd, etc.

def writer():
"""A coroutine that writes data *sent* to it to fd, socket, etc."""
while True:
w = (yield)
print('>> ', w)

Now the question is, how should the wrapper function handle sending data to the writer, so that any data that is sent to the wrapper is transparently sent to the writer()?

def writer_wrapper(coro):
# TBD
pass

w = writer()
wrap = writer_wrapper(w)
wrap.send(None) # "prime" the coroutine
for i in range(4):
wrap.send(i)

# Expected result
>> 0
>> 1
>> 2
>> 3

The wrapper needs to accept the data that is sent to it (obviously) and should also handle the StopIteration when the for loop is exhausted. Evidently just doing for x in coro: yield x won't do. Here is a version that works.

def writer_wrapper(coro):
coro.send(None) # prime the coro
while True:
try:
x = (yield) # Capture the value that's sent
coro.send(x) # and pass it to the writer
except StopIteration:
pass

Or, we could do this.

def writer_wrapper(coro):
yield from coro

That saves 6 lines of code, make it much much more readable and it just works. Magic!

Sending data to a generator yield from - Part 2 - Exception handling

Let's make it more complicated. What if our writer needs to handle exceptions? Let's say the writer handles a SpamException and it prints *** if it encounters one.

class SpamException(Exception):
pass

def writer():
while True:
try:
w = (yield)
except SpamException:
print('***')
else:
print('>> ', w)

What if we don't change writer_wrapper? Does it work? Let's try

# writer_wrapper same as above

w = writer()
wrap = writer_wrapper(w)
wrap.send(None) # "prime" the coroutine
for i in [0, 1, 2, 'spam', 4]:
if i == 'spam':
wrap.throw(SpamException)
else:
wrap.send(i)

# Expected Result
>> 0
>> 1
>> 2
***
>> 4

# Actual Result
>> 0
>> 1
>> 2
Traceback (most recent call last):
... redacted ...
File ... in writer_wrapper
x = (yield)
__main__.SpamException

Um, it's not working because x = (yield) just raises the exception and everything comes to a crashing halt. Let's make it work, but manually handling exceptions and sending them or throwing them into the sub-generator (writer)

def writer_wrapper(coro):
"""Works. Manually catches exceptions and throws them"""
coro.send(None) # prime the coro
while True:
try:
try:
x = (yield)
except Exception as e: # This catches the SpamException
coro.throw(e)
else:
coro.send(x)
except StopIteration:
pass

This works.

# Result
>> 0
>> 1
>> 2
***
>> 4

But so does this!

def writer_wrapper(coro):
yield from coro

The yield from transparently handles sending the values or throwing values into the sub-generator.

This still does not cover all the corner cases though. What happens if the outer generator is closed? What about the case when the sub-generator returns a value (yes, in Python 3.3+, generators can return values), how should the return value be propagated? That yield from transparently handles all the corner cases is really impressive. yield from just magically works and handles all those cases.

I personally feel yield from is a poor keyword choice because it does not make the two-way nature apparent. There were other keywords proposed (like delegate but were rejected because adding a new keyword to the language is much more difficult than combining existing ones.

In summary, it's best to think of yield from as a transparent two way channel between the caller and the sub-generator.

References:

  1. PEP 380 - Syntax for delegating to a sub-generator (Ewing) [v3.3, 2009-02-13]
  2. PEP 342 -
    Coroutines via Enhanced Generators (GvR, Eby) [v2.5, 2005-05-10]

How to understand `yield from` in python coroutine?

This makes me remember how nice ascynio is, and why everybody should use it...

What is happening is best explained by walking through the operation of the iterators. This is the inner generator, simplified:

def averager():
local_var
while True:
term = yield
if term is None:
break
local_var = do_stuff(term)
return local_var

This does two things. Firstly, it gets some data with yield (ugh, explaining that choice of words is just confusing) so long as that data isn't None. Then when it is None, it raises a StopIterationException with the value of local_var. (This is what returning from a generator does).

Here is the outer generator:

def grouper(results, key):
while True:
results[key] = yield from averager()

What this does is to expose the inner generator's yield up to the calling code, until the inner generator raises StopIterationException, which is silently captured (by the yield from statement) and assigned. Then it gets ready to do the same thing again.

Then we have the calling code:

def main(data):
results = {}
for key, values in data.items():
group = grouper(results, key)
next(group)
for value in values:
group.send(value)
group.send(None)

What this does is:

  • it iterates the outer generator exactly once
  • this exposes the inner generator's yield, and it uses that (.send) to communicate with the inner generator.
  • it 'ends' the inner generator by sending None, at which point the first yield from statement ends, and assigns the value passed up.
  • at this point, the outer generator gets ready to send another value
  • the loop moves on, and the generator is deleted by garbage collection.

what's with the while True: loop?

Consider this code, which also works for the outer generator:

def grouper(result, key):
result[key] = yield from averager
yield 7

The only important thing is that the generator should not be exhausted, so it doesn't pass an exception up the chain saying 'I have nothing left to iterate'.

P.S. confused? I was. I had to check this out, it's a while since I've tried to use generator based coros. They're scheduled for deletion---use asyncio, it's much nicer.



Related Topics



Leave a reply



Submit