Yield Statement Implementation

yield statement implementation

yield works by building a state machine internally. It stores the current state of the routine when it exits and resumes from that state next time.

You can use Reflector to see how it's implemented by the compiler.

yield break is used when you want to stop returning results. If you don't have a yield break, the compiler would assume one at the end of the function (just like a return; statement in a normal function)

What does the yield keyword do in Python?

To understand what yield does, you must understand what generators are. And before you can understand generators, you must understand iterables.

Iterables

When you create a list, you can read its items one by one. Reading its items one by one is called iteration:

>>> mylist = [1, 2, 3]
>>> for i in mylist:
...    print(i)
1
2
3

mylist is an iterable. When you use a list comprehension, you create a list, and so an iterable:

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
...    print(i)
0
1
4

Everything you can use "for... in..." on is an iterable; lists, strings, files...

These iterables are handy because you can read them as much as you wish, but you store all the values in memory and this is not always what you want when you have a lot of values.

Generators

Generators are iterators, a kind of iterable you can only iterate over once. Generators do not store all the values in memory, they generate the values on the fly:

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
...    print(i)
0
1
4

It is just the same except you used () instead of []. BUT, you cannot perform for i in mygenerator a second time since generators can only be used once: they calculate 0, then forget about it and calculate 1, and end calculating 4, one by one.

Yield

yield is a keyword that is used like return, except the function will return a generator.

>>> def create_generator():
...    mylist = range(3)
...    for i in mylist:
...        yield i*i
...
>>> mygenerator = create_generator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object create_generator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4

Here it's a useless example, but it's handy when you know your function will return a huge set of values that you will only need to read once.

To master yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is a bit tricky.

Then, your code will continue from where it left off each time for uses the generator.

Now the hard part:

The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it'll return the first value of the loop. Then, each subsequent call will run another iteration of the loop you have written in the function and return the next value. This will continue until the generator is considered empty, which happens when the function runs without hitting yield. That can be because the loop has come to an end, or because you no longer satisfy an "if/else".

Your code explained

Generator:

# Here you create the method of the node object that will return the generator
def _get_child_candidates(self, distance, min_dist, max_dist):

    # Here is the code that will be called each time you use the generator object:

    # If there is still a child of the node object on its left
    # AND if the distance is ok, return the next child
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild

    # If there is still a child of the node object on its right
    # AND if the distance is ok, return the next child
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild

    # If the function arrives here, the generator will be considered empty
    # there are no more than two values: the left and the right children

Caller:

# Create an empty list and a list with the current object reference
result, candidates = list(), [self]

# Loop on candidates (they contain only one element at the beginning)
while candidates:

    # Get the last candidate and remove it from the list
    node = candidates.pop()

    # Get the distance between obj and the candidate
    distance = node._get_dist(obj)

    # If the distance is ok, then you can fill in the result
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)

    # Add the children of the candidate to the candidate's list
    # so the loop will keep running until it has looked
    # at all the children of the children of the children, etc. of the candidate
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))

return result

This code contains several smart parts:

The loop iterates on a list, but the list expands while the loop is being iterated. It's a concise way to go through all these nested data even if it's a bit dangerous since you can end up with an infinite loop. In this case, candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) exhausts all the values of the generator, but while keeps creating new generator objects which will produce different values from the previous ones since it's not applied on the same node.
The extend() method is a list object method that expects an iterable and adds its values to the list.

Usually, we pass a list to it:

>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]

But in your code, it gets a generator, which is good because:

You don't need to read the values twice.
You may have a lot of children and you don't want them all stored in memory.

And it works because Python does not care if the argument of a method is a list or not. Python expects iterables so it will work with strings, lists, tuples, and generators! This is called duck typing and is one of the reasons why Python is so cool. But this is another story, for another question...

You can stop here, or read a little bit to see an advanced use of a generator:

Controlling a generator exhaustion

>>> class Bank(): # Let's create a bank, building ATMs
...    crisis = False
...    def create_atm(self):
...        while not self.crisis:
...            yield "$100"
>>> hsbc = Bank() # When everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # Crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # It's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # The trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # Build a new one to get back in business
>>> for cash in brand_new_atm:
...    print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

Note: For Python 3, useprint(corner_street_atm.__next__()) or print(next(corner_street_atm))

It can be useful for various things like controlling access to a resource.

Itertools, your best friend

The itertools module contains special functions to manipulate iterables. Ever wish to duplicate a generator?
Chain two generators? Group values in a nested list with a one-liner? Map / Zip without creating another list?

Then just import itertools.

An example? Let's see the possible orders of arrival for a four-horse race:

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
 (1, 2, 4, 3),
 (1, 3, 2, 4),
 (1, 3, 4, 2),
 (1, 4, 2, 3),
 (1, 4, 3, 2),
 (2, 1, 3, 4),
 (2, 1, 4, 3),
 (2, 3, 1, 4),
 (2, 3, 4, 1),
 (2, 4, 1, 3),
 (2, 4, 3, 1),
 (3, 1, 2, 4),
 (3, 1, 4, 2),
 (3, 2, 1, 4),
 (3, 2, 4, 1),
 (3, 4, 1, 2),
 (3, 4, 2, 1),
 (4, 1, 2, 3),
 (4, 1, 3, 2),
 (4, 2, 1, 3),
 (4, 2, 3, 1),
 (4, 3, 1, 2),
 (4, 3, 2, 1)]

Understanding the inner mechanisms of iteration

Iteration is a process implying iterables (implementing the __iter__() method) and iterators (implementing the __next__() method).
Iterables are any objects you can get an iterator from. Iterators are objects that let you iterate on iterables.

There is more about it in this article about how for loops work.

Example python implementation of generator/yield

You would have to implement the iterator protocol yourself.

class InfiniteSequence:
    def __init__(self):
        self.x = 0

    def __iter__(self):
        return self

    def __next__(self):
        x = self.x
        self.x = x + 1
        return x

for i in InfiniteSequence():
    print(i)

The __iter__ method has to return something that implements __next__. The __next__ method has to return the next value in the sequence. The instance attribute maintains state between calls to __next__.

The same class doesn't have to implement both methods. You might want a separate iterator class so that you can have multiple, independent iterators over the iterable sequence.

class InfiniteSequence:
    def __init__(self, start):
        self.start = start

    def __iter__(self):
        return InfSeqIterator(self.start)

class InfSeqIterator:
    def __init__(self, x):
        self.x = x

    def __iter__(self):
        return self

    def __next__(self):
        x = self.x
        self.x += 1
        return x

nats = InfiniteSequence()

i1 = iter(nats)
i2 = iter(nats)

assert next(i1) == 0
assert next(i2) == 0  # not 1

generator is just one example of a class that implements this protocol; you create an instance of generator using either a generator expression ((x for x in [1,2,3])) or a generator function (i.e., a function that uses yield).

Algorithm for implementing C# yield statement

The particular code sample you are looking at involves a series of transformations.
Please note that this is an approximate description of the algorithm. The actual names used by the compiler and the exact code it generates may be different. The idea is the same, however.

The first transformation is the "foreach" transformation, which transforms this code:

foreach (var x in y)
{
   //body
}

into this code:

var enumerator = y.GetEnumerator();
while (enumerator.MoveNext())
{
    var x = enumerator.Current;
    //body
}

if (y != null)
{
    enumerator.Dispose();
}

The second transformation finds all the yield return statements in the function body, assigns a number to each (a state value), and creates a "goto label" right after the yield.

The third transformation lifts all the local variables and function arguments in the method body into an object called a closure.

Given the code in your example, that would look similar to this:

 class ClosureEnumerable : IEnumerable<string>
 {
    private IEnumerable<string> args;
    private ClassType originalThis;
    public ClosureEnumerator(ClassType origThis, IEnumerable<string> args)
    {
        this.args = args;
        this.origianlThis = origThis;
    }
    public IEnumerator<string> GetEnumerator()
    {
        return new Closure(origThis, args);
    }
 }

class Closure : IEnumerator<string>
{
    public Closure(ClassType originalThis, IEnumerable<string> args)
    {
        state = 0;
        this.args = args;
        this.originalThis = originalThis;
    }

    private IEnumerable<string> args;
    private IEnumerator<string> enumerator2;
    private IEnumerator<string> argEnumerator;

    //- Here ClassType is the type of the object that contained the method
    //  This may be optimized away if the method does not access any 
    //  class members
    private ClassType originalThis;

    //This holds the state value.
    private int state;
    //The current value to return
    private string currentValue;

    public string Current
    {
        get 
        {
            return currentValue;
        }
    }
}

The method body is then moved from the original method to a method inside "Closure" called MoveNext, which returns a bool, and implements IEnumerable.MoveNext.
Any access to any locals is routed through "this", and any access to any class members are routed through this.originalThis.

Any "yield return expr" is translated into:

currentValue = expr;
state = //the state number of the yield statement;
return true;

Any yield break statement is translated into:

state = -1;
return false;

There is an "implicit" yield break statement at the end of the function.
A switch statement is then introduced at the beginning of the procedure that looks at the state number and jumps to the associated label.

The original method is then translated into something like this:

IEnumerator<string> strings(IEnumerable<string> args)
{
   return new ClosureEnumerable(this,args);
}

The fact that the state of the method is all pushed into an object and that the MoveNext method uses a switch statement / state variable is what allows the iterator to behave as if control is being passed back to the point immediately after the last "yield return" statement the next time "MoveNext" is called.

It is important to point out, however, that the transformation used by the C# compiler is not the best way to do this. It suffers from poor performance when trying to use "yield" with recursive algorithms. There is a good paper that outlines a better way to do this here:

http://research.microsoft.com/en-us/projects/specsharp/iterators.pdf

It's worth a read if you haven't read it yet.

Yield statement's effect on program flow

The foreach will iterate the IEnumerable returned from ComputePower. "Yield return" automatically creates an implementation of IEnumerable so you don't have to hand-roll it. If you put a breakpoint inside your "while"-loop you will see it getting called for each iteration

From msdn:

You consume an iterator method by using a foreach statement or LINQ query. Each iteration of the foreach loop calls the iterator method. When a yield return statement is reached in the iterator method, expression is returned, and the current location in code is retained. Execution is restarted from that location the next time that the iterator function is called.

Is 'yield' keyword a syntactic sugar ? What is its Implementation

The generated code depends on the original, but generally speaking a state machine gets generated which keeps track of the current state of the collection.

See yield statement implementation, this answer by Eric Lippert and this blog post by Jon Skeet.

Implementing yield in C

You can do so, even in portable C. Here is a crude example that works (gcc -Wall gen.c):

#include <stdbool.h>
#include <stdio.h>
#include <setjmp.h>

#define YIELD(func, n) if (! setjmp(func##_gen_jmp)) {  \
      func##_ret = n;                                   \
         longjmp(func##_caller_jmp, 1);                 \
  }

#define GENERATOR(ret, func, argt, argv)        \
  static jmp_buf func##_caller_jmp;             \
  static jmp_buf func##_gen_jmp;                \
  static bool func##_continue=false;            \
  static ret func##_ret;                        \
                                                \
  void func##__real(argt argv);                 \
                                                \
  ret func(argt argv) {                         \
    if (!func##_continue) {                     \
    func##_continue=true ;                      \
      if (! setjmp(func##_caller_jmp)) {        \
        func##__real(argv);                     \
      } else {                                  \
        return func##_ret;                      \
      }                                         \
    }                                           \
     else {                                     \
      longjmp(func##_gen_jmp,1);                \
    }                                           \
    return 0;                                   \
  }                                             \
  void func##__real(argt argv)

GENERATOR(int, getNext, int, n) {
  static int counter;

  counter = n;
    while (true) {
        counter = counter+1;
        YIELD(getNext, counter);
    }
}

int main() {
    while (true) {
      int n = getNext(1);
        if (n > 42)
           break;
        printf("%d\n",n);
    }
    return 0;
}

Is it possible to implement Python yield functionality in freestanding C?

I'm going to answer using setjmp and longjmp since those interfaces are standard and you can easily find their implementations for any HW platform. They are freestanding, but HW dependent.

struct _yield_state {
    jmp_buf buf;
    _Bool yielded;
};

#define yieldable  static struct _yield_state _state; \
                   if (_state.yielded) longjmp(_state.buf, 1); else {}

#define yield(x)  if (setjmp(_state.buf)) { _state.yielded = false;          }\
                  else                    { _state.yielded = true;  return x }

int func(int a, int b)
{
    yieldable;

    if (a > b)
        yield(0);

    return a + b;
}

You can find an example setjmp and longjmp implementation here. It is pure assembly, specific only to the underlying hardware.

What is the yield keyword used for in C#?

The yield contextual keyword actually does quite a lot here.

The function returns an object that implements the IEnumerable<object> interface. If a calling function starts foreaching over this object, the function is called again until it "yields". This is syntactic sugar introduced in C# 2.0. In earlier versions you had to create your own IEnumerable and IEnumerator objects to do stuff like this.

The easiest way understand code like this is to type-in an example, set some breakpoints and see what happens. Try stepping through this example:

public void Consumer()
{
    foreach(int i in Integers())
    {
        Console.WriteLine(i.ToString());
    }
}

public IEnumerable<int> Integers()
{
    yield return 1;
    yield return 2;
    yield return 4;
    yield return 8;
    yield return 16;
    yield return 16777216;
}

When you step through the example, you'll find the first call to Integers() returns 1. The second call returns 2 and the line yield return 1 is not executed again.

Here is a real-life example:

public IEnumerable<T> Read<T>(string sql, Func<IDataReader, T> make, params object[] parms)
{
    using (var connection = CreateConnection())
    {
        using (var command = CreateCommand(CommandType.Text, sql, connection, parms))
        {
            command.CommandTimeout = dataBaseSettings.ReadCommandTimeout;
            using (var reader = command.ExecuteReader())
            {
                while (reader.Read())
                {
                    yield return make(reader);
                }
            }
        }
    }
}

In practice, what are the main uses for the yield from syntax in Python 3.3?

Let's get one thing out of the way first. The explanation that yield from g is equivalent to for v in g: yield v does not even begin to do justice to what yield from is all about. Because, let's face it, if all yield from does is expand the for loop, then it does not warrant adding yield from to the language and preclude a whole bunch of new features from being implemented in Python 2.x.

What yield from does is it establishes a transparent bidirectional connection between the caller and the sub-generator:

The connection is "transparent" in the sense that it will propagate everything correctly too, not just the elements being generated (e.g. exceptions are propagated).
The connection is "bidirectional" in the sense that data can be both sent from and to a generator.

(If we were talking about TCP, yield from g might mean "now temporarily disconnect my client's socket and reconnect it to this other server socket".)

BTW, if you are not sure what sending data to a generator even means, you need to drop everything and read about coroutines first—they're very useful (contrast them with subroutines), but unfortunately lesser-known in Python. Dave Beazley's Curious Course on Coroutines is an excellent start. Read slides 24-33 for a quick primer.

Reading data from a generator using yield from

def reader():
    """A generator that fakes a read from a file, socket, etc."""
    for i in range(4):
        yield '<< %s' % i

def reader_wrapper(g):
    # Manually iterate over data produced by reader
    for v in g:
        yield v

wrap = reader_wrapper(reader())
for i in wrap:
    print(i)

# Result
<< 0
<< 1
<< 2
<< 3

Instead of manually iterating over reader(), we can just yield from it.

def reader_wrapper(g):
    yield from g

That works, and we eliminated one line of code. And probably the intent is a little bit clearer (or not). But nothing life changing.

Sending data to a generator (coroutine) using yield from - Part 1

Now let's do something more interesting. Let's create a coroutine called writer that accepts data sent to it and writes to a socket, fd, etc.

def writer():
    """A coroutine that writes data *sent* to it to fd, socket, etc."""
    while True:
        w = (yield)
        print('>> ', w)

Now the question is, how should the wrapper function handle sending data to the writer, so that any data that is sent to the wrapper is transparently sent to the writer()?

def writer_wrapper(coro):
    # TBD
    pass

w = writer()
wrap = writer_wrapper(w)
wrap.send(None)  # "prime" the coroutine
for i in range(4):
    wrap.send(i)

# Expected result
>>  0
>>  1
>>  2
>>  3

The wrapper needs to accept the data that is sent to it (obviously) and should also handle the StopIteration when the for loop is exhausted. Evidently just doing for x in coro: yield x won't do. Here is a version that works.

def writer_wrapper(coro):
    coro.send(None)  # prime the coro
    while True:
        try:
            x = (yield)  # Capture the value that's sent
            coro.send(x)  # and pass it to the writer
        except StopIteration:
            pass

Or, we could do this.

def writer_wrapper(coro):
    yield from coro

That saves 6 lines of code, make it much much more readable and it just works. Magic!

Sending data to a generator yield from - Part 2 - Exception handling

Let's make it more complicated. What if our writer needs to handle exceptions? Let's say the writer handles a SpamException and it prints *** if it encounters one.

class SpamException(Exception):
    pass

def writer():
    while True:
        try:
            w = (yield)
        except SpamException:
            print('***')
        else:
            print('>> ', w)

What if we don't change writer_wrapper? Does it work? Let's try

# writer_wrapper same as above

w = writer()
wrap = writer_wrapper(w)
wrap.send(None)  # "prime" the coroutine
for i in [0, 1, 2, 'spam', 4]:
    if i == 'spam':
        wrap.throw(SpamException)
    else:
        wrap.send(i)

# Expected Result
>>  0
>>  1
>>  2
***
>>  4

# Actual Result
>>  0
>>  1
>>  2
Traceback (most recent call last):
  ... redacted ...
  File ... in writer_wrapper
    x = (yield)
__main__.SpamException

Um, it's not working because x = (yield) just raises the exception and everything comes to a crashing halt. Let's make it work, but manually handling exceptions and sending them or throwing them into the sub-generator (writer)

def writer_wrapper(coro):
    """Works. Manually catches exceptions and throws them"""
    coro.send(None)  # prime the coro
    while True:
        try:
            try:
                x = (yield)
            except Exception as e:   # This catches the SpamException
                coro.throw(e)
            else:
                coro.send(x)
        except StopIteration:
            pass

This works.

# Result
>>  0
>>  1
>>  2
***
>>  4

But so does this!

def writer_wrapper(coro):
    yield from coro

The yield from transparently handles sending the values or throwing values into the sub-generator.

This still does not cover all the corner cases though. What happens if the outer generator is closed? What about the case when the sub-generator returns a value (yes, in Python 3.3+, generators can return values), how should the return value be propagated? That yield from transparently handles all the corner cases is really impressive. yield from just magically works and handles all those cases.

I personally feel yield from is a poor keyword choice because it does not make the two-way nature apparent. There were other keywords proposed (like delegate but were rejected because adding a new keyword to the language is much more difficult than combining existing ones.

In summary, it's best to think of yield from as a transparent two way channel between the caller and the sub-generator.

References:

PEP 380 - Syntax for delegating to a sub-generator (Ewing) [v3.3, 2009-02-13]
PEP 342 -
Coroutines via Enhanced Generators (GvR, Eby) [v2.5, 2005-05-10]

Yield Statement Implementation