Proper Use of 'Yield Return'

Proper use of 'yield return'

I tend to use yield-return when I calculate the next item in the list (or even the next group of items).

Using your Version 2, you must have the complete list before returning.
By using yield-return, you really only need to have the next item before returning.

Among other things, this helps spread the computational cost of complex calculations over a larger time-frame. For example, if the list is hooked up to a GUI and the user never goes to the last page, you never calculate the final items in the list.

Another case where yield-return is preferable is if the IEnumerable represents an infinite set. Consider the list of Prime Numbers, or an infinite list of random numbers. You can never return the full IEnumerable at once, so you use yield-return to return the list incrementally.

In your particular example, you have the full list of products, so I'd use Version 2.

Why use the yield keyword, when I could just use an ordinary IEnumerable?

Using yield makes the collection lazy.

Let's say you just need the first five items. Your way, I have to loop through the entire list to get the first five items. With yield, I only loop through the first five items.

Can I use using in a yield-return-method?

IEnumerator<T> implements IDisposable, and foreach loops will dispose the thing they're enumerating over when they're finished (this includes linq methods which use a foreach loop, such as .ToArray()).

It turns out that the compiler-generated state machine for generator methods implements Dispose in a smart way: if the state machine is in a state which is "inside" a using block, then calling Dispose() on the state machine will dispose the thing protected by the using statement.


Let's take an example:

public IEnumerable<string> M() {
yield return "1";
using (var ms = new MemoryStream())
{
yield return "2";
yield return "3";
}
yield return "4";
}

I'm not going to paste the entire generated state machine, as it's very large. You can see it on SharpLab here.

The core of the state machine is the following switch statement, which tracks our progress past each of the yield return statements:

switch (<>1__state)
{
default:
return false;
case 0:
<>1__state = -1;
<>2__current = "1";
<>1__state = 1;
return true;
case 1:
<>1__state = -1;
<ms>5__1 = new MemoryStream();
<>1__state = -3;
<>2__current = "2";
<>1__state = 2;
return true;
case 2:
<>1__state = -3;
<>2__current = "3";
<>1__state = 3;
return true;
case 3:
<>1__state = -3;
<>m__Finally1();
<ms>5__1 = null;
<>2__current = "4";
<>1__state = 4;
return true;
case 4:
<>1__state = -1;
return false;
}

You can see that we create the MemoryStream as we enter state 2, and dispose it (by calling <>m__Finally1()) as we exit state 3.

Here's the Dispose method:

void IDisposable.Dispose()
{
int num = <>1__state;
if (num == -3 || (uint)(num - 2) <= 1u)
{
try
{
}
finally
{
<>m__Finally1();
}
}
}

If we're in states -3, 2, or 3, then we'll call <>m__Finally1();. States 2 and 3 are those inside the using block.

(State -3 seems to be a guard in case we wrote yield return Foo() and Foo() threw an exception: in this case we would stay in state -3 and would be unable to iterate any further. However we are still allowed to dispose the MemoryStream in this case).

Just for completeness, <>m__Finally1 is defined as:

private void <>m__Finally1()
{
<>1__state = -1;
if (<ms>5__1 != null)
{
((IDisposable)<ms>5__1).Dispose();
}
}

You can find the specification for this in the C# Language Specification, section 10.14.4.3:

  • If the state of the enumerator object is suspended, invoking Dispose:

    • Changes the state to running.
    • Executes any finally blocks as if the last executed yield return statement were a yield break statement. If this causes an exception to be thrown and propagated out of the iterator body, the state of the enumerator object is set to after and the exception is propagated to the caller of the Dispose method.
    • Changes the state to after.

Does yield return have any uses other than for IEnumerable?

Your question

Does C#'s yield return have a use outside of IEnumerables

The answer is no, you can see this by the yield documentation

yield (C# Reference)

When you use the yield contextual keyword in a statement, you indicate
that the method, operator, or get accessor in which it appears is an
iterator. Using yield to define an iterator removes the need for an
explicit extra class (the class that holds the state for an
enumeration, see IEnumerator for an example) when you implement the
IEnumerable and IEnumerator pattern for a custom collection type.

Iterator methods and get accessors

The declaration of an iterator must meet the following requirements:

>>>The return type must be IEnumerable, IEnumerable, IEnumerator, or IEnumerator <<<

What is the purpose/advantage of using yield return iterators in C#?

But what if you were building a collection yourself?

In general, iterators can be used to lazily generate a sequence of objects. For example Enumerable.Range method does not have any kind of collection internally. It just generates the next number on demand. There are many uses to this lazy sequence generation using a state machine. Most of them are covered under functional programming concepts.

In my opinion, if you are looking at iterators just as a way to enumerate through a collection (it's just one of the simplest use cases), you're going the wrong way. As I said, iterators are means for returning sequences. The sequence might even be infinite. There would be no way to return a list with infinite length and use the first 100 items. It has to be lazy sometimes. Returning a collection is considerably different from returning a collection generator (which is what an iterator is). It's comparing apples to oranges.

Hypothetical example:

static IEnumerable<int> GetPrimeNumbers() {
for (int num = 2; ; ++num)
if (IsPrime(num))
yield return num;
}

static void Main() {
foreach (var i in GetPrimeNumbers())
if (i < 10000)
Console.WriteLine(i);
else
break;
}

This example prints prime numbers less than 10000. You can easily change it to print numbers less than a million without touching the prime number generation algorithm at all. In this example, you can't return a list of all prime numbers because the sequence is infinite and the consumer doesn't even know how many items it wants from the start.

When to use Yield?

The yield construct is used to create an iterator that can produce multiple values in succession:

IEnumerable<int> three_numbers() {
yield return 1;
yield return 2;
yield return 3;
}
...
foreach (var i in three_numbers()) {
// i becomes 1, 2 and 3, in turn.
}

When NOT to use yield (return)

What are the cases where use of yield will be limiting, unnecessary, get me into trouble, or otherwise should be avoided?

It's a good idea to think carefully about your use of "yield return" when dealing with recursively defined structures. For example, I often see this:

public static IEnumerable<T> PreorderTraversal<T>(Tree<T> root)
{
if (root == null) yield break;
yield return root.Value;
foreach(T item in PreorderTraversal(root.Left))
yield return item;
foreach(T item in PreorderTraversal(root.Right))
yield return item;
}

Perfectly sensible-looking code, but it has performance problems. Suppose the tree is h deep. Then there will at most points be O(h) nested iterators built. Calling "MoveNext" on the outer iterator will then make O(h) nested calls to MoveNext. Since it does this O(n) times for a tree with n items, that makes the algorithm O(hn). And since the height of a binary tree is lg n <= h <= n, that means that the algorithm is at best O(n lg n) and at worst O(n^2) in time, and best case O(lg n) and worse case O(n) in stack space. It is O(h) in heap space because each enumerator is allocated on the heap. (On implementations of C# I'm aware of; a conforming implementation might have other stack or heap space characteristics.)

But iterating a tree can be O(n) in time and O(1) in stack space. You can write this instead like:

public static IEnumerable<T> PreorderTraversal<T>(Tree<T> root)
{
var stack = new Stack<Tree<T>>();
stack.Push(root);
while (stack.Count != 0)
{
var current = stack.Pop();
if (current == null) continue;
yield return current.Value;
stack.Push(current.Left);
stack.Push(current.Right);
}
}

which still uses yield return, but is much smarter about it. Now we are O(n) in time and O(h) in heap space, and O(1) in stack space.

Further reading: see Wes Dyer's article on the subject:

http://blogs.msdn.com/b/wesdyer/archive/2007/03/23/all-about-iterators.aspx

What is the yield keyword used for in C#?

The yield contextual keyword actually does quite a lot here.

The function returns an object that implements the IEnumerable<object> interface. If a calling function starts foreaching over this object, the function is called again until it "yields". This is syntactic sugar introduced in C# 2.0. In earlier versions you had to create your own IEnumerable and IEnumerator objects to do stuff like this.

The easiest way understand code like this is to type-in an example, set some breakpoints and see what happens. Try stepping through this example:

public void Consumer()
{
foreach(int i in Integers())
{
Console.WriteLine(i.ToString());
}
}

public IEnumerable<int> Integers()
{
yield return 1;
yield return 2;
yield return 4;
yield return 8;
yield return 16;
yield return 16777216;
}

When you step through the example, you'll find the first call to Integers() returns 1. The second call returns 2 and the line yield return 1 is not executed again.

Here is a real-life example:

public IEnumerable<T> Read<T>(string sql, Func<IDataReader, T> make, params object[] parms)
{
using (var connection = CreateConnection())
{
using (var command = CreateCommand(CommandType.Text, sql, connection, parms))
{
command.CommandTimeout = dataBaseSettings.ReadCommandTimeout;
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
yield return make(reader);
}
}
}
}
}


Related Topics



Leave a reply



Submit