Performance of Nested Yield in a Tree

Performance of nested yield in a tree

It's certainly not ideal in terms of performance - you end up creating a lot of iterators for large trees, instead of a single iterator which knows how to traverse efficiently.

Some blog entries concerning this:

Wes Dyer: All about iterators
Eric Lippert: Immutability in C#, part 6
Eric again: Immutability in C#, part 7

It's worth noting that F# has the equivalent of the proposed "yield foreach" with "yield!"

Get children of a Tree-Structure recursively by using yield return

To be recursive, your method should take a parameter (otherwise the recursion will never stop):

public IEnumerable<Member> GetMembers(Group group) {
     foreach(var member in group.MemberCollection) {
        yield return member;
     }
     foreach(var subGroup in group.GroupCollection) {
        foreach (var member in GetMembers(group)) {
            yield return member;
        }
     }
  }

However, recursive iterator blocks tend to be quite inefficient for deeply nested hierarchies. A better approach is to do use iteration rather than recursion. To get a depth-first traversal as in the recursive approach, you can do this:

public IEnumerable<Member> GetMembers() {
     var stack = new Stack<Group>();
     stack.Push(this);
     while (stack.Count > 0) {
         var group = stack.Pop();
         foreach(var member in group.MemberCollection) {
            yield return member;
         }
         foreach(var subGroup in group.GroupCollection) {
            stack.Push(subGroup);
         }
     }
  }

To get a breadth-first traversal, use a queue instead of a stack.

Overhead involved in nested sequences in F#

I thought the call to Seq.collect might cause it to generate an IEnumerable per call, so I replaced it with a for loop and it produces the same output (via ILSpy).

That made me curious so I decompiled a simpler sequence expression I'd expect to be "flattened."

let rec infSeq n =
  seq {
    yield n
    yield! infSeq (n+1)
  }

The code to generate the next element in the sequence decompiles to this:

public override int GenerateNext(ref IEnumerable<int> next)
{
    switch (this.pc)
    {
    case 1:
        this.pc = 2;
        next = Test.infSeq(this.n + 1);
        return 2;
    case 2:
        this.pc = 3;
        break;
    case 3:
        break;
    default:
        this.pc = 1;
        this.current = this.n;
        return 1;
    }
    this.current = 0;
    return 0;
}

As you can see, it calls itself recursively, generating a fresh IEnumerable each time. A quick test in FSI

infSeq 0 |> Seq.take 10000000 |> Seq.length

you can see there's a lot of GC:

> Real: 00:00:01.759, CPU: 00:00:01.762, GC gen0: 108, gen1: 107, gen2: 1

Compared to the C# version

public static IEnumerable<int> InfSeq(int n) {
    while (true) yield return n++;
}

in FSI:

> Real: 00:00:00.991, CPU: 00:00:00.998, GC gen0: 0, gen1: 0, gen2: 0

It's faster and uses constant memory (no extra IEnumerables).

I thought F# would generate a single IEnumerable for yield! in a tail position, but apparently not.

EDIT

The spec confirms this: {| yield! expr |} is elaborated as expr, that is, subsequences (recursive or otherwise) are not merged into a single IEnumerable.

Performance issue on query over nested recursive objects

You can use simple loop instead of recursion:

public IEnumerable<long> GetAllParentIdsOf(MyRecursiveObject obj)
{
    MyRecursiveObject child = obj;

   while (child.Parent != null)
   {
       child = child.Parent;
       yield return child.Id;
   }
}

Sample:

MyRecursiveObject obj = new MyRecursiveObject {    
    Id = 1,
    Parent = new MyRecursiveObject {
        Id = 2,
        Parent = new MyRecursiveObject { Id = 3 }
    }
};

GetAllParentIdsOf(obj).ToList().ForEach(Console.WriteLine);

// 2
// 3

Nested vs. Chained Unions

While dasblinkenlight is right that the items are each iterated exactly once, the three versions may still have measurable performance differences, depending on your objects.

The items will be inserted into a different number of Hashsets, depending on how far down the Union tree they are.

While inserting into a Hashset is nominally O(1), it does have a cost, and it is not always constant in practice, depending on the details of your objects.

When an item is inserted into a Hashset, GetHashCode is called, and the item needs to be compared using Equals to any other objects in the set that have the same int hashcode. For extremely complex objects, GetHashCode may be expensive. If the items hashkeys are not widely distributed, then Equals may be called, which may be expensive.

The following demo, based on @dasblinkenlight's answer shows GetHashCode being called a different number of times depending on the Union ordering. I have not demoed Equals being called in the case of hash collisions, but you can try that out if you desire.

using System;
using System.Linq;
using System.Collections;
using System.Collections.Generic;

public class Test {
    public static void Main() {
            var A = new VisibleIterator("A");
            var B = new VisibleIterator("B");
            var C = new VisibleIterator("C");
            var D = new VisibleIterator("D");
            Console.WriteLine("--- A.Union(B).Union(C).Union(D)");
            var foo = (A.Union(B).Union(C).Union(D)).ToList();
            Console.WriteLine("--- A.Union(B.Union(C.Union(D)))");
            var bar = (A.Union(B.Union(C.Union(D)))).ToList();
            Console.WriteLine("--- D.Union(C.Union(B.Union(A)))");
            var baz = (D.Union(C.Union(B.Union(A)))).ToList();
    }
}

    class VisibleIterator : IEnumerable<VisibleHasher> {
        private readonly string name;
        public VisibleIterator(string name) {
            this.name = name;
        }
        public IEnumerator<VisibleHasher> GetEnumerator() {
            for (var i = 0 ; i != 4 ; i++) {
                var res = name+i;
                Console.WriteLine("Iterating " + res);
                yield return new VisibleHasher(res);
            }
        }
        IEnumerator IEnumerable.GetEnumerator() {
            return GetEnumerator();
        }
    }

    class VisibleHasher {
        private readonly string val;

        public VisibleHasher(String val) {
            this.val = val;
        }

        public override int GetHashCode() {
            Console.WriteLine("Hashing '" + val + "'");
            return val.GetHashCode();
        }
    }

Demo (based on dasblinkenlight's answer)

Alternative Approach

If you think the costs of these hash insertions may be significant, then the following should guarantee one hash insertion per item:

A.Concat(B).Concat(C).Concat(D).Distinct().ToList()

When NOT to use yield (return)

What are the cases where use of yield will be limiting, unnecessary, get me into trouble, or otherwise should be avoided?

It's a good idea to think carefully about your use of "yield return" when dealing with recursively defined structures. For example, I often see this:

public static IEnumerable<T> PreorderTraversal<T>(Tree<T> root)
{
    if (root == null) yield break;
    yield return root.Value;
    foreach(T item in PreorderTraversal(root.Left))
        yield return item;
    foreach(T item in PreorderTraversal(root.Right))
        yield return item;
}

Perfectly sensible-looking code, but it has performance problems. Suppose the tree is h deep. Then there will at most points be O(h) nested iterators built. Calling "MoveNext" on the outer iterator will then make O(h) nested calls to MoveNext. Since it does this O(n) times for a tree with n items, that makes the algorithm O(hn). And since the height of a binary tree is lg n <= h <= n, that means that the algorithm is at best O(n lg n) and at worst O(n^2) in time, and best case O(lg n) and worse case O(n) in stack space. It is O(h) in heap space because each enumerator is allocated on the heap. (On implementations of C# I'm aware of; a conforming implementation might have other stack or heap space characteristics.)

But iterating a tree can be O(n) in time and O(1) in stack space. You can write this instead like:

public static IEnumerable<T> PreorderTraversal<T>(Tree<T> root)
{
    var stack = new Stack<Tree<T>>();
    stack.Push(root);
    while (stack.Count != 0)
    {
        var current = stack.Pop();
        if (current == null) continue;
        yield return current.Value;
        stack.Push(current.Left);
        stack.Push(current.Right);
    }
}

which still uses yield return, but is much smarter about it. Now we are O(n) in time and O(h) in heap space, and O(1) in stack space.

Further reading: see Wes Dyer's article on the subject:

http://blogs.msdn.com/b/wesdyer/archive/2007/03/23/all-about-iterators.aspx

Why is using a Python generator much slower to traverse binary tree than not?

On PyPy, function calls are much more highly optimized than generators or iterators.

There are many things that have different performance characteristics in PyPy (for example, PyPy's itertools.islice() performs abyssmally).

You're doing the right thing by measuring the performance to see which way is fastest.

Also note PyPy has tools to show the code that is generated so you get a more detailed answer to the question "what does it do". Of course, the question of "why does it do that" has a human component in the answer that involves what was convenient to implement or the proclivities of the implementers.

Yield Return Many?

No, there isn't, unless you completely replace every yield return with a single return statement using LINQ.

For example:

return someSet
    .Concat(someOtherSet.SelectMany(s => FindSingle(context, s));

Recursion applied to lists vs trees in Python

Firstly, bear in mind that a language is not the same as the implementation of a language. The call stack size in Python varies by implementation. The limit of 1000 applies to CPython but not necessarily all implementations.

Is it reasonable to use recursion to process balanced trees?

Recursion is reasonable for balanced trees. As you've described quite clearly, the maximum call stack depth is a logarithmic factor on the size of the structure. You'd need a huge input to blow the stack.

With a list or degenerate unbalanced tree, recursion is inappropriate because the maximum call stack depth is linear on the structure length.

That said, I don't find it necessary to use self-balancing trees often in Python. Most work is done on lists and dicts, occasionally nested and recursively defined. The fact that recursion happens to be a good fit for structures like trees doesn't imply that it's widely applicable in general in Python.

Is it true that for most problems it is just not an option (lists, non-balanced trees, etc)?

Yes, but recursion is inhibited by design in CPython. Python is not a functional language. CPython doesn't support tail call optimization and "never" will.

Guido has a great blog post that defends CPython's anti-recursion design. Regardless of whether you agree with this design or not, tools should generally be used as intended, not as one wishes them to be.

In a pinch, Python (as a multi-paradigm language) can support code written in a functional style and there are workarounds to make long recursion work, but that doesn't change the fact that they're workarounds.

Anything else that I should take into account here? I am just trying to learn more about when recursion is the good option in general when using Python.

Function calls in CPython have more overhead (time and space) than iteration, so there are performance issues to consider even when the structure size fits in the call stack or you use a trick to support deep recursion.

Using setrecursionlimit is unsafe and almost never necessary. Such an option is not available in most languages. Increasing it means you're running the risk of the operating system killing the CPython interpreter. Sure, fiddling with the limit can come in handy for quick scripts and debugging, but it's not a general solution.

The recursion tag is flooded with questions from well-meaning CS students tasked with solving problems with recursion similar to your example that blows the stack using Python and other non-functional languages. The quantity of these posts and the confusion on the part of the students suggests that the CS education system has a role in pushing recursion as the default option for iteration. Recursion is only as elegant as the extent to which the language was designed for it. The fact that recursion tends to be confusing to students is more a symptom of the misapplication of it to imperative languages where recursion is a second-class citizen after loops than anything inherent in recursion itself.

You'll probably never need recursion in Python other than school assignments, algorithm coding challenges and the rare practical business application use. But if you have a hammer, everything looks like a nail, so this isn't to say you can't apply recursion to each and every problem you see, it's that you probably shouldn't.

Recursive thinking is very valuable, and this isn't an attack on recursion as a tool in general, just recognizing that Python is particularly ill-suited for using it (as are most other imperative languages).

Unrelated to recursion and I understand your code is just a demonstration, but max is a builtin function, so I'd pick a different name. Taken alone, max(list(range(999))) looks like it should work.

Python recursive generators performance

rec_subsets() is still faster (for range(20)) even if result.append(s) is added inplace of # do something with s and the results of both gen_subsets() and rec_subsets() are consumed.

It might be explained by the following quote from PEP 380 (yield from syntax support):

Using a specialised syntax opens up possibilities for optimisation
when there is a long chain of generators. Such chains can arise, for
instance, when recursively traversing a tree structure. The overhead
of passing __next__() calls and yielded values down and up the chain
can cause what ought to be an O(n) operation to become, in the worst
case, O(n**2).

You could generate a powerset using itertools.combinations():

from itertools import combinations

def subsets_comb(lst):
    return (comb for r in range(len(lst)+1) for comb in combinations(lst, r))

It is faster for range(20) on my machine:

name                    time ratio comment
subsets_comb        227 msec  1.00 [range(0, 20)]
subsets_ipowerset   476 msec  2.10 [range(0, 20)]
subsets_rec         957 msec  4.22 [range(0, 20)]
subsets_gen_pep380 2.34  sec 10.29 [range(0, 20)]
subsets_gen        2.63  sec 11.59 [range(0, 20)]

To reproduce the results, run time-subsets.py.

Performance of Nested Yield in a Tree