How Do Closures Work Behind the Scenes? (C#)

How do closures work behind the scenes? (C#)

The compiler (as opposed to the runtime) creates another class/type. The function with your closure and any variables you closed over/hoisted/captured are re-written throughout your code as members of that class. A closure in .Net is implemented as one instance of this hidden class.

That means your count variable is a member of a different class entirely, and the lifetime of that class works like any other clr object; it's not eligible for garbage collection until it's no longer rooted. That means as long as you have a callable reference to the method it's not going anywhere.

Deep diving into the implementation of closures

It helps to look at the fully de-compiled code:

// Decompiled with JetBrains decompiler
// Type: Program
// Assembly: test, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null
// MVID: D26FF17C-3FD8-4920-BEFC-ED98BC41836A
// Assembly location: C:\temp\test.exe
// Compiler-generated code is shown

using System;
using System.Runtime.CompilerServices;

internal static class Program
{
  private static void Main()
  {
    Program.\u003C\u003Ec__DisplayClass1 cDisplayClass1 = new Program.\u003C\u003Ec__DisplayClass1();
    cDisplayClass1.x = 1;
    // ISSUE: method pointer
    Action action = new Action((object) cDisplayClass1, __methodptr(\u003CMain\u003Eb__0));
    cDisplayClass1.x = 3;
    action();
    Console.WriteLine(cDisplayClass1.x);
  }

  [CompilerGenerated]
  private sealed class \u003C\u003Ec__DisplayClass1
  {
    public int x;

    public \u003C\u003Ec__DisplayClass1()
    {
      base.\u002Ector();
    }

    public void \u003CMain\u003Eb__0()
    {
      Console.WriteLine(this.x);
      this.x = 2;
    }
  }
}

Specifically, look at how Main got re-written:

  private static void Main()
  {
    Program.\u003C\u003Ec__DisplayClass1 cDisplayClass1 = new Program.\u003C\u003Ec__DisplayClass1();
    cDisplayClass1.x = 1;
    // ISSUE: method pointer
    Action action = new Action((object) cDisplayClass1, __methodptr(\u003CMain\u003Eb__0));
    cDisplayClass1.x = 3;
    action();
    Console.WriteLine(cDisplayClass1.x);
  }

You see that the x being affected is attached to the closure class generated from the code. The following line changes x to 3:

    cDisplayClass1.x = 3;

And this is the same x that the method behind action is referring to.

C# - How do closures work in lambdas and how does the garbage collector deal with them?

It does it through a closure

A closure is a block of code which can be executed at a later time, but which keeps and maintains the environment in which it was first created via the use of a compiler generated class. it can still use the local variables even when the method has finished, the garbage collector maintain reference counts to anything it needs and nothing gets collected which shouldn't be

C# closure. How it changes the variable of struct type?

Your pseudocode isn't quite right. It actually replaces all use of the local with the closure class's field:

static void Main(string[] args)
{
  //create helper class
  DisplayClass1 class1 = new DisplayClass1();
  //initialize fields
  class1.i = 100;

  //crete instance of delegate
  Action d = new Action(class1.Main);

  d.Invoke();

  Console.WriteLine(class1.i.ToString());
}

Closures and Lambda in C#

Yes, FindAll will create a new list. You want "Where", which will return an IEnumerable object that knows how to loop over your existing list:

foreach (string name in names.Where(n => n.StartsWith("C") ) ) 
{
    Console.WriteLine(name);
}

But there's no closure in that code, because there's no local variable to capture.

Captured variable in a loop in C#

Yes - take a copy of the variable inside the loop:

while (variable < 5)
{
    int copy = variable;
    actions.Add(() => copy * 2);
    ++ variable;
}

You can think of it as if the C# compiler creates a "new" local variable every time it hits the variable declaration. In fact it'll create appropriate new closure objects, and it gets complicated (in terms of implementation) if you refer to variables in multiple scopes, but it works :)

Note that a more common occurrence of this problem is using for or foreach:

for (int i=0; i < 10; i++) // Just one variable
foreach (string x in foo) // And again, despite how it reads out loud

See section 7.14.4.2 of the C# 3.0 spec for more details of this, and my article on closures has more examples too.

Note that as of the C# 5 compiler and beyond (even when specifying an earlier version of C#), the behavior of foreach changed so you no longer need to make local copy. See this answer for more details.

Why Are Some Closures 'Friendlier' Than Others?

Your first example had two different int count variable declarations (from the separate method calls). Your second example is sharing the same variable declaration.

Your first example would behave the same as the second example had int count been a field of your main program:

static int count = 0;

static Action GetWorker(int k)
{
    return k == 0 ? (Action)(() => Console.WriteLine("Working 1 - {0}",count++))
                  : (Action)(() => Console.WriteLine("Working 2 - {0}",count++));
}

This outputs:

Working 1 - 0
Working 2 - 1
Working 1 - 2
Working 2 - 3
Working 1 - 4
Working 2 - 5
Working 1 - 6
Working 2 - 7

You can simplify it without the ternary operator as well:

static Action GetWorker(int k)
{
    int count = 0;

    return (Action)(() => Console.WriteLine("Working {0} - {1}",k,count++));
}

Which outputs:

Working 1 - 0
Working 2 - 0
Working 1 - 1
Working 2 - 1
Working 1 - 2
Working 2 - 2
Working 1 - 3
Working 2 - 3

The main issue is that a local variable declared in a method (in your case int count = 0;) is unique for that invocation of the method, then when the lambda delegate is created, each one is applying closure around its own unique count variable:

Action x1 = GetWorker(0); //gets a count
Action x2 = GetWorker(1); //gets a new, different count

Closures with anonymous methods in a for loop

Yes, many instances are generated.

You need an the extra variable j in the scope of the loop body because the variable i has a scope of the method's body, and only a single closure object would be generated for it.

void Main()
{
    AddActions(10);

    var closure1 = functions[0]();
    var closure2 = functions[1]();

    Console.WriteLine(object.ReferenceEquals(closure1, closure2));
    // False
}

public static void AddActions(int count)
{
    for (int i = 0; i < count; i++)
    {
        int j = i;
        functions.Add(delegate()
        {
            Console.WriteLine(j);
            Expression<Func<int>> exp = () => j;
            Console.WriteLine(exp.ToString());
            var m = (MemberExpression)exp.Body;
            var c = (ConstantExpression)m.Expression;
            Console.WriteLine(c.Value.ToString());
            return c.Value;
        });
    }
}

public static List<Func<object>> functions = new List<Func<object>>();

Result

0
() => value(UserQuery+<>c__DisplayClass1_0).j
UserQuery+<>c__DisplayClass1_0
1
() => value(UserQuery+<>c__DisplayClass1_0).j
UserQuery+<>c__DisplayClass1_0
False

How is a referencing environment generally implemented for closures?

Here's a running example in pseudo-javascript-like syntax.

function f(x) {
  var y = ...;
  function g(z) {
    function h(w) {
      .... y, z, w ....
    }
    .... x, h ....
  }
  .... x, g ....
}

One representation is a linked chain of environments. That is, a closure consists of a code pointer, some slots, and a reference to the enclosing closure or the top-level environment. In this representation,

f = [<code>, <top-level-env>]
g = [<code>, f, x, y]
h = [<code>, g, z]

except sometimes it's better to let every function have a direct reference to the top-level environment, since it's used so often:

f = [<code>, <top-level-env>]
g = [<code>, <top-level-env>, f, x, y]
h = [<code>, <top-level-env>, g, z]

(There are other variations too.)

One advantage of this representation is that you can store mutable variables right in the closure. (Well, maybe, depending on how you represent function activations.) One disadvantage is some variables may take multiple hops to reach, if you have deeply nested closures. Another disadvantage is that if a closure outlives its parent (eg, g returns h) then this representation might prevent the GC from collecting environment frames that are mostly or even completely unreachable.

Another representation is "flat closures": each closure contains a code pointer and slots for all of the code's free variables.

g = [<code>, x, y]
h = [<code>, y, z]

This representation fixes the space/GC problem; no closure pins another closure in memory. On the other hand, free-variable slots are copied instead of shared, so if there is a nested closure with many free variables---or many instances of a nested closure---overall memory usage might be higher. Also, this representation typically requires storage for mutable variables to be heap-allocated (but only for variables that are actually mutated and only when the mutation cannot be automatically rewritten).

There are also hybrid approaches. For example, you might have mostly-flat closures but treat the top-level environment specially:

g = [<code>, <top-level-env>, x, y]

Or you might have a "sufficiently clever" (or at least "sufficiently ambitious") compiler that tries to pick between representations based on number of free variables, nesting depth, etc.

Possible to Lock() within a Closure? What does that look like in Lambdas and code output?

If you have a reason to lock, then yes, there's nothing stopping you from putting a lock statement in a closure.

For example, you could do this:

public static Action<T> GetLockedAdd<T>(IList<T> list)
{
    var lockObj = new object();
    return x =>
    {
        lock (lockObj)
        {
            list.Add(x);
        }
    }
}

What does this look like, in terms of compiler-generated code? Ask yourself: what is captured?

A local object used for locking.
The IList<T> passed in.

These will be captured as instance fields in a compiler-generated class. So the result will look something like this:

class LockedAdder<T>
{
    // This field serves the role of the lockObj variable; it will be
    // initialized when the type is instantiated.
    public object LockObj = new object();

    // This field serves as the list parameter; it will be set within
    // the method.
    public IList<T> List;

    // This is the method for the lambda.
    public void Add(T x)
    {
        lock (LockObj)
        {
            List.Add(x);
        }
    }
}

public static Action<T> GetLockedAdd<T>(IList<T> list)
{
    // Initializing the lockObj variable becomes equivalent to
    // instantiating the generated class.
    var lockedAdder = new LockedAdder<T> { List = list };

    // The lambda becomes a method call on the instance we have
    // just made.
    return new Action<T>(lockedAdder.Add);
}

Does that make sense?

How Do Closures Work Behind the Scenes? (C#)