Delegate Caching Behavior Changes in Roslyn

Delegate caching behavior changes in Roslyn

Yes. The most important part is that the method containing lambda implementation is now an instance method.

You can see a delegate as a middleman receiving an instance call through Invoke and dispatching that call according to the calling convention of the implementing method.

Note that there are platform ABI requirements that specify how arguments are passed, how results are returned, what arguments are passed via registers and in which ones, how "this" is being passed and so on. Violating these rules may have bad impact on tools that rely on stack-walking, such as debuggers.

Now, if the implementing method is an instance method, the only thing that needs to happen inside the delegate is to patch "this", which is the delegate instance at the time of Invoke, to be the enclosed Target object. At that point, since everything else is already where it needs to be, the delegate can jump directly to the implementing method body. In many cases this is noticeably less work than what would need to happen if the implementing method was a static method.

Lambda treated as a closed delegate in Roslyn

This was a change made to Roslyn in 2014. It is quite strange, but it was actually done to improve performance. "In the new strategy all lambdas are emitted as instance methods" - from the discussion at roslyn.codeplex.com (note: dead link).

Why is Action.Method.IsStatic different between Visual Studio 2013 and 2015 for certain lambda expressions

Check the answer in the following link :
Delegate caching behavior changes in Roslyn

Basically what changed and i quote @Yuzal from the linked answer :

"Delegate caching behavior was changed in Roslyn. Previously, as
stated, any lambda expression which didn't capture variables was
compiled into a static method at the call site. Roslyn changed this
behavior. Now, any lambda, which captures variables or not, is
transformed into a display class:"

And by display class he meant a generated private sealed class within which encapsulates the instance method invoked by the action delegate.

Why the change was made ? Quoting @Kevin Pilch-Bisson (a member of the C# IDE team) :

The reason it's faster is because delegate invokes are optimized for
instance methods and have space on the stack for them. To call a
static method they have to shift parameters around.

So basically the comment is self explanatory. The behaviour difference you see in the example above is because they noticed that if the Action delegate invoked instance methods its faster than invoking static methods regardless if the lambda captures variables or not.

Lambda without nonlocal dependencies is not static

Or does C# not guarantee that lambdas without nonlocal dependencies are made static?

It makes no such guarantee. That was an implementation detail of some versions of the compiler; one that it has no obligation to continue with going forward. As you have seen, they've changed that implementation detail.

Why has a lambda with no capture changed from a static in C# 5 to an instance method in C# 6?

I don't have an answer as to why that is so (reproduced locally, too).

However, the answer to:

Why is it so? How can this be avoided so that Expression.Call would
begin to work again in new Visual Studio?

You can do this (works on both compilers):

Action<int, int> a = (x, y) => Console.WriteLine(x + y);

ParameterExpression p1 = Expression.Parameter(typeof(int), "p1");
ParameterExpression p2 = Expression.Parameter(typeof(int), "p2");

MethodCallExpression call;
if (a.Method.IsStatic)
{
    call = Expression.Call(a.Method, p1, p2);
}
else
{
    call = Expression.Call(Expression.Constant(a.Target), a.Method, p1, p2);
}

Thanks to Jeppe Stig Nielsen for fix regarding a.Target

ListT Sort Memory Leak

What you're seeing is not a memory leak. It's simply the way the compiler is caching your Comparison<IndicatorPropReport> as a static delegate at the call site, thus saving you the need to create an instance of it for each invocation.

If you look at this simplified example:

var ints = new List<int> { 3, 2, 1, 8, 5 };
ints.Sort((x, y) => x.CompareTo(y));

And look at what the compiler generates using an .NET decompiler:

[CompilerGenerated]
private static Comparison<int> CS$<>9__CachedAnonymousMethodDelegate2;

public static void Main(string[] args)
{
    List<int> ints = new List<int> { 3,2,1,8,5 };
    List<int> arg_51_0 = ints;

    if (Program.CS$<>9__CachedAnonymousMethodDelegate2 == null)
    {
        Program.CS$<>9__CachedAnonymousMethodDelegate2 = 
                new Comparison<int>(Program.<Main>b__1);
    }
    arg_51_0.Sort(Program.CS$<>9__CachedAnonymousMethodDelegate2);
}

[CompilerGenerated]
private static int <Main>b__1(int x, int y)
{
    return x.CompareTo(y);
}

You see that the Comparsion<int> was cached as a static delegate. The same behavior is what happens in your method call.

Note this behavior is pre Roslyn. Roslyn changes the way delegates are cached by creating a display class instead of the static delegate, even when there's no captured variables.

Roslyn scripting engine does not throw runtime exception when used as delegate

There are two issues with your code:

In the first version, you're catching Exception, which means that when the Assert.Fail is reached and throws AssertionException, that exception is then caught and ignored.
This means that there is no difference between RunAsync and delegate here, neither of them throws DivideByZeroException.
Both RunAsync and the ScriptRunner<T> delegate return Task. That means to actually wait for them to complete or to observe any exceptions, you need to use await. Once you do that, you will see the DivideByZeroException that you're expecting.

C# Why using instance method as delegate allocates GC0 temp objects but 10% faster than a cached delegate

So after reading the question much too quickly and thinking the it was asking something else I've finally had some time to sit down and play with the Aeoron test in question.

I tried a few things, first of all I compared the IL and Assembler produced and found that there was basically no difference at either the site where we call Poll() or at the site where the handler is actually called.

Secondly I tried commenting out the code in the Poll() method to confirm that the cached version did actually run faster (which it did).

Thridly I tried looking at the CPU counters (Cache misses, instructions retired and branch mis-predictions) in the VS profiler but could not see any differences between the two version other than the fact that the delegate constructor obviously was called more times.

This made me think about a similar case that we ran accross in porting Disruptor-net where we had a test that was running slower than the java version but we were sure we weren't doing anything more costly. The reason for the "slowness" of the test was that we were actually faster and therefore batched less and therefore our throughput was lower.

If you insert a Thread.SpinWait(5) just before the call to Poll() you will see the same or better performance as the non-cached version.

Original answer to the question which I thought at the time was "why using an instance method delegate is slower than caching the delegate manually":

The clue is in the question. It's an instance method and therefore it implicitly captures the this member and the fact that this is captured means that it cannot be cached. Given that this will never change during the lifetime of the cached delegate it should be cacheable.

If you expand the method group to (first, second) => this.Add(first, second) the capture becomes more obvious.

Note that the Roslyn team is working on fixing this: https://github.com/dotnet/roslyn/issues/5835

Why does the C# compiler create a new Action instance for every passed delegate?

we can write our own simple performance improvement that reduces unnecessary garbage

You have rediscovered a special case of common subexpression elimination -- the optimization of identifying when two or more expressions have exactly the same value, computing the value once, and storing it in a variable to be re-used.

Before continuing, I caution you that all so-called "optimizations" are actually trading off one thing for another. Your proposed optimization trades a small amount of collection pressure generated on each call for instead a small memory leak. The cached value in the static field would become a permanent member of the gen 2 heap. Is that worthwhile? It's a question that you'd want to answer by actually making measurements.

For this very simple case, is there any reason Roslyn couldn't make a similar optimisation?

There is no in principle reason why this optimization could not be performed if the optimization did not produce an unacceptable change in the behaviour of the program.

In particular, the optimization causes two delegates that were previously value-equal but not reference-equal to become reference-equal. That's likely acceptable.

In practice, implementing optimizations requires large amounts of effort in designing, implementing, testing and maintaining the code that does the optimization. C# does not implement common subexpression elimination optimizations. This optimization has poor bang-for-buck. Few people write code that would benefit from the optimization, and the optimization is small, and it is easy, as you discovered, to do the optimization "by hand" if you care.

I note that C# does do a similar cache on lambdas. It will not do common subexpression elimination, but it will generate certain lambdas only once and cache the results:

void M() { Action x = () => {}; ... }

is generated as though you wrote:

static Action anon = null;
void M() 
{
  if (anon == null) anon = () => {};
  Action x = anon;
  ...

If the answer is no, what about when the methods are not static but instance members?

There is no in principle reason why this optimization could not be performed if the optimization did not produce an unacceptable change in the behaviour of the program.

I note that in this case the optimization would be required to deduce when the instances were the same of course. To not do so would be to fail to maintain the invariant that program behaviour must not change.

Again, in practice, C# does not do common subexpression elimination.

And what about when there are closed-over variables captured?

Captured by what? You were talking about method group conversions to delegates just now, and apparently now we are talking about lambdas converted to delegates.

The C# specification explicitly states that the compiler may choose to do common subexpression elimination on identical lambdas, or not, as it sees fit.

There is no in principle reason why this optimization could not be performed if the optimization did not produce an unacceptable change in the behaviour of the program. Since the specification explicitly calls out that this optimization is permitted, it is by definition acceptable.

Again, in practice, C# does not do common subexpression elimination.

Perhaps you are noticing a trend here. The answer to the question "is such and such an optimization permitted?" is almost always "yes, if it does not produce an unacceptable change in the behaviour of the program". But the answer to the question "does C# implement such and such an optimization in practice?" is usually no.

If you want some background on the optimizations the compiler does perform, I described them in 2009.

Roslyn does a better job of these optimizations for the most part. For example, Roslyn does a better job of reifying temporary values and locals as ephemeral rather than durable variables. I completely rewrote the nullable arithmetic optimizer; my eight-part series of articles describing how is here. And there were many more improvements. We never considered doing CSE though.

Behavior of Assembly.GetTypes() changed in Visual Studio 2015

Has anyone else seen this?

Yes, this is caused by the new compiler behavior for lifting lambda expressions.

Previously, if a lambda expression didn't capture any local variables, it would be cached as a static method at the call site, which made the compiler team need to jump some hoops in order to properly align the method arguments and the this parameter. The new behavior in Roslyn is that all lambda expressions get lifted into a display class, where the delegate is exposed as an instance method in the display class, disregarding if it captures any local variables.

If you decompile your method in Roslyn, you see this:

private static void Main(string[] args)
{
    IEnumerable<Type> arg_33_0 = typeof(Program).Assembly.GetTypes();
    Func<Type, bool> arg_33_1;
    if (arg_33_1 = Program.<>c.<>9__0_0 == null)
    {
        arg_33_1 = Program.<>c.<>9__0_0 = 
                        new Func<Type, bool>(Program.<>c.<>9.<Main>b__0_0);
    }
    using (IEnumerator<Type> enumerator = arg_33_0.Where(arg_33_1).GetEnumerator())
    {
        while (enumerator.MoveNext())
        {
            Console.WriteLine(enumerator.Current.FullName);
        }
    }
    Console.ReadKey();
}

[CompilerGenerated]
[Serializable]
private sealed class <>c
{
    public static readonly Program.<>c <>9;
    public static Func<Type, bool> <>9__0_0;
    static <>c()
    {
        // Note: this type is marked as 'beforefieldinit'.
        Program.<>c.<>9 = new Program.<>c();
    }
    internal bool <Main>b__0_0(Type t)
    {
        return !t.IsAbstract && t.IsClass;
    }
}

Where's with the old compiler, you'd see this:

[CompilerGenerated]
private static Func<Type, bool> CS$<>9__CachedAnonymousMethodDelegate1;

private static void Main(string[] args)
{
    IEnumerable<Type> arg_34_0 = typeof(Program).Assembly.GetTypes();
    if (Program.CS$<>9__CachedAnonymousMethodDelegate1 == null)
    {
        Program.CS$<>9__CachedAnonymousMethodDelegate1 = 
                            new Func<Type, bool>(Program.<Main>b__0);
    }
    IEnumerable<Type> types =
                arg_34_0.Where(Program.CS$<>9__CachedAnonymousMethodDelegate1);

    foreach (Type type in types)
    {
        Console.WriteLine(type.FullName);
    }
    Console.ReadKey();
}

[CompilerGenerated]
private static bool <Main>b__0(Type t)
{
    return !t.IsAbstract && t.IsClass;
}

You can get the desired result by filtering out classes that have the CompilerGenerated attribute attached to them:

var types = typeof(Program)
            .Assembly
            .GetTypes()
            .Where(t => !t.IsAbstract && 
                         t.IsClass && 
                         Attribute.GetCustomAttribute(
                            t, typeof (CompilerGeneratedAttribute)) == null);

For more, see my question Delegate caching behavior changes in Roslyn

Delegate Caching Behavior Changes in Roslyn