Performance of "Direct" Virtual Call VS. Interface Call in C#

Performance of direct virtual call vs. interface call in C#

I think the article Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects will answer your questions. In particular, see the section *Interface Vtable Map and Interface Map-, and the following section on Virtual Dispatch.

It's probably possible for the JIT compiler to figure things out and optimize the code for your simple case. But not in the general case.

IFoo f2 = GetAFoo();

And GetAFoo is defined as returning an IFoo, then the JIT compiler wouldn't be able to optimize the call.

Avoiding the overhead of C# virtual calls

You can cause the JIT to devirtualize your interface calls by using a struct with a constrained generic.

public SomeObject<TMathFunction> where TMathFunction: struct, IMathFunction 
{
private readonly TMathFunction mathFunction_;

public double SomeWork(double input, double step)
{
var f = mathFunction_.Calculate(input);
var dv = mathFunction_.Derivate(input);
return f - (dv * step);
}
}

// ...

var obj = new SomeObject<CoolMathFunction>();
obj.SomeWork(x, y);

Here are the important pieces to note:

  • The implementation of the IMathFunction interface, CoolMathFunction, is known at compile-time through a generic. This limits the applicability of this optimization quite a bit.
  • A generic parameter type TMathFunction is called directly rather than the interface IMathFunction.
  • The generic is constrained to implement IMathFunction so we can call those methods.
  • The generic is constrained to a struct -- not strictly a requirement, but to ensure we correctly exploit how the JIT generates codes for generics: the code will still run, but we won't get the optimization we want without a struct.

When generics are instantiated, codegen is different depending on the generic parameter being a class or a struct. For classes, every instantiation actually shares the same code and is done through vtables. But structs are special: they get their own instantiation that devirtualizes the interface calls into calling the struct's methods directly, avoiding any vtables and enabling inlining.

This feature exists to avoid boxing value types into reference types every time you call a generic. It avoids allocations and is a key factor in List<T> etc. being an improvement over the non-generic List etc.

Some implementation:

I made a simple implementation of IMathFunction for testing:

class SomeImplementationByRef : IMathFunction
{
public double Calculate(double input)
{
return input + input;
}

public double Derivate(double input)
{
return input * input;
}
}

... as well as a struct version and an abstract version.

So, here's what happens with the interface version. You can see it is relatively inefficient because it performs two levels of indirection:

    return obj.SomeWork(input, step);
sub esp,40h
vzeroupper
vmovaps xmmword ptr [rsp+30h],xmm6
vmovaps xmmword ptr [rsp+20h],xmm7
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov r11,7FFED7980020h ; load vtable address of the IMathFunction.Calculate function.
cmp dword ptr [rcx],ecx
call qword ptr [r11] ; call IMathFunction.Calculate function which will call the actual Calculate via vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov r11,7FFED7980028h ; load vtable address of the IMathFunction.Derivate function.
cmp dword ptr [rcx],ecx
call qword ptr [r11] ; call IMathFunction.Derivate function which will call the actual Derivate via vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7
vmovaps xmm6,xmmword ptr [rsp+30h]
vmovaps xmm7,xmmword ptr [rsp+20h]
add rsp,40h
pop rsi
ret

Here's an abstract class. It's a little more efficient but only negligibly:

        return obj.SomeWork(input, step);
sub esp,40h
vzeroupper
vmovaps xmmword ptr [rsp+30h],xmm6
vmovaps xmmword ptr [rsp+20h],xmm7
mov rsi,rcx
vmovsd qword ptr [rsp+60h],xmm2
vmovaps xmm6,xmm1
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
call qword ptr [rax+20h] ; call Calculate via offset 0x20 of vtable.
vmovaps xmm7,xmm0
mov rcx,qword ptr [rsi+8] ; load mathFunction_ into rcx.
vmovaps xmm1,xmm6
mov rax,qword ptr [rcx] ; load object type data from mathFunction_.
mov rax,qword ptr [rax+40h] ; load address of vtable into rax.
call qword ptr [rax+28h] ; call Derivate via offset 0x28 of vtable.
vmulsd xmm0,xmm0,mmword ptr [rsp+60h] ; dv * step
vsubsd xmm7,xmm7,xmm0 ; f - (dv * step)
vmovaps xmm0,xmm7
vmovaps xmm6,xmmword ptr [rsp+30h]
vmovaps xmm7,xmmword ptr [rsp+20h]
add rsp,40h
pop rsi
ret

So both an interface and an abstract class rely heavily on branch target prediction to have acceptable performance. Even then, you can see there's quite a lot more going into it, so the best-case is still relatively slow while the worst-case is a stalled pipeline due to a mispredict.

And finally here's the generic version with a struct. You can see it's massively more efficient because everything has been fully inlined so there's no branch prediction involved. It also has the nice side effect of removing most of the stack/parameter management that was in there too, so the code becomes very compact:

    return obj.SomeWork(input, step);
push rax
vzeroupper
movsx rax,byte ptr [rcx+8]
vmovaps xmm0,xmm1
vaddsd xmm0,xmm0,xmm1 ; Calculate - got inlined
vmulsd xmm1,xmm1,xmm1 ; Derivate - got inlined
vmulsd xmm1,xmm1,xmm2 ; dv * step
vsubsd xmm0,xmm0,xmm1 ; f -
add rsp,8
ret

Is there a performance penalty when returning an Interface?

It very much depends on what type of interface you return. Let's look at three simple examples:

  1. Return class object as interface: Most likely neglible impact (as shown here)
public ILoggedData GetLoggedData() => new LoggedDataClass();

  1. Return struct object as interface: Boxing occurs, a common performance bottleneck (MSDN).
public ILoggedData GetLoggedData() => new LoggedDataStruct();

  1. Return list of objects as interface: Memory pressure increases, performance can suffer greatly on hot paths. Detailed explanation here.
public IEnumerable<ILoggedData> GetLoggedData()
{
return new List<ILoggedData>() { new LoggedDataClass() };
}

The answer...

...is yes! In certain circumstances, the "type conversion" implies unwanted side-effects. The other answer does not take the inner workings of the C# compiler into consideration. E.g. this is incorrect:

The use or not of interfaces has very little to do with performance
and all about implementation decoupling an polymorphism.

The use of interfaces makes it harder and sometimes impossible for the compiler to optimize certain operations. The most common pitfall is the GetEnumerator pattern used by the foreach keyword. With a strong type - e.g. List<T> - the compiler can use the optimized enumerator whereas the same is not true for a weak type - e.g. IList<T> or IEnumerable<T>.

Every C# developer should read Lippert's blog post about this.

Performance of Expression.Compile vs Lambda, direct vs virtual calls

I sligthly modified the code of @Serge Semonov and run it on .NET Core 3.1 - it seems the performance of Expression.Compile() has changed dramatically. I have also added code that uses CSharpScript to compile lambdas from string. Note that .CompileToMethod is not available in .NET Core.

Virtual (Func<int>)Expression.Compile(): 908 ms
Direct (Func<int>)Expression.Compile(): 584 ms
Virtual (Func<IFoo, int>)Expression.Compile(): 531 ms
Direct (Func<FooImpl, int>)Expression.Compile(): 426 ms
Virtual (iFooArg) => iFooArg.Bar(): 622 ms
Direct (fooArg) => fooArg.Bar(): 478 ms
Virtual () => IFoo.Bar(): 640 ms
Direct () => FooImpl.Bar(): 477 ms
Virtual IFoo.Bar(): 431 ms
Direct Foo.Bar(): 319 ms
Virtual CSharpScript.EvaluateAsync: 799 ms
Direct CSharpScript.EvaluateAsync: 748 ms
Virtual CSharpScript.EvaluateAsync + Expression.Compile(): 586 ms
Direct CSharpScript.EvaluateAsync + Expression.Compile(): 423 ms
Virtual MethodInfo.Invoke(FooImpl, Bar): 43533 ms
Direct MethodInfo.Invoke(IFoo, Bar): 29012 ms

Code:

#define NET_FW    //if you run this on .NET Framework and not .NET Core or .NET (5+)

using System;
using System.Diagnostics;
using System.Linq.Expressions;
using System.Reflection;
using System.Reflection.Emit;
using System.Runtime.CompilerServices;
using Microsoft.CodeAnalysis.CSharp.Scripting;
using Microsoft.CodeAnalysis.Scripting;

namespace ExpressionTest
{
public interface IFoo
{
int Bar();
}

public sealed class FooImpl : IFoo
{
[MethodImpl(MethodImplOptions.NoInlining)]
public int Bar()
{
return 0;
}
}

class Program
{
static void Main(string[] args)
{
var foo = new FooImpl();
var iFoo = (IFoo)foo;

Func<int> directLambda = () => foo.Bar();
Func<int> virtualLambda = () => iFoo.Bar();
Func<FooImpl, int> directArgLambda = fooArg => fooArg.Bar();
Func<IFoo, int> virtualArgLambda = iFooArg => iFooArg.Bar();
var compiledDirectCall = CompileBar(foo, asInterfaceCall: false);
var compiledVirtualCall = CompileBar(foo, asInterfaceCall: true);
var compiledArgDirectCall = CompileBar<FooImpl>();
var compiledArgVirtualCall = CompileBar<IFoo>();
var barMethodInfo = typeof(FooImpl).GetMethod(nameof(FooImpl.Bar));
var iBarMethodInfo = typeof(IFoo).GetMethod(nameof(IFoo.Bar));
#if NET_FW
var compiledToModuleDirect = CompileToModule<FooImpl>();
var compiledToModuleVirtual = CompileToModule<IFoo>();
#endif
var compiledViaScriptDirect = CompileViaScript<FooImpl>();
var compiledViaScriptVirtual = CompileViaScript<IFoo>();
var compiledViaExprScriptDirect = CompileFromExprFromScript<FooImpl>();
var compiledViaExprScriptVirtual = CompileFromExprFromScript<IFoo>();

var iterationCount = 0;

int round = 0;
start:
if (round == 0)
{
iterationCount = 2000000;
Console.WriteLine($"Burn in");
Console.WriteLine($"Iteration count: {iterationCount:N0}");
goto doWork;
}
if (round == 1)
{
iterationCount = 200000000;
Console.WriteLine($"Iteration count: {iterationCount:N0}");
goto doWork;
}
return;

doWork:
{
var sw = Stopwatch.StartNew();
for (int i = 0; i < iterationCount; i++)
compiledVirtualCall();
var elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Virtual (Func<int>)Expression.Compile(): {elapsedMs} ms");

sw.Restart();
for (int i = 0; i < iterationCount; i++)
compiledDirectCall();
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Direct (Func<int>)Expression.Compile(): {elapsedMs} ms");

sw.Restart();
for (int i = 0; i < iterationCount; i++)
compiledArgVirtualCall(iFoo);
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Virtual (Func<IFoo, int>)Expression.Compile(): {elapsedMs} ms");

sw.Restart();
for (int i = 0; i < iterationCount; i++)
compiledArgDirectCall(foo);
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Direct (Func<FooImpl, int>)Expression.Compile(): {elapsedMs} ms");

#if NET_FW
sw.Restart();
for (int i = 0; i < iterationCount; i++)
compiledToModuleVirtual(iFoo);
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Virtual (Func<IFoo, int>)Expression.CompileToMethod(): {elapsedMs} ms");

sw.Restart();
for (int i = 0; i < iterationCount; i++)
compiledToModuleDirect(foo);
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Direct (Func<FooImpl, int>)Expression.CompileToMethod(): {elapsedMs} ms");
#endif

sw.Restart();
for (int i = 0; i < iterationCount; i++)
virtualArgLambda(iFoo);
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Virtual (iFooArg) => iFooArg.Bar(): {elapsedMs} ms");

sw.Restart();
for (int i = 0; i < iterationCount; i++)
directArgLambda(foo);
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Direct (fooArg) => fooArg.Bar(): {elapsedMs} ms");

sw.Restart();
for (int i = 0; i < iterationCount; i++)
virtualLambda();
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Virtual () => IFoo.Bar(): {elapsedMs} ms");

sw.Restart();
for (int i = 0; i < iterationCount; i++)
directLambda();
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Direct () => FooImpl.Bar(): {elapsedMs} ms");

sw.Restart();
for (int i = 0; i < iterationCount; i++)
iFoo.Bar();
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Virtual IFoo.Bar(): {elapsedMs} ms");

sw.Restart();
for (int i = 0; i < iterationCount; i++)
foo.Bar();
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Direct Foo.Bar(): {elapsedMs} ms");

sw.Restart();
for (int i = 0; i < iterationCount; i++)
compiledViaScriptVirtual(iFoo);
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Virtual CSharpScript.EvaluateAsync: {elapsedMs} ms");

sw.Restart();
for (int i = 0; i < iterationCount; i++)
compiledViaScriptDirect(foo);
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Direct CSharpScript.EvaluateAsync: {elapsedMs} ms");

sw.Restart();
for (int i = 0; i < iterationCount; i++)
compiledViaExprScriptVirtual(iFoo);
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Virtual CSharpScript.EvaluateAsync + Expression.Compile(): {elapsedMs} ms");

sw.Restart();
for (int i = 0; i < iterationCount; i++)
compiledViaExprScriptDirect(foo);
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Direct CSharpScript.EvaluateAsync + Expression.Compile(): {elapsedMs} ms");

sw.Restart();
for (int i = 0; i < iterationCount; i++)
{
int result = (int)iBarMethodInfo.Invoke(iFoo, null);
}
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Virtual MethodInfo.Invoke(FooImpl, Bar): {elapsedMs} ms");

sw.Restart();
for (int i = 0; i < iterationCount; i++)
{
int result = (int)barMethodInfo.Invoke(foo, null);
}
elapsedMs = sw.ElapsedMilliseconds;
Console.WriteLine($"Direct MethodInfo.Invoke(IFoo, Bar): {elapsedMs} ms");
}
round++;
goto start;
}

static Func<int> CompileBar(IFoo foo, bool asInterfaceCall)
{
var fooType = asInterfaceCall ? typeof(IFoo) : foo.GetType();
var methodInfo = fooType.GetMethod(nameof(IFoo.Bar));
var instance = Expression.Constant(foo, fooType);
var call = Expression.Call(instance, methodInfo);
var lambda = Expression.Lambda(call);
var compiledFunction = (Func<int>)lambda.Compile();
return compiledFunction;
}

static Func<TInput, int> CompileBar<TInput>()
{
var fooType = typeof(TInput);
var methodInfo = fooType.GetMethod(nameof(IFoo.Bar));
var instance = Expression.Parameter(fooType, "foo");
var call = Expression.Call(instance, methodInfo);
var lambda = Expression.Lambda(call, instance);
var compiledFunction = (Func<TInput, int>)lambda.Compile();
return compiledFunction;
}

#if NET_FW
static Func<TInput, int> CompileToModule<TInput>()
{
var fooType = typeof(TInput);
var methodInfo = fooType.GetMethod(nameof(IFoo.Bar));
var instance = Expression.Parameter(fooType, "foo");
var call = Expression.Call(instance, methodInfo);
var lambda = Expression.Lambda(call, instance);

var asmName = new AssemblyName(fooType.Name);
var asmBuilder = AssemblyBuilder.DefineDynamicAssembly(asmName, AssemblyBuilderAccess.Run);
var moduleBuilder = asmBuilder.DefineDynamicModule(fooType.Name);
var typeBuilder = moduleBuilder.DefineType(fooType.Name, TypeAttributes.Public);
var methodBuilder = typeBuilder.DefineMethod(nameof(IFoo.Bar), MethodAttributes.Static, typeof(int), new[] { fooType });
Expression.Lambda<Action>(lambda).CompileToMethod(methodBuilder);
var createdType = typeBuilder.CreateType();

var mi = createdType.GetMethods(BindingFlags.NonPublic | BindingFlags.Static)[1];
var func = Delegate.CreateDelegate(typeof(Func<TInput, int>), mi);
return (Func<TInput, int>)func;
}
#endif

static Func<TInput, int> CompileViaScript<TInput>()
{
ScriptOptions scriptOptions = ScriptOptions.Default;

//Add reference to mscorlib
var mscorlib = typeof(System.Object).Assembly;
var systemCore = typeof(System.Func<>).Assembly;
var thisAssembly = typeof(IFoo).Assembly;
scriptOptions = scriptOptions.AddReferences(mscorlib, systemCore, thisAssembly);

var result = CSharpScript.EvaluateAsync<Func<TInput, int>>("it => it.Bar()", options: scriptOptions).Result;
return result;
}
static Func<TInput, int> CompileFromExprFromScript<TInput>()
{
ScriptOptions scriptOptions = ScriptOptions.Default;

//Add reference to mscorlib
var mscorlib = typeof(System.Object).Assembly;
var systemCore = typeof(System.Func<>).Assembly;
var thisAssembly = typeof(IFoo).Assembly;
scriptOptions = scriptOptions.AddReferences(mscorlib, systemCore, thisAssembly);

var result = CSharpScript.EvaluateAsync<Expression<Func<TInput, int>>>("it => it.Bar()", options: scriptOptions).Result;
var compiledFunction = result.Compile();
return compiledFunction;
}
}
}

How to use CSharpScript:

https://joshvarty.com/2015/10/15/learn-roslyn-now-part-14-intro-to-the-scripting-api/

https://www.strathweb.com/2018/01/easy-way-to-create-a-c-lambda-expression-from-a-string-with-roslyn/

C# generics performance vs interface

I want to know why using the generic method B has completely no advantage over using interface in both x86 and x64 modes (like C++ templates vs virtual calls).

CLR generics are not C++ templates.

Templates are basically a search-and-replace mechanism; if you have ten instantiations of a template then ten copies of the source code are generated and all compiled and optimized. This trades off improved optimizations at compile time against increased compile time and increased binary size.

Generics, by contrast, are compiled once to IL by the C# compiler, and then code is generated for each instantiation of the generic by the jitter. However, as an implementation detail, all instantiations that give reference types for the type arguments use the same generated code. So if you have a method C<T>.M(T t), and it is called with T being both string and IList, then the x86 (or whatever) code is generated once and used for both cases.

Therefore there is no getting around any penalty imposed by virtual function invocations or interface invocations. (Which use similar but somewhat different mechanisms.) If, say T.ToString() is called inside the method, then the jitter does not say "oh, I happen to know that if T is string then ToString is an identity; I will elide the virtual function call", or inline the body, or any such thing.

This optimization trades off decreased jit time and smaller memory usage for slightly slower invocations.

If that performance tradeoff is not the one you want, then don't use generics, interfaces or virtual function calls.

Performance of calling delegates vs methods

I haven't seen that effect - I've certainly never encountered it being a bottleneck.

Here's a very rough-and-ready benchmark which shows (on my box anyway) delegates actually being faster than interfaces:

using System;
using System.Diagnostics;

interface IFoo
{
int Foo(int x);
}

class Program : IFoo
{
const int Iterations = 1000000000;

public int Foo(int x)
{
return x * 3;
}

static void Main(string[] args)
{
int x = 3;
IFoo ifoo = new Program();
Func<int, int> del = ifoo.Foo;
// Make sure everything's JITted:
ifoo.Foo(3);
del(3);

Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
x = ifoo.Foo(x);
}
sw.Stop();
Console.WriteLine("Interface: {0}", sw.ElapsedMilliseconds);

x = 3;
sw = Stopwatch.StartNew();
for (int i = 0; i < Iterations; i++)
{
x = del(x);
}
sw.Stop();
Console.WriteLine("Delegate: {0}", sw.ElapsedMilliseconds);
}
}

Results (.NET 3.5; .NET 4.0b2 is about the same):

Interface: 5068
Delegate: 4404

Now I don't have particular faith that that means delegates are really faster than interfaces... but it makes me fairly convinced that they're not an order of magnitude slower. Additionally, this is doing almost nothing within the delegate/interface method. Obviously the invocation cost is going to make less and less difference as you do more and more work per call.

One thing to be careful of is that you're not creating a new delegate several times where you'd only use a single interface instance. This could cause an issue as it would provoke garbage collection etc. If you're using an instance method as a delegate within a loop, you will find it more efficient to declare the delegate variable outside the loop, create a single delegate instance and reuse it. For example:

Func<int, int> del = myInstance.MyMethod;
for (int i = 0; i < 100000; i++)
{
MethodTakingFunc(del);
}

is more efficient than:

for (int i = 0; i < 100000; i++)
{
MethodTakingFunc(myInstance.MyMethod);
}

Could this have been the problem you were seeing?

What are the performance implications of marking methods / properties as virtual?

Virtual functions only have a very small performance overhead compared to direct calls. At a low level, you're basically looking at an array lookup to get a function pointer, and then a call via a function pointer. Modern CPUs can even predict indirect function calls reasonably well in their branch predictors, so they generally won't hurt modern CPU pipelines too badly. At the assembly level, a virtual function call translates to something like the following, where I is an arbitrary immediate value.

MOV EAX, [EBP + I] ; Move pointer to class instance into register
MOV EBX, [EAX] ; Move vtbl pointer into register.
CALL [EBX + I] ; Call function

Vs. the following for a direct function call:

CALL I  ;  Call function directly

The real overhead comes in that virtual functions can't be inlined, for the most part. (They can be in JIT languages if the VM realizes they're always going to the same address anyhow.) Besides the speedup you get from inlining itself, inlining enables several other optimizations such as constant folding, because the caller can know how the callee works internally. For functions that are large enough not to be inlined anyhow, the performance hit will likely be negligible. For very small functions that might be inlined, that's when you need to be careful about virtual functions.

Edit: Another thing to keep in mind is that all programs require flow control, and this is never free. What would replace your virtual function? A switch statement? A series of if statements? These are still branches that may be unpredictable. Furthermore, given an N-way branch, a series of if statements will find the proper path in O(N), while a virtual function will find it in O(1). The switch statement may be O(N) or O(1) depending on whether it is optimized to a jump table.



Related Topics



Leave a reply



Submit