How performant is StackFrame?
edit: Some background
We have a similar feature which is disabled 99% of the time; we were using an approach like:
public void DoSomething()
{
TraceCall(MethodBase.GetCurrentMethod().Name);
// Do Something
}
public void TraceCall(string methodName)
{
if (!loggingEnabled) { return; }
// Log...
}
TraceCall(MethodBase.GetCurrentMethod().Name)
It was simple, but regardless of whether or not tracing was enabled we were incurring the performance hit of using Reflection to lookup the method name.
Our options were to either require more code in every method (and risk simple mistakes or refusal) or to switch to using StackFrame
to determine the calling method only when logging was enabled.
Option A:
public void DoSomething()
{
if (loggingEnabled)
{
TraceCall(MethodBase.GetCurrentMethod().Name);
}
// Do Something
}
public void TraceCall(string methodName)
{
if (!loggingEnabled) { return; }
// Log...
}
Option B:
public void DoSomething()
{
TraceCall();
// Do Something
}
public void TraceCall()
{
if (!loggingEnabled) { return; }
StackFrame stackFrame = new StackFrame(1);
// Log...
}
We opted for Option B. It offers significant performance improvements over Option A when logging is disabled, 99% of the time and is very simple to implement.
Here's an alteration of Michael's code, to display the cost / benefit of this approach
using System;
using System.Diagnostics;
using System.Reflection;
namespace ConsoleApplication
{
class Program
{
static bool traceCalls;
static void Main(string[] args)
{
Stopwatch sw;
// warm up
for (int i = 0; i < 100000; i++)
{
TraceCall();
}
// call 100K times, tracing *disabled*, passing method name
sw = Stopwatch.StartNew();
traceCalls = false;
for (int i = 0; i < 100000; i++)
{
TraceCall(MethodBase.GetCurrentMethod());
}
sw.Stop();
Console.WriteLine("Tracing Disabled, passing Method Name: {0}ms"
, sw.ElapsedMilliseconds);
// call 100K times, tracing *enabled*, passing method name
sw = Stopwatch.StartNew();
traceCalls = true;
for (int i = 0; i < 100000; i++)
{
TraceCall(MethodBase.GetCurrentMethod());
}
sw.Stop();
Console.WriteLine("Tracing Enabled, passing Method Name: {0}ms"
, sw.ElapsedMilliseconds);
// call 100K times, tracing *disabled*, determining method name
sw = Stopwatch.StartNew();
traceCalls = false;
for (int i = 0; i < 100000; i++)
{
TraceCall();
}
Console.WriteLine("Tracing Disabled, looking up Method Name: {0}ms"
, sw.ElapsedMilliseconds);
// call 100K times, tracing *enabled*, determining method name
sw = Stopwatch.StartNew();
traceCalls = true;
for (int i = 0; i < 100000; i++)
{
TraceCall();
}
Console.WriteLine("Tracing Enabled, looking up Method Name: {0}ms"
, sw.ElapsedMilliseconds);
Console.ReadKey();
}
private static void TraceCall()
{
if (traceCalls)
{
StackFrame stackFrame = new StackFrame(1);
TraceCall(stackFrame.GetMethod().Name);
}
}
private static void TraceCall(MethodBase method)
{
if (traceCalls)
{
TraceCall(method.Name);
}
}
private static void TraceCall(string methodName)
{
// Write to log
}
}
}
The Results:
Tracing Disabled, passing Method Name: 294ms
Tracing Enabled, passing Method Name: 298ms
Tracing Disabled, looking up Method Name: 0ms
Tracing Enabled, looking up Method Name: 1230ms
Performance impact due to StackTrace constructor and getting method name
As of C# 5, it would definitely be better to get the compiler to bake this into the call site instead, using [CallerMemberName]
public void Log(string message, [CallerMemberName] caller = null)
{
}
Then:
public void DoSomething()
{
logger.Log("A message");
}
... is converted by the C# compiler into
public void DoSomething()
{
logger.Log("A message", "DoSomething");
}
C# stack frames, curly brackets and performance
When I write this:
int x = 4;
int y = 3;
Console.WriteLine(x + y);
Console.WriteLine(x);
Or this:
int x = 4;
{
int y = 3;
Console.WriteLine(x + y);
}
Console.WriteLine(x);
I still get this in the IL:
.maxstack 2
.locals init (
[0] int32 x,
[1] int32 y
)
There's not an extra stack frame.
Is it cheaper to get a specific StackFrame instead of StackTrace.GetFrame?
See recommendations for the compilation the correct benchmark. You should use prime number of iterations (for suppress JIT Loop unwinding optimization), run benchmark in Release mode without debugging, use cache warmup, etc.
I added your example in BenchmarkDotNet, look to StackFrameProgram.cs:
public class StackFrameProgram
{
private const int IterationCount = 100001;
public void Run()
{
var competition = new BenchmarkCompetition();
competition.AddTask("StackFrame", () => StackFrame());
competition.AddTask("StackTrace", () => StackTrace());
competition.Run();
}
private StackFrame StackFrame()
{
StackFrame method = null;
for (int i = 0; i < IterationCount; i++)
method = new StackFrame(1, false);
return method;
}
private StackFrame StackTrace()
{
StackFrame method = null;
for (int i = 0; i < IterationCount; i++)
method = new StackTrace().GetFrame(1);
return method;
}
}
There is my result (Intel Core i7-3632QM CPU 2.20GHz):
x86, .NET 3.5:
StackFrame : 1035ms
StackTrace : 1619ms
x64, .NET 3.5:
StackFrame : 981ms
StackTrace : 1754ms
x86, .NET 4.0:
StackFrame : 735ms
StackTrace : 1150ms
x64, .NET 4.0:
StackFrame : 637ms
StackTrace : 880ms
Let's look inside:
public StackFrame.ctor(int skipFrames, bool fNeedFileInfo)
{
this.InitMembers();
this.BuildStackFrame(skipFrames, fNeedFileInfo);
}
private void StackFrame.BuildStackFrame(int skipFrames, bool fNeedFileInfo)
{
StackFrameHelper sfh = new StackFrameHelper(fNeedFileInfo, null);
StackTrace.GetStackFramesInternal(sfh, 0, null);
int numberOfFrames = sfh.GetNumberOfFrames();
skipFrames += StackTrace.CalculateFramesToSkip(sfh, numberOfFrames);
if ((numberOfFrames - skipFrames) > 0)
{
this.method = sfh.GetMethodBase(skipFrames);
this.offset = sfh.GetOffset(skipFrames);
this.ILOffset = sfh.GetILOffset(skipFrames);
if (fNeedFileInfo)
{
this.strFileName = sfh.GetFilename(skipFrames);
this.iLineNumber = sfh.GetLineNumber(skipFrames);
this.iColumnNumber = sfh.GetColumnNumber(skipFrames);
}
}
}
public StackTrace.ctor()
{
this.m_iNumOfFrames = 0;
this.m_iMethodsToSkip = 0;
this.CaptureStackTrace(0, false, null, null);
}
private void StackTrace.CaptureStackTrace(int iSkip, bool fNeedFileInfo, Thread targetThread, Exception e)
{
this.m_iMethodsToSkip += iSkip;
StackFrameHelper sfh = new StackFrameHelper(fNeedFileInfo, targetThread);
GetStackFramesInternal(sfh, 0, e);
this.m_iNumOfFrames = sfh.GetNumberOfFrames();
if (this.m_iMethodsToSkip > this.m_iNumOfFrames)
{
this.m_iMethodsToSkip = this.m_iNumOfFrames;
}
if (this.m_iNumOfFrames != 0)
{
this.frames = new StackFrame[this.m_iNumOfFrames];
for (int i = 0; i < this.m_iNumOfFrames; i++)
{
bool flag = true;
bool flag2 = true;
StackFrame frame = new StackFrame(flag, flag2);
frame.SetMethodBase(sfh.GetMethodBase(i));
frame.SetOffset(sfh.GetOffset(i));
frame.SetILOffset(sfh.GetILOffset(i));
frame.SetIsLastFrameFromForeignExceptionStackTrace(sfh.IsLastFrameFromForeignExceptionStackTrace(i));
if (fNeedFileInfo)
{
frame.SetFileName(sfh.GetFilename(i));
frame.SetLineNumber(sfh.GetLineNumber(i));
frame.SetColumnNumber(sfh.GetColumnNumber(i));
}
this.frames[i] = frame;
}
if (e == null)
{
this.m_iMethodsToSkip += CalculateFramesToSkip(sfh, this.m_iNumOfFrames);
}
this.m_iNumOfFrames -= this.m_iMethodsToSkip;
if (this.m_iNumOfFrames < 0)
{
this.m_iNumOfFrames = 0;
}
}
else
{
this.frames = null;
}
}
public virtual StackFrame StackTrace.GetFrame(int index)
{
if (((this.frames != null) && (index < this.m_iNumOfFrames)) && (index >= 0))
{
return this.frames[index + this.m_iMethodsToSkip];
}
return null;
}
Is the stack frame of a function cleared, or is it left as such on the stack, when we return from the function
All data remains in memory. The only parameters changed are stack top pointer (esp) and stack frame base (ebp). Stack frame is a somewhat abstract concept, it is only defined by the values of these two registers. Hence the region above esp is "not visible" even though it may contain data left from previous stack frames
Understanding fastcall stack frame
push rbp ;save non-volatile rbp
mov rbp, rsp ;save old stack
sub rsp, 32 ;reserve space for 32 bytes of local variables = 8 integers
;or 4 pointers.
;this is per the MS/Intel guides. You can use this as temp
;storage for the parameters or for local variables.
and spl, -16 ;align stack by 16 bytes (for sse code)
mov rsp, rbp ;restore the old stack
pop rbp ;restore rbp
How does subtracting 32 from RSP do anything at all
RSP
is the stack pointer, not just another register. Doing anything to it affects the stack. In this case it reserves 8x4 = 32 bytes of space on the stack for local variables to be placed in.
What is SPL and why does masking it make something 16 byte aligned?
The and rsp,-16
forces the four LSB's to zero. And because the stack grows down this aligns it by 16 bytes.
The alignment by 16 bytes is needed when using SSE code, which x64 uses for floating point math. Having 16 byte alignment allows the compiler to use the faster aligned SSE load and store instructions.SPL
is the lower 8 bits of RSP
. Why the compiler chooses to do this makes no sense.
Both instructions are 4 bytes and and rsp,-16
is strictly better, because it does not invoke partial register updates.
Disassembly:
0: 40 80 e4 f0 and spl,-16 ;bad! partial register update.
4: 48 83 e4 f0 and rsp,-16 ;good
8: 83 e4 f0 and esp,-16 ;not possible will zero upper 32 bits of rsp
[RSP is] just another register, right?
No, RSP is magically special.
It points to the stack, which is where PUSH
and POP
instructions act upon.
All local variables and parameters (which do not fit into the registers) are stored in the stack.
Understanding fastcall
There is only one calling convention in X64. To make matters more confusing if you specify a calling convention other than __fastcall
most compiler will remap it to __fastcall
on X64.
Access stacktraces with good performance?
Thread.currentThread().getStackTrace()
is basically the same thing as @apangin points out, but it could be faster in future if it avoided creating a Throwable.
However, you might find sub-sampling will give you the improvement you need. Instead of recording every access, record every N-th access. If you record every 10th access you can reduce the overhead by up to 90% and if you have a high number of accesses it will be almost as accurate.
Another option is to use a profiler like YourKit which can do this much more efficiently. This can show you how much different callers to a method and their stack traces (typically it records every 10th)
Related Topics
Getting Content/Message from Httpresponsemessage
How to Get the Position of Textbox That Has Been Pressed
Checking If an Object Is a Number in C#
How to Maximize the Browser Window in Selenium Webdriver (Selenium 2) Using C#
Resize Wpf Window and Contents Depening on Screen Resolution
Cannot Convert Lambda Expression to Type 'String' Because It Is Not a Delegate Type
Converting an Int[] to Byte[] in C#
How to Get Next (Or Previous) Enum Value in C#
ASP.NET Core Identity - Get Current User
How to Check If a Class Inherits Another Class Without Instantiating It
Rsa Encryption, Getting Bad Length
How to Find Reason of Failed Build Without Any Error or Warning
How to Use Linq Distinct() with Multiple Fields
Ienumerable VS Ireadonlycollection VS Readonlycollection for Exposing a List Member
How to Keep Single SQL Server Connection Instance Open for Multiple Request in C#
Finding Property Differences Between Two C# Objects
Wait for Response from the Serial Port and Then Send Next Data