How to Log Stack Frames with Windows X64

How to Log Stack Frames with Windows x64

I finally found a reliable way to log the stack frames in x64, using the Windows function CaptureStackBackTrace(). As I did not want to update my SDK, I call it via GetProcAddress(LoadLibrary());

   typedef USHORT (WINAPI *CaptureStackBackTraceType)(__in ULONG, __in ULONG, __out PVOID*, __out_opt PULONG);
CaptureStackBackTraceType func = (CaptureStackBackTraceType)(GetProcAddress(LoadLibrary("kernel32.dll"), "RtlCaptureStackBackTrace"));

if(func == NULL)
return; // WOE 29.SEP.2010

// Quote from Microsoft Documentation:
// ## Windows Server 2003 and Windows XP:
// ## The sum of the FramesToSkip and FramesToCapture parameters must be less than 63.
const int kMaxCallers = 62;

void* callers[kMaxCallers];
int count = (func)(0, kMaxCallers, callers, NULL);
for(i = 0; i < count; i++)
printf(TraceFile, "*** %d called from %016I64LX\n", i, callers[i]);

Where, and why, is the x64 frame pointer supposed to point? (Windows x64 ABI)

The diagram makes it quite clear that the frame pointer points to the bottom of the fixed portion of the local stack frame. The "fixed portion" is the part whose size does not change and whose location is fixed relative to the initial stack pointer. In the diagram it is labelled "Local variables and saved nonvolatile registers."[1]

The precise location of the frame pointer doesn't matter to the operating system because from an information-theoretical point of view, local variables are indistinguishable from memory allocated by alloca immediately upon entry to a function.

void function1()
{
int a;
int *b = (int*)alloca(sizeof(int));
...
}

void function2()
{
int& a = *(int*)alloca(sizeof(int));
int *b = (int*)alloca(sizeof(int));
...
}

The operating system has no way of distinguishing between these two functions. They both store a on the stack directly below the nonvolatile registers.

This equivalence is why the diagram says "generally". In practice, compilers point it where indicated, but in theory they could point it anywhere inside the local frame, as long as the distance from the frame pointer to the return address is a constant.

The function needs to inform the operating system where the frame pointer is so that the stack can be unwound during exception handling. Without this information, it would not be possible to walk the stack because the frame is variable-sized.

[1] You can infer this from the fact that the text says that the frame pointer points to "the base of the fixed part of the stack" and the diagram says "The frame pointer will generally point here", and it's pointing at the base of the local variables and saved nonvolatile registers. Assuming the text and diagram are in agreement, this implies that the fixed part of the stack is the same as the local variables and saved nonvolatile registers. This is the same sort of inference you make every day without even realizing it. For example, if a story says

Sally called out to her brother. "Billy, where are you?"

You can infer that Billy is Sally's brother.

Stack Walker for x64 Windows

What you want to do is unroll the stack. Rather than fix that ugly mess I'll just tell you the general principles involved. On x86 and x86_64 the ebp/rsp and esp/rsp registers form an implicit linked list of memory locations. Each esp/rsp points to the top of the current stack frame, and each ebp/rbp points to the bottom of the previous stack frame. Armed with this knowledge, it's fairly trivial to walk through the frames.

Stack frame creation in 64 bit machine

On x86-64 the standard way of passing arguments is through the use of registers, not the stack (unless you got more than 6). See http://www.x86-64.org/documentation/abi.pdf

I highly recommend not doing any kind of experimentation without reading proper documents first (like the one I just linked).

Anyway, you could easily see the arguments were not passed on the stack if you disassembled main:

   0x0000000000400509 <+0>: push   %rbp
0x000000000040050a <+1>: mov %rsp,%rbp
0x000000000040050d <+4>: mov $0x4,%ecx
0x0000000000400512 <+9>: mov $0x3,%edx
0x0000000000400517 <+14>: mov $0x2,%esi
0x000000000040051c <+19>: mov $0x1,%edi
0x0000000000400521 <+24>: callq 0x4004f0 <test>
0x0000000000400526 <+29>: pop %rbp
0x0000000000400527 <+30>: retq

And you could also see how they end up on the stack within test:

   0x00000000004004f0 <+0>: push   %rbp
0x00000000004004f1 <+1>: mov %rsp,%rbp
0x00000000004004f4 <+4>: mov %edi,-0x14(%rbp)
0x00000000004004f7 <+7>: mov %esi,-0x18(%rbp)
0x00000000004004fa <+10>: mov %edx,-0x1c(%rbp)
0x00000000004004fd <+13>: mov %ecx,-0x20(%rbp)
0x0000000000400500 <+16>: movl $0x64,-0x4(%rbp)
0x0000000000400507 <+23>: pop %rbp
0x0000000000400508 <+24>: retq

Fast capture stack trace on windows / 64-bit / mixed mode

9-1-2015 - I've located original function which gets called by process hacker, and that one was

C:\Windows\Microsoft.NET\Framework64\v4.0.30319\mscordacwks.dll
OutOfProcessFunctionTableCallback

it's source code - which was here:
https://github.com/dotnet/coreclr/blob/master/src/debug/daccess/fntableaccess.cpp

From there I have owner of most of changes in that source code - Jan Kotas (jkotas@microsoft.com) and contacted him about this problem.

From: Jan Kotas <jkotas@microsoft.com>
To: Tarmo Pikaro <tapika@yahoo.com>
Sent: Friday, January 8, 2016 3:27 PM
Subject: RE: Fast capture stack trace on windows 64 bit / mixed mode...

...

The mscordacwks.dll is called mscordaccore.dll in CoreCLR / github repro. The VS project
files are auto-generated for it during the build
(\coreclr\bin\obj\Windows_NT.x64.Debug\src\dlls\mscordac\mscordaccore.vcxproj).
You should be able to build and debug CoreCLR to understand how it works.
...

From: Jan Kotas <jkotas@microsoft.com>
To: Tarmo Pikaro <tapika@yahoo.com>
Sent: Saturday, January 9, 2016 2:02 AM
Subject: RE: Fast capture stack trace on windows 64 bit / mixed mode...

> I've tried to replace
> C:\Windows\Microsoft.NET\Framework64\v4.0.30319\mscordacwks.dll dll loading
> with C:\Prototyping\dotNet\coreclr-master\bin\obj\Windows_NT.x64.Debug\src\dlls\mscordac\Debug\mscordaccore.dll
> loading (just compiled), but if previously I could get mixed mode stack trace correctly:
> ...

mscordacwks.dll is tightly coupled with the runtime. You cannot mix and match them between runtimes.
What I meant is that you can use CoreCLR to understand how this works.

But then he recommended this solution which was working for me:

int CaptureStackBackTrace3(int FramesToSkip, int nFrames, PVOID* BackTrace, PDWORD pBackTraceHash)
{
CONTEXT ContextRecord;
RtlCaptureContext(&ContextRecord);

UINT iFrame;
for (iFrame = 0; iFrame < nFrames; iFrame++)
{
DWORD64 ImageBase;
PRUNTIME_FUNCTION pFunctionEntry = RtlLookupFunctionEntry(ContextRecord.Rip, &ImageBase, NULL);

if (pFunctionEntry == NULL)
break;

PVOID HandlerData;
DWORD64 EstablisherFrame;
RtlVirtualUnwind(UNW_FLAG_NHANDLER,
ImageBase,
ContextRecord.Rip,
pFunctionEntry,
&ContextRecord,
&HandlerData,
&EstablisherFrame,
NULL);

BackTrace[iFrame] = (PVOID)ContextRecord.Rip;
}

return iFrame;
}

This code snipet still is missing backtrace hash calculation, but this is something can can be added afterwards.

It's very import also to note that when debugging this code snipet you should use native debugging, not mixed mode (C# project by default use mixed mode), because it somehow disturbs stack trace in debugger. (Something to figure out how and why such distortion happens)

There is still one missing piece of puzzle - how to make symbol resolution fully resistant to FreeLibrary / Jit code dispose, but this is something I need to figure out still.

Please note that RtlVirtualUnwind will most probably work only on 64-bit architecture, not on arm or 32-bit.

One more funny thing is that there exists function RtlCaptureStackBackTrace
which somehow resembles windows api function CaptureStackBackTrace - but they somehow differ - at least by naming. Also if you check RtlCaptureStackBackTrace - it calls eventually RtlVirtualUnwind - you can check it from Windows Research Kernel source codes

RtlCaptureStackBackTrace
>
RtlWalkFrameChain
>
RtlpWalkFrameChain
>
RtlVirtualUnwind

But what I have tested RtlCaptureStackBackTrace does not works correctly.
Unlike function RtlVirtualUnwind above.

It's a kinda magic. :-)

I'll continue this questionnaire with phase 2 question - in here:

Resolve managed and native stack trace - which API to use?

What is the 'shadow space' in x64 assembly?

The Shadow space (also sometimes called Spill space or Home space) is 32 bytes above the return address which the called function owns (and can use as scratch space), below stack args if any. The caller has to reserve space for their callee's shadow space before running a call instruction.

It is meant to be used to make debugging x64 easier.

Recall that the first 4 parameters are passed in registers. If you break into the debugger and inspect the call stack for a thread, you won't be able to see any parameters passed to functions. The values stored in registers are transient and cannot be reconstructed when moving up the call stack.

This is where the Home space comes into play: It can be used by compilers to leave a copy of the register values on the stack for later inspection in the debugger. This usually happens for unoptimized builds. When optimizations are enabled, however, compilers generally treat the Home space as available for scratch use. No copies are left on the stack, and debugging a crash dump turns into a nightmare.

Challenges of Debugging Optimized x64 Code offers in-depth information on the issue.



Related Topics



Leave a reply



Submit