What Does the "Mov Rax, Qword Ptr Fs:0X28" Assembly Instruction Do

Why does this memory address %fs:0x28 ( fs[0x28] ) have a random value?

Both the FS and GS registers can be used as base-pointer addresses in order to access special operating system data-structures. So what you're seeing is a value loaded at an offset from the value held in the FS register, and not bit manipulation of the contents of the FS register.

Specifically what's taking place, is that FS:0x28 on Linux is storing a special sentinel stack-guard value, and the code is performing a stack-guard check. For instance, if you look further in your code, you'll see that the value at FS:0x28 is stored on the stack, and then the contents of the stack are recalled and an XOR is performed with the original value at FS:0x28. If the two values are equal, which means that the zero-bit has been set because XOR'ing two of the same values results in a zero-value, then we jump to the test routine, otherwise we jump to a special function that indicates that the stack was somehow corrupted, and the sentinel value stored on the stack was changed.

If using GCC, this can be disabled with:

-fno-stack-protector

Strange pointer position in the stack

From these lines

  6   int k=1;
7 int l=1337;
8 int *p;
9 p=NULL;
10 p=&k;

It is clear that these assembler instructions correspond to

  6   int k=1;
40055d: c7 45 e8 01 00 00 00 mov DWORD PTR [rbp-0x18],0x1

7 int l=1337;
400564: c7 45 ec 39 05 00 00 mov DWORD PTR [rbp-0x14],0x539

9 p=NULL;
40056b: 48 c7 45 f0 00 00 00 mov QWORD PTR [rbp-0x10],0x0
400572: 00

10 p=&k;
400573: 48 8d 45 e8 lea rax,[rbp-0x18]
400577: 48 89 45 f0 mov QWORD PTR [rbp-0x10],rax

So the local variable p is placed at [rbp-0x10] and occupies a QWORD starting at [rbp-0x10] through [rbp-0x8] ( rbp-0x10 + 0x8 == rbp-0x8)

Why do i have this problem with breakpoints on GDB? GDB Stops

GDB stopping like this is a bug which occurs when GDB throws an error while trying to place a breakpoint, it was fixed in upstream GDB with this patch:

https://sourceware.org/ml/gdb-patches/2019-05/msg00361.html

Once you see GDB stopped like this:

[1]+ Stopped

you should be dropped back to a shell. Just resume GDB with the fg command and continue your debug session. Once GDB 9 is out this bug will be fixed.

As was pointed out in a comment the reason the breakpoint address is incorrect is that you are using a Position Independent Executable (PIE), the code will be relocated when the process starts.

Start GDB with starti, then you can disassemble main and see where the code has actually been placed.

Why copy the same value to rax that he already has?

This is unoptimized code. There are a lot of instructions here that are redundant and make very little sense, so I'm not sure why you've fixed on the particular indicated one. Consider the instructions immediately preceding it:

xor    eax,eax
lea rax,[rbp-0xc]

First, RAX is cleared (instructions that operate on the lower 32-bits of a 64-bit register implicitly clear the upper bits, so xor reg32, reg32 is equivalent and slightly more optimal than xor reg64, reg64), then RAX is loaded with a value. There was absolutely no reason to clear RAX first, so the first instruction could have been altogether elided.

In this code:

lea    rax,[rbp-0xc]
mov rdi,rax

RAX is loaded, and then its value is copied into RDI. This makes sense if you need the same value in both RAX and RDI, but you don't. The value just needs to be in RDI in preparation for the function call. (The System V AMD64 calling convention passes the first integer parameter in the RDI register.) So this could have simply been:

lea   rdi, [rbp-0xc]

but, again, this is unoptimized code. The compiler is prioritizing fast code generation and the ability to set breakpoints on individual (high-level language) statements over the generation of efficient code (which takes longer to produce and is harder to debug).

The cyclical spill-reload from the stack in get_v is another symptom of unoptimized code:

mov    QWORD PTR [rbp-0x8],rdi
mov rax,QWORD PTR [rbp-0x8]

None of this is required. It's all just busy work, a common calling card of unoptimized code. In an optimized build, or hand-written assembly, it would have been written simply as a register-to-register move, e.g.:

mov    rax, rdi

You'll see that GCC always follows the pattern you've observed in unoptimized builds. Consider this function:

void SetParam(int& a)
{
a = 0x2;
}

With -O0 (optimizations disabled), GCC emits the following:

SetParam(int&):
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
mov DWORD PTR [rax], 2
nop
pop rbp
ret

Look familiar?

Now enable optimizations, and we get the more sensible:

SetParam(int&):
mov DWORD PTR [rdi], 2
ret

Here, the store is done directly into the address passed in the RDI register. No stack frame needs to be set up or torn down. In fact, the stack is bypassed altogether. Not only is the code much simpler and easier to understand, it is also much faster.

Which serves as a lesson: when you are trying to analyze a compiler's object-code output, always enable optimization. Studying unoptimized builds is largely a waste of time, unless you are actually interested in how the compiler generates unoptimized code (e.g., because you're writing or reverse-engineering the compiler itself). Otherwise, what you care about is optimized code because it is simpler to understand and much more real-world.

Your entire get_v function could be simply:

mov   DWORD PTR [rdi], 0x2
mov eax, DWORD PTR [rdi]
ret

There's no reason to use the stack, shuffling values back and forth. There's no reason to reload the data from the address RBP-8, since we already have that value loaded into RDI.

But actually, we can do even better than this, since we are moving a constant into the address stored in RDI:

mov   DWORD PTR [rdi], 0x2
mov eax, 0x2
ret

In fact, this is exactly what GCC generates for what I imagine is your get_v function:

int get_v(int& a)
{
a = 0x2;
return a;
}

Unoptimized:

get_v(int&):
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
mov DWORD PTR [rax], 2
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
pop rbp
ret

Optimized:

get_v(int&):
mov DWORD PTR [rdi], 2
mov eax, 2
ret

How to access segment register with out linking libc.so?

Accessing a segment register is no problem, just mov eax, fs. But what you're trying to do is access thread-local storage at a small offset from the FS segment base, which libc init stuff will have asked the kernel to set up.

The simplest thing would be to just access your stack canary with a normal RIP-relative addressing mode, not relative to FS base, like GCC will do when targeting other ISAs. Only if you want to make it harder for some other exploit to reach the canary (and for its address to be separately randomizable) do you need TLS. (Or so library code can access it without the indirection of loading a pointer from the GOT, instead of only being efficient for code in the main executable.)

You can of course make the same system calls libc does to set up thread-local storage and use it, if you want to copy GCC's stack-canary code.


Fun fact: sub rax, qword fs:[0x28] is a more efficient way to check the canary than XOR - it can macro-fuse with the JCC into a single uop. That's why current GCC changed to using sub. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568 - fixed in GCC10+.

My GCC bug report actually included self-contained microbenchmark code (to prove that sub can macro-fuse even with an FS: addressing mode).

Without libc in a static executable, it sets up the FS segment so its base address is the address of a buffer so [fs: 0x28] will work. This is a basic form of TLS.

global _start
_start:

cookie equ 12345
mov eax, 158 ; __NR_arch_prctl
mov edi, 0x1002 ; ARCH_SET_FS
lea rsi, [buf]
syscall

mov qword [fs: 0x28], cookie

...

section .bss
buf: resb 4096 ; fs.base will point at this buffer

If the kernel enabled wrfsbase for user-space use, you could use wrfsbase rsi instead of making a system call. I think the most recent Linux kernel (5.10) maybe has started using wrfsbase itself, but I don't know if it enables user-space use of it.

(It probably doesn't toggle FSGSBASE on/off every time it uses it, so kernel usage would mean user-space can use it; the fault conditions in the manual don't mention privilege level, only the CPUID feature bit and a bit in the CR4 control register. And only in 64-bit mode; it will #UD in other modes including compat mode.)

How are the fs/gs registers used in Linux AMD64?

In x86-64 there are 3 TLS entries, two of them accesible via FS and GS, FS is used internally by glibc (in IA32 apparently FS is used by Wine and GS by glibc).

Glibc makes its TLS entry point to a struct pthread that contains some internal structures for threading. Glibc usually refers to a struct pthread variable as pd, presumably for pthread descriptor.

On x86-64, struct pthread starts with a tcbhead_t (this depends on the architecture, see the macros TLS_DTV_AT_TP and TLS_TCB_AT_TP). This Thread Control Block Header, AFAIU, contains some fields that are needed even when there is a single thread. The DTV is the Dynamic Thread Vector, and contains pointers to TLS blocks for DSOs loaded via dlopen(). Before or after the TCB there is a static TLS block for the executable and DSOs linked at (program's) load time. The TCB and DTV are explained pretty well in Ulrich Drepper's TLS document (look for the diagrams in chapter 3).



Related Topics



Leave a reply



Submit