Why does this memory address %fs:0x28 ( fs[0x28] ) have a random value?
Both the FS
and GS
registers can be used as base-pointer addresses in order to access special operating system data-structures. So what you're seeing is a value loaded at an offset from the value held in the FS
register, and not bit manipulation of the contents of the FS
register.
Specifically what's taking place, is that FS:0x28
on Linux is storing a special sentinel stack-guard value, and the code is performing a stack-guard check. For instance, if you look further in your code, you'll see that the value at FS:0x28
is stored on the stack, and then the contents of the stack are recalled and an XOR
is performed with the original value at FS:0x28
. If the two values are equal, which means that the zero-bit has been set because XOR
'ing two of the same values results in a zero-value, then we jump to the test
routine, otherwise we jump to a special function that indicates that the stack was somehow corrupted, and the sentinel value stored on the stack was changed.
If using GCC, this can be disabled with:
-fno-stack-protector
Strange pointer position in the stack
From these lines
6 int k=1;
7 int l=1337;
8 int *p;
9 p=NULL;
10 p=&k;
It is clear that these assembler instructions correspond to
6 int k=1;
40055d: c7 45 e8 01 00 00 00 mov DWORD PTR [rbp-0x18],0x1
7 int l=1337;
400564: c7 45 ec 39 05 00 00 mov DWORD PTR [rbp-0x14],0x539
9 p=NULL;
40056b: 48 c7 45 f0 00 00 00 mov QWORD PTR [rbp-0x10],0x0
400572: 00
10 p=&k;
400573: 48 8d 45 e8 lea rax,[rbp-0x18]
400577: 48 89 45 f0 mov QWORD PTR [rbp-0x10],rax
So the local variable p
is placed at [rbp-0x10]
and occupies a QWORD
starting at [rbp-0x10]
through [rbp-0x8]
( rbp-0x10 + 0x8 == rbp-0x8
)
Why do i have this problem with breakpoints on GDB? GDB Stops
GDB stopping like this is a bug which occurs when GDB throws an error while trying to place a breakpoint, it was fixed in upstream GDB with this patch:
https://sourceware.org/ml/gdb-patches/2019-05/msg00361.html
Once you see GDB stopped like this:
[1]+ Stopped
you should be dropped back to a shell. Just resume GDB with the fg
command and continue your debug session. Once GDB 9 is out this bug will be fixed.
As was pointed out in a comment the reason the breakpoint address is incorrect is that you are using a Position Independent Executable (PIE), the code will be relocated when the process starts.
Start GDB with starti
, then you can disassemble main and see where the code has actually been placed.
Why copy the same value to rax that he already has?
This is unoptimized code. There are a lot of instructions here that are redundant and make very little sense, so I'm not sure why you've fixed on the particular indicated one. Consider the instructions immediately preceding it:
xor eax,eax
lea rax,[rbp-0xc]
First, RAX
is cleared (instructions that operate on the lower 32-bits of a 64-bit register implicitly clear the upper bits, so xor reg32, reg32
is equivalent and slightly more optimal than xor reg64, reg64
), then RAX
is loaded with a value. There was absolutely no reason to clear RAX
first, so the first instruction could have been altogether elided.
In this code:
lea rax,[rbp-0xc]
mov rdi,rax
RAX
is loaded, and then its value is copied into RDI
. This makes sense if you need the same value in both RAX
and RDI
, but you don't. The value just needs to be in RDI
in preparation for the function call. (The System V AMD64 calling convention passes the first integer parameter in the RDI
register.) So this could have simply been:
lea rdi, [rbp-0xc]
but, again, this is unoptimized code. The compiler is prioritizing fast code generation and the ability to set breakpoints on individual (high-level language) statements over the generation of efficient code (which takes longer to produce and is harder to debug).
The cyclical spill-reload from the stack in get_v
is another symptom of unoptimized code:
mov QWORD PTR [rbp-0x8],rdi
mov rax,QWORD PTR [rbp-0x8]
None of this is required. It's all just busy work, a common calling card of unoptimized code. In an optimized build, or hand-written assembly, it would have been written simply as a register-to-register move, e.g.:
mov rax, rdi
You'll see that GCC always follows the pattern you've observed in unoptimized builds. Consider this function:
void SetParam(int& a)
{
a = 0x2;
}
With -O0
(optimizations disabled), GCC emits the following:
SetParam(int&):
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
mov DWORD PTR [rax], 2
nop
pop rbp
ret
Look familiar?
Now enable optimizations, and we get the more sensible:
SetParam(int&):
mov DWORD PTR [rdi], 2
ret
Here, the store is done directly into the address passed in the RDI
register. No stack frame needs to be set up or torn down. In fact, the stack is bypassed altogether. Not only is the code much simpler and easier to understand, it is also much faster.
Which serves as a lesson: when you are trying to analyze a compiler's object-code output, always enable optimization. Studying unoptimized builds is largely a waste of time, unless you are actually interested in how the compiler generates unoptimized code (e.g., because you're writing or reverse-engineering the compiler itself). Otherwise, what you care about is optimized code because it is simpler to understand and much more real-world.
Your entire get_v
function could be simply:
mov DWORD PTR [rdi], 0x2
mov eax, DWORD PTR [rdi]
ret
There's no reason to use the stack, shuffling values back and forth. There's no reason to reload the data from the address RBP-8
, since we already have that value loaded into RDI
.
But actually, we can do even better than this, since we are moving a constant into the address stored in RDI
:
mov DWORD PTR [rdi], 0x2
mov eax, 0x2
ret
In fact, this is exactly what GCC generates for what I imagine is your get_v
function:
int get_v(int& a)
{
a = 0x2;
return a;
}
Unoptimized:
get_v(int&):
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
mov DWORD PTR [rax], 2
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
pop rbp
ret
Optimized:
get_v(int&):
mov DWORD PTR [rdi], 2
mov eax, 2
ret
How to access segment register with out linking libc.so?
Accessing a segment register is no problem, just mov eax, fs
. But what you're trying to do is access thread-local storage at a small offset from the FS segment base, which libc init stuff will have asked the kernel to set up.
The simplest thing would be to just access your stack canary with a normal RIP-relative addressing mode, not relative to FS base, like GCC will do when targeting other ISAs. Only if you want to make it harder for some other exploit to reach the canary (and for its address to be separately randomizable) do you need TLS. (Or so library code can access it without the indirection of loading a pointer from the GOT, instead of only being efficient for code in the main executable.)
You can of course make the same system calls libc does to set up thread-local storage and use it, if you want to copy GCC's stack-canary code.
Fun fact: sub rax, qword fs:[0x28]
is a more efficient way to check the canary than XOR - it can macro-fuse with the JCC into a single uop. That's why current GCC changed to using sub
. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568 - fixed in GCC10+.
My GCC bug report actually included self-contained microbenchmark code (to prove that sub
can macro-fuse even with an FS: addressing mode).
Without libc in a static executable, it sets up the FS segment so its base address is the address of a buffer so [fs: 0x28]
will work. This is a basic form of TLS.
global _start
_start:
cookie equ 12345
mov eax, 158 ; __NR_arch_prctl
mov edi, 0x1002 ; ARCH_SET_FS
lea rsi, [buf]
syscall
mov qword [fs: 0x28], cookie
...
section .bss
buf: resb 4096 ; fs.base will point at this buffer
If the kernel enabled wrfsbase
for user-space use, you could use wrfsbase rsi
instead of making a system call. I think the most recent Linux kernel (5.10) maybe has started using wrfsbase
itself, but I don't know if it enables user-space use of it.
(It probably doesn't toggle FSGSBASE on/off every time it uses it, so kernel usage would mean user-space can use it; the fault conditions in the manual don't mention privilege level, only the CPUID feature bit and a bit in the CR4 control register. And only in 64-bit mode; it will #UD in other modes including compat mode.)
How are the fs/gs registers used in Linux AMD64?
In x86-64 there are 3 TLS entries, two of them accesible via FS and GS, FS is used internally by glibc (in IA32 apparently FS is used by Wine and GS by glibc).
Glibc makes its TLS entry point to a struct pthread
that contains some internal structures for threading. Glibc usually refers to a struct pthread
variable as pd
, presumably for pthread descriptor.
On x86-64, struct pthread
starts with a tcbhead_t
(this depends on the architecture, see the macros TLS_DTV_AT_TP
and TLS_TCB_AT_TP
). This Thread Control Block Header, AFAIU, contains some fields that are needed even when there is a single thread. The DTV is the Dynamic Thread Vector, and contains pointers to TLS blocks for DSOs loaded via dlopen()
. Before or after the TCB there is a static TLS block for the executable and DSOs linked at (program's) load time. The TCB and DTV are explained pretty well in Ulrich Drepper's TLS document (look for the diagrams in chapter 3).
Related Topics
Sending Keycode to Xorg + Wine with Bash Script
How to Timeout a Group of Commands in Bash
Bash Alias Create File with Current Timestamp in Filename
Loop Over File Names from 'Find'
Run a Command Conditionally with Netcat and Grep
Installing Octave Package in Ubuntu
How Does This Canonical Flock Example Work
How to Cross Compile R Packages for MACos from a Linux Environment
Argument List Too Long When Concatenating Lots of Files in a Folder
How to Add an User and Re Set the Root User in Yocto
Chef Chef-Validator.Pem Security
How to Reserve Virtual Memory in Linux
Coqide 8.5: No Syntax Highlighting on Linux
Arm Assembly "Retne" Instruction