What is the default register state when program launches (asm, linux)?
This depends entirely on the ABI for each platform. Since you mention eax
and ebx
let's see what's the case for x86 (as of Linux v5.17.5). In fs/binfmt_elf.c
, inside load_elf_binary()
, the kernel checks if the ABI specifies any requirements for register values at program loading:
/*
* The ABI may specify that certain registers be set up in special
* ways (on i386 %edx is the address of a DT_FINI function, for
* example. In addition, it may also specify (eg, PowerPC64 ELF)
* that the e_entry field is the address of the function descriptor
* for the startup routine, rather than the address of the startup
* routine itself. This macro performs whatever initialization to
* the regs structure is required as well as any relocations to the
* function descriptor entries when executing dynamically links apps.
*/
It then calls ELF_PLAT_INIT
, which is a macro defined for each architecture in arch/xxx/include/elf.h
. For x86, it does the following:
#define ELF_PLAT_INIT(_r, load_addr) \
do { \
_r->bx = 0; _r->cx = 0; _r->dx = 0; \
_r->si = 0; _r->di = 0; _r->bp = 0; \
_r->ax = 0; \
} while (0)
So, when your statically-linked ELF binary is loaded on Linux x86, you could count on all register values being equal to zero. Doesn't mean you should, though. :-)
Dynamic linking
Note that executing a dynamically linked binary actually runs dynamic linker code in your process before execution reaches your _start
(ELF entry point). This can and does leave garbage in registers, as allowed by the ABI. Except of course for the stack pointer ESP/RSP and atexit
hook EDX/RDX.
Initial state of program registers and stack on Linux ARM
Here's what I use to get a Linux/ARM program started with my compiler:
/** The initial entry point.
*/
asm(
" .text\n"
" .globl _start\n"
" .align 2\n"
"_start:\n"
" sub lr, lr, lr\n" // Clear the link register.
" ldr r0, [sp]\n" // Get argc...
" add r1, sp, #4\n" // ... and argv ...
" add r2, r1, r0, LSL #2\n" // ... and compute environ.
" bl _estart\n" // Let's go!
" b .\n" // Never gets here.
" .size _start, .-_start\n"
);
As you can see, I just get the argc, argv, and environ stuff from the stack at [sp].
A little clarification: The stack pointer points to a valid area in the process' memory. r0, r1, r2, and r3 are the first three parameters to the function being called. I populate them with argc, argv, and environ, respectively.
What's the values of all the general-purpose registers, when a program starts running?
If you ask about a C program - you can't know, it isn't your business.
For assembly, I also don't think they have meaningful values.
The information needed to execute main
- the argument count, argument vector and environment pointer - is all on the stack.
See more info in this Linux Gazette article.
Are there any default values for registers?
Some instructions implicitly update the registers, even if the destinations aren't listed explicitly in the code. Some examples:
cpuid
returns values in eax, ebx, ecx and edxloop
decrements ecxrep
string instructions change ecx, edi and esirdmsr
changes eax and edxmul
anddiv
change eax and edx
And there are many other examples.
You can't assume just by seeing that eax isn't listed in the code that it's not changed.
Even assuming you know which registers are affected by which instructions, the only times you have any guarantee for a value are:
- after an instruction that you know updates it
- immediately after hardware reset
At any other time, you can never make assumptions on the values.
What registers are preserved through a linux x86-64 function call
Here's the complete table of registers and their use from the documentation [PDF Link]:
r12
, r13
, r14
, r15
, rbx
, rsp
, rbp
are the callee-saved registers - they have a "Yes" in the "Preserved across function calls" column.
x86 Linux assembler get program parameters from _start
On Linux, the familiar argc
and argv
variables from C are always passed on the stack by the kernel, available even to assembly programs that are completely standalone and don't link with the startup code in the C library. This is documented in the i386 System V ABI, along with other details of the process startup environment (register values, stack alignment).
At the ELF entry point (a.k.a. _start
) of an x86 Linux executable:
- ESP points to
argc
- ESP + 4 points to
argv[0]
, the start of the array. i.e. the value you should pass to main aschar **argv
islea eax, [esp+4]
, notmov eax, [esp+4]
)
How a Minimal Assembly Program Obtains argc and argv
I'll show how to read argv
and argc[0]
in GDB.
cmdline-x86.S
#include <sys/syscall.h>
.global _start
_start:
/* Cause a breakpoint trap */
int $0x03
/* exit_group(0) */
mov $SYS_exit_group, %eax
mov $0, %ebx
int $0x80
cmdline-x86.gdb
set confirm off
file cmdline-x86
run
# We'll regain control here after the breakpoint trap
printf "argc: %d\n", *(int*)$esp
printf "argv[0]: %s\n", ((char**)($esp + 4))[0]
quit
Sample Session
$ cc -nostdlib -g3 -m32 cmdline-x86.S -o cmdline-x86
$ gdb -q -x cmdline-x86.gdb cmdline-x86
<...>
Program received signal SIGTRAP, Trace/breakpoint trap.
_start () at cmdline-x86.S:8
8 mov $SYS_exit_group, %eax
argc: 1
argv[0]: /home/scottt/Dropbox/stackoverflow/cmdline-x86
Explanation
- I placed a software breakpoint (
int $0x03
) to cause the program to trap back into the debugger right after the ELF entry point (_start
). - I then used
printf
in the GDB script to printargc
with the expression*(int*)$esp
argv
with the expression((char**)($esp + 4))[0]
x86-64 version
The differences are minimal:
- Replace ESP with RSP
- Change address size from 4 to 8
- Conform to different Linux syscall calling conventions when we call
exit_group(0)
to properly terminate the process
cmdline.S
#include <sys/syscall.h>
.global _start
_start:
/* Cause a breakpoint trap */
int $0x03
/* exit_group(0) */
mov $SYS_exit_group, %rax
mov $0, %rdi
syscall
cmdline.gdb
set confirm off
file cmdline
run
printf "argc: %d\n", *(int*)$rsp
printf "argv[0]: %s\n", ((char**)($rsp + 8))[0]
quit
How Regular C Programs Obtain argc and argv
You can disassemble _start
from a regular C program to see how it obtains argc
and argv
from the stack and passes them as it calls __libc_start_main
. Using the /bin/true
program on my x86-64 machine as an example:
$ gdb -q /bin/true
Reading symbols from /usr/bin/true...Reading symbols from /usr/lib/debug/usr/bin/true.debug...done.
done.
(gdb) disassemble _start
Dump of assembler code for function _start:
0x0000000000401580 <+0>: xor %ebp,%ebp
0x0000000000401582 <+2>: mov %rdx,%r9
0x0000000000401585 <+5>: pop %rsi
0x0000000000401586 <+6>: mov %rsp,%rdx
0x0000000000401589 <+9>: and $0xfffffffffffffff0,%rsp
0x000000000040158d <+13>: push %rax
0x000000000040158e <+14>: push %rsp
0x000000000040158f <+15>: mov $0x404040,%r8
0x0000000000401596 <+22>: mov $0x403fb0,%rcx
0x000000000040159d <+29>: mov $0x4014c0,%rdi
0x00000000004015a4 <+36>: callq 0x401310 <__libc_start_main@plt>
0x00000000004015a9 <+41>: hlt
0x00000000004015aa <+42>: xchg %ax,%ax
0x00000000004015ac <+44>: nopl 0x0(%rax)
The first three arguments to __libc_start_main()
are:
- RDI: pointer to
main()
- RSI:
argc
, you can see how it was the first thing popped off the stack - RDX:
argv
, the value of RSP right afterargc
was popped. (ubp_av
in the GLIBC source)
The x86 _start is very similar:
Dump of assembler code for function _start:
0x0804842c <+0>: xor %ebp,%ebp
0x0804842e <+2>: pop %esi
0x0804842f <+3>: mov %esp,%ecx
0x08048431 <+5>: and $0xfffffff0,%esp
0x08048434 <+8>: push %eax
0x08048435 <+9>: push %esp
0x08048436 <+10>: push %edx
0x08048437 <+11>: push $0x80485e0
0x0804843c <+16>: push $0x8048570
0x08048441 <+21>: push %ecx
0x08048442 <+22>: push %esi
0x08048443 <+23>: push $0x80483d0
0x08048448 <+28>: call 0x80483b0 <__libc_start_main@plt>
0x0804844d <+33>: hlt
0x0804844e <+34>: xchg %ax,%ax
End of assembler dump.
CPU registers state on the very start of the app. PE executables
offsets pertain to x86 xp sp3 for other os lookup the CONTEXT structure in winnt.h / ntddk.h
ctrl+g ->type ntdll.ZwContinue-> ok->F2-> restart the exe
ollydbg will break at ZwContinue -> alt+f1 to open commandline plugin type
follow [[esp+4]+b8] -> ok -> f2->f9 you will see a blank stack single step and see who fills the stack now
ZwContinue takes 2 arguments first argument is a pointer to CONTEXT structure whose memeber eip is at 0xb8 from start of Structure this eip will be BaseProcessStartThunk this is the function responsible for filling the initial Structured Exception handler and calling the Module EntryPoint
After entering _start, is rsp aligned?
The x86-64 ABI explicitly says (3.4.1 Initial Stack and Register State) :
%rsp
The stack pointer holds the address of the byte with lowest
address which is part of the stack. It is guaranteed to be 16-byte
aligned at process entry.
Since _start
is the first symbol that's called when a process is entered, you can be entirely sure that it is 16-byte aligned when the OS calls _start
in your executable.
Related Topics
Joining Multiple Fields in Text Files on Unix
Linux Removing Folders Older Than 1 Year and More Than 3 Files
Counter Increment in Bash Loop Not Working
What Is the Purpose of the "-I" and "-T" Options for the "Docker Exec" Command
Determining When Nasm Can Infer the Size of the Mov Operation
Interpreting Segfault Messages
How Stable Is S3Fs to Mount an Amazon S3 Bucket as a Local Directory
Single File Volume Mounted as Directory in Docker
Is It Ok to Use the Same Input File as Output of a Piped Command
Sorting on the Last Field of a Line
Bash Command Substitution on Remote Host
Change Path Permanently on Ubuntu
How Have Both Local and Remote Variable Inside an Ssh Command
How to Make Bash Treat Undefined Variables as Errors