What Is the Default Register State When Program Launches (Asm, Linux)

What is the default register state when program launches (asm, linux)?

This depends entirely on the ABI for each platform. Since you mention eax and ebx let's see what's the case for x86 (as of Linux v5.17.5). In fs/binfmt_elf.c, inside load_elf_binary(), the kernel checks if the ABI specifies any requirements for register values at program loading:

/*
 * The ABI may specify that certain registers be set up in special
 * ways (on i386 %edx is the address of a DT_FINI function, for
 * example.  In addition, it may also specify (eg, PowerPC64 ELF)
 * that the e_entry field is the address of the function descriptor
 * for the startup routine, rather than the address of the startup
 * routine itself.  This macro performs whatever initialization to
 * the regs structure is required as well as any relocations to the
 * function descriptor entries when executing dynamically links apps.
 */

It then calls ELF_PLAT_INIT, which is a macro defined for each architecture in arch/xxx/include/elf.h. For x86, it does the following:

#define ELF_PLAT_INIT(_r, load_addr)        \
    do {                                    \
        _r->bx = 0; _r->cx = 0; _r->dx = 0; \
        _r->si = 0; _r->di = 0; _r->bp = 0; \
        _r->ax = 0;                         \
    } while (0)

So, when your statically-linked ELF binary is loaded on Linux x86, you could count on all register values being equal to zero. Doesn't mean you should, though. :-)

Dynamic linking

Note that executing a dynamically linked binary actually runs dynamic linker code in your process before execution reaches your _start (ELF entry point). This can and does leave garbage in registers, as allowed by the ABI. Except of course for the stack pointer ESP/RSP and atexit hook EDX/RDX.

Initial state of program registers and stack on Linux ARM

Here's what I use to get a Linux/ARM program started with my compiler:

/** The initial entry point.
 */
asm(
"       .text\n"
"       .globl  _start\n"
"       .align  2\n"
"_start:\n"
"       sub     lr, lr, lr\n"           // Clear the link register.
"       ldr     r0, [sp]\n"             // Get argc...
"       add     r1, sp, #4\n"           // ... and argv ...
"       add     r2, r1, r0, LSL #2\n"   // ... and compute environ.
"       bl      _estart\n"              // Let's go!
"       b       .\n"                    // Never gets here.
"       .size   _start, .-_start\n"
);

As you can see, I just get the argc, argv, and environ stuff from the stack at [sp].

A little clarification: The stack pointer points to a valid area in the process' memory. r0, r1, r2, and r3 are the first three parameters to the function being called. I populate them with argc, argv, and environ, respectively.

What's the values of all the general-purpose registers, when a program starts running?

If you ask about a C program - you can't know, it isn't your business.

For assembly, I also don't think they have meaningful values.

The information needed to execute main - the argument count, argument vector and environment pointer - is all on the stack.

See more info in this Linux Gazette article.

Are there any default values for registers?

Some instructions implicitly update the registers, even if the destinations aren't listed explicitly in the code. Some examples:

cpuid returns values in eax, ebx, ecx and edx
loop decrements ecx
rep string instructions change ecx, edi and esi
rdmsr changes eax and edx
mul and div change eax and edx

And there are many other examples.

You can't assume just by seeing that eax isn't listed in the code that it's not changed.

Even assuming you know which registers are affected by which instructions, the only times you have any guarantee for a value are:

after an instruction that you know updates it
immediately after hardware reset

At any other time, you can never make assumptions on the values.

What registers are preserved through a linux x86-64 function call

Here's the complete table of registers and their use from the documentation [PDF Link]:

table from docs

r12, r13, r14, r15, rbx, rsp, rbp are the callee-saved registers - they have a "Yes" in the "Preserved across function calls" column.

x86 Linux assembler get program parameters from _start

On Linux, the familiar argc and argv variables from C are always passed on the stack by the kernel, available even to assembly programs that are completely standalone and don't link with the startup code in the C library. This is documented in the i386 System V ABI, along with other details of the process startup environment (register values, stack alignment).

At the ELF entry point (a.k.a. _start) of an x86 Linux executable:

ESP points to argc
ESP + 4 points to argv[0], the start of the array. i.e. the value you should pass to main as char **argv is lea eax, [esp+4], not mov eax, [esp+4])

How a Minimal Assembly Program Obtains argc and argv

I'll show how to read argv and argc[0] in GDB.

cmdline-x86.S

#include <sys/syscall.h>

    .global _start
_start:
    /* Cause a breakpoint trap */
    int $0x03

    /* exit_group(0) */
    mov $SYS_exit_group, %eax
    mov $0, %ebx
    int $0x80

cmdline-x86.gdb

set confirm off
file cmdline-x86
run
# We'll regain control here after the breakpoint trap
printf "argc: %d\n", *(int*)$esp
printf "argv[0]: %s\n",  ((char**)($esp + 4))[0]
quit

Sample Session

$ cc -nostdlib -g3 -m32 cmdline-x86.S -o cmdline-x86
$ gdb -q -x cmdline-x86.gdb cmdline-x86
<...>  
Program received signal SIGTRAP, Trace/breakpoint trap.
_start () at cmdline-x86.S:8
8   mov $SYS_exit_group, %eax
argc: 1
argv[0]: /home/scottt/Dropbox/stackoverflow/cmdline-x86

Explanation

I placed a software breakpoint (int $0x03) to cause the program to trap back into the debugger right after the ELF entry point (_start).
I then used printf in the GDB script to print
1. argc with the expression *(int*)$esp
2. argv with the expression ((char**)($esp + 4))[0]

x86-64 version

The differences are minimal:

Replace ESP with RSP
Change address size from 4 to 8
Conform to different Linux syscall calling conventions when we call exit_group(0) to properly terminate the process

cmdline.S

#include <sys/syscall.h>

    .global _start
_start:
    /* Cause a breakpoint trap */
    int $0x03

    /* exit_group(0) */
    mov $SYS_exit_group, %rax
    mov $0, %rdi
    syscall

cmdline.gdb

set confirm off
file cmdline
run
printf "argc: %d\n", *(int*)$rsp
printf "argv[0]: %s\n",  ((char**)($rsp + 8))[0]
quit

How Regular C Programs Obtain argc and argv

You can disassemble _start from a regular C program to see how it obtains argc and argv from the stack and passes them as it calls __libc_start_main. Using the /bin/true program on my x86-64 machine as an example:

$ gdb -q /bin/true
Reading symbols from /usr/bin/true...Reading symbols from /usr/lib/debug/usr/bin/true.debug...done.
done.
(gdb) disassemble _start
Dump of assembler code for function _start:
   0x0000000000401580 <+0>: xor    %ebp,%ebp
   0x0000000000401582 <+2>: mov    %rdx,%r9
   0x0000000000401585 <+5>: pop    %rsi
   0x0000000000401586 <+6>: mov    %rsp,%rdx
   0x0000000000401589 <+9>: and    $0xfffffffffffffff0,%rsp
   0x000000000040158d <+13>:    push   %rax
   0x000000000040158e <+14>:    push   %rsp
   0x000000000040158f <+15>:    mov    $0x404040,%r8
   0x0000000000401596 <+22>:    mov    $0x403fb0,%rcx
   0x000000000040159d <+29>:    mov    $0x4014c0,%rdi
   0x00000000004015a4 <+36>:    callq  0x401310 <__libc_start_main@plt>
   0x00000000004015a9 <+41>:    hlt    
   0x00000000004015aa <+42>:    xchg   %ax,%ax
   0x00000000004015ac <+44>:    nopl   0x0(%rax)

The first three arguments to __libc_start_main() are:

RDI: pointer to main()
RSI: argc, you can see how it was the first thing popped off the stack
RDX: argv, the value of RSP right after argc was popped. (ubp_av in the GLIBC source)

The x86 _start is very similar:

Dump of assembler code for function _start:
   0x0804842c <+0>: xor    %ebp,%ebp
   0x0804842e <+2>: pop    %esi
   0x0804842f <+3>: mov    %esp,%ecx
   0x08048431 <+5>: and    $0xfffffff0,%esp
   0x08048434 <+8>: push   %eax
   0x08048435 <+9>: push   %esp
   0x08048436 <+10>:    push   %edx
   0x08048437 <+11>:    push   $0x80485e0
   0x0804843c <+16>:    push   $0x8048570
   0x08048441 <+21>:    push   %ecx
   0x08048442 <+22>:    push   %esi
   0x08048443 <+23>:    push   $0x80483d0
   0x08048448 <+28>:    call   0x80483b0 <__libc_start_main@plt>
   0x0804844d <+33>:    hlt    
   0x0804844e <+34>:    xchg   %ax,%ax
End of assembler dump.

CPU registers state on the very start of the app. PE executables

offsets pertain to x86 xp sp3 for other os lookup the CONTEXT structure in winnt.h / ntddk.h

ctrl+g ->type ntdll.ZwContinue-> ok->F2-> restart the exe

ollydbg will break at ZwContinue -> alt+f1 to open commandline plugin type
follow [[esp+4]+b8] -> ok -> f2->f9 you will see a blank stack single step and see who fills the stack now

ZwContinue takes 2 arguments first argument is a pointer to CONTEXT structure whose memeber eip is at 0xb8 from start of Structure this eip will be BaseProcessStartThunk this is the function responsible for filling the initial Structured Exception handler and calling the Module EntryPoint

After entering _start, is rsp aligned?

The x86-64 ABI explicitly says (3.4.1 Initial Stack and Register State) :

%rsp The stack pointer holds the address of the byte with lowest
address which is part of the stack. It is guaranteed to be 16-byte
aligned at process entry.

Since _start is the first symbol that's called when a process is entered, you can be entirely sure that it is 16-byte aligned when the OS calls _start in your executable.

What Is the Default Register State When Program Launches (Asm, Linux)