Initial State of Program Registers and Stack on Linux Arm

Initial state of program registers and stack on Linux ARM

Here's what I use to get a Linux/ARM program started with my compiler:

/** The initial entry point.
 */
asm(
"       .text\n"
"       .globl  _start\n"
"       .align  2\n"
"_start:\n"
"       sub     lr, lr, lr\n"           // Clear the link register.
"       ldr     r0, [sp]\n"             // Get argc...
"       add     r1, sp, #4\n"           // ... and argv ...
"       add     r2, r1, r0, LSL #2\n"   // ... and compute environ.
"       bl      _estart\n"              // Let's go!
"       b       .\n"                    // Never gets here.
"       .size   _start, .-_start\n"
);

As you can see, I just get the argc, argv, and environ stuff from the stack at [sp].

A little clarification: The stack pointer points to a valid area in the process' memory. r0, r1, r2, and r3 are the first three parameters to the function being called. I populate them with argc, argv, and environ, respectively.

What is the default register state when program launches (asm, linux)?

This depends entirely on the ABI for each platform. Since you mention eax and ebx let's see what's the case for x86 (as of Linux v5.17.5). In fs/binfmt_elf.c, inside load_elf_binary(), the kernel checks if the ABI specifies any requirements for register values at program loading:

/*
 * The ABI may specify that certain registers be set up in special
 * ways (on i386 %edx is the address of a DT_FINI function, for
 * example.  In addition, it may also specify (eg, PowerPC64 ELF)
 * that the e_entry field is the address of the function descriptor
 * for the startup routine, rather than the address of the startup
 * routine itself.  This macro performs whatever initialization to
 * the regs structure is required as well as any relocations to the
 * function descriptor entries when executing dynamically links apps.
 */

It then calls ELF_PLAT_INIT, which is a macro defined for each architecture in arch/xxx/include/elf.h. For x86, it does the following:

#define ELF_PLAT_INIT(_r, load_addr)        \
    do {                                    \
        _r->bx = 0; _r->cx = 0; _r->dx = 0; \
        _r->si = 0; _r->di = 0; _r->bp = 0; \
        _r->ax = 0;                         \
    } while (0)

So, when your statically-linked ELF binary is loaded on Linux x86, you could count on all register values being equal to zero. Doesn't mean you should, though. :-)

Dynamic linking

Note that executing a dynamically linked binary actually runs dynamic linker code in your process before execution reaches your _start (ELF entry point). This can and does leave garbage in registers, as allowed by the ABI. Except of course for the stack pointer ESP/RSP and atexit hook EDX/RDX.

Why Cortex-M requires its first word as initial stack pointer?

The CPU does not technically require the stack pointer to run. It does, however, require the stack pointer in order to properly service interrupts and exceptions. It places certain information on the stack before servicing an interrupt so that system state can be resumed following the interrupt. You could theoretically experience an exception in the first instruction after boot, so the SP needs to be set up before it begins execution.

In addition, are you familiar with the vector table? Generally the first several addresses in a processor are reserved for the vector table anyway. The vector table contains jump addresses which the hardware references when servicing interrupts and exceptions.

Initial State of Program Registers and Stack on Linux Arm