X86 Linux Assembler Get Program Parameters from _Start

x86 Linux assembler get program parameters from _start

On Linux, the familiar argc and argv variables from C are always passed on the stack by the kernel, available even to assembly programs that are completely standalone and don't link with the startup code in the C library. This is documented in the i386 System V ABI, along with other details of the process startup environment (register values, stack alignment).

At the ELF entry point (a.k.a. _start) of an x86 Linux executable:

  1. ESP points to argc
  2. ESP + 4 points to argv[0], the start of the array. i.e. the value you should pass to main as char **argv is lea eax, [esp+4], not mov eax, [esp+4])

How a Minimal Assembly Program Obtains argc and argv

I'll show how to read argv and argc[0] in GDB.

cmdline-x86.S

#include <sys/syscall.h>

.global _start
_start:
/* Cause a breakpoint trap */
int $0x03

/* exit_group(0) */
mov $SYS_exit_group, %eax
mov $0, %ebx
int $0x80

cmdline-x86.gdb

set confirm off
file cmdline-x86
run
# We'll regain control here after the breakpoint trap
printf "argc: %d\n", *(int*)$esp
printf "argv[0]: %s\n", ((char**)($esp + 4))[0]
quit

Sample Session

$ cc -nostdlib -g3 -m32 cmdline-x86.S -o cmdline-x86
$ gdb -q -x cmdline-x86.gdb cmdline-x86
<...>
Program received signal SIGTRAP, Trace/breakpoint trap.
_start () at cmdline-x86.S:8
8 mov $SYS_exit_group, %eax
argc: 1
argv[0]: /home/scottt/Dropbox/stackoverflow/cmdline-x86

Explanation

  • I placed a software breakpoint (int $0x03) to cause the program to trap back into the debugger right after the ELF entry point (_start).
  • I then used printf in the GDB script to print

    1. argc with the expression *(int*)$esp
    2. argv with the expression ((char**)($esp + 4))[0]

x86-64 version

The differences are minimal:

  • Replace ESP with RSP
  • Change address size from 4 to 8
  • Conform to different Linux syscall calling conventions when we call exit_group(0) to properly terminate the process

cmdline.S

#include <sys/syscall.h>

.global _start
_start:
/* Cause a breakpoint trap */
int $0x03

/* exit_group(0) */
mov $SYS_exit_group, %rax
mov $0, %rdi
syscall

cmdline.gdb

set confirm off
file cmdline
run
printf "argc: %d\n", *(int*)$rsp
printf "argv[0]: %s\n", ((char**)($rsp + 8))[0]
quit

How Regular C Programs Obtain argc and argv

You can disassemble _start from a regular C program to see how it obtains argc and argv from the stack and passes them as it calls __libc_start_main. Using the /bin/true program on my x86-64 machine as an example:

$ gdb -q /bin/true
Reading symbols from /usr/bin/true...Reading symbols from /usr/lib/debug/usr/bin/true.debug...done.
done.
(gdb) disassemble _start
Dump of assembler code for function _start:
0x0000000000401580 <+0>: xor %ebp,%ebp
0x0000000000401582 <+2>: mov %rdx,%r9
0x0000000000401585 <+5>: pop %rsi
0x0000000000401586 <+6>: mov %rsp,%rdx
0x0000000000401589 <+9>: and $0xfffffffffffffff0,%rsp
0x000000000040158d <+13>: push %rax
0x000000000040158e <+14>: push %rsp
0x000000000040158f <+15>: mov $0x404040,%r8
0x0000000000401596 <+22>: mov $0x403fb0,%rcx
0x000000000040159d <+29>: mov $0x4014c0,%rdi
0x00000000004015a4 <+36>: callq 0x401310 <__libc_start_main@plt>
0x00000000004015a9 <+41>: hlt
0x00000000004015aa <+42>: xchg %ax,%ax
0x00000000004015ac <+44>: nopl 0x0(%rax)

The first three arguments to __libc_start_main() are:

  1. RDI: pointer to main()
  2. RSI: argc, you can see how it was the first thing popped off the stack
  3. RDX: argv, the value of RSP right after argc was popped. (ubp_av in the GLIBC source)

The x86 _start is very similar:

Dump of assembler code for function _start:
0x0804842c <+0>: xor %ebp,%ebp
0x0804842e <+2>: pop %esi
0x0804842f <+3>: mov %esp,%ecx
0x08048431 <+5>: and $0xfffffff0,%esp
0x08048434 <+8>: push %eax
0x08048435 <+9>: push %esp
0x08048436 <+10>: push %edx
0x08048437 <+11>: push $0x80485e0
0x0804843c <+16>: push $0x8048570
0x08048441 <+21>: push %ecx
0x08048442 <+22>: push %esi
0x08048443 <+23>: push $0x80483d0
0x08048448 <+28>: call 0x80483b0 <__libc_start_main@plt>
0x0804844d <+33>: hlt
0x0804844e <+34>: xchg %ax,%ax
End of assembler dump.

Linux getting terminal arguments from _start not working with inline assembly in C

This looks correct for a minimal _start: but you put it inside a non-naked C function. Compiler-generated code will run, e.g. push %rbp / mov %rsp, %rbp, before execution enters before the asm statement. To see this, look at gcc -S output, or single-step in a debugger such as GDB.

Put your asm statement at global scope (like in How Get arguments value using inline assembly in C without Glibc?) or use __attribute__((naked)) on your _start(). Note that _start isn't really a function

As a rule, never use GNU C Basic asm statements in a non-naked function. Although you might get this to work with -O3 because that would imply -fomit-frame-pointer so the stack would still be pointing at argc and argv when your code ran.

A dynamically linked executable on GNU/Linux will run libc startup code from dynamic linker hooks, so you actually can use printf from _start without manually calling those init functions. Unlike if this was statically linked.

However, your main tries to return to your _start, but you don't show _start calling exit. You should call exit instead of making an _exit system call directly, to make sure stdio buffers get flushed even if output is redirected to a file (making stdout full buffered). Falling off the end of _start would be bad, crashing or getting into an infinite loop depending on what execution falls in to.

Linux 64 command line parameters in Assembly

It looks like section 3.4 Process Initialization, and specifically figure 3.9, in the already mentioned System V AMD64 ABI describes precisely what you want to know.

How to get the first command-line argument and put it into static buffer in memory?

You don't have to copy the contents of the string itself into a data buffer. Save the value of 16(%rsp) in a QWORD sized variable and use it with syscalls all you want. In C terms, that would be the difference between

char lcomm[4];
strcpy(lcomm, argv[1]);
open(lcomm, ...);

and

char *plcomm;
plcomm = argv[1];
open(plcomm, ...);

The second one works just as well.

Also, your buffer has a fixed size of 4 bytes. If the command line argument is longer than that, your code will overflow the buffer, and potentially crash.


That said, if you're serious about learning assembly, you should eventually figure out how to write a strcpy-like loop. :)


EDIT with some assembly code. Last time I checked, the file name goes into the syscall as RDI, not RSI:

mov   16(%rsp), %rdi # File name
mov $0, %rsi # Flags: O_RDONLY, but substitute your own
mov $0, %rdx # Mode: doesn't matter if the file exists
mov $2, %rax # Syscall number for open
syscall
# %rax is the file handle now

For future reference, the x86_64 syscall convention is:

  • Parameters go into %rdi, %rsi, %rdx, %rcx, %r8, and %r9 in that order
  • Syscall # goes into %rax
  • Then perform a syscall instruction
  • The return value is in %rax
  • %rcx and %r11 are clobbered, the rest of the registers are preserved

The reference of syscalls is here.

What does call _start in x86?

First of all, when using the same CPU (e.g. an x86-64 CPU), you need different crt0.S files for different operating systems.

And you need a different crt0.S for programs that are not started using an operating system (such as an operating system itself).

What is in stack before called _start which should be the only entry for linker?

This depends on the operating system. Linux would copy argc, the arguments (argv[n]) and the environment (environ[n]) somewhere on the stack.

The file from your question is intended for an operating system that places argc at rsp+0, followed by the arguments and the environment.

However, I remember a (32-bit) OS that put argc at esp+0x80 instead of esp+0, so this is also possible...

As far as I know, Windows does not put anything on the stack (at least not officially). The corresponding crt0.S code must call a function in a DLL file to get the command line arguments.

In the case of a device firmware which is started immediately after the CPU (microcontroller) start, the crt0.S code must even set the stack pointer to a valid value first. The memory (including the stack) is often completely uninitialized in this case.

Needless to say that the stack does not contain any useful values in this case.

This should be designed to provide initilization of variables from .data ...

In the case of a software started by an operating system, the operating system will initialize the .data section. This means that the crt0.S code does not have to do that.

In the case of a microcontroller program (device firmware), the crt0.S code has to do this.

Because your file is obviously intended for an operating system, it does not initialize the .data section.

Finally, how to compile this assembly, without stdlib ...

If you want to use the crt0.S file from your question, you'll definitely require the _exit() function.

And if you want to use the function puts() in your code, you'll also need this function.

If you don't use the standard library, you'll have to write these functions yourself:

    ...
main:
lea str(%rip), %rdi
call puts
ret

_exit:
...

puts:
...

The exact implementation depends on the operating system you use.

puts() will be a bit tricky to implement; write() would be easier.

Note:

Please also don't forget the ret at the end of the main() function; (alternatively you can jmp to puts() instead of calling it...)

Getting command line parameters from an assembly program

With the help from @Michael, I was able to track down the problem.

Using %ebp as argv as @Michael suggested (he used %eax though). Another problem was that I needed to compare the value of (%ebp) with 0 (the null terminator) and end the program at that point.

Code:

    movl 8(%esp), %ebp  /* Get argv.  */

pr_arg:
cmpl $0, (%ebp)
je endit

pushl %ecx
pushl (%ebp)
pushl $output2
call printf
addl $8, %esp /* remove output2 and current argument. */
addl $4, %ebp

popl %ecx
loop pr_arg

ret


Related Topics



Leave a reply



Submit