x86 Linux assembler get program parameters from _start
On Linux, the familiar argc
and argv
variables from C are always passed on the stack by the kernel, available even to assembly programs that are completely standalone and don't link with the startup code in the C library. This is documented in the i386 System V ABI, along with other details of the process startup environment (register values, stack alignment).
At the ELF entry point (a.k.a. _start
) of an x86 Linux executable:
- ESP points to
argc
- ESP + 4 points to
argv[0]
, the start of the array. i.e. the value you should pass to main aschar **argv
islea eax, [esp+4]
, notmov eax, [esp+4]
)
How a Minimal Assembly Program Obtains argc and argv
I'll show how to read argv
and argc[0]
in GDB.
cmdline-x86.S
#include <sys/syscall.h>
.global _start
_start:
/* Cause a breakpoint trap */
int $0x03
/* exit_group(0) */
mov $SYS_exit_group, %eax
mov $0, %ebx
int $0x80
cmdline-x86.gdb
set confirm off
file cmdline-x86
run
# We'll regain control here after the breakpoint trap
printf "argc: %d\n", *(int*)$esp
printf "argv[0]: %s\n", ((char**)($esp + 4))[0]
quit
Sample Session
$ cc -nostdlib -g3 -m32 cmdline-x86.S -o cmdline-x86
$ gdb -q -x cmdline-x86.gdb cmdline-x86
<...>
Program received signal SIGTRAP, Trace/breakpoint trap.
_start () at cmdline-x86.S:8
8 mov $SYS_exit_group, %eax
argc: 1
argv[0]: /home/scottt/Dropbox/stackoverflow/cmdline-x86
Explanation
- I placed a software breakpoint (
int $0x03
) to cause the program to trap back into the debugger right after the ELF entry point (_start
). - I then used
printf
in the GDB script to printargc
with the expression*(int*)$esp
argv
with the expression((char**)($esp + 4))[0]
x86-64 version
The differences are minimal:
- Replace ESP with RSP
- Change address size from 4 to 8
- Conform to different Linux syscall calling conventions when we call
exit_group(0)
to properly terminate the process
cmdline.S
#include <sys/syscall.h>
.global _start
_start:
/* Cause a breakpoint trap */
int $0x03
/* exit_group(0) */
mov $SYS_exit_group, %rax
mov $0, %rdi
syscall
cmdline.gdb
set confirm off
file cmdline
run
printf "argc: %d\n", *(int*)$rsp
printf "argv[0]: %s\n", ((char**)($rsp + 8))[0]
quit
How Regular C Programs Obtain argc and argv
You can disassemble _start
from a regular C program to see how it obtains argc
and argv
from the stack and passes them as it calls __libc_start_main
. Using the /bin/true
program on my x86-64 machine as an example:
$ gdb -q /bin/true
Reading symbols from /usr/bin/true...Reading symbols from /usr/lib/debug/usr/bin/true.debug...done.
done.
(gdb) disassemble _start
Dump of assembler code for function _start:
0x0000000000401580 <+0>: xor %ebp,%ebp
0x0000000000401582 <+2>: mov %rdx,%r9
0x0000000000401585 <+5>: pop %rsi
0x0000000000401586 <+6>: mov %rsp,%rdx
0x0000000000401589 <+9>: and $0xfffffffffffffff0,%rsp
0x000000000040158d <+13>: push %rax
0x000000000040158e <+14>: push %rsp
0x000000000040158f <+15>: mov $0x404040,%r8
0x0000000000401596 <+22>: mov $0x403fb0,%rcx
0x000000000040159d <+29>: mov $0x4014c0,%rdi
0x00000000004015a4 <+36>: callq 0x401310 <__libc_start_main@plt>
0x00000000004015a9 <+41>: hlt
0x00000000004015aa <+42>: xchg %ax,%ax
0x00000000004015ac <+44>: nopl 0x0(%rax)
The first three arguments to __libc_start_main()
are:
- RDI: pointer to
main()
- RSI:
argc
, you can see how it was the first thing popped off the stack - RDX:
argv
, the value of RSP right afterargc
was popped. (ubp_av
in the GLIBC source)
The x86 _start is very similar:
Dump of assembler code for function _start:
0x0804842c <+0>: xor %ebp,%ebp
0x0804842e <+2>: pop %esi
0x0804842f <+3>: mov %esp,%ecx
0x08048431 <+5>: and $0xfffffff0,%esp
0x08048434 <+8>: push %eax
0x08048435 <+9>: push %esp
0x08048436 <+10>: push %edx
0x08048437 <+11>: push $0x80485e0
0x0804843c <+16>: push $0x8048570
0x08048441 <+21>: push %ecx
0x08048442 <+22>: push %esi
0x08048443 <+23>: push $0x80483d0
0x08048448 <+28>: call 0x80483b0 <__libc_start_main@plt>
0x0804844d <+33>: hlt
0x0804844e <+34>: xchg %ax,%ax
End of assembler dump.
Linux getting terminal arguments from _start not working with inline assembly in C
This looks correct for a minimal _start:
but you put it inside a non-naked
C function. Compiler-generated code will run, e.g. push %rbp
/ mov %rsp, %rbp
, before execution enters before the asm statement. To see this, look at gcc -S
output, or single-step in a debugger such as GDB.
Put your asm statement at global scope (like in How Get arguments value using inline assembly in C without Glibc?) or use __attribute__((naked))
on your _start()
. Note that _start
isn't really a function
As a rule, never use GNU C Basic asm statements in a non-naked function. Although you might get this to work with -O3
because that would imply -fomit-frame-pointer
so the stack would still be pointing at argc and argv when your code ran.
A dynamically linked executable on GNU/Linux will run libc startup code from dynamic linker hooks, so you actually can use printf
from _start
without manually calling those init functions. Unlike if this was statically linked.
However, your main
tries to return to your _start
, but you don't show _start
calling exit
. You should call exit
instead of making an _exit system call directly, to make sure stdio buffers get flushed even if output is redirected to a file (making stdout full buffered). Falling off the end of _start
would be bad, crashing or getting into an infinite loop depending on what execution falls in to.
Linux 64 command line parameters in Assembly
It looks like section 3.4 Process Initialization, and specifically figure 3.9, in the already mentioned System V AMD64 ABI describes precisely what you want to know.
How to get the first command-line argument and put it into static buffer in memory?
You don't have to copy the contents of the string itself into a data buffer. Save the value of 16(%rsp)
in a QWORD sized variable and use it with syscalls all you want. In C terms, that would be the difference between
char lcomm[4];
strcpy(lcomm, argv[1]);
open(lcomm, ...);
and
char *plcomm;
plcomm = argv[1];
open(plcomm, ...);
The second one works just as well.
Also, your buffer has a fixed size of 4 bytes. If the command line argument is longer than that, your code will overflow the buffer, and potentially crash.
That said, if you're serious about learning assembly, you should eventually figure out how to write a strcpy
-like loop. :)
EDIT with some assembly code. Last time I checked, the file name goes into the syscall as RDI, not RSI:
mov 16(%rsp), %rdi # File name
mov $0, %rsi # Flags: O_RDONLY, but substitute your own
mov $0, %rdx # Mode: doesn't matter if the file exists
mov $2, %rax # Syscall number for open
syscall
# %rax is the file handle now
For future reference, the x86_64 syscall convention is:
- Parameters go into %rdi, %rsi, %rdx, %rcx, %r8, and %r9 in that order
- Syscall # goes into %rax
- Then perform a
syscall
instruction - The return value is in %rax
- %rcx and %r11 are clobbered, the rest of the registers are preserved
The reference of syscalls is here.
What does call _start in x86?
First of all, when using the same CPU (e.g. an x86-64 CPU), you need different crt0.S
files for different operating systems.
And you need a different crt0.S
for programs that are not started using an operating system (such as an operating system itself).
What is in stack before called
_start
which should be the only entry for linker?
This depends on the operating system. Linux would copy argc
, the arguments (argv[n]
) and the environment (environ[n]
) somewhere on the stack.
The file from your question is intended for an operating system that places argc
at rsp+0
, followed by the arguments and the environment.
However, I remember a (32-bit) OS that put argc
at esp+0x80
instead of esp+0
, so this is also possible...
As far as I know, Windows does not put anything on the stack (at least not officially). The corresponding crt0.S
code must call a function in a DLL file to get the command line arguments.
In the case of a device firmware which is started immediately after the CPU (microcontroller) start, the crt0.S
code must even set the stack pointer to a valid value first. The memory (including the stack) is often completely uninitialized in this case.
Needless to say that the stack does not contain any useful values in this case.
This should be designed to provide initilization of variables from
.data
...
In the case of a software started by an operating system, the operating system will initialize the .data
section. This means that the crt0.S
code does not have to do that.
In the case of a microcontroller program (device firmware), the crt0.S
code has to do this.
Because your file is obviously intended for an operating system, it does not initialize the .data
section.
Finally, how to compile this assembly, without stdlib ...
If you want to use the crt0.S
file from your question, you'll definitely require the _exit()
function.
And if you want to use the function puts()
in your code, you'll also need this function.
If you don't use the standard library, you'll have to write these functions yourself:
...
main:
lea str(%rip), %rdi
call puts
ret
_exit:
...
puts:
...
The exact implementation depends on the operating system you use.
puts()
will be a bit tricky to implement; write()
would be easier.
Note:
Please also don't forget the ret
at the end of the main()
function; (alternatively you can jmp
to puts()
instead of call
ing it...)
Getting command line parameters from an assembly program
With the help from @Michael, I was able to track down the problem.
Using %ebp as argv
as @Michael suggested (he used %eax
though). Another problem was that I needed to compare the value of (%ebp) with 0 (the null terminator) and end the program at that point.
Code:
movl 8(%esp), %ebp /* Get argv. */
pr_arg:
cmpl $0, (%ebp)
je endit
pushl %ecx
pushl (%ebp)
pushl $output2
call printf
addl $8, %esp /* remove output2 and current argument. */
addl $4, %ebp
popl %ecx
loop pr_arg
ret
Related Topics
How to Increase the /Proc/Pid/Cmdline 4096 Byte Limit
Can Ptrace Tell If an X86 System Call Used the 64-Bit or 32-Bit Abi
How to Get Pid from Forked Child Process in Shell Script
Accessing .So Libraries Using Dlopen() Throws Undefined Symbol Error
How to Remove Only the First Occurrence of a Line in a File Using Sed
Rename File Command in Unix with Timestamp
Get Lines of File1 Which Are Not in File2
Possible Values for 'Uname -M'
How to Execute Parallel "For" Loops in Bash
Getting Current Path in Variable and Using It
Bash: Delete Based on File Date Stamp
Xampp: Another Web Server Daemon Is Already Running
Sort by Third Column Leaving First and Second Column Intact in Linux
How to Create an Rs256 Jwt Assertion with Bash/Shell Scripting