Linux 64 command line parameters in Assembly
It looks like section 3.4 Process Initialization, and specifically figure 3.9, in the already mentioned System V AMD64 ABI describes precisely what you want to know.
Trouble getting command line arguments in x86 assembly
There are two issues with your code.
First, you use lea 8(%rsi), %rdi
to retrieve the second argument. Note that rsi
points to an array of pointers to command line arguments, so to retrieve the pointer to the second argument, you have to dereference 8(%rsi)
using something like mov 8(%rsi), %rdi
.
Second, you forgot the dollar sign in front of 0
in cmp $0, %rax
. This causes an absolute address mode for address 0
to be selected, effectively dereferencing a null pointer. To fix this, add the missing dollar sign to select an immediate addressing mode.
When I fix both issues, your code as far as you posted it seems to work just fine.
Linux Assembly x86_64 create a file using command line parameters
You attempted to use the 32 bit mechanism. If you have a 32 bit tutorial, you can of course create 32 bit programs and those will work as-is in compatibility mode.
If you want to write 64 bit code however, you will need to use the 64 bit conventions and interfaces. Here, that means the syscall
instruction with the appropriate registers:
global _start
_start:
mov eax,85 ; syscall number for creat()
mov rdi,[rsp+16] ; argv[1], the file name
mov esi,00644Q ; rw,r,r
syscall ; call the kernel
xor edi, edi ; exit code 0
mov eax, 60 ; syscall number for exit()
syscall
See also the x86-64 sysv abi on wikipedia or the abi pdf for more details.
x86_64 Assembly Command Line Arguments
You have it close.
argv
is an array pointer, not where the array is. In C
it is written char **argv
, so you have to do two levels of dereferencing to get to the strings.
top of stack
<- rsp after alignment
return address <- rsp at beginning (aligned rsp + 8)
[something] <- rsp + 16
argc <- rsp + 24
argv <- rsp + 32
envp <- rsp + 40 (in most Unix-compatible systems, the environment
... ... string array, char **envp)
bottom of stack
...
somewhere else:
argv[0] <- argv+0: address of first parameter (program path or name)
argv[1] <- argv+8: address of second parameter (first command line argument)
argv[2] <- argv+16: address of third parameter (second command line argument)
...
argv[argc] <- argv+argc*8: NULL
Getting command line parameters from an assembly program
With the help from @Michael, I was able to track down the problem.
Using %ebp as argv
as @Michael suggested (he used %eax
though). Another problem was that I needed to compare the value of (%ebp) with 0 (the null terminator) and end the program at that point.
Code:
movl 8(%esp), %ebp /* Get argv. */
pr_arg:
cmpl $0, (%ebp)
je endit
pushl %ecx
pushl (%ebp)
pushl $output2
call printf
addl $8, %esp /* remove output2 and current argument. */
addl $4, %ebp
popl %ecx
loop pr_arg
ret
x86 Linux assembler get program parameters from _start
On Linux, the familiar argc
and argv
variables from C are always passed on the stack by the kernel, available even to assembly programs that are completely standalone and don't link with the startup code in the C library. This is documented in the i386 System V ABI, along with other details of the process startup environment (register values, stack alignment).
At the ELF entry point (a.k.a. _start
) of an x86 Linux executable:
- ESP points to
argc
- ESP + 4 points to
argv[0]
, the start of the array. i.e. the value you should pass to main aschar **argv
islea eax, [esp+4]
, notmov eax, [esp+4]
)
How a Minimal Assembly Program Obtains argc and argv
I'll show how to read argv
and argc[0]
in GDB.
cmdline-x86.S
#include <sys/syscall.h>
.global _start
_start:
/* Cause a breakpoint trap */
int $0x03
/* exit_group(0) */
mov $SYS_exit_group, %eax
mov $0, %ebx
int $0x80
cmdline-x86.gdb
set confirm off
file cmdline-x86
run
# We'll regain control here after the breakpoint trap
printf "argc: %d\n", *(int*)$esp
printf "argv[0]: %s\n", ((char**)($esp + 4))[0]
quit
Sample Session
$ cc -nostdlib -g3 -m32 cmdline-x86.S -o cmdline-x86
$ gdb -q -x cmdline-x86.gdb cmdline-x86
<...>
Program received signal SIGTRAP, Trace/breakpoint trap.
_start () at cmdline-x86.S:8
8 mov $SYS_exit_group, %eax
argc: 1
argv[0]: /home/scottt/Dropbox/stackoverflow/cmdline-x86
Explanation
- I placed a software breakpoint (
int $0x03
) to cause the program to trap back into the debugger right after the ELF entry point (_start
). - I then used
printf
in the GDB script to printargc
with the expression*(int*)$esp
argv
with the expression((char**)($esp + 4))[0]
x86-64 version
The differences are minimal:
- Replace ESP with RSP
- Change address size from 4 to 8
- Conform to different Linux syscall calling conventions when we call
exit_group(0)
to properly terminate the process
cmdline.S
#include <sys/syscall.h>
.global _start
_start:
/* Cause a breakpoint trap */
int $0x03
/* exit_group(0) */
mov $SYS_exit_group, %rax
mov $0, %rdi
syscall
cmdline.gdb
set confirm off
file cmdline
run
printf "argc: %d\n", *(int*)$rsp
printf "argv[0]: %s\n", ((char**)($rsp + 8))[0]
quit
How Regular C Programs Obtain argc and argv
You can disassemble _start
from a regular C program to see how it obtains argc
and argv
from the stack and passes them as it calls __libc_start_main
. Using the /bin/true
program on my x86-64 machine as an example:
$ gdb -q /bin/true
Reading symbols from /usr/bin/true...Reading symbols from /usr/lib/debug/usr/bin/true.debug...done.
done.
(gdb) disassemble _start
Dump of assembler code for function _start:
0x0000000000401580 <+0>: xor %ebp,%ebp
0x0000000000401582 <+2>: mov %rdx,%r9
0x0000000000401585 <+5>: pop %rsi
0x0000000000401586 <+6>: mov %rsp,%rdx
0x0000000000401589 <+9>: and $0xfffffffffffffff0,%rsp
0x000000000040158d <+13>: push %rax
0x000000000040158e <+14>: push %rsp
0x000000000040158f <+15>: mov $0x404040,%r8
0x0000000000401596 <+22>: mov $0x403fb0,%rcx
0x000000000040159d <+29>: mov $0x4014c0,%rdi
0x00000000004015a4 <+36>: callq 0x401310 <__libc_start_main@plt>
0x00000000004015a9 <+41>: hlt
0x00000000004015aa <+42>: xchg %ax,%ax
0x00000000004015ac <+44>: nopl 0x0(%rax)
The first three arguments to __libc_start_main()
are:
- RDI: pointer to
main()
- RSI:
argc
, you can see how it was the first thing popped off the stack - RDX:
argv
, the value of RSP right afterargc
was popped. (ubp_av
in the GLIBC source)
The x86 _start is very similar:
Dump of assembler code for function _start:
0x0804842c <+0>: xor %ebp,%ebp
0x0804842e <+2>: pop %esi
0x0804842f <+3>: mov %esp,%ecx
0x08048431 <+5>: and $0xfffffff0,%esp
0x08048434 <+8>: push %eax
0x08048435 <+9>: push %esp
0x08048436 <+10>: push %edx
0x08048437 <+11>: push $0x80485e0
0x0804843c <+16>: push $0x8048570
0x08048441 <+21>: push %ecx
0x08048442 <+22>: push %esi
0x08048443 <+23>: push $0x80483d0
0x08048448 <+28>: call 0x80483b0 <__libc_start_main@plt>
0x0804844d <+33>: hlt
0x0804844e <+34>: xchg %ax,%ax
End of assembler dump.
Process command line in Linux 64 bit
You are loading the correct address into %rcx
.
int 0x80
then invokes the 32-bit syscall interface. That truncates the address to 32 bits, which makes it incorrect. (If you use a debugger and set a breakpoint just after the first int 0x80
, you will see that it returns with -14 in %eax
, which is -EFAULT
.)
The second syscall, exit
, works OK because the truncation to 32 bits doesn't do any harm in that case.
If you want to pass a 64-bit address to a system call, you will have to use the 64-bit syscall interface:
- use
syscall
, notint 0x80
; - different registers are used: see here;
- the system call numbers are different as well: see here.
Here is a working version of your code:
.section .text
.globl _start
_start:
movq %rsp, %rbp
movq $1, %rax
movq $1, %rdi
movq 8(%rbp), %rsi # program name address ?
movq $5, %rdx
syscall
movq $60, %rax
movq $0, %rdi
syscall
Linux getting terminal arguments from _start not working with inline assembly in C
This looks correct for a minimal _start:
but you put it inside a non-naked
C function. Compiler-generated code will run, e.g. push %rbp
/ mov %rsp, %rbp
, before execution enters before the asm statement. To see this, look at gcc -S
output, or single-step in a debugger such as GDB.
Put your asm statement at global scope (like in How Get arguments value using inline assembly in C without Glibc?) or use __attribute__((naked))
on your _start()
. Note that _start
isn't really a function
As a rule, never use GNU C Basic asm statements in a non-naked function. Although you might get this to work with -O3
because that would imply -fomit-frame-pointer
so the stack would still be pointing at argc and argv when your code ran.
A dynamically linked executable on GNU/Linux will run libc startup code from dynamic linker hooks, so you actually can use printf
from _start
without manually calling those init functions. Unlike if this was statically linked.
However, your main
tries to return to your _start
, but you don't show _start
calling exit
. You should call exit
instead of making an _exit system call directly, to make sure stdio buffers get flushed even if output is redirected to a file (making stdout full buffered). Falling off the end of _start
would be bad, crashing or getting into an infinite loop depending on what execution falls in to.
How to get the first command-line argument and put it into static buffer in memory?
You don't have to copy the contents of the string itself into a data buffer. Save the value of 16(%rsp)
in a QWORD sized variable and use it with syscalls all you want. In C terms, that would be the difference between
char lcomm[4];
strcpy(lcomm, argv[1]);
open(lcomm, ...);
and
char *plcomm;
plcomm = argv[1];
open(plcomm, ...);
The second one works just as well.
Also, your buffer has a fixed size of 4 bytes. If the command line argument is longer than that, your code will overflow the buffer, and potentially crash.
That said, if you're serious about learning assembly, you should eventually figure out how to write a strcpy
-like loop. :)
EDIT with some assembly code. Last time I checked, the file name goes into the syscall as RDI, not RSI:
mov 16(%rsp), %rdi # File name
mov $0, %rsi # Flags: O_RDONLY, but substitute your own
mov $0, %rdx # Mode: doesn't matter if the file exists
mov $2, %rax # Syscall number for open
syscall
# %rax is the file handle now
For future reference, the x86_64 syscall convention is:
- Parameters go into %rdi, %rsi, %rdx, %rcx, %r8, and %r9 in that order
- Syscall # goes into %rax
- Then perform a
syscall
instruction - The return value is in %rax
- %rcx and %r11 are clobbered, the rest of the registers are preserved
The reference of syscalls is here.
Related Topics
Different Results Between Ps Aux and 'Ps Aux' Inside a Script
Bash, Linux, Need to Remove Lines from One File Based on Matching Content from Another File
String Comparison Not Working Properly
In Bash, Why 'X=100 Echo $X' Doesn't Print Anything
Git Diff with Line Numbers and Proper Code Alignment/Indentation
While Do Loop and Variables in a Bash Script
Join on First Column of Two Files
Sed: Insert a Line in a Certain Position
Bash Indirect Variable Reference
How to Solve "Server Terminated Early with Status 127" When Running Node.Js on Linux
Python Socket.Error: [Errno 13] Permission Denied
How to Use Grep to Match But Without Printing the Matches