Linux 64 Command Line Parameters in Assembly

Linux 64 command line parameters in Assembly

It looks like section 3.4 Process Initialization, and specifically figure 3.9, in the already mentioned System V AMD64 ABI describes precisely what you want to know.

Trouble getting command line arguments in x86 assembly

There are two issues with your code.

First, you use lea 8(%rsi), %rdi to retrieve the second argument. Note that rsi points to an array of pointers to command line arguments, so to retrieve the pointer to the second argument, you have to dereference 8(%rsi) using something like mov 8(%rsi), %rdi.

Second, you forgot the dollar sign in front of 0 in cmp $0, %rax. This causes an absolute address mode for address 0 to be selected, effectively dereferencing a null pointer. To fix this, add the missing dollar sign to select an immediate addressing mode.

When I fix both issues, your code as far as you posted it seems to work just fine.

Linux Assembly x86_64 create a file using command line parameters

You attempted to use the 32 bit mechanism. If you have a 32 bit tutorial, you can of course create 32 bit programs and those will work as-is in compatibility mode.
If you want to write 64 bit code however, you will need to use the 64 bit conventions and interfaces. Here, that means the syscall instruction with the appropriate registers:

  global _start

_start:
  mov eax,85       ; syscall number for creat()
  mov rdi,[rsp+16] ; argv[1], the file name
  mov esi,00644Q   ; rw,r,r
  syscall          ; call the kernel 
  xor edi, edi     ; exit code 0
  mov eax, 60      ; syscall number for exit()
  syscall

See also the x86-64 sysv abi on wikipedia or the abi pdf for more details.

x86_64 Assembly Command Line Arguments

You have it close.

argv is an array pointer, not where the array is. In C it is written char **argv, so you have to do two levels of dereferencing to get to the strings.

 top of stack
               <- rsp after alignment
return address <- rsp at beginning (aligned rsp + 8)
  [something]  <- rsp + 16
    argc       <- rsp + 24
   argv        <- rsp + 32
   envp        <- rsp + 40  (in most Unix-compatible systems, the environment
    ...            ...       string array, char **envp)
bottom of stack
 ...
somewhere else:
   argv[0]     <- argv+0:   address of first parameter (program path or name)
   argv[1]     <- argv+8:   address of second parameter (first command line argument)
   argv[2]     <- argv+16:  address of third parameter (second command line argument)
    ...
   argv[argc]  <-  argv+argc*8:  NULL

Getting command line parameters from an assembly program

With the help from @Michael, I was able to track down the problem.

Using %ebp as argv as @Michael suggested (he used %eax though). Another problem was that I needed to compare the value of (%ebp) with 0 (the null terminator) and end the program at that point.

Code:

    movl 8(%esp), %ebp  /* Get argv.  */

pr_arg:
    cmpl $0, (%ebp)
    je endit

    pushl %ecx
    pushl (%ebp)
    pushl $output2
    call printf
    addl $8, %esp       /* remove output2 and current argument.  */
    addl $4, %ebp

    popl %ecx
    loop pr_arg

    ret

x86 Linux assembler get program parameters from _start

On Linux, the familiar argc and argv variables from C are always passed on the stack by the kernel, available even to assembly programs that are completely standalone and don't link with the startup code in the C library. This is documented in the i386 System V ABI, along with other details of the process startup environment (register values, stack alignment).

At the ELF entry point (a.k.a. _start) of an x86 Linux executable:

ESP points to argc
ESP + 4 points to argv[0], the start of the array. i.e. the value you should pass to main as char **argv is lea eax, [esp+4], not mov eax, [esp+4])

How a Minimal Assembly Program Obtains argc and argv

I'll show how to read argv and argc[0] in GDB.

cmdline-x86.S

#include <sys/syscall.h>

    .global _start
_start:
    /* Cause a breakpoint trap */
    int $0x03

    /* exit_group(0) */
    mov $SYS_exit_group, %eax
    mov $0, %ebx
    int $0x80

cmdline-x86.gdb

set confirm off
file cmdline-x86
run
# We'll regain control here after the breakpoint trap
printf "argc: %d\n", *(int*)$esp
printf "argv[0]: %s\n",  ((char**)($esp + 4))[0]
quit

Sample Session

$ cc -nostdlib -g3 -m32 cmdline-x86.S -o cmdline-x86
$ gdb -q -x cmdline-x86.gdb cmdline-x86
<...>  
Program received signal SIGTRAP, Trace/breakpoint trap.
_start () at cmdline-x86.S:8
8   mov $SYS_exit_group, %eax
argc: 1
argv[0]: /home/scottt/Dropbox/stackoverflow/cmdline-x86

Explanation

I placed a software breakpoint (int $0x03) to cause the program to trap back into the debugger right after the ELF entry point (_start).
I then used printf in the GDB script to print
1. argc with the expression *(int*)$esp
2. argv with the expression ((char**)($esp + 4))[0]

x86-64 version

The differences are minimal:

Replace ESP with RSP
Change address size from 4 to 8
Conform to different Linux syscall calling conventions when we call exit_group(0) to properly terminate the process

cmdline.S

#include <sys/syscall.h>

    .global _start
_start:
    /* Cause a breakpoint trap */
    int $0x03

    /* exit_group(0) */
    mov $SYS_exit_group, %rax
    mov $0, %rdi
    syscall

cmdline.gdb

set confirm off
file cmdline
run
printf "argc: %d\n", *(int*)$rsp
printf "argv[0]: %s\n",  ((char**)($rsp + 8))[0]
quit

How Regular C Programs Obtain argc and argv

You can disassemble _start from a regular C program to see how it obtains argc and argv from the stack and passes them as it calls __libc_start_main. Using the /bin/true program on my x86-64 machine as an example:

$ gdb -q /bin/true
Reading symbols from /usr/bin/true...Reading symbols from /usr/lib/debug/usr/bin/true.debug...done.
done.
(gdb) disassemble _start
Dump of assembler code for function _start:
   0x0000000000401580 <+0>: xor    %ebp,%ebp
   0x0000000000401582 <+2>: mov    %rdx,%r9
   0x0000000000401585 <+5>: pop    %rsi
   0x0000000000401586 <+6>: mov    %rsp,%rdx
   0x0000000000401589 <+9>: and    $0xfffffffffffffff0,%rsp
   0x000000000040158d <+13>:    push   %rax
   0x000000000040158e <+14>:    push   %rsp
   0x000000000040158f <+15>:    mov    $0x404040,%r8
   0x0000000000401596 <+22>:    mov    $0x403fb0,%rcx
   0x000000000040159d <+29>:    mov    $0x4014c0,%rdi
   0x00000000004015a4 <+36>:    callq  0x401310 <__libc_start_main@plt>
   0x00000000004015a9 <+41>:    hlt    
   0x00000000004015aa <+42>:    xchg   %ax,%ax
   0x00000000004015ac <+44>:    nopl   0x0(%rax)

The first three arguments to __libc_start_main() are:

RDI: pointer to main()
RSI: argc, you can see how it was the first thing popped off the stack
RDX: argv, the value of RSP right after argc was popped. (ubp_av in the GLIBC source)

The x86 _start is very similar:

Dump of assembler code for function _start:
   0x0804842c <+0>: xor    %ebp,%ebp
   0x0804842e <+2>: pop    %esi
   0x0804842f <+3>: mov    %esp,%ecx
   0x08048431 <+5>: and    $0xfffffff0,%esp
   0x08048434 <+8>: push   %eax
   0x08048435 <+9>: push   %esp
   0x08048436 <+10>:    push   %edx
   0x08048437 <+11>:    push   $0x80485e0
   0x0804843c <+16>:    push   $0x8048570
   0x08048441 <+21>:    push   %ecx
   0x08048442 <+22>:    push   %esi
   0x08048443 <+23>:    push   $0x80483d0
   0x08048448 <+28>:    call   0x80483b0 <__libc_start_main@plt>
   0x0804844d <+33>:    hlt    
   0x0804844e <+34>:    xchg   %ax,%ax
End of assembler dump.

Process command line in Linux 64 bit

You are loading the correct address into %rcx.

int 0x80 then invokes the 32-bit syscall interface. That truncates the address to 32 bits, which makes it incorrect. (If you use a debugger and set a breakpoint just after the first int 0x80, you will see that it returns with -14 in %eax, which is -EFAULT.)

The second syscall, exit, works OK because the truncation to 32 bits doesn't do any harm in that case.

If you want to pass a 64-bit address to a system call, you will have to use the 64-bit syscall interface:

use syscall, not int 0x80;
different registers are used: see here;
the system call numbers are different as well: see here.

Here is a working version of your code:

.section .text

.globl _start
_start:
 movq  %rsp, %rbp

 movq $1, %rax
 movq $1, %rdi
 movq 8(%rbp), %rsi       # program name address ?
 movq $5, %rdx
 syscall

 movq $60, %rax
 movq $0, %rdi
 syscall

Linux getting terminal arguments from _start not working with inline assembly in C

This looks correct for a minimal _start: but you put it inside a non-naked C function. Compiler-generated code will run, e.g. push %rbp / mov %rsp, %rbp, before execution enters before the asm statement. To see this, look at gcc -S output, or single-step in a debugger such as GDB.

Put your asm statement at global scope (like in How Get arguments value using inline assembly in C without Glibc?) or use __attribute__((naked)) on your _start(). Note that _start isn't really a function

As a rule, never use GNU C Basic asm statements in a non-naked function. Although you might get this to work with -O3 because that would imply -fomit-frame-pointer so the stack would still be pointing at argc and argv when your code ran.

A dynamically linked executable on GNU/Linux will run libc startup code from dynamic linker hooks, so you actually can use printf from _start without manually calling those init functions. Unlike if this was statically linked.

However, your main tries to return to your _start, but you don't show _start calling exit. You should call exit instead of making an _exit system call directly, to make sure stdio buffers get flushed even if output is redirected to a file (making stdout full buffered). Falling off the end of _start would be bad, crashing or getting into an infinite loop depending on what execution falls in to.

How to get the first command-line argument and put it into static buffer in memory?

You don't have to copy the contents of the string itself into a data buffer. Save the value of 16(%rsp) in a QWORD sized variable and use it with syscalls all you want. In C terms, that would be the difference between

char lcomm[4];
strcpy(lcomm, argv[1]);
open(lcomm, ...);

and

char *plcomm;
plcomm = argv[1];
open(plcomm, ...);

The second one works just as well.

Also, your buffer has a fixed size of 4 bytes. If the command line argument is longer than that, your code will overflow the buffer, and potentially crash.

That said, if you're serious about learning assembly, you should eventually figure out how to write a strcpy-like loop. :)

EDIT with some assembly code. Last time I checked, the file name goes into the syscall as RDI, not RSI:

mov   16(%rsp), %rdi # File name
mov   $0, %rsi        # Flags: O_RDONLY, but substitute your own
mov   $0, %rdx        # Mode: doesn't matter if the file exists
mov   $2, %rax        # Syscall number for open
syscall
# %rax is the file handle now

For future reference, the x86_64 syscall convention is:

Parameters go into %rdi, %rsi, %rdx, %rcx, %r8, and %r9 in that order
Syscall # goes into %rax
Then perform a syscall instruction
The return value is in %rax
%rcx and %r11 are clobbered, the rest of the registers are preserved

The reference of syscalls is here.

Linux 64 Command Line Parameters in Assembly