What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Further reading for any of the topics here: The Definitive Guide to Linux System Calls
I verified these using GNU Assembler (gas) on Linux.
Kernel Interface
x86-32 aka i386 Linux System Call convention:
In x86-32 parameters for Linux system call are passed using registers. %eax
for syscall_number. %ebx, %ecx, %edx, %esi, %edi, %ebp are used for passing 6 parameters to system calls.
The return value is in %eax
. All other registers (including EFLAGS) are preserved across the int $0x80
.
I took following snippet from the Linux Assembly Tutorial but I'm doubtful about this. If any one can show an example, it would be great.
If there are more than six arguments,
%ebx
must contain the memory
location where the list of arguments
is stored - but don't worry about this
because it's unlikely that you'll use
a syscall with more than six
arguments.
For an example and a little more reading, refer to http://www.int80h.org/bsdasm/#alternate-calling-convention. Another example of a Hello World for i386 Linux using int 0x80
: Hello, world in assembly language with Linux system calls?
There is a faster way to make 32-bit system calls: using sysenter
. The kernel maps a page of memory into every process (the vDSO), with the user-space side of the sysenter
dance, which has to cooperate with the kernel for it to be able to find the return address. Arg to register mapping is the same as for int $0x80
. You should normally call into the vDSO instead of using sysenter
directly. (See The Definitive Guide to Linux System Calls for info on linking and calling into the vDSO, and for more info on sysenter
, and everything else to do with system calls.)
x86-32 [Free|Open|Net|DragonFly]BSD UNIX System Call convention:
Parameters are passed on the stack. Push the parameters (last parameter pushed first) on to the stack. Then push an additional 32-bit of dummy data (Its not actually dummy data. refer to following link for more info) and then give a system call instruction int $0x80
http://www.int80h.org/bsdasm/#default-calling-convention
x86-64 Linux System Call convention:
(Note: x86-64 Mac OS X is similar but different from Linux. TODO: check what *BSD does)
Refer to section: "A.2 AMD64 Linux Kernel Conventions" of System V Application Binary Interface AMD64 Architecture Processor Supplement. The latest versions of the i386 and x86-64 System V psABIs can be found linked from this page in the ABI maintainer's repo. (See also the x86 tag wiki for up-to-date ABI links and lots of other good stuff about x86 asm.)
Here is the snippet from this section:
- User-level applications use as integer registers for passing the
sequence %rdi, %rsi, %rdx, %rcx,
%r8 and %r9. The kernel interface uses %rdi, %rsi, %rdx, %r10, %r8 and %r9.- A system-call is done via the
syscall
instruction. This clobbers %rcx and %r11 as well as the %rax return value, but other registers are preserved.- The number of the syscall has to be passed in register %rax.
- System-calls are limited to six arguments, no argument is passed
directly on the stack.- Returning from the syscall, register %rax contains the result of
the system-call. A value in the range between -4095 and -1 indicates
an error, it is-errno
.- Only values of class INTEGER or class MEMORY are passed to the kernel.
Remember this is from the Linux-specific appendix to the ABI, and even for Linux it's informative not normative. (But it is in fact accurate.)
This 32-bit int $0x80
ABI is usable in 64-bit code (but highly not recommended). What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? It still truncates its inputs to 32-bit, so it's unsuitable for pointers, and it zeros r8-r11.
User Interface: function calling
x86-32 Function Calling convention:
In x86-32 parameters were passed on stack. Last parameter was pushed first on to the stack until all parameters are done and then call
instruction was executed. This is used for calling C library (libc) functions on Linux from assembly.
Modern versions of the i386 System V ABI (used on Linux) require 16-byte alignment of %esp
before a call
, like the x86-64 System V ABI has always required. Callees are allowed to assume that and use SSE 16-byte loads/stores that fault on unaligned. But historically, Linux only required 4-byte stack alignment, so it took extra work to reserve naturally-aligned space even for an 8-byte double
or something.
Some other modern 32-bit systems still don't require more than 4 byte stack alignment.
x86-64 System V user-space Function Calling convention:
x86-64 System V passes args in registers, which is more efficient than i386 System V's stack args convention. It avoids the latency and extra instructions of storing args to memory (cache) and then loading them back again in the callee. This works well because there are more registers available, and is better for modern high-performance CPUs where latency and out-of-order execution matter. (The i386 ABI is very old).
In this new mechanism: First the parameters are divided into classes. The class of each parameter determines the manner in which it is passed to the called function.
For complete information refer to : "3.2 Function Calling Sequence" of System V Application Binary Interface AMD64 Architecture Processor Supplement which reads, in part:
Once arguments are classified, the registers get assigned (in
left-to-right order) for passing as follows:
- If the class is MEMORY, pass the argument on the stack.
- If the class is INTEGER, the next available register of the
sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used
So %rdi, %rsi, %rdx, %rcx, %r8 and %r9
are the registers in order used to pass integer/pointer (i.e. INTEGER class) parameters to any libc function from assembly. %rdi is used for the first INTEGER parameter. %rsi for 2nd, %rdx for 3rd and so on. Then call
instruction should be given. The stack (%rsp
) must be 16B-aligned when call
executes.
If there are more than 6 INTEGER parameters, the 7th INTEGER parameter and later are passed on the stack. (Caller pops, same as x86-32.)
The first 8 floating point args are passed in %xmm0-7, later on the stack. There are no call-preserved vector registers. (A function with a mix of FP and integer arguments can have more than 8 total register arguments.)
Variadic functions (like printf
) always need %al
= the number of FP register args.
There are rules for when to pack structs into registers (rdx:rax
on return) vs. in memory. See the ABI for details, and check compiler output to make sure your code agrees with compilers about how something should be passed/returned.
Note that the Windows x64 function calling convention has multiple significant differences from x86-64 System V, like shadow space that must be reserved by the caller (instead of a red-zone), and call-preserved xmm6-xmm15. And very different rules for which arg goes in which register.
Using ptrace to write a program supervisor in userspace
This looks like a good place to start.
http://www.linuxjournal.com/article/6100
What is the range for PTRACE_TRACEME?
See How Debuggers Work Part 1 for a better explanation, but in summary, no, it will not trace the do_something() function until the tracee receives a signal.
In the description of the ptrace call from that same man ptrace you quoted:
A process can initiate a trace by calling fork(2) and having
the resulting child do a
PTRACE_TRACEME, followed (typically) by an execve(2). Alternatively, one process may commence
tracing another process using PTRACE_ATTACH or PTRACE_SEIZE.While being traced, the tracee will stop each time a signal is delivered, even if the signal
is being ignored. (An exception is SIGKILL, which has its usual effect.) The tracer will be
notified at its next call to waitpid(2) (or one of the related "wait" system calls); that call
will return a status value containing information that indicates the cause of the stop in the
tracee.
When the tracee calls exec, it receives a signal, that's why it stops.
To illustrace, tracer program mainer.c with no bells or whistles:
//mainer.c
#include <stdio.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <sys/user.h>
#include <unistd.h>
int main(int argc, char ** argv)
{
pid_t child_pid;
char * programname = argv[1];
child_pid = fork();
if (child_pid == 0)
{
ptrace(PTRACE_TRACEME, 0, 0, 0);
execl(programname, programname, NULL);
}
else if (child_pid > 0)
{
int status;
wait(&status);
while (WIFSTOPPED(status))
{
struct user_regs_struct regs;
ptrace(PTRACE_GETREGS, child_pid, 0, ®s);
unsigned instr = ptrace(PTRACE_PEEKTEXT, child_pid, regs.eip, 0);
printf("EIP = 0x%08x, instr = 0x%08x\n", regs.eip, instr);
ptrace(PTRACE_SINGLESTEP, child_pid, 0, 0);
wait(&status);
}
}
}
When the tracee hits exec, it will receive a signal and pass control to the parent, who is waiting.
Assembly program to trace, there's just way too much going on when tracing a C program to extract anything useful:
; hello.asm
section .text
global _start
_start:
mov edx,len1
mov ecx,hello1
mov ebx,1
mov eax,4
int 0x80
mov edx,len2
mov ecx,hello2
mov ebx,1
mov eax,4
int 0x80
mov eax,1
int 0x80
section .data
hello1 db "Hello",0xA
len1 equ $ - hello1
hello2 db "World",0xA
len2 equ $ - hello2
Running this, ./mainer hello
EIP = 0x08048080, instr = 0x00000000
EIP = 0x08048085, instr = 0x00000000
EIP = 0x0804808a, instr = 0x00000000
EIP = 0x0804808f, instr = 0x00000000
EIP = 0x08048094, instr = 0x00000000
Hello
EIP = 0x08048096, instr = 0x00000000
EIP = 0x0804809b, instr = 0x00000000
EIP = 0x080480a0, instr = 0x00000000
EIP = 0x080480a5, instr = 0x00000000
EIP = 0x080480aa, instr = 0x00000000
World
EIP = 0x080480ac, instr = 0x00000000
EIP = 0x080480b1, instr = 0x00000000
If we modify mainer.c so the child process calls do_something() before it's exec the result of the trace is the exact same. This is just how I modified it, you can confirm yourself if you like that the results are the same.
#include <stdio.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <sys/user.h>
#include <unistd.h>
int do_something(void) //added this function
{
printf("Doing something");
return 0;
}
int main(int argc, char ** argv)
{
pid_t child_pid;
char * programname = argv[1];
child_pid = fork();
if (child_pid == 0)
{
ptrace(PTRACE_TRACEME, 0, 0, 0);
do_something(); //added this function call
execl(programname, programname, NULL);
}
else if (child_pid > 0)
{
int status;
wait(&status);
while (WIFSTOPPED(status))
{
struct user_regs_struct regs;
ptrace(PTRACE_GETREGS, child_pid, 0, ®s);
unsigned instr = ptrace(PTRACE_PEEKTEXT, child_pid, regs.eip, 0);
printf("EIP = 0x%08x, instr = 0x%08x\n", regs.eip, instr);
ptrace(PTRACE_SINGLESTEP, child_pid, 0, 0);
wait(&status);
}
}
}
So the tracee won't stop until it receives a signal, which is what happens when it calls exec, and calling functions doesn't generate a signal for the tracee, but there are other ways to send a signal to the tracee and begin tracing, although they are not as tidy as exec and wait.
Related Topics
Linux Desktop Shortcut and Icon from Install
Undefined Reference to Symbol 'Pthread_Key_Delete@@Glibc_2.2.5
Installing Gcc from Source on Alpine
Append to /Etc/Apt/Sources.List
Jenkins Failed to Start in Linux
Can a Pipe in Linux Ever Lose Data
Trying to Launch an External Editor from Within a Go Program
Why Linux Always Output "^C" Upon Pressing of Ctrl+C
Ld Does Not Link Opengl on Linux
Execute a Command in Another Terminal via /Dev/Pts
How to Connect to Amazon Linux Instance Using Remote Desktop from Windows 7
Embedded Linux - Mechanism for Deploying Firmware Updates
Starting Ddd with Remote Gdbserver
Apache Cgi in User Directory "End of Script Output Before Headers"
Executing Exe or Bat File on Remote Windows Machine from *Nix