Can Ptrace Tell If an X86 System Call Used the 64-Bit or 32-Bit Abi

Can ptrace tell if an x86 system call used the 64-bit or 32-bit ABI?

Interesting, I hadn't realized that there wasn't an obvious smarter way that strace could use to correctly decode int 0x80 from 64-bit processes. (This is being worked on, see this answer for links to a proposed kernel patch to add PTRACE_GET_SYSCALL_INFO to the ptrace API. strace 4.26 already supports it on patched kernels.)

Update: now supports per-syscall detection IDK which mainline kernel version added the feature. I tested on Arch Linux with kernel version 5.5 and strace version 5.5.

e.g. this NASM source assembled into a static executable:

mov eax, 4
int 0x80
mov eax, 60
syscall

gives this trace: nasm -felf64 foo.asm && ld foo.o && strace ./a.out

execve("./foo", ["./foo"], 0x7ffcdc233180 /* 51 vars */) = 0
strace: [ Process PID=1262249 runs in 32 bit mode. ]
write(0, NULL, 0) = 0
strace: [ Process PID=1262249 runs in 64 bit mode. ]
exit(0) = ?
+++ exited with 0 +++

strace prints a message every time a system call uses a different ABI bitness than previously. Note that the message about runs in 32 bit mode is completely wrong; it's merely using the 32-bit ABI from 64-bit mode. "Mode" has a specific technical meaning for x86-64, and this is not it.


With older kernels

As a workaround, I think you could disassemble the code at RIP and check whether it was the syscall instruction (0F 05) or not, because ptrace does let you read the target process's memory.

But for a security use-case like disallowing some system calls, this would be vulnerable to a race condition: another thread in the syscall process could rewrite the syscall bytes to int 0x80 after they execute, but before you can peek at them with ptrace.


You only need to do that if the process is running in 64-bit mode, otherwise only the 32-bit ABI is available. If it's not, you don't need to check. (The vdso page can potentially use 32-bit mode syscall on AMD CPUs that support it but not sysenter. Not checking in the first place for 32-bit processes avoids this corner case.) I think you're saying you have a reliable way to detect that at least.

(I haven't used the ptrace API directly, just the tools like strace that use it. So I hope this answer makes sense.)

Why does ptrace show a 32-bit execve system call having EAX = 59, the 64-bit call number? How do 32-bit system calls work on x86-64?

execve is special; it's the only one that has special interaction with PTRACE_TRACEME. The way strace works, other system calls do show the 32-bit call number. (And modern strace needs special help to know whether that's a 32-bit call number for int 0x80 / sysenter, or a 64-bit call number, since 64-bit processes can still invoke int 0x80, although they normally shouldn't. This support was only added in 2019, with PTRACE_GET_SYSCALL_INFO)


You're right, when the kernel is actually invoked, EAX holds 11, __NR_execve from unistd_32.h. It's set by mov $0xb,%eax before glibc's execve wrapper jumps to the VDSO page to enter the kernel via whatever efficient method is supported on this hardware (normally sysenter.)

But execution doesn't actually stop until it reaches some code in the main execve implementation that checks for PTRACE_TRACEME and raises SIGTRAP.

Apparently sometime before that happens, it calls void set_personality_64bit(void) in arch/x86/kernel/process_64.c, which includes

    /* Pretend that this comes from a 64bit execve */
task_pt_regs(current)->orig_ax = __NR_execve;

I found that by searching for __NR_execve in a kernel source browser, and looking at the most likely file in arch/x86. I didn't keep cross-referencing to find where that's called from; the fact that it exists (and the assumption of a sane non-obfuscated design) points very strongly to this being the answer to your mystery.

Intercept only syscall with PTRACE_SINGLESTEP

If you cannot use PTRACE_SYSCALL to stop the child right before/after a syscall, then you will have to manually detect when one is about to happen. I doubt that checking the source code of strace would help, since strace is most likely using PTRACE_SYSCALL, no reason to manually decode instructions.

Assuming you are working on x86-64, here's how you can do it:

  1. Using PTRACE_SINGLESTEP, keep stepping one instruction at a time.
  2. Each instruction, use PTRACE_PEEKTEXT to fetch the next instruction pointed by the instruction pointer.
  3. Check if the instruction is a syscall instruction by comparing the bytes with the opcode of syscall, which is two bytes: 0x0f 0x05. Since x86 is little endian, this means checking whether the return value of ptrace(PTRACE_PEEKDATA, ...) has the two least significant bytes set to 0x050f.

NOTE: If you are on another architecture, or if you also want to detect 32-bit syscalls, you can simply check for different/more values on step 3. On Linux x86-64, there are multiple ways to issue a syscall, with different opcodes. For example, 32-bit syscalls on Linux are done through int 0x80 (opcode 0xcd 0x80). Check this other answer of mine for a list.

Here's an example:

#include <errno.h>

long opcode;

// ...

waitpid(child, &status, 0);

while (WIFSTOPPED(status)) {
ptrace(PTRACE_GETREGS, child, NULL, ®s);

errno = 0;
opcode = ptrace(PTRACE_PEEKTEXT, child, regs.rip, 0);
if (opcode == -1 && errno != 0) {
perror("ptrace(PTRACE_PEEK_DATA) failed");
exit(1);
}

if (((unsigned long)opcode & 0xffff) == 0x050f) {
// Child about to execute a syscall instruction,
// check the registers to know more...
}

ptrace(PTRACE_SINGLESTEP, child, 0, 0);
waitpid(child, &status, 0);
}


Related Topics



Leave a reply



Submit