How to Find Which Type of System Call Is Used by a Program

How to find which type of system call is used by a program

victory:~ # gcc getpid.c -o getpid -g
victory:~ # gdb getpid
<snip>
(gdb) break main
Breakpoint 1 at 0x400540: file getpid.c, line 4.
(gdb) run
Starting program: /root/getpid 

Breakpoint 1, main () at getpid.c:4
4     getpid();
(gdb) disassemble
Dump of assembler code for function main:
0x000000000040053c <main+0>:    push   %rbp
0x000000000040053d <main+1>:    mov    %rsp,%rbp
0x0000000000400540 <main+4>:    mov    $0x0,%eax
0x0000000000400545 <main+9>:    callq  0x400440 <getpid@plt>
0x000000000040054a <main+14>:   mov    $0x0,%eax
0x000000000040054f <main+19>:   leaveq 
0x0000000000400550 <main+20>:   retq   
End of assembler dump.

Looks like our call to getpid() is actually a library call. Let's set a breakpoint there and continue.

(gdb) break getpid
Breakpoint 2 at 0x7ffff7b29c00
(gdb) cont
Continuing.

Breakpoint 2, 0x00007ffff7b29c00 in getpid () from /lib64/libc.so.6
(gdb) disassemble
Dump of assembler code for function getpid:
0x00007ffff7b29c00 <getpid+0>:  mov    %fs:0x94,%edx
0x00007ffff7b29c08 <getpid+8>:  cmp    $0x0,%edx
0x00007ffff7b29c0b <getpid+11>: mov    %edx,%eax
0x00007ffff7b29c0d <getpid+13>: jle    0x7ffff7b29c11 <getpid+17>
0x00007ffff7b29c0f <getpid+15>: repz retq 
0x00007ffff7b29c11 <getpid+17>: jne    0x7ffff7b29c1f <getpid+31>
0x00007ffff7b29c13 <getpid+19>: mov    %fs:0x90,%eax
0x00007ffff7b29c1b <getpid+27>: test   %eax,%eax
0x00007ffff7b29c1d <getpid+29>: jne    0x7ffff7b29c0f <getpid+15>
0x00007ffff7b29c1f <getpid+31>: mov    $0x27,%eax
0x00007ffff7b29c24 <getpid+36>: syscall 
0x00007ffff7b29c26 <getpid+38>: test   %edx,%edx
0x00007ffff7b29c28 <getpid+40>: mov    %rax,%rsi
0x00007ffff7b29c2b <getpid+43>: jne    0x7ffff7b29c0f <getpid+15>
0x00007ffff7b29c2d <getpid+45>: mov    %esi,%fs:0x90
0x00007ffff7b29c35 <getpid+53>: mov    %esi,%eax
0x00007ffff7b29c37 <getpid+55>: retq   
End of assembler dump.

Buried in the getpid() library is the syscall assembler instruction. This is an AMD64 instruction that supports a fast context switch to ring0 for the purpose of system calls.

What determines which system calls a program makes?

What determines which system calls a program makes?

The source code of the program. The entire program including all the compiled sources that have been linked together. Including those that you didn't specify explicitly, most notably the the C system library where most system calls will directly originate. In case of Ubuntu, that would be the glibc. Of course, the other sources that call system library functions will indirectly affect the system calls.

Is it only the compiler

Not only the compiler of course, but it too is responsible for some of these indirect calls. For example, if you write new X, then the compiler generates a call to malloc whose implementation calls sbrk or mmap whose implementations do system calls.

What is the type of system call arguments on Linux?

There's no general solution. If you want to make your code ultra-multiarch you can just do something like that:

#if ARCH_WITH_32BIT_REGS
typedef uint32_t reg_size_int_t;
#elif ARCH_WITH_64BIT_REGS
typedef uint64_t reg_size_int_t;
#elif ARCH_WITH_16BIT_REGS
typedef uint16_t reg_size_int_t;
....
#endif

reg_size_int_t syscall_1( reg_size_t nr, reg_size_t arg0);
...

But for most common-used architectures size of register is equal to long.

How linux identify a particular file system to execute system call

This answer is based on kernel version 4.0. I traced out some of the code which handles a read syscall. I recommend you clone the Linux source repo and follow along in the source code.

Syscall handler for read, at fs/read_write.c:620 is called. It receives a file descriptor (integer) as an argument, and calls fdget_pos to convert it to a struct fd.
fdget_pos calls __fdget_pos calls __fdget calls __fget_light. __fget_light uses current->files, the file descriptor table for the current process, to look up the struct file which corresponds to the passed file descriptor number.
Back in the syscall handler, the file struct is passed to vfs_read, at fs/read_write.c:478.
vfs_read calls __vfs_read, which calls file->f_op->read. From here on, you are in filesystem-specific code.

So the VFS doesn't really bother "identifying" the filesystem which a file lives on; it simply uses the table of "file operation" function pointers which is stored in its struct file. When that struct file is initialized, it is given the correct f_op function pointer table which implements all the filesystem-specific operations for its filesystem.

How to see system call that executed in current time by process?

proc offers some information about what the kernel is currently doing "for" a process

/proc/${pid}/syscall /proc/${pid}/stack

More information:

http://man7.org/linux/man-pages/man5/proc.5.html
http://blog.tanelpoder.com/2013/02/21/peeking-into-linux-kernel-land-using-proc-filesystem-for-quickndirty-troubleshooting/

Accessing a system call directly from user program

The manpage for _syscall(2) states:

Starting around kernel 2.6.18, the _syscall macros were removed from header files supplied to user space. Use syscall(2) instead. (Some architectures, notably ia64, never provided the _syscall macros; on those architectures, syscall(2) was always required.)

Thus, your desired approach can't work on more modern kernels. (You can clearly see that if you run the preprocessor on your code. It won't resolve the _syscall0 macro) Try to use the syscall function instead:

Here is an example for the usage, cited from syscall(2):

#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/types.h>

int
main(int argc, char *argv[])
{
    pid_t tid;
    tid = syscall(SYS_gettid);
}

As you asked for a direct way to call the Linux kernel without any userspace wrappers, I'll show you examples for the 80386 and the amd64 architecture.

First, you have to get the system call number from a table, such as this one. In case of getpid, the system call number is 39 for amd64 and 20 for 80386. Next, we create a function that calls the system for us. On the 80386 processor you use the interrupt 128 to call the system, on amd64 we use the special syscall instruction. The system call number goes into register eax, the output is also written to this register. In order to make the program easier, we write it in assembly. You can later use strace to verify it works correctly.

This is the code for 80386. It should return the lowest byte of its pid as exit status.

        .global _start
_start: mov $20,%eax       #system call number 20:
        int $128           #call the system
        mov %eax,%ebx      #move pid into register ebx
        mov $1,%eax        #system call number 1: exit, argument in ebx
        int $128           #exit

Assemble with:

as -m32 -o 80386.o 80386.s
ld -m elf_i386 -o 80386 80386.o

This is the same code for amd64:

        .global _start
_start: mov $39,%eax    #system call 39: getpid
        syscall         #call the system
        mov %eax,%edi   #move pid into register edi
        mov $60,%eax    #system call 60: exit
        syscall         #call the system

Assemble with:

as -o amd64.o amd64.s
ld -o amd64 amd64.o

Can a system call happen in a C program?

System call background

A system call, according to Wikipedia, is a "programmatic way in which a computer program requests a service from the kernel of the operating system on which it is executed".

Another way of understanding a system call is as a user space program making a request to the operating system kernel to perform some task on behalf of the user space program. The full set of system calls provided by the kernel is analogous (in some ways) to an API provided by the kernel to user space.

As system calls are a low level interface to the kernel, correctly providing their arguments can be error prone or even dangerous. For these reasons, C library authors provide simpler and safer wrapper functions for a significant portion of a kernel's set of system calls.

These wrapper functions take a simplified argument set and then derive the appropriate values to pass on to the kernel so the system call can be executed.

Example

Note: This example is based on compiling and running a C program with gcc on Linux. The system calls, library functions, and output may differ on other POSIX or non-POSIX operating systems.

I will attempt to show how to see when system calls are being made with a simple example.

#include <stdio.h>

int main() {
    write(1, "Hello world!\n", 13);
}

Above we have a very simple C program that writes the string Hello world!\n to stdout. If we compile and then execute this program with strace, we see the following (note the output may look different on other computers):

$ strace ./hello > /dev/null
execve("./hello", ["./hello"], 0x7fff083a0630 /* 58 vars */) = 0
<a bunch of output we aren't interested in>
write(1, "Hello world!\n", 13)          = 13
exit_group(0)                           = ?
+++ exited with 0 +++

strace is a Linux program that intercepts and displays all system calls made by a program, as well as the arguments provided to the system calls and their return values.

We can see here that, as expected, the write system call was made with the expected arguments. Nothing strange yet.

Another Linux tracing program is ltrace, which intercepts dynamic library calls made by a program, and displays their arguments and return values.

If we run the same program with ltrace, we see this:

$ ltrace ./hello > /dev/null
write(1, "Hello world!\n", 13)                                = 13
+++ exited (status 0) +++

This tells us that the write library function was executed. This means that the C code first called the write library function, which then in turn called the write system call.

Suppose now that we want to explicitly make a write system call without calling the write library function. (This is inadvisable in normal use, but useful for illustration.)

Here is the new code:

#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

int main() {
    syscall(SYS_write, 1, "Hello world!\n", 13);
}

Here we directly call the syscall library function, telling it we want to execute the write system call.

After recompiling, here is the output of strace:

$ strace ./hello > /dev/null 
execve("./hello", ["./hello"], 0x7ffe3790a660 /* 58 vars */) = 0
<a bunch of output we aren't interested in>
write(1, "Hello world!\n", 13)          = 13
exit_group(0)                           = ?
+++ exited with 0 +++

We can see the write system call is made as before as expected.

If we run ltrace we see the following:

$ ltrace ./hello > /dev/null 
syscall(1, 1, 0x560b30e4d704, 13)                             = 13
+++ exited (status 0) +++

So the write library function is no longer being called, but we are still making a library function call. Now we are making a call to the syscall library function instead of the write library function.

There may be a way to directly make a system call from a user space C program without calling any library functions, and if there is a way I believe it would be very advanced.

Detecting when a C program makes system calls

In general, nearly every non-trivial C program makes at least one system call. This is because user space does not have direct access to kernel memory or to the computer's hardware. User space programs have indirect access to kernel memory and the hardware through system calls.

To identify if a compiled C program (or any other program on Linux) makes a system call, and to identify which system calls it makes, simply use strace.

Are there compiler options to prevent calling the library wrapper functions for system calls?

You can compile your C program (assuming you are using gcc) with the -nostdlib option. This will prevent linking the C standard library as part of producing your executable. However, then you would need to write your own code to make system calls.

How to Find Which Type of System Call Is Used by a Program