How to find which type of system call is used by a program
victory:~ # gcc getpid.c -o getpid -g
victory:~ # gdb getpid
<snip>
(gdb) break main
Breakpoint 1 at 0x400540: file getpid.c, line 4.
(gdb) run
Starting program: /root/getpid
Breakpoint 1, main () at getpid.c:4
4 getpid();
(gdb) disassemble
Dump of assembler code for function main:
0x000000000040053c <main+0>: push %rbp
0x000000000040053d <main+1>: mov %rsp,%rbp
0x0000000000400540 <main+4>: mov $0x0,%eax
0x0000000000400545 <main+9>: callq 0x400440 <getpid@plt>
0x000000000040054a <main+14>: mov $0x0,%eax
0x000000000040054f <main+19>: leaveq
0x0000000000400550 <main+20>: retq
End of assembler dump.
Looks like our call to getpid() is actually a library call. Let's set a breakpoint there and continue.
(gdb) break getpid
Breakpoint 2 at 0x7ffff7b29c00
(gdb) cont
Continuing.
Breakpoint 2, 0x00007ffff7b29c00 in getpid () from /lib64/libc.so.6
(gdb) disassemble
Dump of assembler code for function getpid:
0x00007ffff7b29c00 <getpid+0>: mov %fs:0x94,%edx
0x00007ffff7b29c08 <getpid+8>: cmp $0x0,%edx
0x00007ffff7b29c0b <getpid+11>: mov %edx,%eax
0x00007ffff7b29c0d <getpid+13>: jle 0x7ffff7b29c11 <getpid+17>
0x00007ffff7b29c0f <getpid+15>: repz retq
0x00007ffff7b29c11 <getpid+17>: jne 0x7ffff7b29c1f <getpid+31>
0x00007ffff7b29c13 <getpid+19>: mov %fs:0x90,%eax
0x00007ffff7b29c1b <getpid+27>: test %eax,%eax
0x00007ffff7b29c1d <getpid+29>: jne 0x7ffff7b29c0f <getpid+15>
0x00007ffff7b29c1f <getpid+31>: mov $0x27,%eax
0x00007ffff7b29c24 <getpid+36>: syscall
0x00007ffff7b29c26 <getpid+38>: test %edx,%edx
0x00007ffff7b29c28 <getpid+40>: mov %rax,%rsi
0x00007ffff7b29c2b <getpid+43>: jne 0x7ffff7b29c0f <getpid+15>
0x00007ffff7b29c2d <getpid+45>: mov %esi,%fs:0x90
0x00007ffff7b29c35 <getpid+53>: mov %esi,%eax
0x00007ffff7b29c37 <getpid+55>: retq
End of assembler dump.
Buried in the getpid() library is the syscall assembler instruction. This is an AMD64 instruction that supports a fast context switch to ring0 for the purpose of system calls.
What determines which system calls a program makes?
What determines which system calls a program makes?
The source code of the program. The entire program including all the compiled sources that have been linked together. Including those that you didn't specify explicitly, most notably the the C system library where most system calls will directly originate. In case of Ubuntu, that would be the glibc. Of course, the other sources that call system library functions will indirectly affect the system calls.
Is it only the compiler
Not only the compiler of course, but it too is responsible for some of these indirect calls. For example, if you write new X
, then the compiler generates a call to malloc
whose implementation calls sbrk
or mmap
whose implementations do system calls.
What is the type of system call arguments on Linux?
There's no general solution. If you want to make your code ultra-multiarch you can just do something like that:
#if ARCH_WITH_32BIT_REGS
typedef uint32_t reg_size_int_t;
#elif ARCH_WITH_64BIT_REGS
typedef uint64_t reg_size_int_t;
#elif ARCH_WITH_16BIT_REGS
typedef uint16_t reg_size_int_t;
....
#endif
reg_size_int_t syscall_1( reg_size_t nr, reg_size_t arg0);
...
But for most common-used architectures size of register is equal to long.
How linux identify a particular file system to execute system call
This answer is based on kernel version 4.0. I traced out some of the code which handles a read
syscall. I recommend you clone the Linux source repo and follow along in the source code.
- Syscall handler for
read
, atfs/read_write.c:620
is called. It receives a file descriptor (integer) as an argument, and callsfdget_pos
to convert it to astruct fd
. fdget_pos
calls__fdget_pos
calls__fdget
calls__fget_light
.__fget_light
usescurrent->files
, the file descriptor table for the current process, to look up thestruct file
which corresponds to the passed file descriptor number.- Back in the syscall handler, the file struct is passed to
vfs_read
, atfs/read_write.c:478
. vfs_read
calls__vfs_read
, which callsfile->f_op->read
. From here on, you are in filesystem-specific code.
So the VFS doesn't really bother "identifying" the filesystem which a file lives on; it simply uses the table of "file operation" function pointers which is stored in its struct file
. When that struct file
is initialized, it is given the correct f_op
function pointer table which implements all the filesystem-specific operations for its filesystem.
How to see system call that executed in current time by process?
proc
offers some information about what the kernel is currently doing "for" a process
/proc/${pid}/syscall
/proc/${pid}/stack
More information:
- http://man7.org/linux/man-pages/man5/proc.5.html
- http://blog.tanelpoder.com/2013/02/21/peeking-into-linux-kernel-land-using-proc-filesystem-for-quickndirty-troubleshooting/
Accessing a system call directly from user program
The manpage for _syscall(2)
states:
Starting around kernel 2.6.18, the _syscall macros were removed from header files supplied to user space. Use syscall(2) instead. (Some architectures, notably ia64, never provided the _syscall macros; on those architectures, syscall(2) was always required.)
Thus, your desired approach can't work on more modern kernels. (You can clearly see that if you run the preprocessor on your code. It won't resolve the _syscall0
macro) Try to use the syscall
function instead:
Here is an example for the usage, cited from syscall(2)
:
#define _GNU_SOURCE
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/types.h>
int
main(int argc, char *argv[])
{
pid_t tid;
tid = syscall(SYS_gettid);
}
As you asked for a direct way to call the Linux kernel without any userspace wrappers, I'll show you examples for the 80386 and the amd64 architecture.
First, you have to get the system call number from a table, such as this one. In case of getpid
, the system call number is 39 for amd64 and 20 for 80386. Next, we create a function that calls the system for us. On the 80386 processor you use the interrupt 128 to call the system, on amd64 we use the special syscall
instruction. The system call number goes into register eax, the output is also written to this register. In order to make the program easier, we write it in assembly. You can later use strace to verify it works correctly.
This is the code for 80386. It should return the lowest byte of its pid as exit status.
.global _start
_start: mov $20,%eax #system call number 20:
int $128 #call the system
mov %eax,%ebx #move pid into register ebx
mov $1,%eax #system call number 1: exit, argument in ebx
int $128 #exit
Assemble with:
as -m32 -o 80386.o 80386.s
ld -m elf_i386 -o 80386 80386.o
This is the same code for amd64:
.global _start
_start: mov $39,%eax #system call 39: getpid
syscall #call the system
mov %eax,%edi #move pid into register edi
mov $60,%eax #system call 60: exit
syscall #call the system
Assemble with:
as -o amd64.o amd64.s
ld -o amd64 amd64.o
Can a system call happen in a C program?
System call background
A system call, according to Wikipedia, is a "programmatic way in which a computer program requests a service from the kernel of the operating system on which it is executed".
Another way of understanding a system call is as a user space program making a request to the operating system kernel to perform some task on behalf of the user space program. The full set of system calls provided by the kernel is analogous (in some ways) to an API provided by the kernel to user space.
As system calls are a low level interface to the kernel, correctly providing their arguments can be error prone or even dangerous. For these reasons, C library authors provide simpler and safer wrapper functions for a significant portion of a kernel's set of system calls.
These wrapper functions take a simplified argument set and then derive the appropriate values to pass on to the kernel so the system call can be executed.
Example
Note: This example is based on compiling and running a C program with gcc
on Linux. The system calls, library functions, and output may differ on other POSIX or non-POSIX operating systems.
I will attempt to show how to see when system calls are being made with a simple example.
#include <stdio.h>
int main() {
write(1, "Hello world!\n", 13);
}
Above we have a very simple C program that writes the string Hello world!\n
to stdout
. If we compile and then execute this program with strace
, we see the following (note the output may look different on other computers):
$ strace ./hello > /dev/null
execve("./hello", ["./hello"], 0x7fff083a0630 /* 58 vars */) = 0
<a bunch of output we aren't interested in>
write(1, "Hello world!\n", 13) = 13
exit_group(0) = ?
+++ exited with 0 +++
strace
is a Linux program that intercepts and displays all system calls made by a program, as well as the arguments provided to the system calls and their return values.
We can see here that, as expected, the write
system call was made with the expected arguments. Nothing strange yet.
Another Linux tracing program is ltrace
, which intercepts dynamic library calls made by a program, and displays their arguments and return values.
If we run the same program with ltrace
, we see this:
$ ltrace ./hello > /dev/null
write(1, "Hello world!\n", 13) = 13
+++ exited (status 0) +++
This tells us that the write
library function was executed. This means that the C code first called the write
library function, which then in turn called the write
system call.
Suppose now that we want to explicitly make a write
system call without calling the write
library function. (This is inadvisable in normal use, but useful for illustration.)
Here is the new code:
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
int main() {
syscall(SYS_write, 1, "Hello world!\n", 13);
}
Here we directly call the syscall
library function, telling it we want to execute the write
system call.
After recompiling, here is the output of strace
:
$ strace ./hello > /dev/null
execve("./hello", ["./hello"], 0x7ffe3790a660 /* 58 vars */) = 0
<a bunch of output we aren't interested in>
write(1, "Hello world!\n", 13) = 13
exit_group(0) = ?
+++ exited with 0 +++
We can see the write
system call is made as before as expected.
If we run ltrace
we see the following:
$ ltrace ./hello > /dev/null
syscall(1, 1, 0x560b30e4d704, 13) = 13
+++ exited (status 0) +++
So the write
library function is no longer being called, but we are still making a library function call. Now we are making a call to the syscall
library function instead of the write
library function.
There may be a way to directly make a system call from a user space C program without calling any library functions, and if there is a way I believe it would be very advanced.
Detecting when a C program makes system calls
In general, nearly every non-trivial C program makes at least one system call. This is because user space does not have direct access to kernel memory or to the computer's hardware. User space programs have indirect access to kernel memory and the hardware through system calls.
To identify if a compiled C program (or any other program on Linux) makes a system call, and to identify which system calls it makes, simply use strace
.
Are there compiler options to prevent calling the library wrapper functions for system calls?
You can compile your C program (assuming you are using gcc
) with the -nostdlib
option. This will prevent linking the C standard library as part of producing your executable. However, then you would need to write your own code to make system calls.
Related Topics
Syntax Error Near Unexpected Token 'Do' When Run with Sudo
Scp Command Between 2 Servers with 2 Different .Pem Keys
Using Flycheck/Flymake on Kernel Source Tree
Put Command Output into String
Simulate Effect of Select() and Poll() in Kernel Socket Programming
Replace Forward Slash with Double Backslash Enclosed in Double Quotes
Linux Bash Script Get User Input and Store in a Array
Xargs and Find, Rm Complaining About \N (Newline) in Filename
Exploiting a String-Based Overflow on X86-64 with Nx (Dep) and Aslr Enabled
How to Write a Bash Script That Cuts Images into Pieces Using Image Magick
Sdl Configuration in Eclipse Ide
Docker Installation on Linux Mint 19.2 Doesn't Work
Docker Create Two Bridges That Corrupts My Internet Access
Tickless Kernel, Isolcpus,Nohz_Full,And Rcu_Nocbs