linux system call implementation
A system call is mostly implemented inside the Linux kernel, with a tiny glue code in the C standard library. But see also vdso(7).
From the user-land point of view, a system call (they are listed in syscalls(2)...) is a single machine instruction (often SYSENTER
) with some calling conventions (e.g. defining which machine register hold the syscall number - e.g. __NR_stat
from /usr/include/asm/unistd_64.h
....-, and which other registers contain the arguments to the system call).
Use strace(1) to understand which system calls are done by a given program or process.
The C standard library has a tiny wrapper function (which invokes the kernel, following the ABI, and deals with error reporting & errno
).
For stat(2), the C wrapping function is e.g. in stat/stat.c for musl-libc.
Inside the kernel code, most of the work happens in fs/stat.c (e.g. after line 207).
See also this & that answers
how to implement my own system call in Linux kernel 4.x?
You may edit your glibc to add wrapper around your syscall. Something like it is in the syscalls.list file in glibc/sysdeps/unix (search for your platform)
https://github.com/lattera/glibc/blob/master/sysdeps/unix/syscalls.list
https://github.com/lattera/glibc/blob/master/sysdeps/unix/sysv/linux/x86_64/syscalls.list
# File name Caller Syscall name Args Strong name Weak names
accept - accept Ci:iBN __libc_accept accept
access - access i:si __access access
close - close Ci:i __libc_close __close close
open - open Ci:siv __libc_open __open open
read - read Ci:ibn __libc_read __read read
uname - uname i:p __uname uname
write - write Ci:ibn __libc_write __write write
To decode this format, use "comments in the script which processes this file: sysdeps/unix/make-syscalls.sh.", as it was recommended in https://blog.packagecloud.io/eng/2016/04/05/the-definitive-guide-to-linux-system-calls/
# This script is used to process the syscall data encoded in the various
# syscalls.list files to produce thin assembly syscall wrappers around the
# appropriate OS syscall. See syscall-template.s for more details on the
# actual wrapper.
#
# Syscall Signature Prefixes:
#
# E: errno and return value are not set by the call
# V: errno is not set, but errno or zero (success) is returned from the call
#
# Syscall Signature Key Letters:
#
# a: unchecked address (e.g., 1st arg to mmap)
# b: non-NULL buffer (e.g., 2nd arg to read; return value from mmap)
# B: optionally-NULL buffer (e.g., 4th arg to getsockopt)
# f: buffer of 2 ints (e.g., 4th arg to socketpair)
# F: 3rd arg to fcntl
# i: scalar (any signedness & size: int, long, long long, enum, whatever)
# I: 3rd arg to ioctl
# n: scalar buffer length (e.g., 3rd arg to read)
# N: pointer to value/return scalar buffer length (e.g., 6th arg to recvfrom)
# p: non-NULL pointer to typed object (e.g., any non-void* arg)
# P: optionally-NULL pointer to typed object (e.g., 2nd argument to gettimeofday)
# s: non-NULL string (e.g., 1st arg to open)
# S: optionally-NULL string (e.g., 1st arg to acct)
# v: vararg scalar (e.g., optional 3rd arg to open)
# V: byte-per-page vector (3rd arg to mincore)
# W: wait status, optionally-NULL pointer to int (e.g., 2nd arg of wait4)
More information about glibc's syscall wrapper at official site: https://sourceware.org/glibc/wiki/SyscallWrappers
There are three types of OS kernel system call wrappers that are used by glibc: assembly, macro, and bespoke.
Assembly syscalls
Simple kernel system calls in glibc are translated from a list of names into an assembly wrapper that is then compiled. ... The list of syscalls that use wrappers is kept in the syscalls.list files: ... ./sysdeps/unix/sysv/linux/x86_64/syscalls.list
Don't forget to define __NR number in linux headers for your syscall
There are instructions from kernel.org, the only linux kernel developer portal, or in Documentation/adding-syscalls.* files inside linux kernel sources:
https://www.kernel.org/doc/html/v4.10/process/adding-syscalls.html
https://github.com/torvalds/linux/blob/master/Documentation/process/adding-syscalls.rst
The method will be different for other OS like FreeBSD: https://wiki.freebsd.org/AddingSyscalls
Where is SYSCALL() implemented in Linux?
syscall
is a wrapper that actually loads the register and executes the instruction syscall
on 64 bit x86 or int 80h
or sysenter
on 32 bit x86 and it is part of the standard library.
example:
syscall:
endbr64
mov rax,rdi
mov rdi,rsi
mov rsi,rdx
mov rdx,rcx
mov r10,r8
mov r8,r9
mov r9,QWORD PTR [rsp+0x8]
syscall
So the answer is that that syscall
function is in the glibc.
In the kernel in the assembly file the syscall,sysentry instruction entry or int 80h interrupt handler (depending on the system implementation) does some stack magic, performs some checks and then calls the function which will handle the particular system call
. Addresses of those functions are placed in the special table containing function pointers. But this part is very hard to be called the "library".
Simple System Call Implementation example?
This depends on which architecture you want to add a system call for, or if you want to add the system call for all architectures. I will explain one way to add a system call for ARM.
- Pick a name for your syscall. For example,
mysyscall
. Choose a syscall number. In
arch/arm/include/asm/unistd.h
, take note of how each syscall has a specific number (__NR__SYSCALL_BASE+<number>
) assigned to it. Choose an unused number for your syscall. Let us choose syscall number 223. Then add:#define __NR_mysyscall (__NR_SYSCALL_BASE+223
where the index 223 would be in that header file. This assigns the number 223 to your syscall on ARM architectures.
Modify architecture-specific syscall table. In
linux/arch/arm/kernel/calls.S
, change the line that corresponds to syscall 223 to:CALL(sys_mysyscall)
Add your function prototype. Suppose you wanted to add a non-architecture-specific syscall. Edit the file:
include/linux/syscalls.h
and add your syscall's prototype:asmlinkage long sys_mysyscall(struct dummy_struct *buf);
If you wanted to add it specifically for ARM, then do the following except in this file:
arch/arm/kernel/sys_arm.c
.Implement your syscall somewhere. Create a file whereever you please. For example, in the
kernel/
directory. You need to at least have:
#include <linux/syscalls.h>
...
SYSCALL_DEFINE1(mysyscall, struct dummy_struct __user *, buf)
{
/* Implement your syscall */
}
Note the macro, SYSCALL_DEFINE1
. The number at the end should correspond to how many input parameters your syscall has. In this case, our system call only has 1 parameter, so you use SYSCALL_DEFINE1
. If it had two parameters, you would use SYSCALL_DEFINE2
, etc.
Don't forget to add the object (.o) file to the Makefile in the directory where you put it.
- Compile your new kernel and test. You haven't modified your C libraries, so you cannot invoke your syscall with
mysyscall()
. You need to use thesyscall()
function which takes a system call number as its first argument:
struct dummy_struct *buf = calloc(1, sizeof(buf));
int res = syscall(223, buf);
Do note that this was for ARM. The process will be very similar for other architectures.
Edit: Don't forget to add your syscall file to the Makefile in kernel/.
How does a system call work
In short, here's how a system call works:
- First, the user application program sets up the arguments for the system call.
- After the arguments are all set up, the program executes the "system call" instruction.
This instruction causes an exception: an event that causes the processor to jump to a new address and start executing the code there.
The instructions at the new address save your user program's state, figure out what system call you want, call the function in the kernel that implements that system call, restores your user program state, and returns control back to the user program.
A visual explanation of a user application invoking the open()
system call:
It should be noted that the system call interface (it serves as the link to system calls made available by the operating system) invokes intended system call in OS kernel and returns status of the system call and any return values. The caller need know nothing about how the system call is implemented or what it does during execution.
Another example: A C program invoking printf()
library call, which calls write()
system call
For more detailed explanation read section 1.5.1 in CH-1 and Section 2.3 in CH-2 from Operating System Concepts.
How does a syscall actually happen on linux?
Assuming we're talking about x86:
- The ID of the system call is deposited into the EAX register
- Any arguments required by the system call are deposited into the locations dictated by the system call. For example, some system calls expect their argument to reside in the EBX register. Others may expect their argument to be sitting on the top of the stack.
- An
INT 0x80
interrupt is invoked. - The Linux kernel services the system call identified by the ID in the EAX register, depositing any results in pre-determined locations.
- The calling code makes use of any results.
I may be a bit rusty at this, it's been a few years...
When implementing a system call, how do you expose the system call number to userland?
Well, I have a partial answer. Partial because it is Debian specific.
If you use the make deb-pkg
target in the kernel sources, then .deb
packages are created in the parent directory. If you then install these, then your headers get installed into the system.
After doing this for my kernel described above:
$ grep krun /usr/include
/usr/include/asm/unistd_64.h:#define __NR_krun_read_msrs 317
/usr/include/asm/unistd_64.h:#define __NR_krun_reset_msrs 318
Related Topics
How to Find Out Line-Endings in a Text File
Makefiles With Source Files in Different Directories
How to Build & Install Glfw 3 and Use It in a Linux Project
How to Run a Perl Script as a System Daemon in Linux
How to Get Terminal'S Character Encoding
Avoid Gnome-Terminal Close After Script Execution
How to Make an Executable Elf File in Linux Using a Hex Editor
How to Instruct Cron to Execute a Job Every Second Week
How to Run a Program With a Different Working Directory from Current, from Linux Shell
How to Use the Lines of a File as Arguments of a Command
How to Open a New Tab in Gnome Terminal from Command Line
Bash Function to Find Newest File Matching Pattern
Error Starting Eclipse in Linux: "Jvm Terminated. Exit Code=13"
Curl Command to Repeat Url Request
How to Use Sudo to Redirect Output to a Location I Don't Have Permission to Write To
Get Exit Code of a Background Process