System Calls Implementation

How does the Implementation of System Calls and Interrupts differ from each other?

On most systems, interrupts and system calls (and exception handlers) are implemented in the same way.

As soon the Program is executed, the system call informs the kernel of the request - What exactly happens here in terms of low level programming?

Usually, system calls are wrappers around assembly language routines. The sequence of events is:

  1. Call to System Routine
  2. System Routine unpacks parameters and loads them into registers.
  3. System Routine forces an exception (identified by a number) by executing a change mode instruction (to some mode higher than user mode).
  4. The CPU handles the exception by dispatching to an exception handler in the system dispatch table.
  5. The handler performs the system service.
  6. The handler executes a return from exception or interrupt instruction, returning the process to user mode (or whatever mode was called from) and to the system service routine.
  7. The system service routine unpacks the return values from registers and updates the parameters.
  8. Return to the calling function.

Can an Interrupt be a System Call or vice versa?

No. They are dispatched in the same way.

Presumably an operating system could map system calls and interrupts to the same handler but that would be screwy.

When implementing a system call, how do you expose the system call number to userland?

Well, I have a partial answer. Partial because it is Debian specific.

If you use the make deb-pkg target in the kernel sources, then .deb packages are created in the parent directory. If you then install these, then your headers get installed into the system.

After doing this for my kernel described above:

$ grep krun /usr/include
/usr/include/asm/unistd_64.h:#define __NR_krun_read_msrs 317
/usr/include/asm/unistd_64.h:#define __NR_krun_reset_msrs 318

how to implement my own system call in Linux kernel 4.x?

You may edit your glibc to add wrapper around your syscall. Something like it is in the syscalls.list file in glibc/sysdeps/unix (search for your platform)
https://github.com/lattera/glibc/blob/master/sysdeps/unix/syscalls.list
https://github.com/lattera/glibc/blob/master/sysdeps/unix/sysv/linux/x86_64/syscalls.list

# File name Caller  Syscall name    Args    Strong name Weak names

accept - accept Ci:iBN __libc_accept accept
access - access i:si __access access
close - close Ci:i __libc_close __close close
open - open Ci:siv __libc_open __open open
read - read Ci:ibn __libc_read __read read
uname - uname i:p __uname uname
write - write Ci:ibn __libc_write __write write

To decode this format, use "comments in the script which processes this file: sysdeps/unix/make-syscalls.sh.", as it was recommended in https://blog.packagecloud.io/eng/2016/04/05/the-definitive-guide-to-linux-system-calls/

# This script is used to process the syscall data encoded in the various
# syscalls.list files to produce thin assembly syscall wrappers around the
# appropriate OS syscall. See syscall-template.s for more details on the
# actual wrapper.
#
# Syscall Signature Prefixes:
#
# E: errno and return value are not set by the call
# V: errno is not set, but errno or zero (success) is returned from the call
#
# Syscall Signature Key Letters:
#
# a: unchecked address (e.g., 1st arg to mmap)
# b: non-NULL buffer (e.g., 2nd arg to read; return value from mmap)
# B: optionally-NULL buffer (e.g., 4th arg to getsockopt)
# f: buffer of 2 ints (e.g., 4th arg to socketpair)
# F: 3rd arg to fcntl
# i: scalar (any signedness & size: int, long, long long, enum, whatever)
# I: 3rd arg to ioctl
# n: scalar buffer length (e.g., 3rd arg to read)
# N: pointer to value/return scalar buffer length (e.g., 6th arg to recvfrom)
# p: non-NULL pointer to typed object (e.g., any non-void* arg)
# P: optionally-NULL pointer to typed object (e.g., 2nd argument to gettimeofday)
# s: non-NULL string (e.g., 1st arg to open)
# S: optionally-NULL string (e.g., 1st arg to acct)
# v: vararg scalar (e.g., optional 3rd arg to open)
# V: byte-per-page vector (3rd arg to mincore)
# W: wait status, optionally-NULL pointer to int (e.g., 2nd arg of wait4)

More information about glibc's syscall wrapper at official site: https://sourceware.org/glibc/wiki/SyscallWrappers

There are three types of OS kernel system call wrappers that are used by glibc: assembly, macro, and bespoke.

Assembly syscalls
Simple kernel system calls in glibc are translated from a list of names into an assembly wrapper that is then compiled. ... The list of syscalls that use wrappers is kept in the syscalls.list files: ... ./sysdeps/unix/sysv/linux/x86_64/syscalls.list

Don't forget to define __NR number in linux headers for your syscall

There are instructions from kernel.org, the only linux kernel developer portal, or in Documentation/adding-syscalls.* files inside linux kernel sources:
https://www.kernel.org/doc/html/v4.10/process/adding-syscalls.html
https://github.com/torvalds/linux/blob/master/Documentation/process/adding-syscalls.rst

The method will be different for other OS like FreeBSD: https://wiki.freebsd.org/AddingSyscalls

Simple System Call Implementation example?

This depends on which architecture you want to add a system call for, or if you want to add the system call for all architectures. I will explain one way to add a system call for ARM.

  1. Pick a name for your syscall. For example, mysyscall.
  2. Choose a syscall number. In arch/arm/include/asm/unistd.h, take note of how each syscall has a specific number (__NR__SYSCALL_BASE+<number>) assigned to it. Choose an unused number for your syscall. Let us choose syscall number 223. Then add:

    #define __NR_mysyscall (__NR_SYSCALL_BASE+223

    where the index 223 would be in that header file. This assigns the number 223 to your syscall on ARM architectures.

  3. Modify architecture-specific syscall table. In linux/arch/arm/kernel/calls.S, change the line that corresponds to syscall 223 to:

    CALL(sys_mysyscall)

  4. Add your function prototype. Suppose you wanted to add a non-architecture-specific syscall. Edit the file: include/linux/syscalls.h and add your syscall's prototype:

    asmlinkage long sys_mysyscall(struct dummy_struct *buf);

    If you wanted to add it specifically for ARM, then do the following except in this file: arch/arm/kernel/sys_arm.c.

  5. Implement your syscall somewhere. Create a file whereever you please. For example, in the kernel/ directory. You need to at least have:

#include <linux/syscalls.h>
...
SYSCALL_DEFINE1(mysyscall, struct dummy_struct __user *, buf)
{
/* Implement your syscall */
}

Note the macro, SYSCALL_DEFINE1. The number at the end should correspond to how many input parameters your syscall has. In this case, our system call only has 1 parameter, so you use SYSCALL_DEFINE1. If it had two parameters, you would use SYSCALL_DEFINE2, etc.

Don't forget to add the object (.o) file to the Makefile in the directory where you put it.


  1. Compile your new kernel and test. You haven't modified your C libraries, so you cannot invoke your syscall with mysyscall(). You need to use the syscall() function which takes a system call number as its first argument:
struct dummy_struct *buf = calloc(1, sizeof(buf));   
int res = syscall(223, buf);

Do note that this was for ARM. The process will be very similar for other architectures.

Edit: Don't forget to add your syscall file to the Makefile in kernel/.

linux system call implementation

A system call is mostly implemented inside the Linux kernel, with a tiny glue code in the C standard library. But see also vdso(7).

From the user-land point of view, a system call (they are listed in syscalls(2)...) is a single machine instruction (often SYSENTER) with some calling conventions (e.g. defining which machine register hold the syscall number - e.g. __NR_stat from /usr/include/asm/unistd_64.h....-, and which other registers contain the arguments to the system call).

Use strace(1) to understand which system calls are done by a given program or process.

The C standard library has a tiny wrapper function (which invokes the kernel, following the ABI, and deals with error reporting & errno).

For stat(2), the C wrapping function is e.g. in stat/stat.c for musl-libc.

Inside the kernel code, most of the work happens in fs/stat.c (e.g. after line 207).

See also this & that answers

Linux Kernel system call implementation with struct parameter

Place a header containing the new struct in include/uapi/linux.
Avoid namespace pollution by using the appropriate types e.g. __u16 instead of unsigned short/uint16_t, __kernel_time_t instead of time_t ...etc. Check out struct mii_ioctl_data for an example.

By adding a header-y += new_header.h entry to include/uapi/linux/Kbuild, you can then export the header with make headers_install.

By default, it installs the headers in ./usr. If you want it to install them as system headers, use make headers_install INSTALL_HDR_PATH=/usr instead. This results in the contents of the uapi directory being merged into /usr/include. You may then #include <linux/new_header.h> in your userspace program.



Related Topics



Leave a reply



Submit