Creating a System Call in Linux
but does this mean I'll have to recompile the whole kernel, so the kernel knows about my module?
Yes, you will need to recompile the kernel.
- Implementing Linux System Calls
adding a simple system call to linux kernel
Your code is executed in kernel context whilst the buffer with data comes from the userspace. If you need to process some string from the userspace, copy it to the kernel memory using strncpy_from_user() function. If you don't follow the scheme and simply try to access the data directly, this will lead to a memory access violation.
A better solution (based on your code) would look somewhat like this:
asmlinkage long sys_hello(char* name) {
long nb_symbols;
char *name_internal;
long i;
/*
* Estimate the buffer length sufficient
* to accommodate the string
*/
for (i = 1; ; ++i) {
nb_symbols = strnlen_user(name, i);
if (nb_symbols <= 0)
return -EFAULT;
if (nb_symbols < i)
break;
}
/* Allocate the storage */
name_internal = kmalloc(nb_symbols + 1, GFP_KERNEL);
if (name_internal == NULL)
return -ENOMEM;
if (strncpy_from_user(name_internal, name, nb_symbols + 1) !=
nb_symbols) {
kfree(name_internal);
return -EFAULT;
}
printk("The 'name' is '%s'\n", name_internal);
kfree(name_internal);
return 0;
}
However, please note that such a loop (as one in my example) might not be an acceptable solution for buffer length estimation. Ideally, you could drop it and use a static char
array of fixed length to use strncpy_from_user()
with.
creat System Call in Unix
creat
only creates a file if it doesn't exist. If it already exists, it's just truncated.
creat(filename, mode);
is equivalent to
open(filename, O_WRONLY|O_CREAT|O_TRUNC, mode);
And as specified in the open(2)
documentation:
O_CREAT
If the file does not exist it will be created.
how to implement my own system call in Linux kernel 4.x?
You may edit your glibc to add wrapper around your syscall. Something like it is in the syscalls.list file in glibc/sysdeps/unix (search for your platform)
https://github.com/lattera/glibc/blob/master/sysdeps/unix/syscalls.list
https://github.com/lattera/glibc/blob/master/sysdeps/unix/sysv/linux/x86_64/syscalls.list
# File name Caller Syscall name Args Strong name Weak names
accept - accept Ci:iBN __libc_accept accept
access - access i:si __access access
close - close Ci:i __libc_close __close close
open - open Ci:siv __libc_open __open open
read - read Ci:ibn __libc_read __read read
uname - uname i:p __uname uname
write - write Ci:ibn __libc_write __write write
To decode this format, use "comments in the script which processes this file: sysdeps/unix/make-syscalls.sh.", as it was recommended in https://blog.packagecloud.io/eng/2016/04/05/the-definitive-guide-to-linux-system-calls/
# This script is used to process the syscall data encoded in the various
# syscalls.list files to produce thin assembly syscall wrappers around the
# appropriate OS syscall. See syscall-template.s for more details on the
# actual wrapper.
#
# Syscall Signature Prefixes:
#
# E: errno and return value are not set by the call
# V: errno is not set, but errno or zero (success) is returned from the call
#
# Syscall Signature Key Letters:
#
# a: unchecked address (e.g., 1st arg to mmap)
# b: non-NULL buffer (e.g., 2nd arg to read; return value from mmap)
# B: optionally-NULL buffer (e.g., 4th arg to getsockopt)
# f: buffer of 2 ints (e.g., 4th arg to socketpair)
# F: 3rd arg to fcntl
# i: scalar (any signedness & size: int, long, long long, enum, whatever)
# I: 3rd arg to ioctl
# n: scalar buffer length (e.g., 3rd arg to read)
# N: pointer to value/return scalar buffer length (e.g., 6th arg to recvfrom)
# p: non-NULL pointer to typed object (e.g., any non-void* arg)
# P: optionally-NULL pointer to typed object (e.g., 2nd argument to gettimeofday)
# s: non-NULL string (e.g., 1st arg to open)
# S: optionally-NULL string (e.g., 1st arg to acct)
# v: vararg scalar (e.g., optional 3rd arg to open)
# V: byte-per-page vector (3rd arg to mincore)
# W: wait status, optionally-NULL pointer to int (e.g., 2nd arg of wait4)
More information about glibc's syscall wrapper at official site: https://sourceware.org/glibc/wiki/SyscallWrappers
There are three types of OS kernel system call wrappers that are used by glibc: assembly, macro, and bespoke.
Assembly syscalls
Simple kernel system calls in glibc are translated from a list of names into an assembly wrapper that is then compiled. ... The list of syscalls that use wrappers is kept in the syscalls.list files: ... ./sysdeps/unix/sysv/linux/x86_64/syscalls.list
Don't forget to define __NR number in linux headers for your syscall
There are instructions from kernel.org, the only linux kernel developer portal, or in Documentation/adding-syscalls.* files inside linux kernel sources:
https://www.kernel.org/doc/html/v4.10/process/adding-syscalls.html
https://github.com/torvalds/linux/blob/master/Documentation/process/adding-syscalls.rst
The method will be different for other OS like FreeBSD: https://wiki.freebsd.org/AddingSyscalls
How to write system calls on debian/ubuntu
This is just example how to write a simple kernel system call.
Consider the following C function system_strcpy() that simply copies one string into another: similar to what strcpy() does.
#include<stdio.h>
long system_strcpy(char* dest, const char* src)
{
int i=0;
while(src[i]!=0)
dest[i]=src[i++];
dest[i]=0;
return i;
}
Before writing, get a kernel source tar and untar it to get a linux-x.x.x directory.
File 1: linux-x.x.x/test/system_strcpy.c
Create a directory within the linux-x.x.x, named test
and save this code as file system_strcpy.c
in it.
#include<linux/linkage.h>
#include<linux/kernel.h>
asmlinkage long system_strcpy(char*dest, const char* src)
{
int i=0;
while(src[i]!=0)
dest[i]=src[i++];
dest[i]=0;
return i;
}
File 2: linux-x.x.x/test/Makefile
Create a Makefile
within the same test
directory you created above and put this line in it:
obj-y := system_strcpy.o
File 3: linux-x.x.x/arch/x86/kernel/syscall_table_32.S
Now, you have to add your system call to the system call table.
Append to the file the following line:
.long system_strcpy
NOTE: For Kernel 3.3 and higher versions.
*Refer:linux-3.3.xx/arch/x86/syscalls/syscall_64.tbl*
And in there, now add at the end of the following series of lines:
310 64 process_vm_readv sys_process_vm_readv
311 64 process_vm_writev sys_process_vm_writev
312 64 kcmp sys_kcmp
313 64 system_strcpy system_strcpy
The format for the 3.3 version is in:number
abi
name
entry point
File 4: linux-x.x.x/arch/x86/include/asm/unistd_32.h
NOTE: This section is redundant for 3.3 and higher kernel versions
In this file, the names of all the system calls will be associated with a unique number. After the last system call-number pair, add a line
#define __NR_system_strcpy 338
(if 337 was the number associated with the last system call in the system call-number pair).
Then replace NR_syscalls
value, stating total number of system calls with (the existing number incremented by 1) i.e. in this case the NR_syscalls
should've been 338 and the new value is 339.
#define NR_syscalls 339
File 5: linux-x.x.x/include/linux/syscalls.h
Append to the file the prototype of our function.
asmlinkage long system_strcpy(char *dest,char *src);
just before the #endif
line in the file.
File 6: Makefile at the root of source directory.
Open Makefile
and find the line where core-y
is defined and add the directory test
to the end of that line.
core-y += kernel/ mm/ fs/ test/
Now compile the kernel. Issue:make bzImage -j4
Install the kernel by executing the following command as root(or with root permissions):make install
Reboot the system.
To use the recently created system call use:
syscall(338,dest,src);
(or syscall(313,dest,src);
for kernel 3.3+) instead of the regular strcpy
library function.
#include "unistd.h"
#include "sys/syscall.h"
int main()
{
char *dest=NULL,*src="Hello";
dest=(char*)malloc(strlen(src)+1);
syscall(338,dest,src);//syscall(313,dest,src); for kernel 3.3+
printf("%s \n %s\n",src,dest);
return 0;
}
Instead of numbers like 313,etc in syscall
, you can also directly use __NR_system_strcpy
This is a generic example. You will need to do a little experimentation to see what works for your specific kernel version.
How to define a system call in Linux with non-default return type?
Returning long
is the only proper way for a system call in Linux.
There is no way for a system call to return a value, which size differs from the size of long
.
Do you expect ssize_t
to have the same size as long
on all platforms? If yes (this is a correct expectation), then there is no reason to prefer ssize_t
over long
. If you are not sure that your return type will fit to long
on every platform, then you simply cannot use this return type.
For example, from the C standard you knows, that read function returns ssize_t
. But read
system call has long
as return type (it is defined using DEFINE_SYSCALL
macro):
SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
{
struct fd f = fdget(fd);
ssize_t ret = -EBADF;
if (f.file) {
loff_t pos = file_pos_read(f.file);
ret = vfs_read(f.file, buf, count, &pos);
file_pos_write(f.file, pos);
fdput(f);
}
return ret;
}
Note, that despite on long
being return type of the system call, the above implementation returns a value of type ssize_t
.
Adding new System Call to Linux Kernel 3.13 on 64 bit system
The problem was from step 6 to last step (Compile Kernel).
After step 5, we have to do following steps :
6- Compiling this kernel on my system
To configure the kernel I tried the following command :
# make menuconfig
After above command a pop up window came up and I made sure that ext4 was selected and then save.
Then to create DEB
file from new kernel we have to :
# make -j 5 KDEB_PKGVERSION=1.arbitrary-name deb-pkg
It will create some deb
files in /usr/src/
.
After that we need to install them :
# dpkg -i linux*.deb
It will install new kernel on your system.
Now, reboot your system. After system rebooted you can find out whether new kernel is installed or not :
$ uname -r
And if you want to know your new System Call added to kernel or not just type :
$ cat /proc/kallsyms | grep <system call name>
In my case :
$ cat /proc/kallsyms | grep hello
Following output indicates that your System Call successfully added to the Kernel :
0000000000000000 T sys_hello
Difference between System call and System call service routines
System call is something abstract. It's a way of communication between user-space to kernel, by passing specific data to specific registers specific to a platform.
Kernel is a program. Linux Kernel is written in C programming language.
Here, "system call service routine" is the name of the function in C programming language that handles a specific system call.
So user space program calls system call number __NR_execve
. On x86 architecture, the user space program places the number 59 in eax register and then executes instruction int 0x80
. Specific system call handling routines are executed in the kernel, which cause the function sys_execve()
in kernel to be executed.
What is __NR_exevce ?
A macro in C programming language that provides platform-agnostic access to the system call number associated with execve
system call. It allows writing portable across all Linux platforms programs written in C programming language that automatically during compilation choose the correct platform specific system call number. On arm __NR_execve
is 11, but on x86 it is 59, etc.. See https://chromium.googlesource.com/chromiumos/docs/+/master/constants/syscalls.md#Cross_arch-Numbers
Where I can find sys_call_table vector
In the kernel sources. Search for it on elixir. https://elixir.bootlin.com/linux/latest/ident/sys_call_table
Create system call function
I'm smelling a XY problem; what is the actual problem you're trying to solve?
Why the heck do you want to create a new system call for that? Just open the directory, enumerate all its entries and filter out those, that are not directory inodes. The canonical way to do this is to use the opendir
function. https://linux.die.net/man/3/opendir
Also keep in mind that if you're writing code that's supposed to run inside the kernel, be aware that from inside the kernel, the ususal file system mechanisms are difficult to reach. The reason for that is, that filesystems spawn namespaces, which are depending on the task context; the only robust way to access files from within the kernel, is to have a userspace process open them and then hand the file descriptor to some kernel code. But this is strongly discouraged.
Related Topics
Define Alias That References Other Aliases
Is Timer Interrupt Independent of Whether System Is in Kernel Mode or User Mode
Why Does This Code Crash with Address Randomization On
Crontab Day of the Week Syntax
Shellscript to Monitor a Log File If Keyword Triggers Then Execute a Command
Exclude List of Files from Find
Convert Bash 'Ls' Output to JSON Array
Cmake_Prefix_Path Doesn't Help Cmake in Finding Qt5
How to Insert New Line in the Email Using Linux Mail Command
Identify Other End of a Unix Domain Socket Connection
Language-Agnostic Properly-Tabbing Code Editors for Linux
Tilde Expansion in Environment Variable
Linux Cross-Compilation for Arm Architecture
Remove Line of Text from Multiple Files in Linux