Creating a System Call in Linux

Creating a System Call in Linux

but does this mean I'll have to recompile the whole kernel, so the kernel knows about my module?

Yes, you will need to recompile the kernel.

  • Implementing Linux System Calls

adding a simple system call to linux kernel

Your code is executed in kernel context whilst the buffer with data comes from the userspace. If you need to process some string from the userspace, copy it to the kernel memory using strncpy_from_user() function. If you don't follow the scheme and simply try to access the data directly, this will lead to a memory access violation.

A better solution (based on your code) would look somewhat like this:

asmlinkage long sys_hello(char* name) {
long nb_symbols;
char *name_internal;
long i;

/*
* Estimate the buffer length sufficient
* to accommodate the string
*/
for (i = 1; ; ++i) {
nb_symbols = strnlen_user(name, i);
if (nb_symbols <= 0)
return -EFAULT;

if (nb_symbols < i)
break;
}

/* Allocate the storage */
name_internal = kmalloc(nb_symbols + 1, GFP_KERNEL);
if (name_internal == NULL)
return -ENOMEM;

if (strncpy_from_user(name_internal, name, nb_symbols + 1) !=
nb_symbols) {
kfree(name_internal);
return -EFAULT;
}

printk("The 'name' is '%s'\n", name_internal);
kfree(name_internal);

return 0;
}

However, please note that such a loop (as one in my example) might not be an acceptable solution for buffer length estimation. Ideally, you could drop it and use a static char array of fixed length to use strncpy_from_user() with.

creat System Call in Unix

creat only creates a file if it doesn't exist. If it already exists, it's just truncated.

creat(filename, mode);

is equivalent to

open(filename, O_WRONLY|O_CREAT|O_TRUNC, mode);

And as specified in the open(2) documentation:

O_CREAT
If the file does not exist it will be created.

how to implement my own system call in Linux kernel 4.x?

You may edit your glibc to add wrapper around your syscall. Something like it is in the syscalls.list file in glibc/sysdeps/unix (search for your platform)
https://github.com/lattera/glibc/blob/master/sysdeps/unix/syscalls.list
https://github.com/lattera/glibc/blob/master/sysdeps/unix/sysv/linux/x86_64/syscalls.list

# File name Caller  Syscall name    Args    Strong name Weak names

accept - accept Ci:iBN __libc_accept accept
access - access i:si __access access
close - close Ci:i __libc_close __close close
open - open Ci:siv __libc_open __open open
read - read Ci:ibn __libc_read __read read
uname - uname i:p __uname uname
write - write Ci:ibn __libc_write __write write

To decode this format, use "comments in the script which processes this file: sysdeps/unix/make-syscalls.sh.", as it was recommended in https://blog.packagecloud.io/eng/2016/04/05/the-definitive-guide-to-linux-system-calls/

# This script is used to process the syscall data encoded in the various
# syscalls.list files to produce thin assembly syscall wrappers around the
# appropriate OS syscall. See syscall-template.s for more details on the
# actual wrapper.
#
# Syscall Signature Prefixes:
#
# E: errno and return value are not set by the call
# V: errno is not set, but errno or zero (success) is returned from the call
#
# Syscall Signature Key Letters:
#
# a: unchecked address (e.g., 1st arg to mmap)
# b: non-NULL buffer (e.g., 2nd arg to read; return value from mmap)
# B: optionally-NULL buffer (e.g., 4th arg to getsockopt)
# f: buffer of 2 ints (e.g., 4th arg to socketpair)
# F: 3rd arg to fcntl
# i: scalar (any signedness & size: int, long, long long, enum, whatever)
# I: 3rd arg to ioctl
# n: scalar buffer length (e.g., 3rd arg to read)
# N: pointer to value/return scalar buffer length (e.g., 6th arg to recvfrom)
# p: non-NULL pointer to typed object (e.g., any non-void* arg)
# P: optionally-NULL pointer to typed object (e.g., 2nd argument to gettimeofday)
# s: non-NULL string (e.g., 1st arg to open)
# S: optionally-NULL string (e.g., 1st arg to acct)
# v: vararg scalar (e.g., optional 3rd arg to open)
# V: byte-per-page vector (3rd arg to mincore)
# W: wait status, optionally-NULL pointer to int (e.g., 2nd arg of wait4)

More information about glibc's syscall wrapper at official site: https://sourceware.org/glibc/wiki/SyscallWrappers

There are three types of OS kernel system call wrappers that are used by glibc: assembly, macro, and bespoke.

Assembly syscalls
Simple kernel system calls in glibc are translated from a list of names into an assembly wrapper that is then compiled. ... The list of syscalls that use wrappers is kept in the syscalls.list files: ... ./sysdeps/unix/sysv/linux/x86_64/syscalls.list

Don't forget to define __NR number in linux headers for your syscall

There are instructions from kernel.org, the only linux kernel developer portal, or in Documentation/adding-syscalls.* files inside linux kernel sources:
https://www.kernel.org/doc/html/v4.10/process/adding-syscalls.html
https://github.com/torvalds/linux/blob/master/Documentation/process/adding-syscalls.rst

The method will be different for other OS like FreeBSD: https://wiki.freebsd.org/AddingSyscalls

How to write system calls on debian/ubuntu

This is just example how to write a simple kernel system call.
Consider the following C function system_strcpy() that simply copies one string into another: similar to what strcpy() does.

#include<stdio.h>

long system_strcpy(char* dest, const char* src)
{
int i=0;
while(src[i]!=0)
dest[i]=src[i++];

dest[i]=0;
return i;
}

Before writing, get a kernel source tar and untar it to get a linux-x.x.x directory.

File 1: linux-x.x.x/test/system_strcpy.c
Create a directory within the linux-x.x.x, named test and save this code as file system_strcpy.c in it.

#include<linux/linkage.h>
#include<linux/kernel.h>
asmlinkage long system_strcpy(char*dest, const char* src)
{
int i=0;
while(src[i]!=0)
dest[i]=src[i++];

dest[i]=0;
return i;
}

File 2: linux-x.x.x/test/Makefile
Create a Makefile within the same test directory you created above and put this line in it:

obj-y := system_strcpy.o

File 3: linux-x.x.x/arch/x86/kernel/syscall_table_32.S
Now, you have to add your system call to the system call table.
Append to the file the following line:

.long system_strcpy

NOTE: For Kernel 3.3 and higher versions.

*Refer:linux-3.3.xx/arch/x86/syscalls/syscall_64.tbl*

And in there, now add at the end of the following series of lines:

310 64 process_vm_readv sys_process_vm_readv

311 64 process_vm_writev sys_process_vm_writev

312 64 kcmp sys_kcmp

313 64 system_strcpy system_strcpy

The format for the 3.3 version is in:
number abi name entry point

File 4: linux-x.x.x/arch/x86/include/asm/unistd_32.h

NOTE: This section is redundant for 3.3 and higher kernel versions

In this file, the names of all the system calls will be associated with a unique number. After the last system call-number pair, add a line

#define __NR_system_strcpy 338

(if 337 was the number associated with the last system call in the system call-number pair).

Then replace NR_syscalls value, stating total number of system calls with (the existing number incremented by 1) i.e. in this case the NR_syscalls should've been 338 and the new value is 339.

#define NR_syscalls 339

File 5: linux-x.x.x/include/linux/syscalls.h

Append to the file the prototype of our function.

asmlinkage long system_strcpy(char *dest,char *src);

just before the #endif line in the file.

File 6: Makefile at the root of source directory.

Open Makefile and find the line where core-y is defined and add the directory test to the end of that line.

core-y += kernel/ mm/ fs/ test/

Now compile the kernel. Issue:
make bzImage -j4

Install the kernel by executing the following command as root(or with root permissions):
make install

Reboot the system.

To use the recently created system call use:

syscall(338,dest,src); (or syscall(313,dest,src); for kernel 3.3+) instead of the regular strcpy library function.

#include "unistd.h"
#include "sys/syscall.h"
int main()
{
char *dest=NULL,*src="Hello";
dest=(char*)malloc(strlen(src)+1);
syscall(338,dest,src);//syscall(313,dest,src); for kernel 3.3+
printf("%s \n %s\n",src,dest);
return 0;
}

Instead of numbers like 313,etc in syscall, you can also directly use __NR_system_strcpy

This is a generic example. You will need to do a little experimentation to see what works for your specific kernel version.

How to define a system call in Linux with non-default return type?

Returning long is the only proper way for a system call in Linux.

There is no way for a system call to return a value, which size differs from the size of long.

Do you expect ssize_t to have the same size as long on all platforms? If yes (this is a correct expectation), then there is no reason to prefer ssize_t over long. If you are not sure that your return type will fit to long on every platform, then you simply cannot use this return type.

For example, from the C standard you knows, that read function returns ssize_t. But read system call has long as return type (it is defined using DEFINE_SYSCALL macro):

SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
{
struct fd f = fdget(fd);
ssize_t ret = -EBADF;

if (f.file) {
loff_t pos = file_pos_read(f.file);
ret = vfs_read(f.file, buf, count, &pos);
file_pos_write(f.file, pos);
fdput(f);
}
return ret;
}

Note, that despite on long being return type of the system call, the above implementation returns a value of type ssize_t.

Adding new System Call to Linux Kernel 3.13 on 64 bit system

The problem was from step 6 to last step (Compile Kernel).

After step 5, we have to do following steps :

6- Compiling this kernel on my system

To configure the kernel I tried the following command :

# make menuconfig

After above command a pop up window came up and I made sure that ext4 was selected and then save.

Then to create DEB file from new kernel we have to :

# make -j 5 KDEB_PKGVERSION=1.arbitrary-name deb-pkg

It will create some deb files in /usr/src/.

After that we need to install them :

# dpkg -i linux*.deb

It will install new kernel on your system.

Now, reboot your system. After system rebooted you can find out whether new kernel is installed or not :

$ uname -r

And if you want to know your new System Call added to kernel or not just type :

$ cat /proc/kallsyms | grep <system call name>

In my case :

$ cat /proc/kallsyms | grep hello

Following output indicates that your System Call successfully added to the Kernel :

0000000000000000 T sys_hello

Difference between System call and System call service routines

System call is something abstract. It's a way of communication between user-space to kernel, by passing specific data to specific registers specific to a platform.

Kernel is a program. Linux Kernel is written in C programming language.

Here, "system call service routine" is the name of the function in C programming language that handles a specific system call.

So user space program calls system call number __NR_execve. On x86 architecture, the user space program places the number 59 in eax register and then executes instruction int 0x80. Specific system call handling routines are executed in the kernel, which cause the function sys_execve() in kernel to be executed.

What is __NR_exevce ?

A macro in C programming language that provides platform-agnostic access to the system call number associated with execve system call. It allows writing portable across all Linux platforms programs written in C programming language that automatically during compilation choose the correct platform specific system call number. On arm __NR_execve is 11, but on x86 it is 59, etc.. See https://chromium.googlesource.com/chromiumos/docs/+/master/constants/syscalls.md#Cross_arch-Numbers

Where I can find sys_call_table vector

In the kernel sources. Search for it on elixir. https://elixir.bootlin.com/linux/latest/ident/sys_call_table

Create system call function

I'm smelling a XY problem; what is the actual problem you're trying to solve?

Why the heck do you want to create a new system call for that? Just open the directory, enumerate all its entries and filter out those, that are not directory inodes. The canonical way to do this is to use the opendir function. https://linux.die.net/man/3/opendir


Also keep in mind that if you're writing code that's supposed to run inside the kernel, be aware that from inside the kernel, the ususal file system mechanisms are difficult to reach. The reason for that is, that filesystems spawn namespaces, which are depending on the task context; the only robust way to access files from within the kernel, is to have a userspace process open them and then hand the file descriptor to some kernel code. But this is strongly discouraged.



Related Topics



Leave a reply



Submit