How to Trace The Write System Call in The Linux Kernel

How to correctly intercept system calls in the Linux kernel 5.*?

There is no correct way to do this.

LSM (Linux Security Modules) doesn't support system calls interception,
with LSMs you need the implemented some of the functions listed at lsm_hooks_defs.h.

There are two alternatives ways to intercept system calls which I'm aware of:

Hook the sys_call_table which can be obtained, and overwrite the pointers with your new function:

unsigned long *sys_call_table_ptr = kallsyms_lookup_name("sys_call_table");
unsigned long cr0 = read_cr0();
write_cr0(cr0 & ~x86_CR0_WP);
sys_call_table_ptr[__NR_getpid] = new_getpid;
write_cr0(cr0);

Using kprobe: Syscall functions name expands with the prefix __do_sys_ (see __SYSCALL_DEFINEx).
For example, kprobe on __do_sys_finit_module (or any other syscall you want) as follow:

static struct kprobe kp = {
    .symbol_name    = "__do_sys_finit_module",
};

static int handler_pre(struct kprobe *p, struct pt_regs *regs) {
    // do your logic
    // obtain function arguments using register (calling convetion)
}

static int __init kprobe_init(void)
{
    kp.pre_handler = handler_pre;
    ret = register_kprobe(&kp);
    if (ret < 0) {
        printk(KERN_INFO "register_kprobe failed, returned %d\n", ret);
        return ret;
    }
    printk(KERN_INFO "Planted kprobe at %p\n", kp.addr);
    return 0;
}

static void __exit kprobe_exit(void)
{
    unregister_kprobe(&kp);
    printk(KERN_INFO "kprobe at %p unregistered\n", kp.addr);
}

How linux identify a particular file system to execute system call

This answer is based on kernel version 4.0. I traced out some of the code which handles a read syscall. I recommend you clone the Linux source repo and follow along in the source code.

Syscall handler for read, at fs/read_write.c:620 is called. It receives a file descriptor (integer) as an argument, and calls fdget_pos to convert it to a struct fd.
fdget_pos calls __fdget_pos calls __fdget calls __fget_light. __fget_light uses current->files, the file descriptor table for the current process, to look up the struct file which corresponds to the passed file descriptor number.
Back in the syscall handler, the file struct is passed to vfs_read, at fs/read_write.c:478.
vfs_read calls __vfs_read, which calls file->f_op->read. From here on, you are in filesystem-specific code.

So the VFS doesn't really bother "identifying" the filesystem which a file lives on; it simply uses the table of "file operation" function pointers which is stored in its struct file. When that struct file is initialized, it is given the correct f_op function pointer table which implements all the filesystem-specific operations for its filesystem.

using system call in Linux kernel file

System call look like wrapper around other kernel function one of ways how to use syscall inside kernel is find sub function for exact system call. For example:

int open(const char *pathname, int flags, mode_t mode); -> filp_open

////////////////////////////////////////////////////////////////////////////////////////////////
struct file* file_open(const char* path, int flags, int rights)
{
    struct file* filp = NULL;
    mm_segment_t oldfs;
    int err = 0;

    oldfs = get_fs();
    set_fs(get_ds());
    filp = filp_open(path, flags, rights);
    set_fs(oldfs);

    if(IS_ERR(filp)) {
        err = PTR_ERR(filp);
        return NULL;
    }

    return filp;
}

ssize_t write(int fd, const void *buf, size_t count); -> vfs_write

////////////////////////////////////////////////////////////////////////////////////////////////
int file_write(struct file* file, unsigned long long offset, unsigned char* data, unsigned int size)
{
    mm_segment_t oldfs;
    int ret;

    oldfs = get_fs();
    set_fs(get_ds());

    ret = vfs_write(file, data, size, &offset);

    set_fs(oldfs);
    return ret;
}

How can I use ftrace to get the in-kernel call graph called by a system call?

First you need to get the function name right - e.g. the function name to use for tracing open syscalls is sys_open.

To do this the "proper" way, it's necessary to have function_graph support in the kernel. On the x86 architecture this depends on CC_OPTIMIZE_FOR_SIZE being disabled, but on x86_64 it doesn't.

In my case I didn't bother to compile a custom kernel to disable CC_OPTIMIZE_FOR_SIZE, I just did

trace-cmd record -p function --func-stack

and included various functions that looked like they might be called by adding several -l options. This was enough to figure out what I wanted to know.

How to Trace The Write System Call in The Linux Kernel

How to correctly intercept system calls in the Linux kernel 5.*?

How linux identify a particular file system to execute system call

using system call in Linux kernel file

How can I use ftrace to get the in-kernel call graph called by a system call?

Related Topics

Leave a reply