Can Ebpf Modify the Return Value or Parameters of a Syscall

Can eBPF modify the return value or parameters of a syscall?

I believe that attaching eBPF to kprobes/kretprobes gives you read access to function arguments and return values, but that you cannot tamper with them. I am NOT 100% sure; good places to ask for confirmation would be the IO Visor project mailing list or IRC channel (#iovisor at irc.oftc.net).

As an alternative solution, I know you can at least change the return value of a syscall with strace, with the -e option. Quoting the manual page:

-e inject=set[:error=errno|:retval=value][:signal=sig][:when=expr]
       Perform syscall tampering for the specified set of syscalls.

Also, there was a presentation on this, and fault injection, at Fosdem 2017, if it is of any interest to you. Here is one example command from the slides:

strace -P precious.txt -efault=unlink:retval=0 unlink precious.txt

Edit: As stated by Ben, eBPF on kprobes and tracepoints is definitively read only, for tracing and monitoring use cases. I also got confirmation about this on IRC.

ebpf: intercepting function calls

No, kprobes BPF programs have only read access to the syscall parameters and return value, they cannot modify registers and therefore cannot intercept function calls. This is a limitation imposed by the BPF verifier.

Kernel modules, however, can intercept function calls using kprobes.

BPF_KPROBE macro provides unexpected value of the syscall argument

Solved by @anakryiko here

That __x64_sys_close() actually has only one input parameter, and that's struct pt_regs *, which contains all the syscall input arguments. So you have to do something like this to get access to input arguments:

SEC("kprobe/__x64_sys_close")
int BPF_KPROBE(do_sys_close, struct pt_regs *regs)
{
        pid_t pid;
        int fd;

        fd = PT_REGS_PARM1_CORE(regs);

        pid = bpf_get_current_pid_tgid() >> 32;
        bpf_printk("KPROBE ENTRY pid = %d, fd = %d\n", pid, fd);
        return 0;
}

It might be a good idea to add syscall-specific kprobe/kretprobe macros, as this is a common gotcha. Added libbpf/libbpf#425 to keep track of that.

why ebpf can ensure safe than lkm

Yes, as far as I know, the BPF verifier is meant to prevent any sort of kernel crash. That however doesn't mean you can't break things unintentionally in production. You could for example freeze your system by attaching BPF programs to all kernel functions or lose all connectivity by dropping all received packets. In those cases, the verifier has no way to know that you didn't mean to perform those actions; it won't stop you.

That being said, any sort of verification is better than no verification as in traditional kernel modules. With kernel modules, not only can you shoot yourself in the foot as I've described above, but you could also crash the whole system because of a subtle bug somewhere in the code.

Regardless of what you're using, you should obviously test it extensively before deploying to production.

eBPF: raw_tracepoint arguments

I think I worked it out, based on this article.

The ctx of a raw_tracepoint program is struct bpf_raw_tracepoint_args. Which is defined in bpf.h as

struct bpf_raw_tracepoint_args {
    __u64 args[0];
};

So basically just an array of numbers/pointers. The meaning of these arguments are depend on how the tracepoint prototype is defined. When looking at the source code where the tracepoint is defined we find:

TRACE_EVENT_FN(sys_enter,

    TP_PROTO(struct pt_regs *regs, long id),

    TP_ARGS(regs, id),

    TP_STRUCT__entry(
        __field(    long,       id      )
        __array(    unsigned long,  args,   6   )
    ),

    TP_fast_assign(
        __entry->id = id;
        syscall_get_arguments(current, regs, __entry->args);
    ),

    TP_printk("NR %ld (%lx, %lx, %lx, %lx, %lx, %lx)",
          __entry->id,
          __entry->args[0], __entry->args[1], __entry->args[2],
          __entry->args[3], __entry->args[4], __entry->args[5]),

    syscall_regfunc, syscall_unregfunc
);

Lets focus on TP_PROTO(struct pt_regs *regs, long id), this means that args[0] is struct pt_regs *regs and args[1] is long id. struct pt_regs is a copy of the CPU registers at the time sys_enter was called. id is the ID of the syscall.

We can get to the arguments of the syscall by extracting them from the CPU registers. The System V ABI specifies which parameters should be present in which CPU registers. To make our lives easier, libbpf defines PT_REGS_PARM{1..5} macros in bpf_tracing.h

So, if believe this should be a correct program:

SEC("raw_tracepoint/sys_enter")
int main_entry_raw(struct bpf_raw_tracepoint_args *ctx)
{
    unsigned long syscall_id = ctx->args[1];
    struct pt_regs *regs;

    if(syscall_id != SYS_kill)    // 62
        return 0;

    regs = (struct pt_regs *)ctx->args[0];
    
    u64 pid_tgid = bpf_get_current_pid_tgid();
    u32 pid = pid_tgid;
    bpf_printk("Catched function call; PID = : %d.\n", pid);
    bpf_printk("  id: %u\n", syscall_id);

    uint64_t arg3 = 0;
    bpf_probe_read(&arg3, sizeof(uint64_t), PT_REGS_PARM3(regs));
    bpf_printk("  Arg3: %u \n", arg3);
}

Can Ebpf Modify the Return Value or Parameters of a Syscall