Can eBPF modify the return value or parameters of a syscall?
I believe that attaching eBPF to kprobes/kretprobes gives you read access to function arguments and return values, but that you cannot tamper with them. I am NOT 100% sure; good places to ask for confirmation would be the IO Visor project mailing list or IRC channel (#iovisor at irc.oftc.net).
As an alternative solution, I know you can at least change the return value of a syscall with strace, with the -e
option. Quoting the manual page:
-e inject=set[:error=errno|:retval=value][:signal=sig][:when=expr]
Perform syscall tampering for the specified set of syscalls.
Also, there was a presentation on this, and fault injection, at Fosdem 2017, if it is of any interest to you. Here is one example command from the slides:
strace -P precious.txt -efault=unlink:retval=0 unlink precious.txt
Edit: As stated by Ben, eBPF on kprobes and tracepoints is definitively read only, for tracing and monitoring use cases. I also got confirmation about this on IRC.
ebpf: intercepting function calls
No, kprobes BPF programs have only read access to the syscall parameters and return value, they cannot modify registers and therefore cannot intercept function calls. This is a limitation imposed by the BPF verifier.
Kernel modules, however, can intercept function calls using kprobes.
BPF_KPROBE macro provides unexpected value of the syscall argument
Solved by @anakryiko here
That __x64_sys_close()
actually has only one input parameter, and that's struct pt_regs *
, which contains all the syscall input arguments. So you have to do something like this to get access to input arguments:
SEC("kprobe/__x64_sys_close")
int BPF_KPROBE(do_sys_close, struct pt_regs *regs)
{
pid_t pid;
int fd;
fd = PT_REGS_PARM1_CORE(regs);
pid = bpf_get_current_pid_tgid() >> 32;
bpf_printk("KPROBE ENTRY pid = %d, fd = %d\n", pid, fd);
return 0;
}
It might be a good idea to add syscall-specific kprobe/kretprobe macros, as this is a common gotcha. Added libbpf/libbpf#425 to keep track of that.
why ebpf can ensure safe than lkm
Yes, as far as I know, the BPF verifier is meant to prevent any sort of kernel crash. That however doesn't mean you can't break things unintentionally in production. You could for example freeze your system by attaching BPF programs to all kernel functions or lose all connectivity by dropping all received packets. In those cases, the verifier has no way to know that you didn't mean to perform those actions; it won't stop you.
That being said, any sort of verification is better than no verification as in traditional kernel modules. With kernel modules, not only can you shoot yourself in the foot as I've described above, but you could also crash the whole system because of a subtle bug somewhere in the code.
Regardless of what you're using, you should obviously test it extensively before deploying to production.
eBPF: raw_tracepoint arguments
I think I worked it out, based on this article.
The ctx of a raw_tracepoint program is struct bpf_raw_tracepoint_args
. Which is defined in bpf.h as
struct bpf_raw_tracepoint_args {
__u64 args[0];
};
So basically just an array of numbers/pointers. The meaning of these arguments are depend on how the tracepoint prototype is defined. When looking at the source code where the tracepoint is defined we find:
TRACE_EVENT_FN(sys_enter,
TP_PROTO(struct pt_regs *regs, long id),
TP_ARGS(regs, id),
TP_STRUCT__entry(
__field( long, id )
__array( unsigned long, args, 6 )
),
TP_fast_assign(
__entry->id = id;
syscall_get_arguments(current, regs, __entry->args);
),
TP_printk("NR %ld (%lx, %lx, %lx, %lx, %lx, %lx)",
__entry->id,
__entry->args[0], __entry->args[1], __entry->args[2],
__entry->args[3], __entry->args[4], __entry->args[5]),
syscall_regfunc, syscall_unregfunc
);
Lets focus on TP_PROTO(struct pt_regs *regs, long id)
, this means that args[0]
is struct pt_regs *regs
and args[1]
is long id
. struct pt_regs
is a copy of the CPU registers at the time sys_enter
was called. id
is the ID of the syscall.
We can get to the arguments of the syscall by extracting them from the CPU registers. The System V ABI specifies which parameters should be present in which CPU registers. To make our lives easier, libbpf defines PT_REGS_PARM{1..5} macros in bpf_tracing.h
So, if believe this should be a correct program:
SEC("raw_tracepoint/sys_enter")
int main_entry_raw(struct bpf_raw_tracepoint_args *ctx)
{
unsigned long syscall_id = ctx->args[1];
struct pt_regs *regs;
if(syscall_id != SYS_kill) // 62
return 0;
regs = (struct pt_regs *)ctx->args[0];
u64 pid_tgid = bpf_get_current_pid_tgid();
u32 pid = pid_tgid;
bpf_printk("Catched function call; PID = : %d.\n", pid);
bpf_printk(" id: %u\n", syscall_id);
uint64_t arg3 = 0;
bpf_probe_read(&arg3, sizeof(uint64_t), PT_REGS_PARM3(regs));
bpf_printk(" Arg3: %u \n", arg3);
}
Related Topics
Remove Log Files Using Cron Job
Check If Rsync Command Ran Successful
In Order to Write Pci Ethernet Driver. How to Implement Mmap in the Pci Ethernet Driver
Importing Shapefiles in Postgresql in Linux Using Pgadmin 4
Using Find - Deleting All Files/Directories (In Linux ) Except Any One
Apt-Get Error: Sub-Process /Usr/Bin/Dpkg Returned an Error Code (1)
How to Determine If Code Is Running in Signal-Handler Context
How to Check Status of Urls from Text File Using Bash Shell Script
Bluetoothctl to Hcitool Equivalent Commands
How to Control Backlight by Terminal Command
Best Approach to Writing a Generic Installer for a Linux Application
Understanding Bash Short-Circuiting
Bash File Is Running Fine in Windows for Testng But It Is Not Working in Linux/Mac