How to Add a Custom Extended Attribute from Linux Kernel Space (I.E from a Custom System Call)

How to add a custom Extended Attribute from Linux kernel space (i.e from a custom system call)

I was able to get the extended attributes working for: vfs_setxattr(struct dentry *, const char *, const void *, size_t, int);
The main problem was the const void * wanted a char * to be passed. That code looked something like this:

char * buf = "test\0";
int size = 5; //number of bytes needed for attribute
int flag = 0; //0 allows for replacement or creation of attribute
int err; //gets error code negative error and positive success

err = vfs_setxattr(path_struct.dentry, "user.custom_attrib", buf, size, flag);

I also was able to get vfs_getxattr(struct dentry *, const char *, const void *, size_t); working as well. The buffer and void * was once again where I got stuck. I had to allocate a buffer to hold the extended attribute that was being passed. So my code looked something like this:

char buf[1024];
int size_buf = 1024;
int err;

err = vfs_getxattr(path_struct.dentry, "user.custom_attrib",buf, size_buf);

So now buf will hold the value from the specified file from dentry. The error codes are extremely helpful in figuring out what is going on. So is using the command line tools.

To install the command line tool:

sudo apt-get install attr

To set an attribute manually from the command line:

setfattr -n user.custom_attrib -v "test_if working" test.txt

To get an attribute manually from the command line:

getfattr -n user.custom_attrib test.txt  

I wasn't able to figure out if you could pass different types like int's into the extended atrributes and my trials caused me to brick the kernel builds numerous times. Hope this helps some people out, or if anyone has any corrections let me know.

How to mark some files from linux user space so as to apply some operation on them in kernel space

You could use extended file attributes. They allow you to set and get arbitrary metadata associated with a file.

Check man 5 attr and this question for how to set and get extended attributes from the kernel.

Call a userspace function from within a Linux kernel module

I think the simplest solution would be to create a character device in your kernel driver, with your own file operations for a virtual file. Then userspace can open this device O_RDWR. You have to implement two main file operations:

  • read -- this is how the kernel passes data back up to userspace. This function is run in the context of the userspace thread calling the read() system call, and in your case it should block until the kernel has another seed value that it needs to know the output for.

  • write-- this is how userspace passes data into the kernel. In your case, the kernel would just take the response to the previous read and pass it onto the hardware.

Then you end up with a simple loop in userspace:

while (1) {
read(fd, buf, sizeof buf);
calculate_output(buf, output);
write(fd, output, sizeof output);
}

and no loop at all in the kernel -- everything runs in the context of the userspace process that is driving things, and the kernel driver is just responsible for moving the data to/from the hardware.

Depending on what your "do some random stuff here" on the kernel side is, it might not be possible to do it quite so simply. If you really need the kernel loop, then you need to create a kernel thread to run that loop, and then have some variables along the lines of input_data, input_ready, output_data and output_ready, along with a couple of waitqueues and whatever locking you need.

When the kernel thread reads data, you put the data in input_ready and set the input_ready flag and signal the input waitqueue, and then do wait_event(<output_ready is set>). The read file operation would do a wait_event(<input_ready is set>) and return the data to userspace when it becomes ready. Similarly the write file operation would put the data it gets from userspace into output_data and set output_ready and signal the output waitqueue.

Another (uglier, less portable) way is to use something like ioperm, iopl or /dev/port to do everything completely in userspace, including the low-level hardware access.

How can I implement a custom iNode on linux?


So every directory, file, queue or whatever in Linux creates it's own inodes that can be accessed in one way or another.

False. Directories, files etc. do not create their own inodes. They are stored with use of inodes belonging to the filesystem on which they are stored. The inodes are not even created specifically for particular files -- all inodes are created as part of filesystem creation, before there are any files stored on it.*

How would I go about implementing my own inode type that doesn't quite fit any of the existing descriptions?

It's unclear why you think you need a custom inode type, but if you do, then you need a whole custom filesystem. You will need to write either kernel drivers or FUSE drivers implementing it, plus all the needed utilities for formatting a device with that FS, mounting and unmounting it, checking it for errors, etc.

A custom something that is visible in the file system but isn't a file? Do I have to extend the kernel or is there some simpler approach?

Everything is a file. This is one of the principles of UNIX. But perhaps you mean something that isn't a regular file. Unfortunately for you, even a custom file system and inode wouldn't be enough to give you a custom file type. The partition of filesystem entries regular files, directories, character and block special files, etc. is deeply ingrained in the kernel and the standard file management APIs and utilities. You would not only have to extend the kernel (beyond writing filesystem drivers), but also modify the C standard library, several standard utilities, and probably a bunch of other libraries and utilities affected by those changes. In the end, you basically have your own whole operating system.

But maybe your premise is wrong. UNIX has been going along just fine with pretty much its current file model for a very long time. It's unclear why you want what you say you want, but there are at least two simpler options that might suit you:

  • Write a kernel driver for a character or block device with a filesystem interface, and use the system's existing facilities to link one or more device instances to the filesystem as a character or block special file.

  • Embed what you want to do in regular files / directories / etc.


*More or less. I ignore special administrative actions that may in some cases be able to expand a filesystem and add inodes to it in the process.

How does kernel_thread_helper pass parameters to kernel thread function using inline assembly?

A comment in an earlier Linux kernel version (v2.5.0) explains this (the movl into eax was added in v2.1.131 to really be precise):

int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
{
long retval, d0;

__asm__ __volatile__(
"movl %%esp,%%esi\n\t"
"int $0x80\n\t" /* Linux/i386 system call */
"cmpl %%esp,%%esi\n\t" /* child or parent? */
"je 1f\n\t" /* parent - jump */
/* Load the argument into eax, and push it. That way, it does
* not matter whether the called function is compiled with
* -mregparm or not. */
"movl %4,%%eax\n\t"
"pushl %%eax\n\t"
"call *%5\n\t" /* call fn */
"movl %3,%0\n\t" /* exit */
"int $0x80\n"
"1:\t"
:"=&a" (retval), "=&S" (d0)
:"0" (__NR_clone), "i" (__NR_exit),
"r" (arg), "r" (fn),
"b" (flags | CLONE_VM)
: "memory");
return retval;
}

So yes, as you suspect, the movl into eax is exactly to provide compatibility for both functions compiled with -mregparam or an equivalent __attribute__ and without (i386 System V ABI calling convention, i.e. all parameters on the stack).

But this causes a problem: the previously pushed pointer by pushl %edx could remain in the stack if no one pops it off.

Well, true, but who cares about popping it off if the only thing we are doing right after calling the kernel thread function is pushl %eax; call do_exit? Doing movl %eax, 0(%esp) would work as well, but if there's enough stack space to run a kernel thread then surely there are 4 bytes to push a register after it returns. Additionally pushl %eax is only 1 byte vs 3 for the movl, and I'm not sure about latency or throughput of the two, but I don't really think that was a concern in such a piece of code.

asmlinkage defined to nothing in Linux code

System calls are not part of your userland program nor shared libraries, they are system services provided by the kernel and are executed in kernel space .. in order to move from user space to kernel space one needs to execute a few special assembly instructions to tell the processor to jump into supervisor mode and before that it will want to push a few bits of information in registers so that the kernel side knows what system call to execute - it has a table of function pointers which point to the various functions that 'service' each system call) -

#include "SYS.h"

ENTRY(syscall)
pop %ecx /* rta */
pop %eax /* syscall number */
push %ecx
KERNCALL
push %ecx /* need to push a word to keep stack frame intact
upon return; the word must be the return address. */
jb 1f
ret

where KERNCALL is architecture-dependent and is a few assembly language instructions that tell the cpu to jump into supervisor mode in kernel space -

./lib/libc/amd64/SYS.h:#define  KERNCALL    movq %rcx, %r10; syscall
./lib/libc/i386/SYS.h:#define KERNCALL int $0x80

So here's the thing .. when you compile a program the optimizer will occasionally throw a function's parameters in registers instead of putting them on the program's stack .. this optimization works because the compiler is emitting code for both the caller and the callee and so both sides are made aware of this slight-of-hand. Not so for the kernel however .. it has no idea what to look for in which register and so all parameters intended for a system call must be on the userland program's stack.

How do I open a directory at kernel level using the file descriptor for that directory?

To add to caf's answer mentioning vfs_readdir(), reading and writing to files from within the kernel is is considered unsafe (except for /proc, which acts as an interface to internal data structures in the kernel.)

The reasons are well described in this linuxjournal article, although they also provide a hack to access files. I don't think their method could be easily modified to work for directories. A more correct approach is accessing the kernel's filesystem inode entries, which is what vfs_readdir does.

Inodes are filesystem objects such as regular files, directories, FIFOs and other
beasts. They live either on the disc (for block device filesystems)
or in the memory (for pseudo filesystems).

Notice that vfs_readdir() expects a file * parameter. To obtain a file structure pointer from a user space file descriptor, you should utilize the kernel's file descriptor table.

The kernel.org files documentation says the following on doing so safely:

To look up the file structure given an fd, a reader
must use either fcheck() or fcheck_files() APIs. These
take care of barrier requirements due to lock-free lookup.
An example :

    rcu_read_lock();
file = fcheck_files(files, fd);
if (file) {
// Handling of the file structures is special.
// Since the look-up of the fd (fget() / fget_light())
// are lock-free, it is possible that look-up may race with
// the last put() operation on the file structure.
// This is avoided using atomic_long_inc_not_zero() on ->f_count
if (atomic_long_inc_not_zero(&file->f_count))
*fput_needed = 1;
else
/* Didn't get the reference, someone's freed */
file = NULL;
}
rcu_read_unlock();
....
return file;

atomic_long_inc_not_zero() detects if refcounts is already zero or
goes to zero during increment. If it does, we fail fget() / fget_light().

Finally, take a look at filldir_t, the second parameter type.

Shared semaphore between user and kernel spaces

One solution I can think of is to have a /proc (or /sys or whatever) file on a main kernel module where writing 0/1 to it (or read from/write to it) would cause it to issue an up/down on a semaphore. Exporting that semaphore allows other kernel modules to directly access it while user applications would go through the /proc file system.

I'd still wait to see if the original question has an answer.



Related Topics



Leave a reply



Submit