How to add a custom Extended Attribute from Linux kernel space (i.e from a custom system call)
I was able to get the extended attributes working for: vfs_setxattr(struct dentry *, const char *, const void *, size_t, int);
The main problem was the const void *
wanted a char *
to be passed. That code looked something like this:
char * buf = "test\0";
int size = 5; //number of bytes needed for attribute
int flag = 0; //0 allows for replacement or creation of attribute
int err; //gets error code negative error and positive success
err = vfs_setxattr(path_struct.dentry, "user.custom_attrib", buf, size, flag);
I also was able to get vfs_getxattr(struct dentry *, const char *, const void *, size_t);
working as well. The buffer and void *
was once again where I got stuck. I had to allocate a buffer to hold the extended attribute that was being passed. So my code looked something like this:
char buf[1024];
int size_buf = 1024;
int err;
err = vfs_getxattr(path_struct.dentry, "user.custom_attrib",buf, size_buf);
So now buf will hold the value from the specified file from dentry. The error codes are extremely helpful in figuring out what is going on. So is using the command line tools.
To install the command line tool:
sudo apt-get install attr
To set an attribute manually from the command line:
setfattr -n user.custom_attrib -v "test_if working" test.txt
To get an attribute manually from the command line:
getfattr -n user.custom_attrib test.txt
I wasn't able to figure out if you could pass different types like int's into the extended atrributes and my trials caused me to brick the kernel builds numerous times. Hope this helps some people out, or if anyone has any corrections let me know.
How to mark some files from linux user space so as to apply some operation on them in kernel space
You could use extended file attributes. They allow you to set and get arbitrary metadata associated with a file.
Check man 5 attr and this question for how to set and get extended attributes from the kernel.
Call a userspace function from within a Linux kernel module
I think the simplest solution would be to create a character device in your kernel driver, with your own file operations for a virtual file. Then userspace can open this device O_RDWR
. You have to implement two main file operations:
read
-- this is how the kernel passes data back up to userspace. This function is run in the context of the userspace thread calling theread()
system call, and in your case it should block until the kernel has another seed value that it needs to know the output for.write
-- this is how userspace passes data into the kernel. In your case, the kernel would just take the response to the previous read and pass it onto the hardware.
Then you end up with a simple loop in userspace:
while (1) {
read(fd, buf, sizeof buf);
calculate_output(buf, output);
write(fd, output, sizeof output);
}
and no loop at all in the kernel -- everything runs in the context of the userspace process that is driving things, and the kernel driver is just responsible for moving the data to/from the hardware.
Depending on what your "do some random stuff here" on the kernel side is, it might not be possible to do it quite so simply. If you really need the kernel loop, then you need to create a kernel thread to run that loop, and then have some variables along the lines of input_data
, input_ready
, output_data
and output_ready
, along with a couple of waitqueues and whatever locking you need.
When the kernel thread reads data, you put the data in input_ready
and set the input_ready
flag and signal the input waitqueue, and then do wait_event(<output_ready is set>)
. The read
file operation would do a wait_event(<input_ready is set>)
and return the data to userspace when it becomes ready. Similarly the write
file operation would put the data it gets from userspace into output_data
and set output_ready
and signal the output waitqueue.
Another (uglier, less portable) way is to use something like ioperm
, iopl
or /dev/port
to do everything completely in userspace, including the low-level hardware access.
How can I implement a custom iNode on linux?
So every directory, file, queue or whatever in Linux creates it's own inodes that can be accessed in one way or another.
False. Directories, files etc. do not create their own inodes. They are stored with use of inodes belonging to the filesystem on which they are stored. The inodes are not even created specifically for particular files -- all inodes are created as part of filesystem creation, before there are any files stored on it.*
How would I go about implementing my own inode type that doesn't quite fit any of the existing descriptions?
It's unclear why you think you need a custom inode type, but if you do, then you need a whole custom filesystem. You will need to write either kernel drivers or FUSE drivers implementing it, plus all the needed utilities for formatting a device with that FS, mounting and unmounting it, checking it for errors, etc.
A custom something that is visible in the file system but isn't a file? Do I have to extend the kernel or is there some simpler approach?
Everything is a file. This is one of the principles of UNIX. But perhaps you mean something that isn't a regular file. Unfortunately for you, even a custom file system and inode wouldn't be enough to give you a custom file type. The partition of filesystem entries regular files, directories, character and block special files, etc. is deeply ingrained in the kernel and the standard file management APIs and utilities. You would not only have to extend the kernel (beyond writing filesystem drivers), but also modify the C standard library, several standard utilities, and probably a bunch of other libraries and utilities affected by those changes. In the end, you basically have your own whole operating system.
But maybe your premise is wrong. UNIX has been going along just fine with pretty much its current file model for a very long time. It's unclear why you want what you say you want, but there are at least two simpler options that might suit you:
Write a kernel driver for a character or block device with a filesystem interface, and use the system's existing facilities to link one or more device instances to the filesystem as a character or block special file.
Embed what you want to do in regular files / directories / etc.
*More or less. I ignore special administrative actions that may in some cases be able to expand a filesystem and add inodes to it in the process.
How does kernel_thread_helper pass parameters to kernel thread function using inline assembly?
A comment in an earlier Linux kernel version (v2.5.0) explains this (the movl
into eax
was added in v2.1.131 to really be precise):
int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
{
long retval, d0;
__asm__ __volatile__(
"movl %%esp,%%esi\n\t"
"int $0x80\n\t" /* Linux/i386 system call */
"cmpl %%esp,%%esi\n\t" /* child or parent? */
"je 1f\n\t" /* parent - jump */
/* Load the argument into eax, and push it. That way, it does
* not matter whether the called function is compiled with
* -mregparm or not. */
"movl %4,%%eax\n\t"
"pushl %%eax\n\t"
"call *%5\n\t" /* call fn */
"movl %3,%0\n\t" /* exit */
"int $0x80\n"
"1:\t"
:"=&a" (retval), "=&S" (d0)
:"0" (__NR_clone), "i" (__NR_exit),
"r" (arg), "r" (fn),
"b" (flags | CLONE_VM)
: "memory");
return retval;
}
So yes, as you suspect, the movl
into eax
is exactly to provide compatibility for both functions compiled with -mregparam
or an equivalent __attribute__
and without (i386 System V ABI calling convention, i.e. all parameters on the stack).
But this causes a problem: the previously pushed pointer by
pushl %edx
could remain in the stack if no one pops it off.
Well, true, but who cares about popping it off if the only thing we are doing right after calling the kernel thread function is pushl %eax; call do_exit
? Doing movl %eax, 0(%esp)
would work as well, but if there's enough stack space to run a kernel thread then surely there are 4 bytes to push a register after it returns. Additionally pushl %eax
is only 1 byte vs 3 for the movl
, and I'm not sure about latency or throughput of the two, but I don't really think that was a concern in such a piece of code.
asmlinkage defined to nothing in Linux code
System calls are not part of your userland program nor shared libraries, they are system services provided by the kernel and are executed in kernel space .. in order to move from user space to kernel space one needs to execute a few special assembly instructions to tell the processor to jump into supervisor mode and before that it will want to push a few bits of information in registers so that the kernel side knows what system call to execute - it has a table of function pointers which point to the various functions that 'service' each system call) -
#include "SYS.h"
ENTRY(syscall)
pop %ecx /* rta */
pop %eax /* syscall number */
push %ecx
KERNCALL
push %ecx /* need to push a word to keep stack frame intact
upon return; the word must be the return address. */
jb 1f
ret
where KERNCALL is architecture-dependent and is a few assembly language instructions that tell the cpu to jump into supervisor mode in kernel space -
./lib/libc/amd64/SYS.h:#define KERNCALL movq %rcx, %r10; syscall
./lib/libc/i386/SYS.h:#define KERNCALL int $0x80
So here's the thing .. when you compile a program the optimizer will occasionally throw a function's parameters in registers instead of putting them on the program's stack .. this optimization works because the compiler is emitting code for both the caller and the callee and so both sides are made aware of this slight-of-hand. Not so for the kernel however .. it has no idea what to look for in which register and so all parameters intended for a system call must be on the userland program's stack.
How do I open a directory at kernel level using the file descriptor for that directory?
To add to caf's answer mentioning vfs_readdir()
, reading and writing to files from within the kernel is is considered unsafe (except for /proc
, which acts as an interface to internal data structures in the kernel.)
The reasons are well described in this linuxjournal article, although they also provide a hack to access files. I don't think their method could be easily modified to work for directories. A more correct approach is accessing the kernel's filesystem inode entries, which is what vfs_readdir
does.
Inodes are filesystem objects such as regular files, directories, FIFOs and other
beasts. They live either on the disc (for block device filesystems)
or in the memory (for pseudo filesystems).
Notice that vfs_readdir()
expects a file *
parameter. To obtain a file
structure pointer from a user space file descriptor, you should utilize the kernel's file descriptor table.
The kernel.org files documentation says the following on doing so safely:
To look up the file structure given an fd, a reader
must use eitherfcheck()
orfcheck_files()
APIs. These
take care of barrier requirements due to lock-free lookup.
An example :
rcu_read_lock();
file = fcheck_files(files, fd);
if (file) {
// Handling of the file structures is special.
// Since the look-up of the fd (fget() / fget_light())
// are lock-free, it is possible that look-up may race with
// the last put() operation on the file structure.
// This is avoided using atomic_long_inc_not_zero() on ->f_count
if (atomic_long_inc_not_zero(&file->f_count))
*fput_needed = 1;
else
/* Didn't get the reference, someone's freed */
file = NULL;
}
rcu_read_unlock();
....
return file;
atomic_long_inc_not_zero()
detects ifrefcounts
is already zero or
goes to zero during increment. If it does, we failfget()
/fget_light()
.
Finally, take a look at filldir_t
, the second parameter type.
Shared semaphore between user and kernel spaces
One solution I can think of is to have a /proc
(or /sys
or whatever) file on a main kernel module where writing 0
/1
to it (or read from/write to it) would cause it to issue an up
/down
on a semaphore
. Exporting that semaphore allows other kernel modules to directly access it while user applications would go through the /proc
file system.
I'd still wait to see if the original question has an answer.
Related Topics
Linux: Writes Are Split into 512K Chunks
Linking a C Library and Its Supporting Library in Swift (Linux)
Linux, Where Are the Return Codes Stored of System Daemons and Other Processes
Rename Files and Directories (Add Prefix)
Cut or Awk Command to Print First Field of First Row
How Does Ngrok Work Behind a Firewall
What Is Start-Stop-Daemon in Linux Scripting
How to Detect If a Git Clone Failed in a Bash Script
How to Export Database Schema in Oracle to a Dump File
Mechanism of Clipboard of Xwindow
Sed: Insert a Line in a Certain Position
Should %Rsp Be Aligned to 16-Byte Boundary Before Calling a Function in Nasm
Sed Returns "Sed: Command Garbled"
Signals and Interrupts a Comparison
Merge Multiple Jpgs into Single PDF in Linux
What's the Difference Between "Env" and "Set" (On MAC Os X or Linux)
How to Define a Bash Alias as a Sequence of Multiple Commands