Why Is The Close Function Is Called Release in 'struct File_Operations' in The Linux Kernel

Why is the close function is called release in `struct file_operations` in the Linux kernel?

Because the file may be opened multiple times, when you close a descriptor, only on the last close call for the last reference to the file invokes release. So there is a difference between close and release.

release: called at the last close(2) of this file, i.e. when
file->f_count reaches 0. Although defined as returning int, the return
value is ignored by VFS (see fs/file_table.c:__fput()). more

Problem with .release behavior in file_operations

The release() isn't allowed to cause the close() to fail.

You could require your userspace programs to call fsync() on the file descriptor before close(), if they want to find out about all possible errors; then implement your final error checking in the fsync() handler.

Why we actually use Open and Release in a kernel module programming ? what is the calling pattern of these module?

Open Method is a device driver implemented to do initialization for later tasks.
Which performs mainly the following tasks:
1. Check the device is ready / any hardware problems.
2. Initialize the device if opened for first time.
3. allocate and fill data structure.
in this case we have open a file data for device driver. the calling is done similar way Open a file. when we open a file with specific device name. this method will be executed.

Release Method:
which is just reverse of Open Method.

Does a kernel driver's `release` file-operations handler wait for other fops to finish?

could 1 thread of a process be inside the ioctl handler for the file (or fd), while another thread of the same process is inside of the release handler?

No. The release entry point is called when the reference counter on the
file entry is 0. ioctl() increments the reference counter on the file. So, the release entry point will not be called while an ioctl() is on tracks.

Foreword

The source code discussed below is:

  • GLIBC 2.31
  • Linux 5.4

GLIBC's pthread management

The GLIBC's pthread_create() actually involves a clone() system call with
the following flags:

CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID

According to the manual of clone(), the CLONE_FILES flag makes the threads of a process

share the same file descriptor table. Any file descriptor created by

one thread is also valid in the other threads. Similarly, if one thread closes a file descriptor, or changes its associated flags (using the fcntl() F_SETFD operation), the other threads are also affected.

clone() on the kernel side

When clone() is passed CLONE_FILES, the files_struct is not duplicated but a reference counter is incremented. As a consequence, the task structures of both threads point on the same files_struct (files field):

. The task structure is defined in include/linux/sched.h:

struct task_struct {
[...]
/* Open file information: */
struct files_struct *files; /// <==== Table of open files shared between thread
[...]

. In kernel/fork.c, the clone() service calls copy_files() to increment the reference counter on the files_struct

static int copy_files(unsigned long clone_flags, struct task_struct *tsk)
{
struct files_struct *oldf, *newf;
int error = 0;

/*
* A background process may not have any files ...
*/
oldf = current->files;
if (!oldf)
goto out;

if (clone_flags & CLONE_FILES) {
atomic_inc(&oldf->count); // <==== Ref counter incremented: files_struct is shared
goto out;
}

newf = dup_fd(oldf, &error);
if (!newf)
goto out;

tsk->files = newf;
error = 0;
out:
return error;
}

. The files_struct is defined in include/linux/fdtable.h:

/*
* Open file table structure
*/
struct files_struct {
/*
* read mostly part
*/
atomic_t count; // <==== Reference counter
bool resize_in_progress;
wait_queue_head_t resize_wait;

struct fdtable __rcu *fdt;
struct fdtable fdtab;
/*
* written part on a separate cache line in SMP
*/
spinlock_t file_lock ____cacheline_aligned_in_smp;
unsigned int next_fd;
unsigned long close_on_exec_init[1];
unsigned long open_fds_init[1];
unsigned long full_fds_bits_init[1];
struct file __rcu * fd_array[NR_OPEN_DEFAULT];

ioctl() operation

ioctl() system call is defined fs/ioctl.c. It calls fdget() first to increment the reference counter on the file entry, do the requested operation and then call fdput()

int ksys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg)
{
int error;
struct fd f = fdget(fd);

if (!f.file)
return -EBADF;
error = security_file_ioctl(f.file, cmd, arg);
if (!error)
error = do_vfs_ioctl(f.file, fd, cmd, arg);
fdput(f);
return error;
}

SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
{
return ksys_ioctl(fd, cmd, arg);
}

The file entry is defined in include/linux/fs.h. Its reference counter is the f_count field:

struct file {
union {
struct llist_node fu_llist;
struct rcu_head fu_rcuhead;
} f_u;
struct path f_path;
struct inode *f_inode; /* cached value */
const struct file_operations *f_op;

/*
* Protects f_ep_links, f_flags.
* Must not be taken from IRQ context.
*/
spinlock_t f_lock;
enum rw_hint f_write_hint;
atomic_long_t f_count; // <===== Reference counter
unsigned int f_flags;
[...]
} __randomize_layout
__attribute__((aligned(4)));

Example

Here is a simple device driver into which the file operations merely display a message when they are triggered. The ioctl() entry makes the caller sleep 5 seconds:

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/kdev_t.h>
#include <linux/cdev.h>
#include <linux/uaccess.h>
#include <linux/slab.h>
#include <linux/delay.h>

MODULE_LICENSE("GPL");

#define DEVICE_NAME "device"

static int device_open(struct inode *, struct file *);
static int device_release(struct inode *, struct file *);
static ssize_t device_read(struct file *, char *, size_t, loff_t *);
static ssize_t device_write(struct file *, const char *, size_t, loff_t *);
static long int device_ioctl(struct file *, unsigned int, unsigned long);
static int device_flush(struct file *, fl_owner_t);

static const struct file_operations fops = {
.owner = THIS_MODULE,
.read = device_read,
.write = device_write,
.unlocked_ioctl = device_ioctl,
.open = device_open,
.flush = device_flush,
.release = device_release
};

struct cdev *device_cdev;
dev_t deviceNumbers;

static int __init init(void)
{
// This returns the major number chosen dynamically in deviceNumbers
int ret = alloc_chrdev_region(&deviceNumbers, 0, 1, DEVICE_NAME);

if (ret < 0) {
printk(KERN_ALERT "Error registering: %d\n", ret);
return -1;
}

device_cdev = cdev_alloc();

cdev_init(device_cdev, &fops);

ret = cdev_add(device_cdev, deviceNumbers, 1);

printk(KERN_INFO "Device initialized (major number is %d)\n", MAJOR(deviceNumbers));

return 0;
}

static void __exit cleanup(void)
{
unregister_chrdev_region(deviceNumbers, 1);

cdev_del(device_cdev);

printk(KERN_INFO "Device unloaded\n");
}

static int device_open(struct inode *inode, struct file *file)
{
printk(KERN_INFO "Device open\n");
return 0;
}

static int device_flush(struct file *file, fl_owner_t id)
{
printk(KERN_INFO "Device flush\n");
return 0;
}

static int device_release(struct inode *inode, struct file *file)
{
printk(KERN_INFO "Device released\n");
return 0;
}

static ssize_t device_write(struct file *filp, const char *buff, size_t len, loff_t * off)
{
printk(KERN_INFO "Device write\n");
return len;
}

static ssize_t device_read(struct file *filp, char *buff, size_t len, loff_t * off)
{
printk(KERN_INFO "Device read\n");
return 0;
}

static long int device_ioctl(struct file *file, unsigned int ioctl_num, unsigned long ioctl_param)
{
printk(KERN_INFO "Device ioctl enter\n");
msleep_interruptible(5000);
printk(KERN_INFO "Device ioctl out\n");
return 0;
}

module_init(init);
module_exit(cleanup);

Here is a user space program which involves the main thread and a secondary one. The main thread opens the above device and waits for the secondary thread to start (barrier) before closing the device after 1 second. Meanwhile, the secondary thread calls ioctl() on the above device which makes it sleep 5 seconds. Then it calls ioctl() a second time before exiting.

The expected behavior is to make the main thread close the device file while the secondary thread is running the ioctl().

#include <stdio.h>
#include <pthread.h>
#include <fcntl.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <errno.h>

static int dev_fd;

static pthread_barrier_t barrier;

void *entry(void *arg)
{
int rc;

printf("Thread running...\n");

// Rendez-vous with main thread
pthread_barrier_wait(&barrier);

rc = ioctl(dev_fd, 0);
printf("rc = %d, errno = %d\n", rc, errno);

rc = ioctl(dev_fd, 0);
printf("rc = %d, errno = %d\n", rc, errno);

return NULL;
}

int main(void)
{
pthread_t tid;

dev_fd = open("/dev/device", O_RDWR);

pthread_barrier_init(&barrier, NULL, 2);

pthread_create(&tid,NULL, entry, NULL);

pthread_barrier_wait(&barrier);

sleep(1);

close(dev_fd);

pthread_join(tid,NULL);

return 0;
}

Installation of the kernel module:

$ sudo insmod ./device.ko
$ dmesg
[13270.589766] Device initialized (major number is 237)
$ sudo mknod /dev/device c 237 0
$ sudo chmod 666 /dev/device
$ ls -l /dev/device
crw-rw-rw- 1 root root 237, 0 janv. 27 10:55 /dev/device

The execution of the program shows that the first ioctl() makes the thread wait 5 seconds. But the second returns in error with EBADF (9) because meanwhile the device file has been closed by the main thread:

$ gcc p1.c -lpthread
$ ./a.out
Thread running...
rc = 0, errno = 0
rc = -1, errno = 9

In the kernel log, we can see that the close() in the main thread merely triggered a flush() operation on the device while the first ioctl() was on tracks in the secondary thread. Then, once the first ioctl() returned, the internals of the kernel freed the file entry (reference counter dropped to 0) and so, the second ioctl() did not reach the device as the file descriptor no longer referenced an opened file. Hence, the EBADF error on the second call:

[13270.589766] Device initialized (major number is 237)
[13656.862951] Device open <==== Open() in the main thread
[13656.863315] Device ioctl enter <==== 1st ioctl() in secondary thread
[13657.863523] Device flush <==== 1 s later, flush() = close() in the main thread
[13661.941238] Device ioctl out <==== 5 s later, the 1st ioctl() returns
[13661.941244] Device released <==== The file is released because the reference counter reached 0

Linux Device Driver -- file operations not working

You should add terminating "\n" at the end of your prints to force their flush into the kernel log buffer. Here is your module with some enhancement suggestions:

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/fs.h>
#include <linux/kdev_t.h>
#include <linux/cdev.h>
#include <linux/uaccess.h>
#include <linux/slab.h>

MODULE_LICENSE("Dual BSD/GPL");

#define DEVICE_NAME "device"

static int device_open(struct inode *, struct file *);
static int device_release(struct inode *, struct file *);
static ssize_t device_read(struct file *, char *, size_t, loff_t *);
static ssize_t device_write(struct file *, const char *, size_t, loff_t *);
static long int device_ioctl(struct file *, unsigned int, unsigned long);

static const struct file_operations fops =
{
.owner= THIS_MODULE,
.read= device_read,
.write=device_write,
.unlocked_ioctl= device_ioctl,
.open= device_open,
.release= device_release

};

struct cdev *device_cdev;
dev_t deviceNumbers;

static int __init init(void) // <------ Add __init keyword for kernel cleanups
{
// This returns the major number chosen dynamically in deviceNumbers
int ret = alloc_chrdev_region(&deviceNumbers, 0, 1, DEVICE_NAME);

if (ret < 0) {
printk(KERN_ALERT "Error registering: %d\n", ret);
return -1;
}

device_cdev = cdev_alloc();

cdev_init(device_cdev, &fops);

ret = cdev_add(device_cdev, deviceNumbers, 1);

printk(KERN_INFO "Device initialized (major number is %d)\n", MAJOR(deviceNumbers));

return 0;
}

static void __exit cleanup(void) // <------ Add __exit keyword for kernel cleanups
{
unregister_chrdev_region(deviceNumbers, 1);

cdev_del(device_cdev);

printk(KERN_INFO "Device unloaded\n");
}

static int device_open(struct inode *inode, struct file *file)
{
printk(KERN_INFO "Device open\n");
return 0;
}

static int device_release(struct inode *inode, struct file *file)
{
printk(KERN_INFO "Device released\n");
return 0;
}

static ssize_t device_write(struct file *filp, const char *buff, size_t len, loff_t * off)
{
printk(KERN_INFO "Device write\n");
return len; // <-------------- To stop the write
}

static ssize_t device_read(struct file *filp, char *buff, size_t len, loff_t * off)
{
printk(KERN_INFO "Device read\n");
return len; // <-------------- To stop the read
}

static long int device_ioctl(struct file *file, unsigned int ioctl_num, unsigned long ioctl_param)
{
printk(KERN_INFO "Device IOCTL\n");
return 0;
}

module_init(init);
module_exit(cleanup);

Check the current settings for the kernel log level.

$ cat /proc/sys/kernel/printk
4 4 1 7

In the preceding, the first column specifies that only messages with a log level lower than 4 will be printed.

The values accepted by printk() are:

   KERN_EMERG             0        System is unusable
KERN_ALERT 1 Action must be taken immediately
KERN_CRIT 2 Critical conditions
KERN_ERR 3 Error conditions
KERN_WARNING 4 Warning conditions
KERN_NOTICE 5 Normal but significant condition
KERN_INFO 6 Informational
KERN_DEBUG 7 Debug-level messages

So, the KERN_INFO level is 6 which is greater than 4!

We modify the configuration:

$ sudo sh -c "echo 7 4 1 7 > /proc/sys/kernel/printk"
$ cat /proc/sys/kernel/printk
7 4 1 7

I built your module with the suggested modifications and tried it on Linux 5.4.0-58:


$ uname -a
Linux xxxx 5.4.0-58-generic #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ sudo insmod ./device.ko
$ dmesg
[...]
[ 7244.516706] Device initialized (major number is 235)
$ lsmod
Module Size Used by
device 16384 0
[...]
$ cat /proc/devices
Character devices:
[...]
235 device
$ sudo mknod /dev/device c 235 0
$ ls -l /dev/device
crw-r--r-- 1 root root 235, 0 janv. 3 10:33 /dev/device
$ sudo sh -c "echo foo > /dev/device"
$ dmesg
[...]
[ 7244.516706] Device initialized (major number is 235)
[ 7311.507652] Device open
[ 7311.507672] Device write
[ 7311.507677] Device released
$ sudo rmmod device
$ dmesg
[...]
[ 7244.516706] Device initialized (major number is 235)
[ 7311.507652] Device open
[ 7311.507672] Device write
[ 7311.507677] Device released
[ 7361.523964] Device unloaded
$ sudo rm /dev/device


Related Topics



Leave a reply



Submit