Userspace VS Kernel Space Driver

Embedded Linux: kernel drivers vs user space drivers?

Of the top of my head:

Userland approach pros:

faster to develop / easier to debug
if buggy and crash, cannot crash your whole system

Userland approach cons:

"performances" - which I'll leave as a very vague concept here and
today...

In the case of your application, putting that in perspective:

since we can safely bet that humidity does not change dramatically in
short amount of time,
and/or your sensor have some non negligible hysteresis anyway (would
it be only for mechanical reasons, ex a drop of water fall on it, it
will not dissapear in a millisecond),
...and you probably don't plan to send humidity measures every
millisecond - do you?
...and even if you did, most of the latency (as "vs performance")
will be from that part that will make it a JSON, send it to the
server (both being clearly jobs for userland), and - though that may
be none of your business, this is still part of the use case -
networking condition and processing time by the by the server,

...all in all, I would 200% go with the userland approach.

Kernel space may be technically more "fun" or "rewarding", but engineering put "pragmatic" before "fun". :-)

Is mmap() same in kernel space and user space?

mmap(2) implementation is not as illustrative because it has compatibility artefacts, but here's an example with truncate(2):

Userspace calls into kernel space, which starts with a specially linked function.
That function calls an internal function.
The internal function e.g. do_sys_truncate() calls into an actual kernel API.
The kernel API e.g. vfs_truncate() does the heavy lifting.
That kernel API is what's being exported to the rest of the kernel code.

So, essentially, the userspace mmap() is just another, more complex path, to an internal kernel API that is accessible to the rest of the kernel via other, simpler path.

Linux kernel space and user space

You've got the general idea mostly right, but make this adjustment: there's only one "kernelspace" for the whole machine, and all processes share it.

When a process is active, it can either be running in "user mode" or "kernel mode".

In user mode, the instructions being executed by the CPU are in the userspace side of the memory map. The program is running its own code, or code from a userspace library. In user mode, a process has limited abilities. There is a flag in the CPU which tells it not to allow the use of privileged instructions, and kernel memory, although it exists in the process's memory map, is inaccessible. (You wouldn't want let any program just read and write the kernel's memory - all security would be gone.)

When a process wants to do something other than move data around in its own (userspace) virtual memory, like open a file for example, it must make a syscall. Each CPU architecture has its own unique quirky method of making syscalls, but they all boil down to this: a magic instruction is executed, the CPU turns on the "privileged mode" flag, and jumps to a special address in kernelspace, the "syscall entry point".

Now the process is running in kernel mode. Instructions being executed are located in kernel memory, and they can read and write any memory they want to. The kernel examines the request that the process just made and decides what to do with it.

In the open example, the kernel receives 2 or 3 parameters corresponding to the arguments of int open(const char *filename, int flags[, int mode]). The first argument provides an example of when kernelspace needs access to userspace. You said open("foo", O_RDONLY) so the string "foo" is part of your program in userspace. The syscall mechanism only passed a pointer, not a string, so the kernel must read the string from user memory.

To find the requested file, the kernel may consult with filesystem drivers (to figure out where the file is) and block device drivers (to load the necessary blocks from disk) or network device drivers and protocols (to load the file from a remote source). All of those things are part of the kernel, i.e. in kernelspace, regardless of whether they are built-in or were loaded as modules.

If the request can't be satisfied immediately, the kernel may put the process to sleep. That means the process will be taken off the CPU until a response is received from the disk or network. Another process may get a chance to run now. Later, when the response comes in, your process starts running again (still in kernel mode). Now that it's found the file, the open syscall can finish up (check the permissions, create a file descriptor) and return to userspace.

Returning to userspace is a simple matter of putting the CPU back in non-privileged mode and restoring the registers to what they were before the user->kernel transition, with the instruction pointer pointing at the instruction after the magic syscall instruction.

Besides syscalls, there are other things that can cause a transition from user mode to kernel mode, including:

page faults - if your process accesses a virtual memory address that doesn't have a physical address assigned to it, the CPU enters kernel mode and jumps to the page fault handler. The kernel then decides whether the virtual address is valid or not, and it either creates a physical page and resumes the process in userspace where it left off, or sends a SIGSEGV.
interrupts - some hardware (network, disk, serial port, etc.) notifies the CPU that it requires attention. The CPU enters kernel mode and jumps to a handler, the kernel responds to it and then resumes the userspace process that was running before the interrupt.

Loading a module is done with a syscall that asks the kernel to copy the module's code and data into kernelspace and run its initialization code in kernel mode.

This is pretty long, so I'm stopping. I hope the walk-through focusing on user-kernel transitions has provided enough examples to solidify the idea.

what situations when to read data out of kernel space to user space?

The operating system's job is allow a lot of components, both hardware and software, to play nice with each other. In general, userland programs can't directly manipulate peripherals nor interfere with each other. I'm not familiar with the specific setup that you're citing, but it doesn't sound unusual.

The USB camera notifies the operating system that it has a new frame. When the kernel (driver) notices this it, will copy the frame with I/O commands into RAM. Since this RAM was allocated by the driver, the userland programs won't be able to see or read it due to virtual memory. To summarise it quickly, the address &0x1000 in the kernel and the address &0x1000 in a program are actually physically distinct locations in RAM. The kernel will then copy the frame into the memory of any process that is expecting input from the camera and then notify it (in this case catusb).

Likewise, since xform, detect and hdinput exist as separate processes, they must use inter-process communication. Since the operating system must ensure the isolation of the programs, each process will leverage the kernel to achieve this.

There's nothing unusual here. I imagine they are just spelling it out because gesture recognition is time-critical and doing it this way has some overhead.

Kernel module or user space application

Using /dev/mem is quite straightforward, however it also causes some serious security issues. You either have to run your application as root or make the /dev/mem file accessible for other users, which are both unwelcome in designs that at some point will become products. If a malicious process can access the /dev/mem file it can possibly access any secret stored in RAM or corrupt any application - including the kernel itself. Even if your application is the only one able to access this file, any security concern of your code becomes the security concern of the whole system.

Preparing the driver is obviously not an easy task, but allows you to separate the (usually simple) privileged code from the applications in user space. In a simplest case you only have to provide some register read and write methods (through ioctl). These should check if the address is well aligned and constrained to the device address space. Additionally, the driver usually performs any additional address translation - so the client application does not need to know under which physical address was your device mapped (which is the case e.g. with PCI Express).

I would not recommend writing the driver from scratch, but to repurpose some existent code. In the mentioned case of PCI Express I have used two sources of inspiration - the Xilinx driver described here: https://www.xilinx.com/support/answers/65444.html (sources included) and more complicated 'pcieuni' and 'gpcieuni' from ChimeraTk project (https://github.com/ChimeraTK).