Identifying Kernel Threads

How to identify a thread is a kernel thread or not through `bash`?

You can determine whether a particular task is a kthread or not by looking at /proc/<PID>/stat. More presicesely, according to man 5 proc the 9th field of this virtual file contains the kernel flags for the process. In case of a kthread, the flags will have PF_KTHREAD set.

Here's an example of a Bash script that takes a PID as argument and checks if it's a kthread or not:

#!/bin/bash

read -a stats < /proc/$1/stat
flags=${stats[8]}

if (( ($flags & 0x00200000) == 0x00200000 )); then
    echo 'KTHREAD'
else
    echo 'NOT KTHREAD'
fi

This isn't really simpler than just doing ps u -p <PID> and checking for [], but it's a pure Bash solution nonetheless.

There are also some more "tricks" which can be used to identify a kernel thread, it's easy to modify the above script to use one of the methods highlighted in this post.

What is a Kernel thread?

A kernel thread is a kernel task running only in kernel mode; it usually has not been created by fork() or clone() system calls. An example is kworker or kswapd.

You probably should not implement kernel threads if you don't know what they are.

Google gives many pages about kernel threads, e.g. Frey's page.

Identifying a thread as a Remote thread

If you are willing to go kernel mode PsSetCreateThreadNotifyRoutine may be of interest to you. According to Uninformed it is called in the context of the process 'that is creating or terminating the thread'. I have also seen this suggested elsewhere.

I got round to testing this, it works fine. You should note that you will see a number of false positives as Windows (unsurprisingly) does some injection itself. <EDIT> This is actually caused because on process creation as the first thread is created it will be done in the context of the parent process. Simply eliminate first thread creation and this gives a pretty good indication. </EDIT>

The main draw back (other than having to write a driver) would be you need to see the creation happen, so your process needs to start first.

Alternatively, as mentioned, heuristics involving stack traces, loaded modules and all that good stuff come into play.

nodejs spawns threads implicitly by delegating the I/O to the kernel. How is this different than a server that makes a thread per request

This would be true if node created one thread for each I/O request. But, of course, it doesn't do that. It has an I/O engine that understands the best way to do I/O on each platform.

What nodejs hides from you is not some naive implementation where a scheduling entity waits for each request to complete, but a sophisticated implementation that understands the optimal way to do I/O on every platform on which it is implemented.

Updates:

If both approaches need the kernel for I/O aren't they both creating a kernel thread per request?

No. There are lots of ways to use the kernel for I/O that don't require a kernel thread per request. They differ from platform to platform. Windows has IOCP. Linux has epoll. And so on.

If nodejs somehow is using a fixed amount of threads and queueing the I/O operations, isn't that slower than a thread per request?

No, it's typically much faster for a variety of reasons that depend on the specifics of each platform. Here are a few advantages:

You can avoid "thundering herds" when lots of I/O completes at once. Instead, you can wake just the number of threads that can usefully run at the same time.
You can avoid needing lots of contexts switches to get all the different threads to execute. Instead, each thread can handle completion after completion.
You don't have to put each thread on a wait queue for each I/O operation. Instead, you can use a single wait queue for the group of threads.

Just to give you an idea of how significant it can be, consider the difference between using a thread per I/O and using epoll on Linux. If you use a thread per I/O, that means each I/O operation requires a thread to place itself on a wait queue, that thread to block, that thread to be unblocked, a context switch to occur to that thread, and that thread to remove itself from the wait queue.

By contrast, with epoll, a single thread can service any number of I/O completions without having to be rescheduled or added to or removed from a wait queue for each I/O. Similarly, a thread can issue a number of I/O requests without being descheduled. This difference is massive.

How to list threads were killed by the kernel?

The opposite of do_fork is do_exit, here:
do_exit kernel source

I'm not able to find when threads are exiting, other than:

release_task

I believe "task" and "thread" are (almost) synonymous in Linux.

When doing asynchronous I/O, how does the kernel determine if an I/O operation is completed?

The KERNEL below means "kernel side". It includes OS kernel code + loaded drivers.

Given you have a TCP connection to a remote server. Here is an example how Kernel handles asynchronous write/read TCP stream.

When you send a byte array to TCP stream, kernel will puts the buffer stream in RAM and control the DMA system to copy the buffer to networking card. When DMA done its job, there is an interrupt inside the CPU invoked. A interrupt handler registered by kernel will transform the signal from DMA into a done callback for write to TCP stream method. Of course, the actual TCP stack is much more complex. These sentences are just idea how the thing works.

For the case read from TCP stream, when a package come in on networking card, there is another interrupt invoked. The another handler registered by kernel will transform the interrupt to event on golang side.

Again, the real case is very very complex. There are many OSes, many versions, many kind of IO operations and many hardware devices.