When and How Are System Calls Interrupted

When and how are system calls interrupted?

System calls can be interrupted by any signal, this includes such signals as SIGINT (generated by CTRL-C), SIGHUP, etc.

When SA_RESTART is set, a send() will return (with the sent count) if any data was transmitted before the signal was received, it will return an error EINTR if a send timeout was set (as those can't be restarted), otherwise the send() will be restarted.

System call restarting is implemented in the kernel's signal handling code. The system call internally returns -ERESTARTSYS upon detecting a pending signal (or having a wait interrupted by a signal), which causes the signal handling code to restore the instruction pointer and relevant registers to the state before the call, making the syscall repeat.

why are system calls handled using interrupts?

Linking between separately compiled pieces of code is a minor problem. Shared libraries have had a workaround for it for quite some time (relocatable code, export tables, etc). You pay the cost typically just once, when you load the library in the program.

The bigger problem is that you need to switch the CPU from the unprivileged, user mode into the privileged, kernel mode and you need to do it in a controllable way, without letting user code escape and wreck a havoc on the kernel. And that's typically done with special or designated instructions. You may also benefit from automatic interrupt disabling when transitioning into the kernel, which the x86 int instruction can do for you. Most CPUs have something like this instruction and it's a common way of implementing the system call interface, although not the only one.

If you asked about MS-DOS or the original MINIX, both of which ran on the i8086 in the real address mode, where the kernel couldn't protect itself or other programs from anything because all the memory and system resources were accessible to all code, then there would be less reason in using a special instruction like int, there were no two modes, only one, and in that respect int would be largely equivalent to a simple call (far).

Also noteworthy is the fact that CPUs often handle the following 3 types of events in a very similar fashion:

hardware interrupts from I/O devices
exceptions, errors from code execution (e.g. division by 0, page faults, etc)
system calls

That makes using something like the int instruction a natural choice as your entry and exit points in all of the above handlers would be if not fully then largely identical.

how can signals interrupt a system call

How is the system call interrupted? does the child send the signal to the parent process or the the system call sleep(3)?

When the child process executes the system call exit(2), the kernel function do_exit(), from there exit_notify() and from there do_notify_parent() is called, which sends SIGCHLD to the parent and calls __wake_up_parent().

I don't get how the system call stop execution and passes to the
parent process, is the system call like another process?

The system call underlying sleep() puts the calling process in the state TASK_INTERRUPTIBLE - see e. g. Sleeping in the Kernel. After that, the process is simply not scheduled until woken up as by __wake_up_parent() above or the timeout.

What does wait and waitpid are always interrupted when a signal is caught mean?

The original behaviour of signal() (System-V semantics) was, to interrupt any system call if the process currently slept, execute the signal handler and the system call returns with -EINTR. Then, BSD4.3 invented the restart mechanism, which would restart any system call automatically after it was interrupted. This avoids having to write a loop for each syscall if signal handlers are involved.

Linux did not change the semantics of the signal() syscall. However, the signal() glibc wrapper function nowadays calls the syscall sigaction() with SA_RESTART flag by default. So, if you do not need the restart behaviour, you have to call sigaction() and omit that flag.

So, your code indeed makes use of the restart mechanism on both BSD and linux

do system calls execute inside a software interrupt handler in entirety?

The syscalls run on most kernels inside an ISR. Take a quick glance at a former release of Linux and you will notice the int $Ox80 to invoke the kernel. This solution which is probably the simplest from a kernel development point of view, has a strong drawback: as long as running the ISR; interrupts are disabled. Disabling interrupts too long sucks because it's obvious your system won't be reactive (it delays external events, it doesn't reschedule on time, ...).

Preemption, as Adel explained in his answer is a smart solution. But whenever the kernel choose to preempt a thread because of an unavailable ressource, it has generally already spent a lot of time with interrupts disabled.

Are system calls offloaded to other threads?

You're right. Interrupt-threads and/or threaded kernel is an even smarter solution. Kernels like Solaris and Mac OS X prefers to have very simple ISRs which just wakeup high priority interrupt threads. Therefore the ISRs are reduced to the minimum processings, and the time the system runs with interrupts disabled is strongly decreased. Because these interrupt-threads have an high priority, they are likely to run at the return of the ISR. What is nice is interrupts will be enabled again, and therefore an even higher priority work wouldn't be delayed. With a threaded kernel, such as Linux in its recent releases, multiple things can be done inside the kernel, and despite one blocks, other process are still able to enter the kernel.

Hope this help!

When and How Are System Calls Interrupted