How Does Linux Kernel Prevents The Bios System Calls

how does linux kernel prevents the BIOS system calls?

The INT n instruction generates a call to the interrupt or exception handler specified with the destination operand. The destination operand specifies an interrupt vector number from 0 to 255, encoded as an 8-bit unsigned intermediate value. Each interrupt vector number provides an index to a gate descriptor in the IDT.

The selected interrupt descriptor in turn contains a pointer to an interrupt or exception handler procedure. In protected mode (linux works in protected mode only), the IDT contains an array of 8-byte descriptors, each of which is an interrupt gate, trap gate, or task gate.

This IDT is set by the OS. Linux sets it up so that descriptors point to its own handlers, not the BIOS handlers at all.

How does the Implementation of System Calls and Interrupts differ from each other?

On most systems, interrupts and system calls (and exception handlers) are implemented in the same way.

As soon the Program is executed, the system call informs the kernel of the request - What exactly happens here in terms of low level programming?

Usually, system calls are wrappers around assembly language routines. The sequence of events is:

  1. Call to System Routine
  2. System Routine unpacks parameters and loads them into registers.
  3. System Routine forces an exception (identified by a number) by executing a change mode instruction (to some mode higher than user mode).
  4. The CPU handles the exception by dispatching to an exception handler in the system dispatch table.
  5. The handler performs the system service.
  6. The handler executes a return from exception or interrupt instruction, returning the process to user mode (or whatever mode was called from) and to the system service routine.
  7. The system service routine unpacks the return values from registers and updates the parameters.
  8. Return to the calling function.

Can an Interrupt be a System Call or vice versa?

No. They are dispatched in the same way.

Presumably an operating system could map system calls and interrupts to the same handler but that would be screwy.

linux system call implementation

A system call is mostly implemented inside the Linux kernel, with a tiny glue code in the C standard library. But see also vdso(7).

From the user-land point of view, a system call (they are listed in syscalls(2)...) is a single machine instruction (often SYSENTER) with some calling conventions (e.g. defining which machine register hold the syscall number - e.g. __NR_stat from /usr/include/asm/unistd_64.h....-, and which other registers contain the arguments to the system call).

Use strace(1) to understand which system calls are done by a given program or process.

The C standard library has a tiny wrapper function (which invokes the kernel, following the ABI, and deals with error reporting & errno).

For stat(2), the C wrapping function is e.g. in stat/stat.c for musl-libc.

Inside the kernel code, most of the work happens in fs/stat.c (e.g. after line 207).

See also this & that answers

Relationship between system calls API, syscall instruction and exception mechanism (interrupts)

TL;DR

The syscall instruction itself acts like a glorified jump, it's a hardware-supported way to efficiently and safely jump from unprivileged user-space into the kernel.

The syscall instruction jumps to a kernel entry-point that dispatches the call.

Before x86_64 two other mechanisms were used: the int instruction and the sysenter instruction.

They have different entry-points (still present today in 32-bit kernels, and 64-bit kernels that can run 32-bit user-space programs).

The former uses the x86 interrupt machinery and can be confused with the exceptions dispatching (that also uses the interrupt machinery).

However, exceptions are spurious events while int is used to generate a software interrupt, again, a glorified jump.


The C language doesn't concern itself with system calls, it relies on the C runtime to perform all the interactions with the environment of the future program.

The C runtime implements the above-mentioned interactions through an environment specific mechanism.

There could be various layers of software abstractions but in the end the OS APIs get called.

The term API is used to denote a contract, strictly speaking using an API doesn't require to invoke a piece of kernel code (the trend is to implement non-critical functions in userspace to limit the exploitable code), here we are only interested in the subset of the API that requires a privilege switch.

Under Linux, the kernel exposes a set of services accessible from userspace, these entry-points are called system calls.

Under Windows, the kernel services (that are accessed with the same mechanism of the Linux analogues) are considered private in the sense that they are not required to be stable across versions.

A set of DLL/EXE exported functions are used as entry-points instead (e.g. ntoskrnl.exe, hal.dll, kernel32.dll, user32.dll) that in turn use the kernel services through a (private) system call.

Note that under Linux, most system calls have a POSIX wrapper around it, so it's possible to use these wrappers, that are ordinary C functions, to invoke a system call.

The underlying ABI is different, so is for the error reporting; the wrapper translates between the two worlds.

The C runtime calls the OS APIs, in the case of Linux the system calls are used directly because they are public (in the sense that are stable across versions), while for Windows the usual DLLs, like kernel32.dll, are marked as dependencies and used.

We are reduced to the point where an user-mode program, being it part of the C runtime (Linux) or part of an API DLL (Windows), need to invoke a code in the kernel.

The x86 architecture historically offered different ways to do so, for example, a call gate.

Another way is through the int instruction, it has a few advantages:

  • It is what the BIOS and the DOS did in their times.

    In real-mode, using an int instructions is suitable because a vector number (e.g. 21h) is easier to remember than a far address (e.g. 0f000h:0fff0h).
  • It saves the flags.
  • It is easy to set up (setting up ISR is relatively easy).

With the modernization of the architecture this mechanism turned out to have a big disadvantage: it is slow.
Before the introduction of the sysenter (note, sysenter not syscall) instruction there was no faster alternative (a call gate would be equally slow).

With the advent of the Pentium Pro/II[1] a new pair of instructions sysenter and sysexit were introduced to make system calls faster.

Linux started using them since the version 2.5 and are still used today on 32-bit systems I believe.

I won't explain the whole mechanism of the sysenter instruction and the companion VDSO necessary to use it, it is only needed to say that it was faster than the int mechanism (I can't find an article from Andy Glew where he says that sysenter turned out to be slow on Pentium III, I don't know how it performs nowadays).

With the advent of x86-64 the AMD response to sysenter, i.e. the syscall/sysret pair, began the de-facto way to switch from user-mode to kernel-mode.

This is due to the fact that sysenter is actually fast and very simple (it copies rip and rflags into rcx and r11 respectively, masks rflags and jump to an address set in IA32_LSTAR).

64-bit versions of both Linux and Windows use syscall.

To recap, control can be given to the kernel through three mechanism:

  • Software interrupts.

    This was int 80h for 32-bit Linux (pre 2.5) and int 2eh for 32-bit Windows.
  • Via sysenter.

    Used by 32-bit versions of Linux since 2.5.
  • Via syscall.

    Used by 64-bit versions of Linux and Windows.

Here is a nice page to put it in a better shape.

The C runtime is usually a static library, thus pre-compiled, that uses one of the three methods above.

The syscall instruction transfers control to a kernel entry-point (see entry_64.s) directly.

It is an instruction that just does so, it is not implemented by the OS, it is used by the OS.

The term exception is overloaded in CS, C++ has exceptions, so do Java and C#.

The OS can have a language agnostic exception trapping mechanism (under windows it was once called SEH, now has been rewritten).

The CPU also has exceptions.

I believe we are talking about the last meaning.

Exceptions are dispatched through interrupts, they are a kind of interrupt.

It goes unsaid that while exceptions are synchronous (they happen at specific, replayable points) they are "unwanted", they are exceptional, in the sense that programmers tend to avoid them and when they happen is due to either a bug, an unhandled corner case or a bad situation.

They, thus, are not used to transfer control to the kernel (they could).

Software interrupts (that are synchronous too) were used instead; the mechanism is almost exactly the same (exceptions can have a status code pushed on the kernel stack) but the semantic is different.

We never deferenced a null-pointer, accessed an unmapped page or similar to invoke a system call, we used the int instruction instead.

Understanding the hardware of printf

Linux:

printf() ---> printf() in the C library ---> write() in C library ---> write() system call in kernel.

To understand the interface between user space and kernel space, you will need to have some knowledge of how system calls work.

To understand what is going on at the lowest levels, you will need to analyze the source code in the kernel.

The Linux system call quick reference (pdf link) may be useful as it identifies where in the kernel source you might begin looking.

What does int 0x80 mean in assembly code?

It passes control to interrupt vector 0x80

See http://en.wikipedia.org/wiki/Interrupt_vector

On Linux, have a look at this: it was used to handle system_call. Of course on another OS this could mean something totally different.



Related Topics



Leave a reply



Submit