How to Take Advantage of The Vdso Object with Your Own Programming Language

How to take advantage of the VDSO object with your own programming language?

It depends if your implementation is using C interface for low level utilities or not.

If your language implement gives direct access to syscalls without going thru the C wrapper you don't need to use VDSO (you could for instance generate the appropriate SYSENTER machine instruction to do the syscall), but you could decide to use VDSO and then take advantage of it. In that case, your language don't even need to follow the all the ABI conventions, just the conventions of the kernel. (for instance, you don't need the ABI provided caller-safe calle-safe distinguo on registers, and you could even avoid using any stacks).

An example of language implementation not even using libc.so is Bones Scheme. You could find a few others.

My understanding of the VDSO is that it is an abstraction, provided by the kernel, to abstract away the various small differences (related to user-land -> kernel transitions) in implementing syscalls, between various families of x86 processors. If you have chosen a particular processor target, you don't need VDSO, and you can always avoid it.

AFAIU, the VDSO is an ELF shared object, sitting (on my Debian/AMD64 with a recently compiled 3.8.3 kernel) in the segment ffffffffff600000-ffffffffff601000; check exactly with cat /proc/self/maps where it is). So you just need to understand the organization of ELF shared objects and retrieve the symbols from it. See this & that links. The VDSO uses the C conventions for calling documented in the x86-64 ABI specification.

That is, if you extract from your process space the VDSO and write it on a disk file, the result is a well formed ELF shared object

ELF is a well documented format. And so is the x86-64 ABI conventions
(which defines precisely the C calling conventions, and how exactly a process' image starts. See also execve(2)) man page, and of course the kernel documentation, so I don't understand what is your issue. I agree that understanding ELF takes time (I did that 10 years ago, but my memory is rusty). Read also the <elf.h> header file on your machine.

For instance; running (under zsh on 64 bits Debian x86-64)

 % file $(which sash)
/bin/sash: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
statically linked, for GNU/Linux 2.6.26,
BuildID[sha1]=0x0347fcc08fba2f811f58af99f26111d0f579a9f6, stripped

% ldd $(which sash)
not a dynamic executable

% sash
Stand-alone shell (version 3.7)
> ps |grep sash
21635 pts/3 00:00:00 sash
> cat /proc/21635/maps
00400000-004da000 r-xp 00000000 08:01 4985590 /bin/sash
006da000-006dc000 rw-p 000da000 08:01 4985590 /bin/sash
006dc000-006e1000 rw-p 00000000 00:00 0
017e3000-01806000 rw-p 00000000 00:00 0 [heap]
7fe4950e5000-7fe4950e7000 rw-p 00000000 00:00 0
7fff3f130000-7fff3f151000 rw-p 00000000 00:00 0 [stack]
7fff3f173000-7fff3f175000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

See also this answer.

You probably want inside your runtime a minimal version of a dynamic linker able to simply parse the VDSO. You certainly want to understand the exact state in which a process is started, and in particular the role of auxv, the auxiliary vector (I really forget these details, but I remember that they are important). See e.g. this article

Actually, starting reliably your runtime is probably harder than the VDSO issue.

You may also want to read the linux assembly howto which also explains some things (but more about x86 than x86-64)

BTW,the code of http://musl-libc.org/ (which is an alternative libc) is much easier to read and understand (and you'll learn easily how they do dynamic linking, pthreads, etc..)

Object code relocation and Intel Pin interaction

Relocation are processor specific, so ARM and x86-64 and x86 have different relocations (because their instruction set is different).

Relocation are also operating system specific, but some related OSes try to have the same relocations, e.g. Solaris and Linux for x86-64

They are described in detail in the ABI (application binary interface) specification "System V Application Binary Interface AMD64 Architecture Processor Supplement". The original x86-64 ABI used to be on http://www.x86-64.org/documentation.html
but that site is not responding since several weeks. An old copy is on this link and a newer one is here

There is also the X32 ABI

See also this question.

Let the gcc compile syscalls in int 0x80 way?

You should compile with gcc -Wall (and also perhaps the -g flag).

This will give you more warnings, and will pin-point some easy mistakes, like the lack of appropriate #include

The exit(3) function is a library function (making atexit possible). The corresponding syscall is _exit(2), but on recent Linux exit is calling exit_group(2). So your example misses an #include <stdlib.h>

Current Linux implementations often do not use int 0x80 but go thru the VDSO or at least use SYSENTER or SYSCALL machine instructions. YMMV.

You could, as Jeremy answered, use asm ; (you might define your own headers for all the syscalls you are using, and having these syscalls be static inline functions doing some asm) beware that for other syscalls, you want to catch their failure and the errno error code.

Why are you asking?.... The libc (and its startup routines crt0.o ...) is doing complex tricks to call main...

See also this answer

Relationship between system calls API, syscall instruction and exception mechanism (interrupts)

TL;DR

The syscall instruction itself acts like a glorified jump, it's a hardware-supported way to efficiently and safely jump from unprivileged user-space into the kernel.

The syscall instruction jumps to a kernel entry-point that dispatches the call.

Before x86_64 two other mechanisms were used: the int instruction and the sysenter instruction.

They have different entry-points (still present today in 32-bit kernels, and 64-bit kernels that can run 32-bit user-space programs).

The former uses the x86 interrupt machinery and can be confused with the exceptions dispatching (that also uses the interrupt machinery).

However, exceptions are spurious events while int is used to generate a software interrupt, again, a glorified jump.


The C language doesn't concern itself with system calls, it relies on the C runtime to perform all the interactions with the environment of the future program.

The C runtime implements the above-mentioned interactions through an environment specific mechanism.

There could be various layers of software abstractions but in the end the OS APIs get called.

The term API is used to denote a contract, strictly speaking using an API doesn't require to invoke a piece of kernel code (the trend is to implement non-critical functions in userspace to limit the exploitable code), here we are only interested in the subset of the API that requires a privilege switch.

Under Linux, the kernel exposes a set of services accessible from userspace, these entry-points are called system calls.

Under Windows, the kernel services (that are accessed with the same mechanism of the Linux analogues) are considered private in the sense that they are not required to be stable across versions.

A set of DLL/EXE exported functions are used as entry-points instead (e.g. ntoskrnl.exe, hal.dll, kernel32.dll, user32.dll) that in turn use the kernel services through a (private) system call.

Note that under Linux, most system calls have a POSIX wrapper around it, so it's possible to use these wrappers, that are ordinary C functions, to invoke a system call.

The underlying ABI is different, so is for the error reporting; the wrapper translates between the two worlds.

The C runtime calls the OS APIs, in the case of Linux the system calls are used directly because they are public (in the sense that are stable across versions), while for Windows the usual DLLs, like kernel32.dll, are marked as dependencies and used.

We are reduced to the point where an user-mode program, being it part of the C runtime (Linux) or part of an API DLL (Windows), need to invoke a code in the kernel.

The x86 architecture historically offered different ways to do so, for example, a call gate.

Another way is through the int instruction, it has a few advantages:

  • It is what the BIOS and the DOS did in their times.

    In real-mode, using an int instructions is suitable because a vector number (e.g. 21h) is easier to remember than a far address (e.g. 0f000h:0fff0h).
  • It saves the flags.
  • It is easy to set up (setting up ISR is relatively easy).

With the modernization of the architecture this mechanism turned out to have a big disadvantage: it is slow.
Before the introduction of the sysenter (note, sysenter not syscall) instruction there was no faster alternative (a call gate would be equally slow).

With the advent of the Pentium Pro/II[1] a new pair of instructions sysenter and sysexit were introduced to make system calls faster.

Linux started using them since the version 2.5 and are still used today on 32-bit systems I believe.

I won't explain the whole mechanism of the sysenter instruction and the companion VDSO necessary to use it, it is only needed to say that it was faster than the int mechanism (I can't find an article from Andy Glew where he says that sysenter turned out to be slow on Pentium III, I don't know how it performs nowadays).

With the advent of x86-64 the AMD response to sysenter, i.e. the syscall/sysret pair, began the de-facto way to switch from user-mode to kernel-mode.

This is due to the fact that sysenter is actually fast and very simple (it copies rip and rflags into rcx and r11 respectively, masks rflags and jump to an address set in IA32_LSTAR).

64-bit versions of both Linux and Windows use syscall.

To recap, control can be given to the kernel through three mechanism:

  • Software interrupts.

    This was int 80h for 32-bit Linux (pre 2.5) and int 2eh for 32-bit Windows.
  • Via sysenter.

    Used by 32-bit versions of Linux since 2.5.
  • Via syscall.

    Used by 64-bit versions of Linux and Windows.

Here is a nice page to put it in a better shape.

The C runtime is usually a static library, thus pre-compiled, that uses one of the three methods above.

The syscall instruction transfers control to a kernel entry-point (see entry_64.s) directly.

It is an instruction that just does so, it is not implemented by the OS, it is used by the OS.

The term exception is overloaded in CS, C++ has exceptions, so do Java and C#.

The OS can have a language agnostic exception trapping mechanism (under windows it was once called SEH, now has been rewritten).

The CPU also has exceptions.

I believe we are talking about the last meaning.

Exceptions are dispatched through interrupts, they are a kind of interrupt.

It goes unsaid that while exceptions are synchronous (they happen at specific, replayable points) they are "unwanted", they are exceptional, in the sense that programmers tend to avoid them and when they happen is due to either a bug, an unhandled corner case or a bad situation.

They, thus, are not used to transfer control to the kernel (they could).

Software interrupts (that are synchronous too) were used instead; the mechanism is almost exactly the same (exceptions can have a status code pushed on the kernel stack) but the semantic is different.

We never deferenced a null-pointer, accessed an unmapped page or similar to invoke a system call, we used the int instruction instead.

Where does GCC find printf ? My code worked without any #include

gcc will link your program, by default, with the c library libc which implements printf:

$ ldd ./a.out
linux-vdso.so.1 (0x00007ffd5d7d3000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdf2d307000)
/lib64/ld-linux-x86-64.so.2 (0x00007fdf2d4f0000)

$ nm -D /lib/x86_64-linux-gnu/libc.so.6 | grep ' printf' | head -1
0000000000056cf0 T printf@@GLIBC_2.2.5

If you build your program with -nolibc you have to satisfy a few symbols on your own (see
Compiling without libc):

$ gcc -nolibc ./1.c 
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/10/../../../x86_64-linux-gnu/Scrt1.o: in function `_start':
(.text+0x12): undefined reference to `__libc_csu_fini'
/usr/bin/ld: (.text+0x19): undefined reference to `__libc_csu_init'
/usr/bin/ld: (.text+0x26): undefined reference to `__libc_start_main'
/usr/bin/ld: /tmp/user/1000/ccCFGFhf.o: in function `main':
1.c:(.text+0xc): undefined reference to `puts'
collect2: error: ld returned 1 exit status


Related Topics



Leave a reply



Submit