Why Doesn't Time() from Time.H Have a Syscall to Sys_Time

Why doesn't time() from time.h have a syscall to sys_time?

Read time(7). Probably your call to time(2) uses the vdso(7) (maybe via clock_gettime(2) or via __vdso_time). If vdso(7) is used,

When tracing systems calls with strace(1), symbols (system calls)
that are exported by the vDSO will not appear in the trace output.

Details could be kernel and libc specific (and of course architecture specific).

For similar vDSO reasons, strace date don't show any time-related syscalls.

And vDSO is a really handy feature (subject to ASLR). Thanks to it, timing calls (e.g. clock_gettime(2)...) go really quick (about 40 nanoseconds on my i5-4690S). AFAIU, no context switch (or user to kernel mode transition) is happening.

So your 0x7ffff7ff80a8 is probably sitting in the vDSO (and the kernel ensures it contains the current time). You might check by using proc(5) (e.g. reading and showing /proc/self/maps from your program), or perhaps using ldd(1) and pmap(1)

glibc time function implementation

This complicated-looking inline assembly just causes the following assembly instructions to be emitted by the compiler:

mov     eax, 201
syscall

So, the entire time function is just:

time:
mov eax, 201
syscall
ret

The immediate value 201 (0xC9 in hexadecimal notation) is moved into the EAX register, and then the syscall instruction is executed. This instruction does just what the name suggests: it makes a system call. This is basically the way you call platform API functions on Linux. See also section A.2 ("AMD64 Linux Kernel Conventions") of the System V AMD64 ABI.

In brief:

  • The system call ID number is placed into rax.

    (In this case, the number is just 32 bits, so the assembly code places it into eax. The upper 32 bits are implicitly zeroed, saving some bytes in the size of the mov instruction.)

  • The arguments for the system call, if any, are placed in registers: rdi, rsi, rdx, r10, r8, and r9.

    (In this case, for system call #201, there are no arguments that need to be specified, so none of these registers are initialized by the time function.)

  • After syscall is invoked, its result is contained in rax. Conventionally, negative values (−4095 to −1) indicate an error, corresponding to −errno.

  • For system calls, the rcx and r11 registers are treated as volatile, which means that their contents are subject to being clobbered. If the caller cares about those values, it needs to preserve them. All other registers' values are saved across the system call.

    (This is why the clobbers are there in the extended inline asm syntax.)

There is a reference for 64-bit Linux system calls available here (32-bit Linux system calls are here). You can see that 201 (0xC9) corresponds to sys_time.

sys_time interprets the RDI register as a time_t* value. This code:

long int __arg1 = (long int) (t);
register long int _a1 asm ("rdi") = __arg1;

causes the function's parameter, t, to be stored in the RDI register. That doesn't actually cause any machine instructions to be generated, though, because the System V AMD64 calling convention already passes the first parameter of a function in RDI, so t is already in RDI.

The sys_time system call just fills the pointer it finds in RDI, which is the same as the time function's t argument. It also returns its result (an error code) in RAX, which is always used for the return value of a function under the System V AMD64 calling convention, so no machine instructions are required there, either.

Perhaps more clearly:

# inputs:  RDI is a pointer to time_t that will be filled in
# returns: result is left in RAX
time:
mov eax, 201
syscall
ret

Why does this ptrace program say syscall returned -38?

The code doesn't account for the notification of the exec from the child, and so ends up handling syscall entry as syscall exit, and syscall exit as syscall entry. That's why you see "syscall 12 returned" before "syscall 12 called", etc. (-38 is ENOSYS which is put into RAX as a default return value by the kernel's syscall entry code.)

As the ptrace(2) man page states:

PTRACE_TRACEME

Indicates that this process is to be traced by its parent. Any signal (except SIGKILL) delivered to this process will cause it to stop and its parent to be notified via wait(). Also, all subsequent calls to exec() by this process will cause a SIGTRAP to be sent to it, giving the parent a chance to gain control before the new program begins execution. [...]

You said that the original code you were running was "the same as this one except that I'm running execl("/bin/ls", "ls", NULL);". Well, it clearly isn't, because you're working with x86_64 rather than 32-bit and have changed the messages at least.

But, assuming you didn't change too much else, the first time the wait() wakes up the parent, it's not for syscall entry or exit - the parent hasn't executed ptrace(PTRACE_SYSCALL,...) yet. Instead, you're seeing this notification that the child has performed an exec (on x86_64, syscall 59 is execve).

The code incorrectly interprets that as syscall entry. Then it calls ptrace(PTRACE_SYSCALL,...), and the next time the parent is woken it is for a syscall entry (syscall 12), but the code reports it as syscall exit.

Note that in this original case, you never see the execve syscall entry/exit - only the additional notification - because the parent does not execute ptrace(PTRACE_SYSCALL,...) until after it happens.

If you do arrange the code so that the execve syscall entry/exit are caught, you will see the new behaviour that you observe. The parent will be woken three times: once for execve syscall entry (due to use of ptrace(PTRACE_SYSCALL,...), once for execve syscall exit (also due to use of ptrace(PTRACE_SYSCALL,...), and a third time for the exec notification (which happens anyway).


Here is a complete example (for x86 or x86_64) which takes care to show the behaviour of the exec itself by stopping the child first:

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <sys/reg.h>

#ifdef __x86_64__
#define SC_NUMBER (8 * ORIG_RAX)
#define SC_RETCODE (8 * RAX)
#else
#define SC_NUMBER (4 * ORIG_EAX)
#define SC_RETCODE (4 * EAX)
#endif

static void child(void)
{
/* Request tracing by parent: */
ptrace(PTRACE_TRACEME, 0, NULL, NULL);

/* Stop before doing anything, giving parent a chance to catch the exec: */
kill(getpid(), SIGSTOP);

/* Now exec: */
execl("/bin/ls", "ls", NULL);
}

static void parent(pid_t child_pid)
{
int status;
long sc_number, sc_retcode;

while (1)
{
/* Wait for child status to change: */
wait(&status);

if (WIFEXITED(status)) {
printf("Child exit with status %d\n", WEXITSTATUS(status));
exit(0);
}
if (WIFSIGNALED(status)) {
printf("Child exit due to signal %d\n", WTERMSIG(status));
exit(0);
}
if (!WIFSTOPPED(status)) {
printf("wait() returned unhandled status 0x%x\n", status);
exit(0);
}
if (WSTOPSIG(status) == SIGTRAP) {
/* Note that there are *three* reasons why the child might stop
* with SIGTRAP:
* 1) syscall entry
* 2) syscall exit
* 3) child calls exec
*/
sc_number = ptrace(PTRACE_PEEKUSER, child_pid, SC_NUMBER, NULL);
sc_retcode = ptrace(PTRACE_PEEKUSER, child_pid, SC_RETCODE, NULL);
printf("SIGTRAP: syscall %ld, rc = %ld\n", sc_number, sc_retcode);
} else {
printf("Child stopped due to signal %d\n", WSTOPSIG(status));
}
fflush(stdout);

/* Resume child, requesting that it stops again on syscall enter/exit
* (in addition to any other reason why it might stop):
*/
ptrace(PTRACE_SYSCALL, child_pid, NULL, NULL);
}
}

int main(void)
{
pid_t pid = fork();

if (pid == 0)
child();
else
parent(pid);

return 0;
}

which gives something like this (this is for 64-bit - system call numbers are different for 32-bit; in particular execve is 11, rather than 59):

Child stopped due to signal 19
SIGTRAP: syscall 59, rc = -38
SIGTRAP: syscall 59, rc = 0
SIGTRAP: syscall 59, rc = 0
SIGTRAP: syscall 63, rc = -38
SIGTRAP: syscall 63, rc = 0
SIGTRAP: syscall 12, rc = -38
SIGTRAP: syscall 12, rc = 5324800
...

Signal 19 is the explicit SIGSTOP; the child stops three times for the execve as just described above; then twice (entry and exit) for other system calls.

If you're really interesting in all the gory details of ptrace(), the best documentation I'm aware of is the
README-linux-ptrace file in the strace source. As it says, the "API is complex and has subtle quirks"....

NASM printing out time - code doesn't output anything

This is your example translated to C. You are copying the pointer to time to eax instead of eax to the buffer. Still that wouldn't work because you want a char array for write and not a raw integer which will print garbage.

#include <stdlib.h>

char b[255];

int
main()
{
/* You wanted to do this which doesn't work
* because write wont take int* but char arrays
* *(int*)b=time(NULL);
*/

/* Instead you did */
time(NULL);
b;
write(1, b, 255);
exit(1);
}

C segmentation fault after accessing the system time by inline assembly

You have a lot of bugs here, of which the most important are:

  • You pushed things onto the stack, but you didn't pop them off again before leaving the inline assembly block. The compiler doesn't know you did that, so it will look for everything on the stack (such as the return address) in the wrong place afterward. This is is very likely to be what caused the crash.

  • More generally, compilers that use this style of inline assembly don't interpret the assembly instructions at all. They trust you to have used the input, output, clobber annotations correctly. If you neglect to mention even one register or memory area that has been modified, the compiler will generate incorrect code surrounding the assembly insert and the program won't work.

  • "Ubuntu 16.04" is a distribution of Linux, so you are using the wrong calling convention. Linux takes system call arguments in registers, not on the stack, as documented here, and gettimeofday is not system call number 116 on x86-32/Linux. (Always use the SYS_foo constants, from sys/syscall.h, for system call numbers.)

Also, it is best to do as little as possible in the actual inserted assembly. In this case, that means just the int instruction itself. Set up arguments using the input and output constraints, instead. This gives the compiler maximum leeway to optimize. (If you are writing assembly by hand because the compiler is failing to do a sufficiently good job of optimizing, you should write an entire ".s" file of pure assembly, rather than a .c file with gigantic assembly inserts; this is more maintainable.)

Correct code for this task would be something like

#include <assert.h>
#include <sys/time.h>
#include <sys/syscall.h>

struct timeval
call_gettimeofday()
{
struct timeval ret;
int dummy;
asm("int $0x80"
: "=m" (ret), "=a" (dummy)
: "1" (SYS_gettimeofday), "b" (&ret), "c" (0));
assert(!dummy); // gettimeofday should never fail
return ret;
}

As a final note, it is almost always a mistake to use inline assembly to make system calls. The C library's wrapper functions may be doing more work than is apparent to you, and they know how to use a more efficient trap sequence (using sysenter or syscall instead of int) when possible. In the case of gettimeofday, the difference is even more profound: the C library knows how to do a gettimeofday operation without trapping into the kernel at all! (Read up on the vDSO to understand how this is possible.)



Related Topics



Leave a reply



Submit