How Come _Exit(0) (Exiting by Syscall) Prevents Me from Receiving Any Stdout Content

How come _exit(0) (exiting by syscall) prevents me from receiving any stdout content?

when you pipe stdout to wc, stdout becomes fully buffered.

_exit terminates the process immediately and does not run atexit() and other cleanup handlers. The runtime will register such handlers to be run on exit that flushes open FILE*, such as stdout. When those handlers does not get executed on exit, buffered data will be lost.

You should see output if you call fflush(stdout) after the printf call, or if you just run the program in a consol without piping the output to another program - in which case stdout will normally be line buffered, so stdout is flushed whenever you write a \n

Syscall implementation of exit()

The Linux and glibc man pages document all of this (See especially the "C library/kernel differences" in the NOTES section).

  • _exit(2): In glibc 2.3 and later, this wrapper function actually uses the Linux SYS_exit_group system call to exit all threads. Before glibc2.3, it was a wrapper for SYS_exit to exit just the current thread.

  • exit_group(2): glibc wrapper for SYS_exit_group, which exits all threads.

  • exit(3): The ISO C89 function which flushes buffers and then exits the whole process. (It always uses exit_group() because there's no benefit to checking if the process was single-threaded and deciding to use SYS_exit vs. SYS_exit_group). As @Matteo points out, recent ISO C / POSIX standards are thread-aware and one or both probably require this behaviour.

    But apparently exit(3) itself is not thread-safe (in the C library cleanup parts), so I guess don't call it from multiple threads at once.

  • syscall / int 0x80 with SYS_exit: terminates just the current thread, leaving others running. AFAIK, modern glibc has no thin wrapper function for this Linux system call, but I think pthread_exit() uses it if this isn't the last thread. (Otherwise exit(3) -> exit_group(2).)

Only exit(), not _exit() or exit_group(), flushes stdout, leading to "printf doesn't print anything" problems in newbie asm programs if writing to a pipe (which makes stdout full-buffered instead of line-buffered), or if you forgot the \n in the format string. For example, How come _exit(0) (exiting by syscall) prevents me from receiving any stdout content?. If you use any buffered I/O functions, or at_exit, or anything like that, it's usually a good idea to call the libc exit(3) function instead of the system call directly. But of course you can call fflush before SYS_exit_group.

(Also related: On x64 Linux, what is the difference between syscall, int 0x80 and ret to exit a program? - ret from main is equivalent to calling exit(3))


It's not of course the compiler that chose anything, it's libc. When you include headers and write read(fd, buf, 123) or exit(1), the C compiler just sees an ordinary function call.

Some C libraries (e.g. musl, but not glibc) may use inline asm to inline a syscall instruction into your binary, but still the headers are part of the C library, not the compiler.

Using printf in assembly leads to empty output when piping, but works on the terminal

Use call exit instead of a raw _exit syscall after using stdio functions like printf. This flushes stdio buffers (write system call) before making an exit_group system call).

(Or if your program defines a main instead of _start, returning from main is equivalent to calling exit. You can't ret from _start.) Calling fflush(NULL) should also work.


As Michael explained, it is OK to link the C-library dynamically. This is also how it is introduced in the "Programming bottom up" book (see chapter 8).

However it is important to call exit from the C-library in order to end the program and not to bypass it, which was what I wrongly did by calling exit-syscall. As hinted by Michael, exit does a lot of clean up like flushing streams.

That is what happened: As explained here, the C-library buffers the the standard streams as follows:

  1. No buffering for standard error.
  2. If standard out/in is a terminal, it is line-buffered.
  3. If standard out/in is a not a terminal, it is fully-buffered and thus flush is needed before a raw exit system call.

Which case applies is decided when printf is called for the first time for a stream.

So if printf_try is called directly in the terminal, the output of the program can be seen because hello has \n at the end (which triggers the flush in the line-buffered mode) and it is a terminal, also the 2. case.

Calling printf_try via $(./printf_try) means that the stdout is no longer a terminal (actually I don't know whether is is a temp file or a memory file) and thus the 3. case is in effect - there is need for an explicit flush i.e. call to C-exit.

Capture both exit status and output from a system call in R

As of R 2.15, system2 will give the return value as an attribute when stdout and/or stderr are TRUE. This makes it easy to get the text output and return value.

In this example, ret ends up being a string with an attribute "status":

> ret <- system2("ls","xx", stdout=TRUE, stderr=TRUE)
Warning message:
running command ''ls' xx 2>&1' had status 1
> ret
[1] "ls: xx: No such file or directory"
attr(,"status")
[1] 1
> attr(ret, "status")
[1] 1

Nasm segmentation fault on RET in _start

Because ret is NOT the proper way to exit a program in Linux, Windows, or Mac!!!!

_start is not a function, there is no return address on the stack because there is no user-space caller to return to. Execution in user-space started here (in a static executable), at the process entry point. (Or with dynamic linking, it jumped here after the dynamic linker finished, but same result).

On Linux / OS X, the stack pointer is pointing at argc on entry to _start (see the i386 or x86-64 System V ABI doc for more details on the process startup environment); the kernel puts command line args into user-space stack memory before starting user-space. (So if you do try to ret, EIP/RIP = argc = a small integer, not a valid address. If your debugger shows a fault at address 0x00000001 or something, that's why.)


For Windows it is ExitProcess and Linux is is system call -
int 80H using sys_exit, for x86 or using syscall using 60 for 64-bit or a call to exit from the C Library if you are linking to it.

32-bit Linux (i386)

%define  SYS_exit  1   ; call number __NR_exit from <asm/unistd_32.h>

mov eax, SYS_exit ; use the NASM macro we defined earlier
xor ebx, ebx ; ebx = 0 exit status
int 80H ; _exit(0)

64-bit Linux (amd64)

mov     rax, 60        ; SYS_exit aka __NR_exit from asm/unistd_64.h
xor rdi, rdi ; edi = 0 first arg to 64-bit system calls
syscall ; _exit(0)

(In GAS you can actually #include <sys/syscall.h> or <asm/unistd.h> to get the right numbers for the mode you're assembling a .S for, but NASM can't easily use the C preprocessor.
See Polygot include file for nasm/yasm and C for hints.)

32-bit Windows (x86)

push    0
call ExitProcess


Or Windows/Linux linking against the C Library

; pass an int exit_status as appropriate for the calling convention
; push 0 / xor edi,edi / xor ecx,ecx
call exit

(Or for 32-bit x86 Windows, call _exit, because C names get prepended with an underscore, unlike in x86-64 Windows. The POSIX _exit function would be call __exit, if Windows had one.)

Windows x64's calling convention includes shadow space which the caller has to reserve, but exit isn't going to return so it's ok to let it step on that space above its return address. Also, 16-byte stack alignment is required by the calling convention before call exit except for 32-bit Windows, but often won't actually crash for a simple function like exit().


call exit (unlike a raw exit system call or libc _exit) will flush stdio buffers first. If you used printf from _start, use exit to make sure all output is printed before you exit, even if stdout is redirected to a file (making stdout full-buffered, not line-buffered).

It's generally recommended that if you use libc functions, you write a main function and link with gcc so it's called by the normal CRT start functions which you can ret to.

See also

  • Syscall implementation of exit()
  • How come _exit(0) (exiting by syscall) prevents me from receiving any stdout content?

Defining main as something that _start falls through into doesn't make it special, it's just confusing to use a main label if it's not like a C main function called by a _start that's prepared to exit after main returns.

Printf without newline in assembly

fflush() flushes buffered output in line or full-buffered stdio streams:

extern fflush
...
xor edi, edi ; RDI = 0
call fflush ; fflush(NULL) flushes all streams
...

Alternatively, mov rdi, [stdout] / call fflush also works to flush only that stream. (Use default rel for efficient RIP-relative addressing, and you'll need extern stdout as well.)

On x64 Linux, what is the difference between syscall, int 0x80 and ret to exit a program?

If you use printf or other libc functions, it's best to ret from main or call exit. (Which are equivalent; main's caller will call the libc exit function.)

If not, if you were only making other raw system calls like write with syscall, it's also appropriate and consistent to exit that way, but either way, or call exit are 100% fine in main.

If you want to work without libc at all, e.g. put your code under _start: instead of main: and link with ld or gcc -static -nostdlib, then you can't use ret. Use mov eax, 231 (__NR_exit_group) / syscall.

main is a real & normal function like any other (called with a valid return address), but _start (the process entry point) isn't. On entry to _start, the stack holds argc and argv, so trying to ret would set RIP=argc, and then code-fetch would segfault on that unmapped address. Nasm segmentation fault on RET in _start



System call vs. ret-from-main

Exiting via a system call is like calling _exit() in C - skip atexit() and libc cleanup, notably not flushing any buffered stdout output (line buffered on a terminal, full-buffered otherwise).
This leads to symptoms such as Using printf in assembly leads to empty output when piping, but works on the terminal (or if your output doesn't end with \n, even on a terminal.)

main is a function, called (indirectly) from CRT startup code. (Assuming you link your program normally, like you would a C program.) Your hand-written main works exactly like a compiler-generate C main function would. Its caller (__libc_start_main) really does do something like int result = main(argc, argv); exit(result);,

e.g. call rax (pointer passed by _start) / mov edi, eax / call exit.

So returning from main is exactly1 like calling exit.

  • Syscall implementation of exit() for a comparison of the relevant C functions, exit vs. _exit vs. exit_group and the underlying asm system calls.

  • C question: What is the difference between exit and return? is primarily about exit() vs. return, although there is mention of calling _exit() directly, i.e. just making a system call. It's applicable because C main compiles to an asm main just like you'd write by hand.

Footnote 1: You can invent a hypothetical intentionally weird case where it's different. e.g. you used stack space in main as your stdio buffer with sub rsp, 1024 / mov rsi, rsp / ... / call setvbuf. Then returning from main would involve putting RSP above that buffer, and __libc_start_main's call to exit could overwrite some of that buffer with return addresses and locals before execution reached the fflush cleanup. This mistake is more obvious in asm than C because you need leave or mov rsp, rbp or add rsp, 1024 or something to point RSP at your return address.

In C++, return from main runs destructors for its locals (before global/static exit stuff), exit doesn't. But that just means the compiler makes asm that does more stuff before actually running the ret, so it's all manual in asm, like in C.

The other difference is of course the asm / calling-convention details: exit status in EAX (return value) or EDI (first arg), and of course to ret you have to have RSP pointing at your return address, like it was on function entry. With call exit you don't, and you can even do a conditional tailcall of exit like jne exit. Since it's a noreturn function, you don't really need RSP pointing at a valid return address. (RSP should be aligned by 16 before a call, though, or RSP%16 = 8 before a tailcall, matching the alignment after call pushes a return address. It's unlikely that exit / fflush cleanup will do any alignment-required stores/loads to the stack, but it's a good habit to get this right.)

(This whole footnote is about ret vs. call exit, not syscall, so it's a bit of a tangent from the rest of the answer. You can also run syscall without caring where the stack-pointer points.)



SYS_exit vs. SYS_exit_group raw system calls

The raw SYS_exit system call is for exiting the current thread, like pthread_exit().

(eax=60 / syscall, or eax=1 / int 0x80).

SYS_exit_group is for exiting the whole program, like _exit.

(eax=231 / syscall, or eax=252 / int 0x80).

In a single-threaded program you can use either, but conceptually exit_group makes more sense to me if you're going to use raw system calls. glibc's _exit() wrapper function actually uses the exit_group system call (since glibc 2.3). See Syscall implementation of exit() for more details.

However, nearly all the hand-written asm you'll ever see uses SYS_exit1. It's not "wrong", and SYS_exit is perfectly acceptable for a program that didn't start more threads. Especially if you're trying to save code size with xor eax,eax / inc eax (3 bytes in 32-bit mode) or push 60 / pop rax (3 bytes in 64-bit mode), while push 231/pop rax would be even larger than mov eax,231 because it doesn't fit in a signed imm8.

Note 1: (Usually actually hard-coding the number, not using __NR_... constants from asm/unistd.h or their SYS_... names from sys/syscall.h)

And historically, it's all there was. Note that in unistd_32.h, __NR_exit has call number 1, but __NR_exit_group = 252 wasn't added until years later when the kernel gained support for tasks that share virtual address space with their parent, aka threads started by clone(2). This is when SYS_exit conceptually became "exit current thread". (But one could easily and convincingly argue that in a single-threaded program, SYS_exit does still mean exit the whole program, because it only differs from exit_group if there are multiple threads.)

To be honest, I've never used eax=252 / int 0x80 in anything, only ever eax=1. It's only in 64-bit code where I often use mov eax,231 instead of mov eax,60 because neither number is "simple" or memorable the way 1 is, so might as well be a cool guy and use the "modern" exit_group way in my single-threaded toy program / experiment / microbenchmark / SO answer. :P (If I didn't enjoy tilting at windmills, I wouldn't spend so much time on assembly, especially on SO.)

And BTW, I usually use NASM for one-off experiments so it's inconvenient to use pre-defined symbolic constants for call numbers; with GCC to preprocess a .S before running GAS you can make your code self-documenting with #include <sys/syscall.h> so you can use mov $SYS_exit_group, %eax (or $__NR_exit_group), or mov eax, __NR_exit_group with .intel_syntax noprefix.



Don't use the 32-bit int 0x80 ABI in 64-bit code:

What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? explains what happens if you use the COMPAT_IA32_EMULATION int 0x80 ABI in 64-bit code.

It's totally fine for just exiting, as long as your kernel has that support compiled in, otherwise it will segfault just like any other random int number like int 0x7f. (e.g. on WSL1, or people that built custom kernels and disabled that support.)

But the only reason you'd do it that way in asm would be so you could build the same source file with nasm -felf32 or nasm -felf64. (You can't use syscall in 32-bit code, except on some AMD CPUs which have a 32-bit version of syscall. And the 32-bit ABI uses different call numbers anyway so this wouldn't let the same source be useful for both modes.)


Related:

  • Why am I allowed to exit main using ret? (CRT startup code calls main, you're not returning directly to the kernel.)
  • Nasm segmentation fault on RET in _start - you can't ret from _start
  • Using printf in assembly leads to empty output when piping, but works on the terminal stdout buffer (not) flushing with raw system call exit
  • Syscall implementation of exit() call exit vs. mov eax,60/syscall (_exit) vs. mov eax,231/syscall (exit_group).
  • Can't call C standard library function on 64-bit Linux from assembly (yasm) code - modern Linux distros config GCC in a way that call exit or call puts won't link with nasm -felf64 foo.asm && gcc foo.o.
  • Is main() really start of a C++ program? - Ciro's answer is a deep dive into how glibc + its CRT startup code actually call main (including x86-64 asm disassembly in GDB), and shows the glibc source code for __libc_start_main.
  • Linux x86 Program Start Up
    or - How the heck do we get to main()? 32-bit asm, and more detail than you'll probably want until you're a lot more comfortable with asm, but if you've ever wondered why CRT runs so much code before getting to main, that covers what's happening at a level that's a couple steps up from using GDB with starti (stop at the process entry point, e.g. in the dynamic linker's _start) and stepi until you get to your own _start or main.
  • https://stackoverflow.com/tags/x86/info lots of good links about this and everything else.

Linux x86-64 fork syscall strange behavior against C standard libc FILE I/O (keywords: fork, fclose, linux)

Thanks to @Jonathan Leffler!

This problem is a duplicate of Why does forking my process cause the file to be read infinitely

The missing knowledge, why does the bug not occur on Linux 5.10.0-051000-generic turned out that it is not related to the kernel.


It turned out that the parent process competes with the child processes (not related to kernel).

  • Note: change the offset of file handle from child process will also change the offset in parent process if the handle is created by the parent.
  • If there is no fclose(3) in the childs, the child processes will call lseek(2) as soon as they call exit(3). This will cause the parent re-read the same offset, because the childs call lseek(2) with negative offset + SEEK_CUR.

(I don't know why it is necessary to call lseek(2) before exit, it might have been explained in @Jonathan Leffler's answer, I did not read the whole answer carefully).

  • If the parent finishes to read the entire file before the childs call lseek(2). Then there is no problem at all.

Also, as @iBug has mentioned But keep in mind that process scheduling may make the result non-predictable, unless you implement some kind of "syncing".

The parent process on Linux 5.10.0-051000-generic machine I used was just a lucky process that always won to read the entire file first before the childs call lseek(2).

I tried to add more lines to the file (to be 150 lines), so the parent will mostly be slower than reading 6 lines, and the undefined behavior happens.

Test result: https://gist.githubusercontent.com/ammarfaizi2/b72bd03fcc13779f96b8bbeef9253e66/raw/da1eff4ed5434aa51929e5c810d54de8ffe15548/test2_fix.txt

Why does this ptrace program say syscall returned -38?

The code doesn't account for the notification of the exec from the child, and so ends up handling syscall entry as syscall exit, and syscall exit as syscall entry. That's why you see "syscall 12 returned" before "syscall 12 called", etc. (-38 is ENOSYS which is put into RAX as a default return value by the kernel's syscall entry code.)

As the ptrace(2) man page states:

PTRACE_TRACEME

Indicates that this process is to be traced by its parent. Any signal (except SIGKILL) delivered to this process will cause it to stop and its parent to be notified via wait(). Also, all subsequent calls to exec() by this process will cause a SIGTRAP to be sent to it, giving the parent a chance to gain control before the new program begins execution. [...]

You said that the original code you were running was "the same as this one except that I'm running execl("/bin/ls", "ls", NULL);". Well, it clearly isn't, because you're working with x86_64 rather than 32-bit and have changed the messages at least.

But, assuming you didn't change too much else, the first time the wait() wakes up the parent, it's not for syscall entry or exit - the parent hasn't executed ptrace(PTRACE_SYSCALL,...) yet. Instead, you're seeing this notification that the child has performed an exec (on x86_64, syscall 59 is execve).

The code incorrectly interprets that as syscall entry. Then it calls ptrace(PTRACE_SYSCALL,...), and the next time the parent is woken it is for a syscall entry (syscall 12), but the code reports it as syscall exit.

Note that in this original case, you never see the execve syscall entry/exit - only the additional notification - because the parent does not execute ptrace(PTRACE_SYSCALL,...) until after it happens.

If you do arrange the code so that the execve syscall entry/exit are caught, you will see the new behaviour that you observe. The parent will be woken three times: once for execve syscall entry (due to use of ptrace(PTRACE_SYSCALL,...), once for execve syscall exit (also due to use of ptrace(PTRACE_SYSCALL,...), and a third time for the exec notification (which happens anyway).


Here is a complete example (for x86 or x86_64) which takes care to show the behaviour of the exec itself by stopping the child first:

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <sys/reg.h>

#ifdef __x86_64__
#define SC_NUMBER (8 * ORIG_RAX)
#define SC_RETCODE (8 * RAX)
#else
#define SC_NUMBER (4 * ORIG_EAX)
#define SC_RETCODE (4 * EAX)
#endif

static void child(void)
{
/* Request tracing by parent: */
ptrace(PTRACE_TRACEME, 0, NULL, NULL);

/* Stop before doing anything, giving parent a chance to catch the exec: */
kill(getpid(), SIGSTOP);

/* Now exec: */
execl("/bin/ls", "ls", NULL);
}

static void parent(pid_t child_pid)
{
int status;
long sc_number, sc_retcode;

while (1)
{
/* Wait for child status to change: */
wait(&status);

if (WIFEXITED(status)) {
printf("Child exit with status %d\n", WEXITSTATUS(status));
exit(0);
}
if (WIFSIGNALED(status)) {
printf("Child exit due to signal %d\n", WTERMSIG(status));
exit(0);
}
if (!WIFSTOPPED(status)) {
printf("wait() returned unhandled status 0x%x\n", status);
exit(0);
}
if (WSTOPSIG(status) == SIGTRAP) {
/* Note that there are *three* reasons why the child might stop
* with SIGTRAP:
* 1) syscall entry
* 2) syscall exit
* 3) child calls exec
*/
sc_number = ptrace(PTRACE_PEEKUSER, child_pid, SC_NUMBER, NULL);
sc_retcode = ptrace(PTRACE_PEEKUSER, child_pid, SC_RETCODE, NULL);
printf("SIGTRAP: syscall %ld, rc = %ld\n", sc_number, sc_retcode);
} else {
printf("Child stopped due to signal %d\n", WSTOPSIG(status));
}
fflush(stdout);

/* Resume child, requesting that it stops again on syscall enter/exit
* (in addition to any other reason why it might stop):
*/
ptrace(PTRACE_SYSCALL, child_pid, NULL, NULL);
}
}

int main(void)
{
pid_t pid = fork();

if (pid == 0)
child();
else
parent(pid);

return 0;
}

which gives something like this (this is for 64-bit - system call numbers are different for 32-bit; in particular execve is 11, rather than 59):

Child stopped due to signal 19
SIGTRAP: syscall 59, rc = -38
SIGTRAP: syscall 59, rc = 0
SIGTRAP: syscall 59, rc = 0
SIGTRAP: syscall 63, rc = -38
SIGTRAP: syscall 63, rc = 0
SIGTRAP: syscall 12, rc = -38
SIGTRAP: syscall 12, rc = 5324800
...

Signal 19 is the explicit SIGSTOP; the child stops three times for the execve as just described above; then twice (entry and exit) for other system calls.

If you're really interesting in all the gory details of ptrace(), the best documentation I'm aware of is the
README-linux-ptrace file in the strace source. As it says, the "API is complex and has subtle quirks"....



Related Topics



Leave a reply



Submit