Separately Redirecting and Recombining Stderr/Stdout Without Losing Ordering

Separately redirecting and recombining stderr/stdout without losing ordering

Preserving perfect order while performing separate redirections is not even theoretically possible without some ugly hackery. Ordering is only preserved in writes (in O_APPEND mode) directly to the same file; as soon as you put something like tee in one process but not the other, ordering guarantees go out the window and can't be retrieved without keeping information about which syscalls were invoked in what order.

So, what would that hackery look like? It might look something like this:

# eat our initialization time *before* we start the background process
sudo sysdig-probe-loader

# now, start monitoring syscalls made by children of this shell that write to fd 1 or 2
# ...funnel content into our logs.log file
sudo sysdig -s 32768 -b -p '%evt.buffer' \
"proc.apid=$$ and evt.type=write and (fd.num=1 or fd.num=2)" \
> >(base64 -i -d >logs.log) \
& sysdig_pid=$!

# Run your-program, with stderr going both to console and to errors.log
./your-program >/dev/null 2> >(tee errors.log)

That said, this remains ugly hackery: It only catches writes direct to FDs 1 and 2, and doesn't track any further redirections that may take place. (This could be improved by performing the writes to FIFOs, and using sysdig to track writes to those FIFOs; that way fdup() and similar operations would work as-expected; but the above suffices to prove the concept).


Making Separate Handling Explicit

Here we demonstrate how to use this to colorize only stderr, and leave stdout alone -- by telling sysdig to generate a stream of JSON as output, and then iterating over that:

exec {colorizer_fd}> >(
jq --unbuffered --arg startColor "$(tput setaf 1)" --arg endColor "$(tput sgr0)" -r '
if .["fd.filename"] == "stdout" then
("STDOUT: " + .["evt.buffer"])
else
("STDERR: " + $startColor + .["evt.buffer"] + $endColor)
end
'
)

sudo sysdig -s 32768 -j -p '%fd.filename %evt.buffer' \
"proc.apid=$$ and evt.type=write and proc.name != jq and (fd.num=1 or fd.num=2)" \
>&$colorizer_fd \
& sysdig_pid=$!

# Run your-program, with stdout and stderr going to two separately-named destinations
./your-program >stdout 2>stderr

Because we're keying off the output filenames (stdout and stderr), these need to be constant for the above code to work -- any temporary directory desired can be used.


Obviously, you shouldn't actually do any of this. Update your program to support whatever logging infrastructure is available in its native language (Log4j in Java, the Python logging module, etc) to allow its logging to be configured explicitly.

redirect stdout and stderr to file and stderr to screen while preserving output order

Standard UNIX semantics do not permit this; it cannot be reliably done using only facilities built into bash or specified in POSIX.

Writes only have a guaranteed order in relation to each other when they happen to the same destination. As soon as you redirect stdout and stderr to different destinations, the operating system no longer offers guarantees around ordering of writes performed to them; doubly when those writes are to a separate process (such as tee) which then must read content and perform its own further writes, subject to OS-level scheduling.

It may be possible to use syscall-level tracing facilities provided by (or available for) your operating system to generate an absolute ordering of system calls, and thus to generate definitively-ordered output; an example of this is given in the answer to Separately redirecting and recombining stderr/stdout without losing ordering.

Using bash itself, however, you can only control where file descriptors of a subprocess are connected, and not how content written to those descriptors is buffered and flushed, so there simply isn't enough control to make what you're asking for possible without writes being subject to reordering.

How to unify stdout and stderr, yet be able to distinguish between them?

You can use select() to multiplex the output. Suppose you have stdout and stderr being captured in pipes, this code will work:

import select
import sys

inputs = set([pipe_stdout, pipe_stderr])

while inputs:
readable, _, _ = select.select(inputs, [], [])
for x in readable:
line = x.readline()
if len(line) == 0:
inputs.discard(x)
if x == pipe_stdout
print 'STDOUT', line
if x == pipe_stderr
print 'STDERR', line

Bash redirect stdout and stderr to seperate files with timestamps

The trick is to make sure that tee, and the process substitution running your log function, exits before the script as a whole does -- so that when the shell that started the script prints its prompt, there isn't any backgrounded process that might write more output after it's done.

As a working example (albeit one focused more on being explicit than terseness):

#!/usr/bin/env bash
stdout_log=stdout.log; stderr_log=stderr.log

log () {
file=$1; shift
while read -r line; do
printf '%(%s)T %s\n' -1 "$line"
done >> "$file"
}

# first, make backups of your original stdout and stderr
exec {stdout_orig_fd}>&1 {stderr_orig_fd}>&2
# for stdout: start your process substitution, record its PID, start tee, record *its* PID
exec {stdout_log_fd}> >(log "$stdout_log"); stdout_log_pid=$!
exec {stdout_tee_fd}> >(tee "/dev/fd/$stdout_log_fd"); stdout_tee_pid=$!
exec {stdout_log_fd}>&- # close stdout_log_fd so the log process can exit when tee does
# for stderr: likewise
exec {stderr_log_fd}> >(log "$stderr_log"); stderr_log_pid=$!
exec {stderr_tee_fd}> >(tee "/dev/fd/$stderr_log_fd" >&2); stderr_tee_pid=$!
exec {stderr_log_fd}>&- # close stderr_log_fd so the log process can exit when tee does
# now actually swap out stdout and stderr for the processes we started
exec 1>&$stdout_tee_fd 2>&$stderr_tee_fd {stdout_tee_fd}>&- {stderr_tee_fd}>&-

# ...do the things you want to log here...
echo "this goes to stdout"; echo "this goes to stderr" >&2

# now, replace the FDs going to tee with the backups...
exec >&"$stdout_orig_fd" 2>&"$stderr_orig_fd"

# ...and wait for the associated processes to exit.
while :; do
ready_to_exit=1
for pid_var in stderr_tee_pid stderr_log_pid stdout_tee_pid stdout_log_pid; do
# kill -0 just checks whether a PID exists; it doesn't actually send a signal
kill -0 "${!pid_var}" &>/dev/null && ready_to_exit=0
done
(( ready_to_exit )) && break
sleep 0.1 # avoid a busy-loop eating unnecessary CPU by sleeping before next poll
done

So What's With The File Descriptor Manipulation?

A few key concepts to make sure we have clear:

  • All subshells have their own copies of the file descriptor table as created when they were fork()ed off from their parent. From that point forward, each file descriptor table is effectively independent.
  • A process reading from (the read end of) a FIFO (or pipe) won't see an EOF until all programs writing to (the write end of) that FIFO have closed their copies of the descriptor.

...so, if you create a FIFO pair, fork() off a child process, and let the child process write to the write end of the FIFO, whatever's reading from the read end will never see an EOF until not just the child, but also the parent, closes their copies.

Thus, the gymnastics you see here:

  • When we run exec {stdout_log_fd}>&-, we're closing the parent shell's copy of the FIFO writing to the log function for stdout, so the only remaining copy is the one used by the tee child process -- so that when tee exits, the subshell running log exits too.
  • When we run exec 1>&$stdout_tee_fd {stdout_tee_fd}>&-, we're doing two things: First, we make FD 1 a copy of the file descriptor whose number is stored in the variable stdout_tee_fd; second, we delete the stdout_tee_fd entry from the file descriptor table, so only the copy on FD 1 remains. This ensures that later, when we run exec >&"$stdout_orig_fd", we're deleting the last remaining write handle to the stdout tee function, causing tee to get an EOF on stdin (so it exits, thus closing the handle it holds on the log function's subshell and letting that subshell exit as well).

Some Final Notes On Process Management

Unfortunately, how bash handles subshells created for process substitutions has changed substantially between still-actively-deployed releases; so while in theory it's possible to use wait "$pid" to let a process substitution exit and collect its exit status, this isn't always reliable -- hence the use of kill -0.

However, if wait "$pid" worked, it would be strongly preferable, because the wait() syscall is what removes a previously-exited process's entry from the process table: It is guaranteed that a PID will not be reused (and a zombie process-table entry is left as a placeholder) if no wait() or waitpid() invocation has taken place.

Modern operating systems try fairly hard to avoid short-term PID reuse, so wraparound is not an active concern in most scenarios. However, if you're worried about this, consider using the flock-based mechanism discussed in https://stackoverflow.com/a/31552333/14122 for waiting for your process substitutions to exit, instead of kill -0.

redirect stdout and stderr to one file, copy of just stderr to another

You can use an additional file descriptor and tee:

{ foo.sh 2>&1 1>&3- | tee stderr.txt; } > stdout_and_stderr.txt 3>&1

Be aware that line buffering may cause the stdout output to appear out of order. If this is a problem, there are ways to overcome that including the use of unbuffer.



Related Topics



Leave a reply



Submit