Can You Fool Isatty and Log Stdout and Stderr Separately

Output stdout and stderr to file and screen and stderr to file in a limited environment

Finally, reaching the goal. I want to say that I have been inspired by @WilliamPursell's answer.

{ "$0" "${mainArgs[@]}" 2>&1 1>&3 | tee -a "$logPath/$logFileName.err" 1>&3 & } 3>&1 | tee -a "$logPath/$logFileName.log" &

Explanation

Relaunch the script with...
Sending stderr (2>&1) to stdout and...
Sending stdout to a new file descriptor (1>&3)
Pipe it to tee which receives stderr to duplicate the errors in a file and to file descriptor #1 with...
Sending stdout to the new file descriptor (1>&3)...
And having & to ensure no blocking
Then grouping the previous commands using curly brackets.
Sending the grouped commands new file descriptor to stdout (3>&1)
Pipe it to tee which receives stdout that combines errors and normal output that write to file and display on screen
And having & to ensure no blocking

Full code of my activateLogs function for those interested. I also included the dependencies even though they could be inserted into the activateLogs function.

m=0
declare -a mainArgs
if [ ! "$#" = "0" ]; then
    for arg in "$@"; do
        mainArgs[$m]=$arg
        m=$(($m + 1))
    done
fi

function containsElement()
# $1 string to find
# $2 array to search in
# return 0 if there is a match, otherwise 1
{
  local e match="$1"
  shift
  for e; do [[ "$e" == "$match" ]] && return 0; done
  return 1
}

function hasMainArg()
# $1 string to find
# return 0 if there is a match, otherwise 1
{
    local match="$1"
    containsElement "$1" "${mainArgs[@]}"
    return $?
}

function activateLogs()
# $1 = logOutput: What is the output for logs: SCREEN, DISK, BOTH. Default is DISK. Optional parameter.
{
    local logOutput=$1
    if [ "$logOutput" != "SCREEN" ] && [ "$logOutput" != "BOTH" ]; then
        logOutput="DISK"
    fi
    
    if [ "$logOutput" = "SCREEN" ]; then
        echo "Logs will only be output to screen"
        return
    fi
    
    hasMainArg "--force-log"
    local forceLog=$?
        
    local isFileDescriptor3Exist=$(command 2>/dev/null >&3 && echo "Y")
    
    if [ "$isFileDescriptor3Exist" = "Y" ]; then
        echo "Logs are configured"
    elif [ "$forceLog" = "1" ] && ([ ! -t 1 ] || [ ! -t 2 ]); then
        # Use external file descriptor if they are set except if having "--force-log"
        echo "Logs are configured externally"
    else
        echo "Relaunching with logs files"
        local logPath="logs"
        if [ ! -d $logPath ]; then mkdir $logPath; fi
        
        local logFileName=$(basename "$0")"."$(date +%Y-%m-%d.%k-%M-%S)
    
        exec 4<> "$logPath/$logFileName.log" # File descriptor created only to get the underlying file in any output option
        if [ "$logOutput" = "DISK" ]; then
            # FROM: https://stackoverflow.com/a/45426547/214898
            exec 3<> "$logPath/$logFileName.log"
            "$0" "${mainArgs[@]}" 2>&1 1>&3 | tee -a "$logPath/$logFileName.err" 1>&3 &
        else
            # FROM: https://stackoverflow.com/a/70790574/214898
            { "$0" "${mainArgs[@]}" 2>&1 1>&3 | tee -a "$logPath/$logFileName.err" 1>&3 & } 3>&1 | tee -a "$logPath/$logFileName.log" &
        fi
        
        exit        
    fi
}

#activateLogs "DISK"
#activateLogs "SCREEN"
activateLogs "BOTH"

echo "FIRST"
echo "ERROR" >&2
echo "LAST"
echo "LAST2"

Truncated output log files when logging stdout and stderr separately

I think I've finally figured it out.
The problem here is that the StreamLogger code can't explicitly check to make sure the stdout is fully written. Once the main thread running the subprocess receives a returncode, it exits the context manager, calling the __exit__() method of the StreamLogger, which is inherited from IOBase (source code here). And this calls close() which changes the self._run attribute to False. This causes the thread that is polling the pipe to stop looping, regardless of what is still coming through the pipe.

This works fine for most commands with small to moderate outputs to stdout, where there is no lag time between the returncode returning and the output being written. But in my case I'm running a program through subprocess with a substantial amount of text written to stdout. So there's a kind of race against time to try and get the pipe cleared before the main thread tells the child thread to stop polling the pipe.

The two critical variables here are the size of the buffer being read from the pipe and the frequency at which the pipe is polled. I fixed my problem by increasing the buffer size in os.read() to 4096, and also removing the time.sleep() in the while loop in the _flusher() method. This maximizes the amount of data that can be read from the pipe and in my case ensures the output is fully logged before the logging loop stops.

Run command and get its stdout, stderr separately in near real time like in a terminal

The stdout and stderr of the program being run can be logged separately.

You can't use pexpect because both stdout and stderr go to the same pty and there is no way to separate them after that.

The stdout and stderr of the program being run can be viewed in near-real time, such that if the child process hangs, the user can see. (i.e. we do not wait for execution to complete before printing the stdout/stderr to the user)

If the output of a subprocess is not a tty then it is likely that it uses a block buffering and therefore if it doesn't produce much output then it won't be "real time" e.g., if the buffer is 4K then your parent Python process won't see anything until the child process prints 4K chars and the buffer overflows or it is flushed explicitly (inside the subprocess). This buffer is inside the child process and there are no standard ways to manage it from outside. Here's picture that shows stdio buffers and the pipe buffer for command 1 | command2 shell pipeline:

pipe/stdio buffers

The program being run does not know it is being run via python, and thus will not do unexpected things (like chunk its output instead of printing it in real-time, or exit because it demands a terminal to view its output).

It seems, you meant the opposite i.e., it is likely that your child process chunks its output instead of flushing each output line as soon as possible if the output is redirected to a pipe (when you use stdout=PIPE in Python). It means that the default threading or asyncio solutions won't work as is in your case.

There are several options to workaround it:

the command may accept a command-line argument such as grep --line-buffered or python -u, to disable block buffering.

stdbuf works for some programs i.e., you could run ['stdbuf', '-oL', '-eL'] + command using the threading or asyncio solution above and you should get stdout, stderr separately and lines should appear in near-real time:

#!/usr/bin/env python3
import os
import sys
from select import select
from subprocess import Popen, PIPE

with Popen(['stdbuf', '-oL', '-e0', 'curl', 'www.google.com'],
           stdout=PIPE, stderr=PIPE) as p:
    readable = {
        p.stdout.fileno(): sys.stdout.buffer, # log separately
        p.stderr.fileno(): sys.stderr.buffer,
    }
    while readable:
        for fd in select(readable, [], [])[0]:
            data = os.read(fd, 1024) # read available
            if not data: # EOF
                del readable[fd]
            else: 
                readable[fd].write(data)
                readable[fd].flush()

finally, you could try pty + select solution with two ptys:

#!/usr/bin/env python3
import errno
import os
import pty
import sys
from select import select
from subprocess import Popen

masters, slaves = zip(pty.openpty(), pty.openpty())
with Popen([sys.executable, '-c', r'''import sys, time
print('stdout', 1) # no explicit flush
time.sleep(.5)
print('stderr', 2, file=sys.stderr)
time.sleep(.5)
print('stdout', 3)
time.sleep(.5)
print('stderr', 4, file=sys.stderr)
'''],
           stdin=slaves[0], stdout=slaves[0], stderr=slaves[1]):
    for fd in slaves:
        os.close(fd) # no input
    readable = {
        masters[0]: sys.stdout.buffer, # log separately
        masters[1]: sys.stderr.buffer,
    }
    while readable:
        for fd in select(readable, [], [])[0]:
            try:
                data = os.read(fd, 1024) # read available
            except OSError as e:
                if e.errno != errno.EIO:
                    raise #XXX cleanup
                del readable[fd] # EIO means EOF on some systems
            else:
                if not data: # EOF
                    del readable[fd]
                else:
                    readable[fd].write(data)
                    readable[fd].flush()
for fd in masters:
    os.close(fd)

I don't know what are the side-effects of using different ptys for stdout, stderr. You could try whether a single pty is enough in your case e.g., set stderr=PIPE and use p.stderr.fileno() instead of masters[1]. Comment in sh source suggests that there are issues if stderr not in {STDOUT, pipe}

Can You Fool Isatty and Log Stdout and Stderr Separately

Output stdout and stderr to file and screen and stderr to file in a limited environment

Truncated output log files when logging stdout and stderr separately

Run command and get its stdout, stderr separately in near real time like in a terminal

Related Topics

Leave a reply