Bash Pipe Handling

Bash Pipe Handling

I decided to write a slightly more detailed explanation.

The "magic" here lies in the operating system. Both programs do start up at roughly the same time, and run at the same time (the operating system assigns them slices of time on the processor to run) as every other simultaneously running process on your computer (including the terminal application and the kernel). So, before any data gets passed, the processes are doing whatever initialization necessary. In your example, tail is parsing the '-20' argument and cat is parsing the 'file.txt' argument and opening the file. At some point tail will get to the point where it needs input and it will tell the operating system that it is waiting for input. At some other point (either before or after, it doesn't matter) cat will start passing data to the operating system using stdout. This goes into a buffer in the operating system. The next time tail gets a time slice on the processor after some data has been put into the buffer by cat, it will retrieve some amount of that data (or all of it) which leaves the buffer on the operating system. When the buffer is empty, at some point tail will have to wait for cat to output more data. If cat is outputting data much faster than tail is handling it, the buffer will expand. cat will eventually be done outputting data, but tail will still be processing, so cat will close and tail will process all remaining data in the buffer. The operating system will signal tail when their is no more incoming data with an EOF. Tail will process the remaining data. In this case, tail is probably just receiving all the data into a circular buffer of 20 lines, and when it is signalled by the operating system that there is no more incoming data, it then dumps the last twenty lines to its own stdout, which just gets displayed in the terminal. Since tail is a much simpler program than cat, it will likely spend most of the time waiting for cat to put data into the buffer.

On a system with multiple processors, the two programs will not just be sharing alternating time slices on the same processor core, but likely running at the same time on separate cores.

To get into a little more detail, if you open some kind of process monitor (operating system specific) like 'top' in Linux you will see a whole list of running processes, most of which are effectively using 0% of the processor. Most applications, unless they are crunching data, spend most of their time doing nothing. This is good, because it allows other processes to have unfettered access to the processor according to their needs. This is accomplished in basically three ways. A process could get to a sleep(n) style instruction where it basically tells the kernel to wait n milliseconds before giving it another time slice to work with. Most commonly a program needs to wait for something from another program, like 'tail' waiting for more data to enter the buffer. In this case the operating system will wake up the process when more data is available. Lastly, the kernel can preempt a process in the middle of execution, giving some processor time slices to other processes. 'cat' and 'tail' are simple programs. In this example, tail spends most of it's time waiting for more data on the buffer, and cat spends most of it's time waiting for the operating system to retrieve data from the harddrive. The bottleneck is the speed (or slowness) of the physical medium that the file is stored on. That perceptible delay you might detect when you run this command for the first time is the time it takes for the read heads on the disk drive to seek to the position on the harddrive where 'file.txt' is. If you run the command a second time, the operating system will likely have the contents of file.txt cached in memory, and you will not likely see any perceptible delay (unless file.txt is very large, or the file is no longer cached.)

Most operations you do on your computer are IO bound, which is to say that you are usually waiting for data to come from your harddrive, or from a network device, etc.

Specific pipe command in Ubuntu's shell handling in C

Why do I get a prompt before the output?

Your main process doesn't wait for the children to finish. What you see is:

  1. Main starts
  2. Main creates children
  3. Main exits
  4. BASH prints prompt
  5. Children start their work

To prevent this, you need to wait for the children. See How to wait until all child processes called by fork() complete?

In your case, it's enough to add

 waitpid(-1, NULL, 0);

after the loop.

Specific pipe command in Ubuntu's shell handling in C

Why do I get a prompt before the output?

Your main process doesn't wait for the children to finish. What you see is:

  1. Main starts
  2. Main creates children
  3. Main exits
  4. BASH prints prompt
  5. Children start their work

To prevent this, you need to wait for the children. See How to wait until all child processes called by fork() complete?

In your case, it's enough to add

 waitpid(-1, NULL, 0);

after the loop.

Pipe output and capture exit status in Bash

There is an internal Bash variable called $PIPESTATUS; it’s an array that holds the exit status of each command in your last foreground pipeline of commands.

<command> | tee out.txt ; test ${PIPESTATUS[0]} -eq 0

Or another alternative which also works with other shells (like zsh) would be to enable pipefail:

set -o pipefail
...

The first option does not work with zsh due to a little bit different syntax.

errexit and pipe in bash

Add set -o pipefail and it will manage errors between pipes.

How do you catch error codes in a shell pipe?

If you really don't want the second command to proceed until the first is known to be successful, then you probably need to use temporary files. The simple version of that is:

tmp=${TMPDIR:-/tmp}/mine.$$
if ./a > $tmp.1
then
if ./b <$tmp.1 >$tmp.2
then
if ./c <$tmp.2
then : OK
else echo "./c failed" 1>&2
fi
else echo "./b failed" 1>&2
fi
else echo "./a failed" 1>&2
fi
rm -f $tmp.[12]

The '1>&2' redirection can also be abbreviated '>&2'; however, an old version of the MKS shell mishandled the error redirection without the preceding '1' so I've used that unambiguous notation for reliability for ages.

This leaks files if you interrupt something. Bomb-proof (more or less) shell programming uses:

tmp=${TMPDIR:-/tmp}/mine.$$
trap 'rm -f $tmp.[12]; exit 1' 0 1 2 3 13 15
...if statement as before...
rm -f $tmp.[12]
trap 0 1 2 3 13 15

The first trap line says 'run the commands 'rm -f $tmp.[12]; exit 1' when any of the signals 1 SIGHUP, 2 SIGINT, 3 SIGQUIT, 13 SIGPIPE, or 15 SIGTERM occur, or 0 (when the shell exits for any reason).
If you're writing a shell script, the final trap only needs to remove the trap on 0, which is the shell exit trap (you can leave the other signals in place since the process is about to terminate anyway).

In the original pipeline, it is feasible for 'c' to be reading data from 'b' before 'a' has finished - this is usually desirable (it gives multiple cores work to do, for example). If 'b' is a 'sort' phase, then this won't apply - 'b' has to see all its input before it can generate any of its output.

If you want to detect which command(s) fail, you can use:

(./a || echo "./a exited with $?" 1>&2) |
(./b || echo "./b exited with $?" 1>&2) |
(./c || echo "./c exited with $?" 1>&2)

This is simple and symmetric - it is trivial to extend to a 4-part or N-part pipeline.

Simple experimentation with 'set -e' didn't help.

How can I pipe output, from a command in an if statement, to a function?

The Print function doesn't read standard input so there's no point piping data to it. One possible way to do what you want with the current implementation of Print is:

if ! occ_output=$(sudo -u "$web_user" "$nextcloud_dir/occ" files:scan --all 2>&1); then
Print "Error: Failed to scan files. Are you in maintenance mode?"
fi

Print "'occ' output: $occ_output"

Since there is only one line in the body of the if statement you could use || instead:

occ_output=$(sudo -u "$web_user" "$nextcloud_dir/occ" files:scan --all 2>&1) \
|| Print "Error: Failed to scan files. Are you in maintenance mode?"

Print "'occ' output: $occ_output"

The 2>&1 causes both standard output and error output of occ to be captured to occ_output.

Note that the body of the Print function could be simplified to:

[[ $quiet_mode == No ]] && printf '%s\n' "$1"
(( logging )) && printf '%s\n' "$1" >> "$log_file"

See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why I replaced echo "$1" with printf '%s\n' "$1".



Related Topics



Leave a reply



Submit