Bash: Split Stdout from Multiple Concurrent Commands into Columns

Bash: Split stdout from multiple concurrent commands into columns

Regrettably answering my own question.

None of the supplied solutions were exactly what I was looking for. So I developed my own command line utility: multiview. Maybe others will benefit?

It works by piping processes' stdout/stderr to a command interface and then by launching a "viewer" to see their outputs in columns:

fooProcess | multiview -s & \
barProcess | multiview -s & \
bazProcess | multiview -s & \
multiview

This will display a neatly organized column view of their outputs. You can name each process as well by adding a string after the -s flag:

fooProcess | multiview -s "foo" & \
barProcess | multiview -s "bar" & \
bazProcess | multiview -s "baz" & \
multiview

There are a few other options, but thats the gist of it.

Hope this helps!

How to redirect output in multiple columns in bash?

With pr:

pr -2 -t -s file

or from stdin:

cat file | pr -2 -t -s

Output:


Run1    Run2
1       1
2       2
3       3
4       4

See: man pr

How can I send the stdout of one process to multiple processes using (preferably unnamed) pipes in Unix (or Windows)?

^{Editor's note:

- >(…) is a process substitution that is a nonstandard shell feature of some POSIX-compatible shells: bash, ksh, zsh.

- This answer accidentally sends the output process substitution's output through the pipeline too: echo 123 | tee >(tr 1 a) | tr 1 b.

- Output from the process substitutions will be unpredictably interleaved, and, except in zsh, the pipeline may terminate before the commands inside >(…) do.}

In unix (or on a mac), use the tee command:

$ echo 123 | tee >(tr 1 a) >(tr 1 b) >/dev/null
b23
a23

Usually you would use tee to redirect output to multiple files, but using >(...) you can
redirect to another process. So, in general,

$ proc1 | tee >(proc2) ... >(procN-1) >(procN) >/dev/null

will do what you want.

Under windows, I don't think the built-in shell has an equivalent. Microsoft's Windows PowerShell has a tee command though.

Can I chain multiple commands and make all of them take the same input from stdin?

I like Stephen's idea of using awk instead of grep.

It ain't pretty, but here's a command that uses output redirection to keep all data flowing through stdout:

cat food.txt | 
awk '/coffee/ {print $0 > "/dev/stderr"} {print $0}' 
    2> coffee.txt | 
awk '/tea/ {print $0 > "/dev/stderr"} {print $0}' 
    2> tea.txt

As you can see, it uses awk to send all lines matching 'coffee' to stderr, and all lines regardless of content to stdout. Then stderr is fed to a file, and the process repeats with 'tea'.

If you wanted to filter out content at each step, you might use this:

cat food.txt | 
awk '/coffee/ {print $0 > "/dev/stderr"} $0 !~ /coffee/ {print $0}' 
    2> coffee.txt | 
awk '/tea/ {print $0 > "/dev/stderr"} $0 !~ /tea/ {print $0}' 
    2> tea.txt

Why is the output from these parallel processes not messed up?

Background jobs are not threads. With a multi-threaded process then you can get that effect. The reason is that each process has just one standard output stream (stdout). In a multi-threaded program all threads share the same output stream, so an unprotected write to stdout can lead to garbled output as you describe. But you do not have a multi-threaded program.

When you use the & qualifier bash creates a new child process with its own stdout stream. Generally (depends on implementation details) this is flushed on a newline. So even though the file might be shared, the granularity is by line.

There is a slim chance that two processes could flush to the file at exactly the same time, but your code, with subprocesses and a sleep, makes it highly unlikely.

You could try taking out the newline from the printf, but given the inefficiency of the rest of the code, and the small dataset, it is still unlikely. It is quite possible that each process is complete before the next starts.

How to split a file and keep the first line in each of the pieces?

This is robhruska's script cleaned up a bit:

tail -n +2 file.txt | split -l 4 - split_
for file in split_*
do
    head -n 1 file.txt > tmp_file
    cat "$file" >> tmp_file
    mv -f tmp_file "$file"
done

I removed wc, cut, ls and echo in the places where they're unnecessary. I changed some of the filenames to make them a little more meaningful. I broke it out onto multiple lines only to make it easier to read.

If you want to get fancy, you could use mktemp or tempfile to create a temporary filename instead of using a hard coded one.

Edit

Using GNU split it's possible to do this:

split_filter () { { head -n 1 file.txt; cat; } > "$FILE"; }; export -f split_filter; tail -n +2 file.txt | split --lines=4 --filter=split_filter - split_

Broken out for readability:

split_filter () { { head -n 1 file.txt; cat; } > "$FILE"; }
export -f split_filter
tail -n +2 file.txt | split --lines=4 --filter=split_filter - split_

When --filter is specified, split runs the command (a function in this case, which must be exported) for each output file and sets the variable FILE, in the command's environment, to the filename.

A filter script or function could do any manipulation it wanted to the output contents or even the filename. An example of the latter might be to output to a fixed filename in a variable directory: > "$FILE/data.dat" for example.

Bash: Split Stdout from Multiple Concurrent Commands into Columns