Parallel processes: appending outputs to an array in a bash script
GNU Parallel is good at doing stuff in parallel :-)
task (){ sleep 1;echo "hello $1"; }
# Make "task" known to sub shells
export -f task
# Do tasks in parallel
parallel -k task ::: {1..3}
Sample Output
hello 1
hello 2
hello 3
I am suggesting you do - but Charles kindly points out that this is a known bash
pitfall:
array=( $(parallel -k task ::: {1..3}) )
Charles' suggested solution is:
IFS=$'\n' read -r -d '' -a array < <(parallel -k task ::: 1 2 3 && printf '\0')
running each element in array in parallel in bash script
The convenient thing to do is to push your background code into a separate script -- or an exported function. That way xargs
can create a new shell, and access the function from its parent. (Be sure to export
any other variables that need to be available in the child as well).
array=( 1 2 3 4 5 6 )
max_proc_count=8
log_file=out.txt
run_for_each() {
local each=$1
echo "Processing: $each" >&2
IFS=$' \t\n' read -r -d '' -a lags < <(yourcommand --arg1 "$each" && printf '\0')
for result in "${lags[@]}"; do
printf '%(%Y-%m-%dT%H:%M:%S)T\t%s\t%s\n' -1 "$each" "$result"
done >>"$log_file"
}
export -f run_for_each
export log_file # make log_file visible to subprocesses
printf '%s\0' "${array[@]}" |
xargs -P "$max_proc_count" -n 1 -0 bash -c 'run_for_each "$@"'
Some notes:
- Using
echo -e
is bad form. See the APPLICATION USAGE and RATIONALE sections in the POSIX spec forecho
, explicitly advising usingprintf
instead (and not defining an-e
option, and explicitly defining thanecho
must not accept any options other than-n
). - We're including the
each
value in the log file so it can be extracted from there later. - You haven't specified whether the output of
yourcommand
is space-delimited, tab-delimited, line-delimited, or otherwise. I'm thus accepting all these for now; modify the value ofIFS
passed to theread
to taste. printf '%(...)T'
to get a timestamp without external tools such asdate
requires bash 4.2 or newer. Replace with your own code if you see fit.read -r -a arrayname < <(...)
is much more robust thanarrayname=( $(...) )
. In particular, it avoids treating emitted values as globs -- replacing*
s with a list of files in the current directory, orFoo[Bar]
withFooB
should any file by that name exist (or, if thefailglob
ornullglob
options are set, triggering a failure or emitting no value at all in that case).- Redirecting stdout to your
log_file
once for the entire loop is somewhat more efficient than redirecting it every time you want to runprintf
once. Note that having multiple processes writing to the same file at the same time is only safe if all of them opened it withO_APPEND
(which>>
will do), and if they're writing in chunks small enough to individually complete as single syscalls (which is probably happening unless the individuallags
values are quite large).
Add data to Bash array over multiple scripts
The arrays are not shared between the different shells. Each script will run as a separate process, and build its own private arrays, but these are lost when the process exits. @Upasana Shukla's suggestion of running the scripts with source
will work (because it runs them in the main shell process, rather than as subshells/diferent processes), but will not allow you to run the scripts in parallel. If you want to run them in parallel, the simplest way is probably to have them output to temporary files instead of arrays:
export tmpdir="$(mktemp -d "/tmp/$(basename "$0").XXXXXX")" || {
echo "Error creating temporary directory" >&2
exit 1
}
for z in scripts/*; do # Please don't parse ls
sh "$z" &
done
wait
echo "Validating Script Output"
cat "$tmpdir/exeSuccess"
rm -R "$tmpdir"
And in the individual scripts:
echo "$OUTPUT" >>"$tmpdir/exeSuccess"
How do you run multiple programs in parallel from a bash script?
To run multiple programs in parallel:
prog1 &
prog2 &
If you need your script to wait for the programs to finish, you can add:
wait
at the point where you want the script to wait for them.
Collecting process ids of parallel process in bash file
Don't send the append
operation itself to the background. Putting an &
after the content you want to background but before the append
suffices: The sleep
and echo
are still backgrounded, but the append
is not.
process_ids=( )
append() { process_ids+=( "$1" ); } # POSIX-standard function declaration syntax
{ sleep 1 && echo 'one'; } & append "$!"
{ sleep 5 && echo 'two'; } & append "$!"
{ sleep 1 && echo 'three'; } & append "$!"
{ sleep 5 && echo 'four'; } & append "$!"
echo "Background processes:" # Demonstrate that our array was populated
printf ' - %s\n' "${process_ids[@]}"
wait
How do I assign the output of a command into an array?
To assign the output of a command to an array, you need to use a command substitution inside of an array assignment. For a general command command
this looks like:
arr=( $(command) )
In the example of the OP, this would read:
arr=($(grep -n "search term" file.txt | sed 's/:.*//'))
The inner $()
runs the command while the outer ()
causes the output to be an array. The problem with this is that it will not work when the output of the command contains spaces. To handle this, you can set IFS
to \n
.
IFS=$'\n' arr=($(grep -n "search term" file.txt | sed 's/:.*//'))
You can also cut out the need for sed by performing an expansion on each element of the array:
arr=($(grep -n "search term" file.txt))
arr=("${arr[@]%%:*}")
Related Topics
How to Delay Pipe Netcat to Connect on First Input
How to Use Schell Script to Read Element from a File, Do Some Calculation and Write Back
How to Continously Run a Unix Script in Background Without Using Crontab.
Perl Script to Capture Stderr and Stdout of Command Executed in Back-Quotes
How to Trigger a Function in Kernel Module Interrupt
Please Help Me "Binary Operator Expected in Cygwin"
How to Insert a Tab Character in Iterm
How to Know If I Can Compile with Fma Instruction Sets
How to Get The Process Id of Command Executed in Bash Script
Cygwin Xwin Server Randomly Loses Connection
Git Wont Reset File Permissions Over Cifs Mount
Centos Cgconfig Fails to Start