How to execute 4 shell scripts in parallel, I can't use GNU parallel?
The easiest way to do this is to background all four of the scripts. You could wrap these with another script "run_parallel.sh" that looks like this:
./dog.sh &
./bird.sh &
./cow.sh &
./fox.sh &
The ampersand backgrounds the invoked process in a non-blocking fashion causing all 4 to be executed at the same time.
As an example, here's a script called "one_two_three.sh":
echo 'One'
sleep 1
echo 'Two'
sleep 1
echo 'Three'
sleep 1
echo 'Done'
and a wrapper "wrapper.sh":
./one_two_three.sh &
./one_two_three.sh &
./one_two_three.sh &
./one_two_three.sh &
echo 'Four running at once!'
Running Multiple Bash Scripts parallel
You could use xargs
:echo "~/path/test1.sh $1 ~/path/test2.sh $1" | xargs -P0 -n2 /bin/bash
-P0
says "run all in parallel"-n2
passes two arguments to /bin/bash
, in this case the script and the parameter
How do you run multiple programs in parallel from a bash script?
To run multiple programs in parallel:
prog1 &
prog2 &
If you need your script to wait for the programs to finish, you can add:
wait
at the point where you want the script to wait for them.
GNU parallel and script not starting
GNU Parallel is not magic: You cannot tell it to parallelize any script.
Instead you need to tell it what to parallelize and how.
In general you need to think that you have to generate a list of commands that you want run in parallel and then give this list to GNU Parallel.
In the script you have 2 for
loops and a pipe. All three can be parallelized by using GNU Parallel. It is, however, not certain it will make sense: There is an overhead in parallelizing and if the current implementation utilized the CPU and disk resources optimally, then you will not see a speedup by parallelizing.
A for
loop like this
for x in x-value1 x-value2 x-value3 ... x-valueN; do
# do something to $x
done
is parallelized by:
myfunc() {
x="$1"
# do something to $x
}
export -f myfunc
parallel myfunc ::: x-value1 x-value2 x-value3 ... x-valueN
A pipe in the form of A | B | C
where B
is slow is parallelized by:
A | parallel --pipe B | C
So start by identifying the bottleneck.
For this top
is really useful. If you see a single process running 100% in top
that is a good candidate for parallelizing.
If not, then you may be limited by how fast your disk is, and that can rarely be sped up by GNU Parallel.
You have not included test data, so I cannot run your script and identify the bottleneck for you. But I have experience with samtools
and samtools view
was always the bottleneck in my scripts. So let us assume that is also the case here.
samtools ... | awk ...
This is does not fit the A | B | C
template where B
is slow, so we cannot use parallel --pipe
to speed that up. If, however, awk
is the bottleneck, then we can use parallel --pipe
.
So let us instead look at the two for
loops.
It is easy to parallelize the outer loop:
#!/bin/bash
files_chrM_ID="concat_chrM_*"
do_chrM() {
ID_file="$1"
bam_directory="../bam/"
echo "$(date +%H:%I:%S) $ID_file is being treated"
sample=${ID_file: -12}
sample=${sample:0:8}
echo "$(date +%H:%I:%S) $sample is being treated"
for bam_file_target in "${bam_directory}"*"${sample}"*".bam"
do
echo $bam_file_target // $sample
out_file=${ID_file:0:-4}_ON_${bam_file_target:8:-4}.sam
echo "$out_file will be created"
echo "samtools and awk starting"
samtools view -@ 6 $bam_file_target | awk -v st="$ID_file" 'BEGIN {OFS="\t";ORS="\r\n"; while (getline < st) {st_array[$1]=$2}} {if ($1 in st_array) {print $0, st_array[$1], "target"}}' >> $out_file
echo "$out_file done."
done
}
export -f do_chrM
parallel do_chrM ::: ${files_chrM_ID}
This is great if there are more ${files_chrM_ID}
than there are CPU threads. But if that is not the case, we also need to parallelize the inner loop.
This is slightly trickier because we need to export a few variables to make them visible to do_bam
which is called by parallel
:
#!/bin/bash
files_chrM_ID="concat_chrM_*"
do_chrM() {
ID_file="$1"
bam_directory="../bam/"
echo "$(date +%H:%I:%S) $ID_file is being treated"
sample=${ID_file: -12}
sample=${sample:0:8}
# We need to export $sample and $ID_file to make them visible to do_bam()
export sample
export ID_file
echo "$(date +%H:%I:%S) $sample is being treated"
do_bam() {
bam_file_target="$1"
echo $bam_file_target // $sample
out_file=${ID_file:0:-4}_ON_${bam_file_target:8:-4}.sam
echo "$out_file will be created"
echo "samtools and awk starting"
samtools view -@ 6 $bam_file_target |
awk -v st="$ID_file" 'BEGIN {OFS="\t";ORS="\r\n"; while (getline < st) {st_array[$1]=$2}} {if ($1 in st_array) {print $0, st_array[$1], "target"}}' >> $out_file
echo "$out_file done."
}
export -f do_bam
parallel do_bam ::: "${bam_directory}"*"${sample}"*".bam"
}
export -f do_chrM
parallel do_chrM ::: ${files_chrM_ID}
This, however, may overload your server: The inner parallel does not communicate with the outer parallel so if you run this on a 64 core machine you risk running 64*64 jobs in parallel (but only if there are enough files matching concat_chrM_*
and "${bam_directory}"*"${sample}"*".bam"
).
In that case it will make sense to limit the outer parallel
to 1 or 2 jobs in parallel:
parallel -j2 do_chrM ::: ${files_chrM_ID}
This will at most run 2*64 jobs in parallel on a 64-core machine.
If, however, you want to run 64 jobs in parallel all the time then it becomes quite a bit trickier: It would have been fairly simple if the values of the inner loop did not depend on the outer loop, because then you could simply have done something like:
parallel do_stuff ::: chrM_1 ... chrM_100 ::: bam1.bam ... bam100.bam
which would generate all combinations of chrM_X,bamY.bam and run those in parallel - 64 at a time on a 64-core machine.
But in your case the values in the inner loop do depend on the values in the outer loop. This means you need to compute the values before starting any jobs. This also means you cannot have your script output information in the outer loop.
#!/bin/bash
sam_awk() {
bam_file_target="$1"
sample="$2"
ID_File="$3"
echo "$(date +%H:%I:%S) $ID_file is being treated"
echo "$(date +%H:%I:%S) $sample is being treated"
echo $bam_file_target // $sample
out_file=${ID_file:0:-4}_ON_${bam_file_target:8:-4}.sam
echo "$out_file will be created"
echo "samtools and awk starting"
samtools view -@ 6 $bam_file_target |
awk -v st="$ID_file" 'BEGIN {OFS="\t";ORS="\r\n"; while (getline < st) {st_array[$1]=$2}} {if ($1 in st_array) {print $0, st_array[$1], "target"}}' >> $out_file
echo "$out_file done."
}
files_chrM_ID="concat_chrM_*"
bam_directory="../bam/"
for ID_file in ${files_chrM_ID}
do
# Moved to inner
# echo "$(date +%H:%I:%S) $ID_file is being treated"
sample=${ID_file: -12}
sample=${sample:0:8}
# Moved to inner
# echo "$(date +%H:%I:%S) $sample is being treated"
for bam_file_target in "${bam_directory}"*"${sample}"*".bam"
do
echo "$bam_file_target"
echo "$sample"
echo "$ID_File"
done
done | parallel -n3 sam_awk
Given that you have not given us any test data, I cannot test whether these scripts will actually run, so there may be errors in them.
If you have not already done so, read at least chapter 1+2 of "GNU Parallel 2018" (available at
http://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html or
download it at: https://doi.org/10.5281/zenodo.1146014)
It should take you less than 20 minutes and your command line will love you for it.
How to use parallel execution in a shell script?
Convert this into a Makefile with proper dependencies. Then you can use make -j
to have Make run everything possible in parallel.
Note that all the indents in a Makefile must be TABs. TAB shows Make where the commands to run are.
Also note that this Makefile is now using GNU Make extensions (the wildcard and subst functions).
It might look like this:
export PATH := .:${PATH}
FILES=$(wildcard file*)
RFILES=$(subst file,r,${FILES})
final: combine ${RFILES}
combine ${RFILES} final
rm ${RFILES}
ex: example.c
combine: combine.c
r%: file% ex
ex $< $@
Use GNU parallel to parallelise a bash for loop
Replace echo $folders | parallel ...
with echo "$folders" | parallel ...
.
Without the double quotes, the shell parses spaces in $folders
and passes them as separate arguments to echo
, which causes them to be printed on one line. parallel
provides each line as argument to the job.
To avoid such quoting issues altogether, it is always a good idea to pipe find
to parallel
directly, and use the null character as the delimiter:
find ... -print0 | parallel -0 ...
This will work even when encountering file names that contain multiple spaces or a newline character.
How to run program in Bash script for as long as other program runs in parallel?
#!/bin/bash
execProgram(){
case $1 in
server)
sleep 5 & # <-- change "sleep 5" to your server command.
# use "&" for background process
SERVER_PID=$!
echo "server started with pid $SERVER_PID"
;;
client)
sleep 18 & # <-- change "sleep 18" to your client command
# use "&" for background process
CLIENT_PID=$!
echo "client started with pid $CLIENT_PID"
;;
esac
}
waitForServer(){
echo "waiting for server"
wait $SERVER_PID
echo "server prog is done"
}
terminateClient(){
echo "killing client pid $CLIENT_PID after 5 seconds"
sleep 5
kill -15 $CLIENT_PID >/dev/null 2>&1
wait $CLIENT_PID >/dev/null 2>&1
echo "client terminated"
}
execProgram server && execProgram client
waitForServer && terminateClient
How to send arguments to bash script in GNU parallel
bash_script.sh
parallel scp "$1" xxx@{}.com: ::: {1..5}
Usage:
bash bash_script.sh argument
Related Topics
How to Make My Makefiles Better
Merge Two Files Using Awk in Linux
How to Switch to Root User Without Entering Password in Bash Script on Redhat
Libreoffice Command Line Conversion - No Output File
Iterating Over File (And Directory) Names with Bash
How to Add Output "Non_Assigned" When There Is No Match in Grep
Sort Command in Not Working Properly in Unix for Sorting a CSV File
Screen Command Disable the Control Key Ctrl-A to Use It in Vim
How to Cross-Compile a Qt Application for Imx6
How to Check If "S" Permission Bit Is Set on Linux Shell? or Perl
Linux C Socket: Blocked on Recv Call
How to Share a Register Between Threads
How to Get the Offset in a Block Device of an Inode in a Deleted Partition