How to Execute Parallel "For" Loops in Bash

How can I execute parallel for loops in Bash?

Replace

/usr/bin/sshpass ...

with

/usr/bin/sshpass ... &

How can I run this simple for loop in parallel bash?

For your script to work, you need to

  1. either use variables names a, b, c, etc. or $dist $rlen $trans $meta $init but not both
  2. end the scrip with wait otherwise Slurm will think your script has finished

So:

#!/usr/bin/bash

#SBATCH --time=48:00:00
#SBATCH --mem=10G
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=my@mail.com

cd my_dir

for dist in 1 2
do
for rlen in 1 2 3 4
do
for trans in 1 2 3
do
for meta in 1 2 3 4
do
for init in 5 10 15 20 30 40 50 75 100 200 300 400 500 750 1000 1250 1500 1750 2000
do
Rscript /hpc/someRscript.R $dist $rlen $trans $meta $init &
done
done
done
done
done
wait
echo done

Now one issue is that this will create 1824 processes and try to run them all at the same time. This will be highly inefficient. So you should use srun to "micro-schedule" all theses processes on the available number of CPUs. Note that you might need to explicitly request a certain amount of CPUs with --ntasks.

#!/usr/bin/bash

#SBATCH --time=48:00:00
#SBATCH --mem=10G
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=my@mail.com
#SBATCH --ntasks=<SOME NUMBER>

cd my_dir

for dist in 1 2
do
for rlen in 1 2 3 4
do
for trans in 1 2 3
do
for meta in 1 2 3 4
do
for init in 5 10 15 20 30 40 50 75 100 200 300 400 500 750 1000 1250 1500 1750 2000
do
srun -n 1 -c 1 --exclusive Rscript /hpc/someRscript.R $dist $rlen $trans $meta $init &
done
done
done
done
done
wait
echo done

Furthermore, if GNU Parallel is available, you can simplify the script as

#!/usr/bin/bash

#SBATCH --time=48:00:00
#SBATCH --mem=10G
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=my@mail.com
#SBATCH --ntasks=<SOME NUMBER>

cd my_dir

parallel -P SLURM_NTASKS srun -n 1 -c 1 --exclusive Rscript /hpc/someRscript.R ::: 1 2 ::: 1 2 3 4 ::: 1 2 3 ::: 1 2 3 4 ::: 5 10 15 20 30 40 50 75 100 200 300 400 500 750 1000 1250 1500 1750 2000

echo done

Transforming this into a job array is not trivial. One small step is to take the innermost loop for instance (the largest) and run the array over that parameter

#!/usr/bin/bash

#SBATCH --time=48:00:00
#SBATCH --mem=10G
#SBATCH --mail-type=END
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=my@mail.com
#SBATCH --ntasks=<SOME NUMBER>
#SBATCH --array=0-18

cd my_dir

INIT=(5 10 15 20 30 40 50 75 100 200 300 400 500 750 1000 1250 1500 1750 2000)

parallel -P SLURM_NTASKS srun -n 1 -c 1 --exclusive Rscript /hpc/someRscript.R {} ${INIT[$SLURM_ARRAY_TASK_ID]} ::: 1 2 ::: 1 2 3 4 ::: 1 2 3 ::: 1 2 3 4

echo done

You can run the array over all combinations provided you create them all in a Bash array and use the $SLURM_ARRAY_TASK_ID to index them.

Using BASH FOR LOOP and PARALLEL

this is now solved via parallel command:

cat test.txt | parallel ./MY-BASH-SCRIPT.sh {}

how to run bash for loop and using GNU parallel?

Try like this:

parallel ./SCRIPT -n {} ::: FILE1 FILE2 FILE3

Or, more succinctly if your files are really named like that:

parallel ./SCRIPT -n {} ::: FILE*

Running shell script loop in parallel

When you have a shell loop that does some setup and invokes an expensive command, the way to parallelize it is to use sem from GNU parallel:

for i in {1..10}
do
echo "Doing some stuff"
sem -j +0 sleep 2
done
sem --wait

This allows the loop to run and do its thing as normal, while also scheduling the commands to run in parallel (-j +0 runs one job per CPU core).

Parallelize a bash script and wait for each loop to finish

You don't need to keep track of PIDs, because if you call wait without any argument, the script will wait for all the child processes to finish.

#!/bin/bash
for X in $(seq $1)
do
nohup ./script.sh $(($X +($2 -1)*$1 )) &
done
wait


Related Topics



Leave a reply



Submit