Bash Script Processing Limited Number of Commands in Parallel

Bash script processing limited number of commands in parallel

Use the wait built-in:

process1 &
process2 &
process3 &
process4 &
wait
process5 &
process6 &
process7 &
process8 &
wait

For the above example, 4 processes process1 ... process4 would be started in the background, and the shell would wait until those are completed before starting the next set.

From the GNU manual:

wait [jobspec or pid ...]

Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last
command waited for. If a job spec is given, all processes in the job
are waited for. If no arguments are given, all currently active child
processes are waited for, and the return status is zero. If neither
jobspec nor pid specifies an active child process of the shell, the
return status is 127.

Run a limited number of commands in parallel from a bash for loop

Try it like this:

a=1  # counter
n=10 # desired number of simultaneous processes
for i in {1..100}; {
((a%n==0)) && wait
((a++))
sleep 5 &
}

Running a limited number of child processes in parallel in bash?

#! /usr/bin/env bash

set -o monitor
# means: run background processes in a separate processes...
trap add_next_job CHLD
# execute add_next_job when we receive a child complete signal

todo_array=($(find . -type f)) # places output into an array

index=0
max_jobs=2

function add_next_job {
# if still jobs to do then add one
if [[ $index -lt ${#todo_array[*]} ]]
# apparently stackoverflow doesn't like bash syntax
# the hash in the if is not a comment - rather it's bash awkward way of getting its length
then
echo adding job ${todo_array[$index]}
do_job ${todo_array[$index]} &
# replace the line above with the command you want
index=$(($index+1))
fi
}

function do_job {
echo "starting job $1"
sleep 2
}

# add initial set of jobs
while [[ $index -lt $max_jobs ]]
do
add_next_job
done

# wait for all jobs to complete
wait
echo "done"

Having said that Fredrik makes the excellent point that xargs does exactly what you want...

Bash: limit the number of concurrent jobs?

If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:

parallel gzip ::: *.log

which will run one gzip per CPU core until all logfiles are gzipped.

If it is part of a larger loop you can use sem instead:

for i in *.log ; do
echo $i Do more stuff here
sem -j+0 gzip $i ";" echo done
done
sem --wait

It will do the same, but give you a chance to do more stuff for each file.

If GNU Parallel is not packaged for your distribution you can install GNU Parallel simply by:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh

It will download, check signature, and do a personal installation if it cannot install globally.

Watch the intro videos for GNU Parallel to learn more:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

running a limited number of parallel programs using a script

Try this one:

#!/bin/bash

todo_array[1]="echo start1;sleep 3;echo done1"
todo_array[2]="echo start2;sleep 3;echo done2"
todo_array[3]="echo start3;sleep 3;echo done3"
todo_array[4]="echo start4;sleep 3;echo done4"
todo_array[5]="echo start5;sleep 3;echo done5"
todo_array[6]="echo start6;sleep 3;echo done6"
todo_array[7]="echo start7;sleep 3;echo done7"
todo_array[8]="echo start8;sleep 3;echo done8"
todo_array[9]="echo start9;sleep 3;echo done9"

max_jobs=4

for i in "${todo_array[@]}"
do
echo $i
done | xargs -IX --max-procs=$max_jobs bash -c "X"

Parallelize Bash script with maximum number of processes

Depending on what you want to do xargs also can help (here: converting documents with pdf2ps):

cpus=$( ls -d /sys/devices/system/cpu/cpu[[:digit:]]* | wc -w )

find . -name \*.pdf | xargs --max-args=1 --max-procs=$cpus pdf2ps

From the docs:

--max-procs=max-procs
-P max-procs
Run up to max-procs processes at a time; the default is 1.
If max-procs is 0, xargs will run as many processes as possible at a
time. Use the -n option with -P; otherwise chances are that only one
exec will be done.

How do you run multiple programs in parallel from a bash script?

To run multiple programs in parallel:

prog1 &
prog2 &

If you need your script to wait for the programs to finish, you can add:

wait

at the point where you want the script to wait for them.

How to run group of commands in parallel followed by an other command

how can I wait for all three commands befofe executing cmdD?

Use wait.

(cmdA & cmdB & cmdC ; wait) ; cmdD


Related Topics



Leave a reply



Submit