How to Shell Have More Than One Job in Linux

How can I Shell have more than one job in Linux

There is quite a bit to say on this topic, some of which can fit in an answer, and most of which will require further reading.

For Q1, I would say conceptually yes, but jobs are not automatic, and job tracking and control are not magical. I don't see any logic in the code snippits you've show that e.g. establishes and maintains a jobs table. I understand it's just a sample, so maybe the job control logic is elsewhere. Job control is a feature of common, existing Unix shells, but if a person writes a new Unix shell from scratch, job control features would need to be added, as code / logic.

For Q2, the way you've put it is not how I would put it. After the first call to fork(), yes there is a p1 and a c1, but recognize that at first, p1 and c1 are different instances of the same program (shellex); only after the call to execve() is exampleProgram running. fork() creates a child instance of shellex, and execve() causes the child instance of shellex to be replaced (in RAM) by exampleProgram (assuming that's the value of argv[0]).

There is no real sense in which the parent is "executing" the child, nor the process that replaces the child upon execve(), except just to get them going. The parent starts the child and might wait for the child execution to complete, but really a parent and its whole hierarchy of child processes are all executing each on its own, being executed by the kernel.

But yes, if told that the program to run should be run in the background, then shellex will accept further input, and upon the next call to fork(), there will be the parent shellex with two child processes. And again, at first the child c2 will be an instance of shellex, quickly replaced via execve() by whatever program has been named.

(Regarding running in the background, whether or not & has that effect depends upon the logic inside the function named parseline() in the sample code. Shells I'm familiar with use & to say "run this in the background", but there is nothing special nor magical about that. A newly-written Unix shell can do it some other way, with a trailing +, or a leading BG:, or whatever the shell author decides to do.

For Q3 and Q4, the first thing to recognize is that the parent you are calling p1 is the shell program that you've shown. So, no, p1 would not be part of the job.

In Unix, a job is a collection of processes that execute as part of a single pipeline. Thus a job can consist of one process or many. Such processes remain attached to the terminal from which they are run, but might be in the foreground (running and interactive), suspended, or in the background (running, not interactive).

one process, foreground    : ls -lR
one process, background : ls -lR &
one process, background : ls -lR, then CTRL-Z, then bg
many processes, foreground : ls -lR | grep perl | sed 's/^.*\.//'
many processes, background : ls -lR | grep perl | sed 's/^.*\.//' &

To see jobs vs. processes empirically, run a pipeline in the background (the 5th of the 5 examples above), and while it is running use ps to show you the process IDs and the process group IDs. e.g., on my Mac's version of bash, that's:

$ ls -lR | grep perl | sed 's/^.*\.//' &
[1] 2454 <-- job 1, PID of the sed is 2454

$ ps -o command,pid,pgid
COMMAND PID PGID
vim 2450 2450 <-- running in a different tab
ls -lR 2452 2452 }
grep perl 2453 2452 }-- 3 PIDs, 1 PGID
sed s/^.*\.// 2454 2452 }

In contrast to this attachment to the shell and the terminal, a daemon detaches from both. When starting a daemon, the parent uses fork() to start a child process, but then exits, leaving only the child running, and now with a parent of PID 1. The child closes down stdin, stdout, and stderr, since those are meaningless, since a daemon runs "headless".

But in a shell, the parent -- which, again, is the shell -- stays running either wait()ing (foreground child program), or not wait()ing (background child program), and the child typically retains use of stdin, stdout, and stderr (although, these might be redirected to files, etc.)

And, a shell can invoke sub-shells, and of course any program that is run can fork() its own child processes, and so on. So the hierarchy of processes can become quite deep. Without specific action otherwise, a child process will be in the same process group as it's parent.

Here are some articles for further reading:

What is difference between a job and a process in Unix?

https://unix.stackexchange.com/questions/4214/what-is-the-difference-between-a-job-and-a-process

https://unix.stackexchange.com/questions/363126/why-is-process-not-part-of-expected-process-group

Bash Reference Manual; Job Control

Bash Reference Manual; Job Control Basics

How do you run multiple programs in parallel from a bash script?

To run multiple programs in parallel:

prog1 &
prog2 &

If you need your script to wait for the programs to finish, you can add:

wait

at the point where you want the script to wait for them.

Run several jobs parallelly and Efficiently

As Mark Setchell says: GNU Parallel.

find scripts/ -type f | parallel

If you insists on keeping 8 CPUs free:

find scripts/ -type f | parallel -j-8

But usually it is more efficient simply to use nice as that will give you all 48 cores when no one else needs them:

find scripts/ -type f | nice -n 15 parallel

To learn more:

  • Watch the intro video for a quick introduction:
    https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
  • Walk through the tutorial (man parallel_tutorial). You command line
    with love you for it.

Multiple jobs on server using a script

You must set the executable bit to your scripts:

printf "#!/bin/bash\ncd /home/PATH/\n./nvt inpt/%b" "$f" > run-script$i.sh
chmod +x run-script$i.sh

To be sure that it is not a formting problem (or any problem with printf) you can try to use echo:

echo '#!/bin/bash' > run-script$i.sh
echo cd /home/PATH/ >> run-script$i.sh
echo ./nvt "inpt/$f" >> run-script$i.sh

Bash: limit the number of concurrent jobs?

If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:

parallel gzip ::: *.log

which will run one gzip per CPU core until all logfiles are gzipped.

If it is part of a larger loop you can use sem instead:

for i in *.log ; do
echo $i Do more stuff here
sem -j+0 gzip $i ";" echo done
done
sem --wait

It will do the same, but give you a chance to do more stuff for each file.

If GNU Parallel is not packaged for your distribution you can install GNU Parallel simply by:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh

It will download, check signature, and do a personal installation if it cannot install globally.

Watch the intro videos for GNU Parallel to learn more:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Multiple sequential jobs in a shell script to be cronned in a conditional manner

So you have a series of runnerX.sh scripts that you need to invoke sequentially.

for i in $(seq 1 31); do
sh runner$i.sh || break
done

This covers running them in sequential order, and not running the next one of the prior one fails.

To make sure that the basic runner1.sh is not running, you do:

pid=$(ps -fe | grep '[r]unner1.sh' | awk '{print $2}')

or

pid=$(pgrep runner1.sh)

or any of a wide variety of mechanisms for detecting the pid of runner1.sh

combined, you surround the for loop with an:

if [ -z "$pid" ]; then
# do for here
:
fi

Running multiple jobs in spark

Yes, your command will run both sh commands simultaneously. You can check with a simple example, e.g. sleep.sh contains sleep $1 and running
sh -x sleep.sh 2 & sh -x sleep.sh 3 results in both commands finishing after approximately 3 seconds.

Whether or not the scripts will complete in the maximum time it takes to complete one of them, depends on the resources of the cluster.



Related Topics



Leave a reply



Submit