How do I write a bash script to restart a process if it dies?
Avoid PID-files, crons, or anything else that tries to evaluate processes that aren't their children.
There is a very good reason why in UNIX, you can ONLY wait on your children. Any method (ps parsing, pgrep, storing a PID, ...) that tries to work around that is flawed and has gaping holes in it. Just say no.
Instead you need the process that monitors your process to be the process' parent. What does this mean? It means only the process that starts your process can reliably wait for it to end. In bash, this is absolutely trivial.
until myserver; do
echo "Server 'myserver' crashed with exit code $?. Respawning.." >&2
sleep 1
done
The above piece of bash code runs myserver
in an until
loop. The first line starts myserver
and waits for it to end. When it ends, until
checks its exit status. If the exit status is 0
, it means it ended gracefully (which means you asked it to shut down somehow, and it did so successfully). In that case we don't want to restart it (we just asked it to shut down!). If the exit status is not 0
, until
will run the loop body, which emits an error message on STDERR and restarts the loop (back to line 1) after 1 second.
Why do we wait a second? Because if something's wrong with the startup sequence of myserver
and it crashes immediately, you'll have a very intensive loop of constant restarting and crashing on your hands. The sleep 1
takes away the strain from that.
Now all you need to do is start this bash script (asynchronously, probably), and it will monitor myserver
and restart it as necessary. If you want to start the monitor on boot (making the server "survive" reboots), you can schedule it in your user's cron(1) with an @reboot
rule. Open your cron rules with crontab
:
crontab -e
Then add a rule to start your monitor script:
@reboot /usr/local/bin/myservermonitor
Alternatively; look at inittab(5) and /etc/inittab. You can add a line in there to have myserver
start at a certain init level and be respawned automatically.
Edit.
Let me add some information on why not to use PID files. While they are very popular; they are also very flawed and there's no reason why you wouldn't just do it the correct way.
Consider this:
PID recycling (killing the wrong process):
/etc/init.d/foo start
: startfoo
, writefoo
's PID to/var/run/foo.pid
- A while later:
foo
dies somehow. - A while later: any random process that starts (call it
bar
) takes a random PID, imagine it takingfoo
's old PID. - You notice
foo
's gone:/etc/init.d/foo/restart
reads/var/run/foo.pid
, checks to see if it's still alive, findsbar
, thinks it'sfoo
, kills it, starts a newfoo
.
PID files go stale. You need over-complicated (or should I say, non-trivial) logic to check whether the PID file is stale, and any such logic is again vulnerable to
1.
.What if you don't even have write access or are in a read-only environment?
It's pointless overcomplication; see how simple my example above is. No need to complicate that, at all.
See also: Are PID-files still flawed when doing it 'right'?
By the way; even worse than PID files is parsing ps
! Don't ever do this.
ps
is very unportable. While you find it on almost every UNIX system; its arguments vary greatly if you want non-standard output. And standard output is ONLY for human consumption, not for scripted parsing!- Parsing
ps
leads to a LOT of false positives. Take theps aux | grep PID
example, and now imagine someone starting a process with a number somewhere as argument that happens to be the same as the PID you stared your daemon with! Imagine two people starting an X session and you grepping for X to kill yours. It's just all kinds of bad.
If you don't want to manage the process yourself; there are some perfectly good systems out there that will act as monitor for your processes. Look into runit, for example.
How do I write a bash script to restart a process if it exits gracefully?
A simple way:
while sleep 1; do
echo "success"
done
Seems to work fine for me.
Replace sleep 1
with the command to start your process.
edit: this is an answer for the question in the title, I'm not sure what /etc/init
or the code you gave has to do with the question
Shell script: How to restart a process (with pipe) if it dies
The until
loop itself can be piped into logger
:
until myserver 2>&1; do
echo "..."
sleep 1
done | /usr/bin/logger -p local0.info &
since myserver
inherits its standard output and error from the loop (which inherits from the shell).
How to restart a process in bash or kill it on command?
From the manual:
If Bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap will not be executed until the command completes. When Bash is waiting for an asynchronous command via the
wait
builtin, the reception of a signal for which a trap has been set will cause thewait
builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed.
Emphasis is mine.
So in your case, while your command is executing, Bash will wait until it ends before it triggers the trap.
To fix this, you need to run your program as a job, and wait for it. If your program never exits with a return code greater than 128, you could simplify the following code, but I'm not making this assumption:
#!/bin/bash
desc="Foo Manager"
to_exec=( python "/myPath/bin/FooManager.pyc" )
trap 'trap_triggered=true' SIGHUP SIGINT SIGTERM
trap_triggered=false
while ! $trap_triggered; do
"${to_exec[@]}" &
job_pid=$!
wait $job_pid
job_ret=$?
if [[ $job_ret = 0 ]]; then
echo >&2 "Job ended gracefully with no errors... quitting..."
break
elif ! $trap_triggered; then
echo >&2 "Server $desc crashed with exit code $job_ret. Restarting..."
else
printf >&2 "Received fatal signal... "
if kill -0 $job_pid >&/dev/null; then
printf >&2 "killing job $job_pid... "
kill $job_pid
wait $job_pid
fi
printf >&2 "quitting...\n"
fi
done
Notes.
- I used lowercase variable name, since uppercase are considered bad practice: they can clash with Bash's reserved names, or environmental variables.
- I didn't use a string to store the command, but an array. With a string, you'll have a lot of problems if you want to have funny characters like spaces passed as arguments. With a properly quoted array, you won't have any problems. (Some would argue that it would be even better to use a function.)
How do I write a bash script to restart a service if it dies?
I think it will be better to manage your process with supervisord, or other process control system.
How can a bash script restart a process on non-0 exit while sending signals to child
Don't write a shell script. Use systemd, supervisor, docker or any available service manager to manage the docker/script process directly. This is the job service managers were built to do, they live for it.
A systemd service would run docker run {image} python test.py
and you would need to set it to run indefinitely.
A systemd config would look like:
[Unit]
Description=My Super Script
Requires=docker.service
After=docker.service
[Service]
ExecStart=/bin/docker run --name={container} --rm=true {image} python test.py
ExecStop=/bin/docker stop --time=10 {container}
TimeoutStopSec=11
KillMode=control-group
Restart=on-failure
RestartSec=5
TimeoutStartSec=5
[Install]
WantedBy=multi-user.target
The Restart=on-failure
setting matches your requirement of only restarting the process when a non 0 exit code is returned so you can still kill the process underneath systemd, if required.
If you want to run and manage your python process inside an already running container, it might be easier to run supervisord
as the main container process and have it manage python test.py
. Supervisor is not as feature complete as systemd but it can do all the basic service management tasks.
Related Topics
How to Get the Bssid of Currently Connected Network Through Bash
How to Send a File as an Email Attachment Using Linux Command Line
How to Make a Program Continue to Run After Log Out from Ssh
How to Recursively Grep All Directories and Subdirectories
How to Use Local Docker Images With Minikube
Command Not Found When Using Sudo
How to Obtain the Number of Cpus/Cores in Linux from the Command Line
Register File Extensions/Mime Types in Linux
How to Single Step Arm Assembly in Gdb on Qemu
How to Print a Character in Linux X86 Nasm
Error:13 - Permission Denied Android Studio
How to Fix 'Sudo: No Tty Present and No Askpass Program Specified' Error
What Happens to an Open File Handle on Linux If the Pointed File Gets Moved or Deleted
Peak Memory Usage of a Linux/Unix Process
How to Know the Script File Name in a Bash Script