Reliably Kill Sleep Process After Usr1 Signal

Reliably kill sleep process after USR1 signal

As the background job is a fork of the foreground one, they share the same name (trap-test.sh); so pkill matches and signals both. This, in an uncertain order, kills the background process (leaving sleep alive, explained below) and triggers the trap in the foreground one, hence the race condition.

Besides, in the examples you linked, the background job is always a mere sleep x, but in your script it is sleep 10 && echo 'doing some work'; which requires the forked subshell to wait sleep to terminate and conditionally execute echo. Compare these two:

$ sleep 10 &
[1] 9401
$ pstree 9401
sleep
$
$ sleep 10 && echo foo &
[2] 9410
$ pstree 9410
bash───sleep

So let's start from scratch and reproduce the principal issue in a terminal.

$ set +m
$ sleep 100 && echo 'doing some work' &
[1] 9923
$ pstree -pg $$
bash(9871,9871)─┬─bash(9923,9871)───sleep(9924,9871)
└─pstree(9927,9871)
$ kill $!
$ pgrep sleep
9924
$ pkill -e sleep
sleep killed (pid 9924)

I disabled job control to partly emulate a non-interactive shell's behavior.

Killing the background job didn't kill sleep, I needed to terminate it manually. This happened because a signal sent to a process is not automatically broadcast to its target's children; i.e. sleep didn't receive the TERM signal at all.

To kill sleep as well as the subshell, I need to put the background job into a separate process group —which requires job control to be enabled, otherwise all jobs are put into the main shell's process group as seen in pstree's output above—, and send the TERM signal to it, as shown below.


$ set -m
$ sleep 100 && echo 'doing some work' &
[1] 10058
$ pstree -pg $$
bash(9871,9871)─┬─bash(10058,
10058)───sleep(10059,10058)
└─pstree(10067,10067)
$ kill --
-$!
$
[1]+ Terminated sleep 100 && echo 'doing some work'
$ pgrep sleep
$

With some refinement and adaptation of this concept, your script looks like:

#!/bin/bash -
set -m

usr1_handler() {
kill -- -$!
echo 'doing some work'
}

do_something() {
trap '' USR1
sleep 10 && echo 'doing some work'
}

trap usr1_handler USR1 EXIT

echo "my PID is $$"

while true; do
do_something &
wait
done

This will print my PID is xxx (where xxx is the PID of foreground process) and start looping. Sending a USR1 signal to xxx (i.e kill -USR1 xxx) will trigger the trap and cause the background process and its children to terminate. Thus wait will return and the loop will continue.

If you use pkill instead it'll work anyway, as the background process ignores USR1.

For further information, see:

  • Bash Reference Manual § Special Parameters ($$ and $!),
  • POSIX kill specification (-$! usage),
  • POSIX Definitions § Job Control (how job control is implemented in POSIX shells),
  • Bash Reference Manual § Job Control Basics (how job control works in bash),
  • POSIX Shell Command Language § Signals And Error Handling,
  • POSIX wait specification.

Why may USR1 signals sent from background jobs in a Bash script not be reliably received by the parent shell process waiting for their completion?

This is explained in the Bash Reference Manual as follows.

When bash is waiting for an asynchronous command via the wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed.

So, you need to repeat wait until it returns 0 to make sure all background jobs have terminated, e.g.:

until wait; do
:
done

It was my understanding that it is impossible for the parent process to terminate before all background jobs do, due to the wait call.

That is a misunderstanding; wait may return due to reception of a signal for which a trap has been set while there are running jobs at the background, and that may lead to normal completion of the program, with the side effect of leaving those jobs orphaned.

Bash script catch signal but wait afterwards for processes to terminate

Something rewritted

In order to avoid some useless forks.

clock(){  local prefix=C interval=2
trap : RTMIN{,+{{,1}{1,2,3,4,5},6,7,8,9,10}}
while :;do
printf "%s: %(%d.%m %H:%M:%S)T\n" $prefix -1
sleep $interval
done
}

volume(){ local prefix=V vol=() field playback val foo
while IFS=':[]' read field playback val foo;do
[ "$playback" ] && [ -z "${playback//*Playback*}" ] && [ "$val" ] &&
vol+=(${val%\%})
done < <(amixer get Master)
suffix='%%'
if [ "$vol" = "off" ] ;then
icon="&" #alternative: deaf:  mute: 
suffix=''
elif (( vol > 50 )) ;then icon="("
elif (( vol > 30 )) ;then icon="("
else icon="'"
fi
printf -v values "%3s$suffix " ${vol[@]}
printf "%s%s %s\n" $prefix "$icon" "$values"
}

clock & volume &

trap volume RTMIN+2
trap : RTMIN{,+{{,1}{1,3,4,5},6,7,8,9,10,12}}
echo -e "To get status, run:\n kill -RTMIN+2 $$"

while :;do wait ;done

Regarding my last comment about stereo bug, there is a volume function working for stereo, mono or even quadra:

volume(){
local prefix=V vol=() field playback val foo
local -i overallvol=0
while IFS=':[]' read field playback val foo ;do
[ "$playback" ] && [ -z "${playback//*Playback*}" ] && [ "$val" ] && {
vol+=($val)
val=${val%\%}
overallvol+=${val//off/0}
}
done < <(
amixer get Master
)
overallvol=$overallvol/${#vol[@]}
if (( overallvol == 0 )) ;then
icon="&"
elif (( overallvol > 50 )) ;then
icon="("
elif (( overallvol > 30 )) ;then
icon="("
else
icon="'"
fi
printf "%s%s %s\n" $prefix "$icon" "${vol[*]}"
}

or even:

volume(){
local prefix=V vol=() field playback val foo icons=(⏻ ¼ ¼ ¼ ½ ½ ¾ ¾ ¾ ¾ ¾)
local -i overallvol=0
while IFS=':[]' read field playback val foo ;do
[ "$playback" ] && [ -z "${playback//*Playback*}" ] && [ "$val" ] && {
vol+=($val)
val=${val%\%}
overallvol+=${val//off/0}
}
done < <(
amixer get Master
)
overallvol=$overallvol/${#vol[@]}
printf "%s%s %s\n" $prefix "${icons[(9+overall)/10]}" "${vol[*]}"

Some explanations

Regarding useless forks in volume() function

I've posted there some ideas to improve the job, reducing resource eating and doing same job of choosing an icon as function of current volume set.

About while :;do wait;done loop

As requested sample stand for an infinite loop in backgrounded sub function, the main script use same infinite loop.

But as question title stand for wait afterwards for processes to terminate, I have to agree with oguz-ismail's comment.

In fact, last line would better be written:

until wait;do :;done

For more information on how wait command work and good practice, please have a look on good oguz-ismail's answer

Why is the KILL signal handler not executing when my child process dies

I couldn't find anything in the bash documentation that would explain the observed behavior, so I turned to the source code. Debugging lead to the function notify_of_job_status(). The line that prints the message about a killed subprocess can be reached only if all of the following conditions hold:

  1. the subprocess is registered in the job table (i.e. has not been disown-ed)
  2. the shell was NOT started in interactive mode
  3. the signal that terminated the child process is NOT trapped in the parent shell (see the signal_is_trapped (termsig) == 0 check)

Demonstration:

$ cat test.sh 
echo Starting a subprocess
LC_ALL=C sleep 100 &
Active_pid=$!
case "$1" in
disown) disown ;;
trapsigkill) trap "echo Signal SIGKILL caught" 9 ;;
esac
sleep 1
kill -9 $Active_pid
sleep 1
echo End of script

$ # Demonstrate the undesired message
$ bash test.sh
Starting a subprocess
test.sh: line 14: 15269 Killed LC_ALL=C sleep 100
End of script

$ # Suppress the undesired message by disowning the child process
$ bash test.sh disown
Starting a subprocess
End of script

$ # Suppress the undesired message by trapping SIGKILL in the parent shell
$ bash test.sh trapsigkill
Starting a subprocess
End of script

$ # Suppress the undesired message by using an interactive shell
$ bash -i test.sh
Starting a subprocess
End of script

How this removes the trace of the first test without executing echo Signal SIGKILL ?

The trap is not executed since the KILL signal is received by the sub-process rather than the shell process for which the trap has been set. The effect of the trap on the diagnostics is in the (somewhat arguable) logic in the notify_of_job_status() function.

Interrupt sleep in bash with a signal trap

#!/bin/bash

trap 'echo "Caught SIGUSR1"' SIGUSR1

echo "Sleeping. Pid=$$"
while :
do
sleep 10 &
wait $!
echo "Sleep over"
done

kill -INT $pid won't kill process, but ctrl+c will

Prepend process id with dash sign "-"

kill -SIGINT -<pid>

This will kill the process with exit code 130.

UPD: Why not to use SIGTERM(15) which is send by kill command by default (with no signal number or name)?



Related Topics



Leave a reply



Submit