Prevent a bash script from terminating after handling a SIGINT - linux

I am writing a bash wrapper for an application. This wrapper is responsible for changing the user, running the software and logging its output.
I also want it to propagate the SIGINT signal.
Here is my code so far :
#!/bin/bash
set -e; set -u
function child_of {
ps --ppid $1 -o "pid" --no-headers | head -n1
}
function handle_int {
echo "Received SIGINT"
kill -int $(child_of $SU_PID)
}
su myuser -p -c "bash /opt/loop.sh 2>&1 | tee -i >(logger -t mytag)" &
SU_PID=$!
trap "handle_int" SIGINT
wait $SU_PID
echo "This is the end."
My problem is that when I send a SIGINT to this wrapper, handle_int gets called but then the script is over, while I want it to continue to wait for $SU_PID.
Is there a way to catch the int signal, do something and then prevent the script from terminating ?

You have a gotcha: after Ctrl-C, "This is the end." is expected but it never comes because the script has exited prematurely. The reason is wait has (unexpectedly) returned non-zero while running under set -e.
According to "man bash":
If bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap
will not be executed until the command completes. When bash is waiting for an asynchronous command via the
wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return
immediately with an exit status greater than 128, immediately after which the trap is executed.
You should wrap your wait call in set +e so that your program can continue running after handling a trapped signal while waiting for an asynchronous command.
Like this:
# wait function that handles trapped signal on asynchronous commands.
function safe_async_wait {
set +e
wait $1 # returns >128 on asynchronous commands
set -e
}
#...
safe_async_wait $SU_PID

Related

How can I send a timeout signal to a wrapped command in sbatch?

I have a program that, when it receives a SIGUSR1, writes some output and quits. I'm trying to get sbatch to notify this program before timing out.
I enqueue the program using:
sbatch -t 06:00:00 --signal=USR1 ... --wrap my_program
but my_program never receives the signal. I've tried sending signals while the program is running, with: scancel -s USR1 <JOBID>, but without any success. I also tried scancel --full, but it kills the wrapper and my_program is not notified.
One option is to write a bash file that wraps my_program and traps the signal, forwarding it to my_program (similar to this example), but I don't need this cumbersome bash file for anything else. Also, sbatch --signal documentation very clearly says that, when you want to notify the enveloping bash file, you need to specify signal=B:, so I believe that the bash wrapper is not really necessary.
So, is there a way to send a SIGUSR1 signal to a program enqueued using sbatch --wrap?
Your command is sending the USR1 to the shell created by the --wrap. However, if you want the signal to be caught and processed, you're going to need to write the shell functions to handle the signal and that's probably too much for a --wrap command.
These folks are doing it but you can't see into their setup.sh script to see what they are defining. https://docs.nersc.gov/jobs/examples/#annotated-example-automated-variable-time-jobs
Note they use "." to run the code in setup.sh in the same process instead of spawing a sub-shell. You need that.
These folks describe a nice method of creating the functions you need: Is it possible to detect *which* trap signal in bash?
The only thing they don't show there is the function that would actually take action on receiving the signal. Here's what I wrote that does it - put this in a file that can be included from any user's sbatch submit script and show them how to use it and the --signal option:
trap_with_arg() {
func="$1" ; shift
for sig ; do
echo "setting trap for $sig"
trap "$func $sig" "$sig"
done
}
func_trap () {
echo "called with sig $1"
case $1 in
USR1)
echo "caught SIGUSR1, making ABORT file"
date
cd $WORKDIR
touch ABORT
ls -l ABORT
;;
*) echo "something else" ;;
esac
}
trap_with_arg func_trap USR1 USR2

Why can't I catch process signals when jobs control is enabled in a script?

When I enable jobs control into a shell script (using set -m), I am no longer able to catch process signals. Take a look at the following code:
#!/bin/bash
set -m
for i in `seq 15`; do
trap 'echo " Signal $i catched"' $i
done
while true; do
echo " Waiting for a process signal"
sleep 999
done
When I run the above code and I press for example Ctrl + C, nothing happens:
Waiting for a process signal
^CWaiting for a process signal
However, when I run the same code deleting set -m, I do get an answer:
Waiting for a process signal
^C Signal 15 catched
My questions are:
Why is it not working?
Is it possible to catch process signals when jobs control is enabled?
Note: It doesn't happen with all processes, if I use read instead of sleep, it does work.
The solution is to put sleep in the background and use a bash builtin, wait, to wait while sleep completes. Thus, try this:
#!/bin/bash
set -m
for i in `seq 15`; do
trap 'echo " Signal $i catched"' $i
done
while true; do
echo " Waiting for a process signal"
sleep 999 & wait $!
done
Sample run:
$ bash script
Waiting for a process signal
^C Signal 15 catched
Waiting for a process signal
^C Signal 15 catched
Waiting for a process signal
For more details, see Greg's FAQ.

Don't show the output of kill command in a Linux bash script [duplicate]

How can you suppress the Terminated message that comes up after you kill a
process in a bash script?
I tried set +bm, but that doesn't work.
I know another solution involves calling exec 2> /dev/null, but is that
reliable? How do I reset it back so that I can continue to see stderr?
In order to silence the message, you must be redirecting stderr at the time the message is generated. Because the kill command sends a signal and doesn't wait for the target process to respond, redirecting stderr of the kill command does you no good. The bash builtin wait was made specifically for this purpose.
Here is very simple example that kills the most recent background command. (Learn more about $! here.)
kill $!
wait $! 2>/dev/null
Because both kill and wait accept multiple pids, you can also do batch kills. Here is an example that kills all background processes (of the current process/script of course).
kill $(jobs -rp)
wait $(jobs -rp) 2>/dev/null
I was led here from bash: silently kill background function process.
The short answer is that you can't. Bash always prints the status of foreground jobs. The monitoring flag only applies for background jobs, and only for interactive shells, not scripts.
see notify_of_job_status() in jobs.c.
As you say, you can redirect so standard error is pointing to /dev/null but then you miss any other error messages. You can make it temporary by doing the redirection in a subshell which runs the script. This leaves the original environment alone.
(script 2> /dev/null)
which will lose all error messages, but just from that script, not from anything else run in that shell.
You can save and restore standard error, by redirecting a new filedescriptor to point there:
exec 3>&2 # 3 is now a copy of 2
exec 2> /dev/null # 2 now points to /dev/null
script # run script with redirected stderr
exec 2>&3 # restore stderr to saved
exec 3>&- # close saved version
But I wouldn't recommend this -- the only upside from the first one is that it saves a sub-shell invocation, while being more complicated and, possibly even altering the behavior of the script, if the script alters file descriptors.
EDIT:
For more appropriate answer check answer given by Mark Edgar
Solution: use SIGINT (works only in non-interactive shells)
Demo:
cat > silent.sh <<"EOF"
sleep 100 &
kill -INT $!
sleep 1
EOF
sh silent.sh
http://thread.gmane.org/gmane.comp.shells.bash.bugs/15798
Maybe detach the process from the current shell process by calling disown?
The Terminated is logged by the default signal handler of bash 3.x and 4.x. Just trap the TERM signal at the very first of child process:
#!/bin/sh
## assume script name is test.sh
foo() {
trap 'exit 0' TERM ## here is the key
while true; do sleep 1; done
}
echo before child
ps aux | grep 'test\.s[h]\|slee[p]'
foo &
pid=$!
sleep 1 # wait trap is done
echo before kill
ps aux | grep 'test\.s[h]\|slee[p]'
kill $pid ## no need to redirect stdin/stderr
sleep 1 # wait kill is done
echo after kill
ps aux | grep 'test\.s[h]\|slee[p]'
Is this what we are all looking for?
Not wanted:
$ sleep 3 &
[1] 234
<pressing enter a few times....>
$
$
[1]+ Done sleep 3
$
Wanted:
$ (set +m; sleep 3 &)
<again, pressing enter several times....>
$
$
$
$
$
As you can see, no job end message. Works for me in bash scripts as well, also for killed background processes.
'set +m' disables job control (see 'help set') for the current shell. So if you enter your command in a subshell (as done here in brackets) you will not influence the job control settings of the current shell. Only disadvantage is that you need to get the pid of your background process back to the current shell if you want to check whether it has terminated, or evaluate the return code.
This also works for killall (for those who prefer it):
killall -s SIGINT (yourprogram)
suppresses the message... I was running mpg123 in background mode.
It could only silently be killed by sending a ctrl-c (SIGINT) instead of a SIGTERM (default).
disown did exactly the right thing for me -- the exec 3>&2 is risky for a lot of reasons -- set +bm didn't seem to work inside a script, only at the command prompt
Had success with adding 'jobs 2>&1 >/dev/null' to the script, not certain if it will help anyone else's script, but here is a sample.
while true; do echo $RANDOM; done | while read line
do
echo Random is $line the last jobid is $(jobs -lp)
jobs 2>&1 >/dev/null
sleep 3
done
Another way to disable job notifications is to place your command to be backgrounded in a sh -c 'cmd &' construct.
#!/bin/bash
# ...
pid="`sh -c 'sleep 30 & echo ${!}' | head -1`"
kill "$pid"
# ...
# or put several cmds in sh -c '...' construct
sh -c '
sleep 30 &
pid="${!}"
sleep 5
kill "${pid}"
'
I found that putting the kill command in a function and then backgrounding the function suppresses the termination output
function killCmd() {
kill $1
}
killCmd $somePID &
Simple:
{ kill $! } 2>/dev/null
Advantage? can use any signal
ex:
{ kill -9 $PID } 2>/dev/null

sh trap SIGINT failed, but trap SIGQUIT success

I want to trap the CtrL+c and CtrL+\, then the cmd below added into my script:
trap _trapException SIGINT SIGQUIT
function _trapException(){
echo "The job is canceled!"
exit
}
However, this can trap CtrL+\ but can not trap CtrL+c,
I delete the SIGQUIT, it still does not trap CtrL+c.
Otherwise, I used tee function in my script at the same time.
Your handler function and trap call are fine. The function will be called when you raise either SIGINT or SIGQUIT for the first time. However, in the signal handler, you are also calling exit. That means it's going to kill the process.
Try removing the exit call from the function _trapException.
#Blue Moon I found what causes the problem when I rewrite a demo code to reproduce it. The demo as below:
test.sh
#!/bin/sh
#encoding:UTF-8
trap _trapException SIGINT SIGQUIT
function _trapException(){
echo "INFO: The job is canceled!"
exit 1
}
sh trap_test.sh | tee -a test.log
while the trap_test.sh also has a trap function, the function like below:
#!/bin/shell
trap test SIGINT SIGQUIT
function test(){
echo "trap test!"
exit 1
}
while true
do
echo "test"
sleep 10
done
when I run sh test.sh, then trap the CtrL+c failed, however trap Ctrl+\ success;
when I delete the trap code in trap_test.sh , it can trap both signal when run sh test.sh.
The deep reason still unknown?

Bash not trapping interrupts during rsync/subshell exec statements

Context:
I have a bash script that contains a subshell and a trap for the EXIT pseudosignal, and it's not properly trapping interrupts during an rsync. Here's an example:
#!/bin/bash
logfile=/path/to/file;
directory1=/path/to/dir
directory2=/path/to/dir
cleanup () {
echo "Cleaning up!"
#do stuff
trap - EXIT
}
trap '{
(cleanup;) | 2>&1 tee -a $logfile
}' EXIT
(
#main script logic, including the following lines:
(exec sleep 10;);
(exec rsync --progress -av --delete $directory1 /var/tmp/$directory2;);
) | 2>&1 tee -a $logfile
trap - EXIT #just in case cleanup isn't called for some reason
The idea of the script is this: most of the important logic runs in a subshell which is piped through tee and to a logfile, so I don't have to tee every single line of the main logic to get it all logged. Whenever the subshell ends, or the script is stopped for any reason (the EXIT pseudosignal should capture all of these cases), the trap will intercept it and run the cleanup() function, and then remove the trap. The rsync and sleep commands (the sleep is just an example) are run through exec to prevent the creation of zombie processes if I kill the parent script while they're running, and each potentially-long-running command is wrapped in its own subshell so that when exec finishes, it won't terminate the whole script.
The problem:
If I interrupt the script (via kill or CTRL+C) during the exec/subshell wrapped sleep command, the trap works properly, and I see "Cleaning up!" echoed and logged. If I interrupt the script during the rsync command, I see rsync end, and write rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(544) [sender=3.0.6] to the screen, and then the script just dies; no cleanup, no trapping. Why doesn't an interrupting/killing of rsync trigger the trap?
I've tried using the --no-detach switch with rsync, but it didn't change anything.
I have bash 4.1.2, rsync 3.0.6, centOS 6.2.
How about just having all the output from point X be redirected to tee without having to repeat it everywhere and mess with all the sub-shells and execs ... (hope I didn't miss something)
#!/bin/bash
logfile=/path/to/file;
directory1=/path/to/dir
directory2=/path/to/dir
exec > >(exec tee -a $logfile) 2>&1
cleanup () {
echo "Cleaning up!"
#do stuff
trap - EXIT
}
trap cleanup EXIT
sleep 10
rsync --progress -av --delete $directory1 /var/tmp/$directory2
In addition to set -e, I think you want set -E:
If set, any trap on ERR is inherited by shell functions, command substitutions, and commands executed in a sub‐shell environment. The ERR trap is normally not inherited in such cases.
Alternatively, instead of wrapping your commands in subshells use curly braces which will still give you the ability to redirect command outputs but will execute them in the current shell.
The interupt will be properly caught if you add INT to the trap
trap '{
(cleanup;) | 2>&1 tee -a $logfile
}' EXIT INT
Bash is trapping interrupts correctly. However, this does not anwer the question, why the script traps on exit if sleep is interupted, nor why it does not trigger on rsync, but makes the script work as it is supposed to. Hope this helps.
Your shell might be configured to exit on error:
bash # enter subshell
set -e
trap "echo woah" EXIT
sleep 4
If you interrupt sleep (^C) then the subshell will exit due to set -e and print woah in the process.
Also, slightly unrelated: your trap - EXIT is in a subshell (explicitly), so it won't have an effect after the cleanup function returns
It's pretty clear from experimentation that rsync behaves like other tools such as ping and do not inherit signals from the calling Bash parent.
So you have to get a little creative with this and do something like the following:
$ cat rsync.bash
#!/bin/sh
set -m
trap '' SIGINT SIGTERM EXIT
rsync -avz LargeTestFile.500M root#host.mydom.com:/tmp/. &
wait
echo FIN
Now when I run it:
$ ./rsync.bash
X11 forwarding request failed
building file list ... done
LargeTestFile.500M
^C^C^C^C^C^C^C^C^C^C
sent 509984 bytes received 42 bytes 92732.00 bytes/sec
total size is 524288000 speedup is 1027.96
FIN
And we can see the file did fully transfer:
$ ll -h | grep Large
-rw-------. 1 501 games 500M Jul 9 21:44 LargeTestFile.500M
How it works
The trick here is we're telling Bash via set -m to disable job controls on any background jobs within it. We're then backgrounding the rsync and then running a wait command which will wait on the last run command, rsync, until it's complete.
We then guard the entire script with the trap '' SIGINT SIGTERM EXIT.
References
https://access.redhat.com/solutions/360713
https://access.redhat.com/solutions/1539283

Resources