How can I send a timeout signal to a wrapped command in sbatch? - slurm

I have a program that, when it receives a SIGUSR1, writes some output and quits. I'm trying to get sbatch to notify this program before timing out.
I enqueue the program using:
sbatch -t 06:00:00 --signal=USR1 ... --wrap my_program
but my_program never receives the signal. I've tried sending signals while the program is running, with: scancel -s USR1 <JOBID>, but without any success. I also tried scancel --full, but it kills the wrapper and my_program is not notified.
One option is to write a bash file that wraps my_program and traps the signal, forwarding it to my_program (similar to this example), but I don't need this cumbersome bash file for anything else. Also, sbatch --signal documentation very clearly says that, when you want to notify the enveloping bash file, you need to specify signal=B:, so I believe that the bash wrapper is not really necessary.
So, is there a way to send a SIGUSR1 signal to a program enqueued using sbatch --wrap?

Your command is sending the USR1 to the shell created by the --wrap. However, if you want the signal to be caught and processed, you're going to need to write the shell functions to handle the signal and that's probably too much for a --wrap command.
These folks are doing it but you can't see into their setup.sh script to see what they are defining. https://docs.nersc.gov/jobs/examples/#annotated-example-automated-variable-time-jobs
Note they use "." to run the code in setup.sh in the same process instead of spawing a sub-shell. You need that.
These folks describe a nice method of creating the functions you need: Is it possible to detect *which* trap signal in bash?
The only thing they don't show there is the function that would actually take action on receiving the signal. Here's what I wrote that does it - put this in a file that can be included from any user's sbatch submit script and show them how to use it and the --signal option:
trap_with_arg() {
func="$1" ; shift
for sig ; do
echo "setting trap for $sig"
trap "$func $sig" "$sig"
done
}
func_trap () {
echo "called with sig $1"
case $1 in
USR1)
echo "caught SIGUSR1, making ABORT file"
date
cd $WORKDIR
touch ABORT
ls -l ABORT
;;
*) echo "something else" ;;
esac
}
trap_with_arg func_trap USR1 USR2

Related

Shell Script get CTRL+Z with Trap

I am trying to get the SIGSTOP CTRL+Z signal in my script's trap.
When my script is executing, if I temporarily suspend from execution, send a SIGSTOP signalCTRL+Z, it needs to remove the files I create in it and to kill the execution.
I don't understand why the following script doesn't work. But, more important, what is the correct way to do it?
#!/bin/bash
DIR="temp_folder"
trap "rm -r $DIR; kill -SIGINT $$" SIGSTP
if [ -d $DIR ]
then
rm -r $DIR
else
mkdir $DIR
fi
sleep 5
EDIT:
SIGSTOP cannot be trapped, however SIGTSTP can be trapped, and from what I understood after searching on the internet and the answer below it's the correct to trap when sending signal with CTRL+Z. However, when I press CTRL+Z while running the script it will get stuck, meaning that the script will be endlessly execute no matter what signals I send afterwards.
The problem here is you are trying to suspend a process that is already sleeping.
It is also good practice to use DIR=$(mktemp -d) in shell scripts to create temp directories.
CTRL-C is signal (2) / CTRL-Z (20):
catch_exits() {
printf "\n$(basename $0): exiting\n" 1>&2
rm -rf $DIR
exit 1
}
trap catch_exits 1 2 3 15 20
DIR="$(mktemp -d)"
read -p "not sleeping" test
if you send a function to the background (such as for a cursor spinner) - then you need to disable CTRL-Z while the long process is running with:
trap "" SIGTSTP
There are two signals you can't trap*, SIGKILL and SIGSTOP. Use another signal.
*: without modifying the kernel
IEEE standard:
Setting a trap for SIGKILL or SIGSTOP produces undefined results.

defer pipe process to background after text match

So I have a bash command to start a server and it outputs some lines before getting to the point where it outputs something like "Server started, Press Control+C to exit". How do I pipe this output so when this line occurs i put this process in the background and continue with another script/function (i.e to do stuff that needs to wait until the server starts such as run tests)
I want to end up with 3 functions
start_server
run_tests
stop_server
I've got something along the lines of:
function read_server_output{
while read data; do
printf "$data"
if [[ $data == "Server started, Press Control+C to exit" ]]; then
# do something here to put server process in the background
# so I can run another function
fi
done
}
function start_server{
# start the server and pipe its output to another function to check its running
start-server-command | read_server_output
}
function run_test{
# do some stuff
}
function stop_server{
# stop the server
}
# run the bash script code
start_server()
run_tests()
stop_tests()
related question possibly SH/BASH - Scan a log file until some text occurs, then exit. How?
Thanks in advance I'm pretty new to this.
First, a note on terminology...
"Background" and "foreground" are controlling-terminal concepts, i.e., they have to do with what happens when you type ctrl+C, ctrl+Z, etc. (which process gets the signal), whether a process can read from the terminal device (a "background" process gets a SIGTTIN that by default causes it to stop), and so on.
It seems clear that this has little to do with what you want to achieve. Instead, you have an ill-behaved program (or suite of programs) that needs some special coddling: when the server is first started, it needs some hand-holding up to some point, after which it's OK. The hand-holding can stop once it outputs some text string (see your related question for that, or the technique below).
There's a big potential problem here: a lot of programs, when their output is redirected to a pipe or file, produce no output until they have printed a "block" worth of output, or are exiting. If this is the case, a simple:
start-server-command | cat
won't print the line you're looking for (so that's a quick way to tell whether you will have to work around this issue as well). If so, you'll need something like expect, which is an entirely different way to achieve what you want.
Assuming that's not a problem, though, let's try an entirely-in-shell approach.
What you need is to run the start-server-command and save the process-ID so that you can (eventually) send it a SIGINT signal (as ctrl+C would if the process were "in the foreground", but you're doing this from a script, not from a controlling terminal, so there's no key the script can press). Fortunately sh has a syntax just for this.
First let's make a temporary file:
#! /bin/sh
# myscript - script to run server, check for startup, then run tests
TMPFILE=$(mktemp -t myscript) || exit 1 # create /tmp/myscript.<unique>
trap "rm -f $TMPFILE" 0 1 2 3 15 # arrange to clean up when done
Now start the server and save its PID:
start-server-command > $TMPFILE & # start server, save output in file
SERVER_PID=$! # and save its PID so we can end it
trap "kill -INT $SERVER_PID; rm -f $TMPFILE" 0 1 2 3 15 # adjust cleanup
Now you'll want to scan through $TMPFILE until the desired output appears, as in the other question. Because this requires a certain amount of polling you should insert a delay. It's also probably wise to check whether the server has failed and terminated without ever getting to the "started" point.
while ! grep '^Server started, Press Control+C to exit$' >/dev/null; do
# message has not yet appeared, is server still starting?
if kill -0 $SERVER_PID 2>/dev/null; then
# server is running; let's wait a bit and try grepping again
sleep 1 # or other delay interval
else
echo "ERROR: server terminated without starting properly" 1>&2
exit 1
fi
done
(Here kill -0 is used to test whether the process still exists; if not, it has exited. The "cleanup" kill -INT will produce an error message, but that's probably OK. If not, either redirect that kill command's error-output, or adjust the cleanup or do it manually, as seen below.)
At this point, the server is running and you can do your tests. When you want it to exit as if the user hit ctrl+C, send it a SIGINT with kill -INT.
Since there's a kill -INT in the trap set for when the script exits (0) as well as when it's terminated by SIGHUP (1), SIGINT (2), SIGQUIT (3), and SIGTERM (15)—that's the:
trap "do some stuff" 0 1 2 3 15
part—you can simply let your script exit at this point, unless you want to specifically wait for the server to exit too. If you want that, perhaps:
kill -INT $SERVER_PID; rm -f $TMPFILE # do the pre-arranged cleanup now
trap - 0 1 2 3 15 # don't need it arranged anymore
wait $SERVER_PID # wait for server to finish exit
would be appropriate.
(Obviously none of the above is tested, but that's the general framework.)
Probably the easiest thing to do is to start it in the background and block on reading its output. Something like:
{ start-server-command & } | {
while read -r line; do
echo "$line"
echo "$line" | grep -q 'Server started' && break
done
cat &
}
echo script continues here after server outputs 'Server started' message
But this is a pretty ugly hack. It would be better if the server could be modified to perform a more specific action which the script could wait for.

Prevent a bash script from terminating after handling a SIGINT

I am writing a bash wrapper for an application. This wrapper is responsible for changing the user, running the software and logging its output.
I also want it to propagate the SIGINT signal.
Here is my code so far :
#!/bin/bash
set -e; set -u
function child_of {
ps --ppid $1 -o "pid" --no-headers | head -n1
}
function handle_int {
echo "Received SIGINT"
kill -int $(child_of $SU_PID)
}
su myuser -p -c "bash /opt/loop.sh 2>&1 | tee -i >(logger -t mytag)" &
SU_PID=$!
trap "handle_int" SIGINT
wait $SU_PID
echo "This is the end."
My problem is that when I send a SIGINT to this wrapper, handle_int gets called but then the script is over, while I want it to continue to wait for $SU_PID.
Is there a way to catch the int signal, do something and then prevent the script from terminating ?
You have a gotcha: after Ctrl-C, "This is the end." is expected but it never comes because the script has exited prematurely. The reason is wait has (unexpectedly) returned non-zero while running under set -e.
According to "man bash":
If bash is waiting for a command to complete and receives a signal for which a trap has been set, the trap
will not be executed until the command completes. When bash is waiting for an asynchronous command via the
wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return
immediately with an exit status greater than 128, immediately after which the trap is executed.
You should wrap your wait call in set +e so that your program can continue running after handling a trapped signal while waiting for an asynchronous command.
Like this:
# wait function that handles trapped signal on asynchronous commands.
function safe_async_wait {
set +e
wait $1 # returns >128 on asynchronous commands
set -e
}
#...
safe_async_wait $SU_PID

Setting variables in a KSH spawned process

I have a lengthy menu script that relies on a few command outputs for it's variables. These commands take several seconds to run each and I would like to spawn new processes to set these variables. It would look something like this:
VAR1=`somecommand` &
VAR2=`somecommand` &
...
wait
echo $VAR1 $VAR2
The problem is that the processes are spawned and die with those variables they set. I realize that I can do this by sending these to a file and then reading that but I would like to do it without a temp file. Any ideas?
You can get the whole process' output using command substitution, like:
VAR1=$(somecommand &)
VAR2=$(somecommand &)
...
wait
echo $VAR1 $VAR2
This is rather clunky, but works for me. I have three scripts.
cmd.sh is your "somecommand", it is a test script only:
#!/bin/ksh
sleep 10
echo "End of job $1"
Below is wrapper.sh, which runs a single command, captures the output, signals the parent when done, then writes the result to stdout:
#!/bin/ksh
sig=$1
shift
var=$($#)
kill -$sig $PPID
echo $var
and here is the parent script:
#!/bin/ksh
trap "read -u3 out1" SIGUSR1
trap "read -p out2" SIGUSR2
./wrapper.sh SIGUSR1 ./cmd.sh one |&
exec 3<&p
exec 4>&p
./wrapper.sh SIGUSR2 ./cmd.sh two |&
wait
wait
echo "out1: $out1, out2: $out2"
echo "Ended"
2x wait because the first will be interrupted.
In the parent script I am running the wrapper twice, once for each job, passing in the command to be run and any arguments. The |& means "pipe to background" - run as a co-process.
The two exec commands copy the pipe file descriptors to fds 3 and 4. When the jobs are finished, the wrapper signals the main process to read the pipes. The signals are caught using the trap, which read the pipe for the appropriate child process, and gather the resulting data.
Rather convoluted and clunky, but it appears to work.

Bash not trapping interrupts during rsync/subshell exec statements

Context:
I have a bash script that contains a subshell and a trap for the EXIT pseudosignal, and it's not properly trapping interrupts during an rsync. Here's an example:
#!/bin/bash
logfile=/path/to/file;
directory1=/path/to/dir
directory2=/path/to/dir
cleanup () {
echo "Cleaning up!"
#do stuff
trap - EXIT
}
trap '{
(cleanup;) | 2>&1 tee -a $logfile
}' EXIT
(
#main script logic, including the following lines:
(exec sleep 10;);
(exec rsync --progress -av --delete $directory1 /var/tmp/$directory2;);
) | 2>&1 tee -a $logfile
trap - EXIT #just in case cleanup isn't called for some reason
The idea of the script is this: most of the important logic runs in a subshell which is piped through tee and to a logfile, so I don't have to tee every single line of the main logic to get it all logged. Whenever the subshell ends, or the script is stopped for any reason (the EXIT pseudosignal should capture all of these cases), the trap will intercept it and run the cleanup() function, and then remove the trap. The rsync and sleep commands (the sleep is just an example) are run through exec to prevent the creation of zombie processes if I kill the parent script while they're running, and each potentially-long-running command is wrapped in its own subshell so that when exec finishes, it won't terminate the whole script.
The problem:
If I interrupt the script (via kill or CTRL+C) during the exec/subshell wrapped sleep command, the trap works properly, and I see "Cleaning up!" echoed and logged. If I interrupt the script during the rsync command, I see rsync end, and write rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(544) [sender=3.0.6] to the screen, and then the script just dies; no cleanup, no trapping. Why doesn't an interrupting/killing of rsync trigger the trap?
I've tried using the --no-detach switch with rsync, but it didn't change anything.
I have bash 4.1.2, rsync 3.0.6, centOS 6.2.
How about just having all the output from point X be redirected to tee without having to repeat it everywhere and mess with all the sub-shells and execs ... (hope I didn't miss something)
#!/bin/bash
logfile=/path/to/file;
directory1=/path/to/dir
directory2=/path/to/dir
exec > >(exec tee -a $logfile) 2>&1
cleanup () {
echo "Cleaning up!"
#do stuff
trap - EXIT
}
trap cleanup EXIT
sleep 10
rsync --progress -av --delete $directory1 /var/tmp/$directory2
In addition to set -e, I think you want set -E:
If set, any trap on ERR is inherited by shell functions, command substitutions, and commands executed in a sub‐shell environment. The ERR trap is normally not inherited in such cases.
Alternatively, instead of wrapping your commands in subshells use curly braces which will still give you the ability to redirect command outputs but will execute them in the current shell.
The interupt will be properly caught if you add INT to the trap
trap '{
(cleanup;) | 2>&1 tee -a $logfile
}' EXIT INT
Bash is trapping interrupts correctly. However, this does not anwer the question, why the script traps on exit if sleep is interupted, nor why it does not trigger on rsync, but makes the script work as it is supposed to. Hope this helps.
Your shell might be configured to exit on error:
bash # enter subshell
set -e
trap "echo woah" EXIT
sleep 4
If you interrupt sleep (^C) then the subshell will exit due to set -e and print woah in the process.
Also, slightly unrelated: your trap - EXIT is in a subshell (explicitly), so it won't have an effect after the cleanup function returns
It's pretty clear from experimentation that rsync behaves like other tools such as ping and do not inherit signals from the calling Bash parent.
So you have to get a little creative with this and do something like the following:
$ cat rsync.bash
#!/bin/sh
set -m
trap '' SIGINT SIGTERM EXIT
rsync -avz LargeTestFile.500M root#host.mydom.com:/tmp/. &
wait
echo FIN
Now when I run it:
$ ./rsync.bash
X11 forwarding request failed
building file list ... done
LargeTestFile.500M
^C^C^C^C^C^C^C^C^C^C
sent 509984 bytes received 42 bytes 92732.00 bytes/sec
total size is 524288000 speedup is 1027.96
FIN
And we can see the file did fully transfer:
$ ll -h | grep Large
-rw-------. 1 501 games 500M Jul 9 21:44 LargeTestFile.500M
How it works
The trick here is we're telling Bash via set -m to disable job controls on any background jobs within it. We're then backgrounding the rsync and then running a wait command which will wait on the last run command, rsync, until it's complete.
We then guard the entire script with the trap '' SIGINT SIGTERM EXIT.
References
https://access.redhat.com/solutions/360713
https://access.redhat.com/solutions/1539283

Resources