Bash script: can not properly handle SIGTSTP

Bash script: can not properly handle SIGTSTP - linux

I have a bash script that mounts and unmounts a device, which performing some read operations in between. Since the device is very slow, the script takes about 15 seconds to complete (the mount taking atleast 5-6 seconds). Since leaving this device mounted can cause other problems, I don't want this script to be interrupted.
Having said that, I can correctly handle SIGINT (Ctrl+c), but when I try to handle SIGTSTP (Ctrl+z), the script freezes. Which means the signal is trapped but the handler doesn't run.
#!/bin/sh
cleanup()
{
# Don't worry about unmounting yet. Just checking if trap works.
echo "Quitting..." > /dev/tty
exit 0
}
trap 'cleanup' SIGTSTP
...
I manually have to send the KILL signal to the process. Any idea why this is happening and how I can fix it?

The shell does not execute the trap until the currently executing process terminates. (at least, that is the behavior of bash 3.00.15). If you send SIGINT via ^c, it is sent to all processes in the foreground process group; if the program currently executing receives it and terminates then bash can execute the trap. Similarly with SIGTSTP via ^z; bash receives the signal but does not execute the trap until the program that was being run terminates, which it does not do if it takes the default behavior and is suspended. Try replacing ... with a simple read f and note that the trap executes immediately.

Related

`trap`-ing Linux Signals - SIGSTOP?

The Docker docs note the following (w/ code) on how to run clean up upon a container's shutdown:
Lastly, if you need to do some extra cleanup ... on shutdown, ..., you
may need to ensure that the ENTRYPOINT script receives the Unix
signals, passes them on, and then does some more work
#!/bin/sh
trap "echo TRAPed signal" HUP INT QUIT KILL TERM
/usr/sbin/apachectl start
I think that this trap will catch a KILL signal. However, I read in this post:
All signals except for SIGKILL and SIGSTOP can be intercepted by the process.
However, another post states:
There is one signal that you cannot trap: SIGKILL or signal 9.
Which is it?

You cannot catch SIGSTOP. The page you referenced was mistaken.
Yes, the above script tries to catch the KILL signal. It fails. You can verify this yourself quite easily with the following script:
#!/bin/sh
echo "Running shell in pid $$"
trap "echo TRAPed signal" HUP INT QUIT KILL TERM STOP
sleep 20
Try sending it the KILL and STOP signals, you will see that the process dies and halts, respectively, without the message being printed. If you try any other caught signal, you will see those are handled as expected.

Does linux kill background processes if we close the terminal from which it has started?

I have an embedded system, on which I do telnet and then I run an application in background:
./app_name &
Now if I close my terminal and do telnet from other terminal and if I check then I can see this process is still running.
To check this I have written a small program:
#include<stdio.h>
main()
{
while(1);
}
I ran this program in my local linux pc in background and I closed the terminal.
Now, when I checked for this process from other terminal then I found that this process was also killed.
My question is:
Why undefined behavior for same type of process?
On which it is dependent?
Is it dependent on version of Linux?

Who should kill jobs?
Normally, foreground and background jobs are killed by SIGHUP sent by kernel or shell in different circumstances.
When does kernel send SIGHUP?
Kernel sends SIGHUP to controlling process:
for real (hardware) terminal: when disconnect is detected in a terminal driver, e.g. on hang-up on modem line;
for pseudoterminal (pty): when last descriptor referencing master side of pty is closed, e.g. when you close terminal window.
Kernel sends SIGHUP to other process groups:
to foreground process group, when controlling process terminates;
to orphaned process group, when it becomes orphaned and it has stopped members.
Controlling process is the session leader that established the connection to the controlling terminal.
Typically, the controlling process is your shell. So, to sum up:
kernel sends SIGHUP to the shell when real or pseudoterminal is disconnected/closed;
kernel sends SIGHUP to foreground process group when the shell terminates;
kernel sends SIGHUP to orphaned process group if it contains stopped processes.
Note that kernel does not send SIGHUP to background process group if it contains no stopped processes.
When does bash send SIGHUP?
Bash sends SIGHUP to all jobs (foreground and background):
when it receives SIGHUP, and it is an interactive shell (and job control support is enabled at compile-time);
when it exits, it is an interactive login shell, and huponexit option is set (and job control support is enabled at compile-time).
See more details here.
Notes:
bash does not send SIGHUP to jobs removed from job list using disown;
processes started using nohup ignore SIGHUP.
More details here.
What about other shells?
Usually, shells propagate SIGHUP. Generating SIGHUP at normal exit is less common.
Telnet or SSH
Under telnet or SSH, the following should happen when connection is closed (e.g. when you close telnet window on PC):
client is killed;
server detects that client connection is closed;
server closes master side of pty;
kernel detects that master pty is closed and sends SIGHUP to bash;
bash receives SIGHUP, sends SIGHUP to all jobs and terminates;
each job receives SIGHUP and terminates.
Problem
I can reproduce your issue using bash and telnetd from busybox or dropbear SSH server: sometimes, background job doesn't receive SIGHUP (and doesn't terminate) when client connection is closed.
It seems that a race condition occurs when server (telnetd or dropbear) closes master side of pty:
normally, bash receives SIGHUP and immediately kills background jobs (as expected) and terminates;
but sometimes, bash detects EOF on slave side of pty before handling SIGHUP.
When bash detects EOF, it by default terminates immediately without sending SIGHUP. And background job remains running!
Solution
It is possible to configure bash to send SIGHUP on normal exit (including EOF) too:
Ensure that bash is started as login shell. The huponexit works only for login shells, AFAIK.
Login shell is enabled by -l option or leading hyphen in argv[0]. You can configure telnetd to run /bin/bash -l or better /bin/login which invokes /bin/sh in login shell mode.
E.g.:
telnetd -l /bin/login
Enable huponexit option.
E.g.:
shopt -s huponexit
Type this in bash session every time or add it to .bashrc or /etc/profile.
Why does the race occur?
bash unblocks signals only when it's safe, and blocks them when some code section can't be safely interrupted by a signal handler.
Such critical sections invoke interruption points from time to time, and if signal is received when a critical section is executed, it's handler is delayed until next interruption point happens or critical section is exited.
You can start digging from quit.h in the source code.
Thus, it seems that in our case bash sometimes receives SIGHUP when it's in a critical section. SIGHUP handler execution is delayed, and bash reads EOF and terminates before exiting critical section or calling next interruption point.
Reference
"Job Control" section in official Glibc manual.
Chapter 34 "Process Groups, Sessions, and Job Control" of "The Linux Programming Interface" book.

When you close the terminal, shell sends SIGHUP to all background processes – and that kills them. This can be suppressed in several ways, most notably:
nohup
When you run program with nohup it catches SIGHUP and redirect program output.
$ nohup app &
disown
disown tells shell not to send SIGHUP
$ app &
$ disown
Is it dependent on version of linux?
It is dependent on your shell. Above applies at least for bash.

AFAIK in both cases the process should be killed. In order to avoid this you have to issue a nohup like the following:
> nohup ./my_app &
This way your process will continue executing. Probably the telnet part is due to a BUG similar to this one:
https://bugzilla.redhat.com/show_bug.cgi?id=89653

In order completely understand whats happening you need to get into unix internals a little bit.
When you are running a command like this
./app_name &
The app_name is sent to background process group. You can check about unix process groups here
When you close bash with normal exit it triggers SIGHUP hangup signal to all its jobs. Some information on unix job control is here.
In order to keep your app running when you exit bash you need to make your app immune to hangup signal with nohup utility.
nohup - run a command immune to hangups, with output to a non-tty
And finally this is how you need to do it.
nohup app_name & 2> /dev/null;

In modern Linux--that is, Linux with systemd--there is an additional reason this might happen which you should be aware of: "linger".
systemd kills processes left running from a login shell, even if the process is properly daemonized and protected from HUP. This is the default behavior in modern configurations of systemd.
If you run
loginctl enable-linger $USER
you can disable this behavior, allowing background processes to keep running. The mechanisms covered by the other answers still apply, however, and you should also protect your process against them.
enable-linger is permanent until it is re-disabled. You can check it with
ls /var/lib/systemd/linger
This may have files, one per username, for users who have enable-linger. Any user listed in the directory has the ability to leave background processes running at logout.

ctrl+c to kill a bash script with child processes

I have a script whose internals boil down to:
trap "exit" SIGINT SIGTERM
while :
do
mplayer sound.mp3
sleep 3
done
(yes, it is a bit more meaningful than the above, but that's not relevant to the problem). Several instances of the script may be running at the same time.
Sometimes I want to ^C the script... but that does not succeed. As I understand, when ^C kills mplayer, it continues to sleep, and when ^C kills sleep, it continues to mplayer, and I never happen to catch it in between. As I understand, trap just never works.
How do I terminate the script?

You can get the PID of mplayer and upon trapping send the kill signal to mplayer's PID.
function clean_up {
# Perform program exit housekeeping
KILL $MPLAYER_PID
exit
}
trap clean_up SIGHUP SIGINT SIGTERM
mplayer sound.mp3 &
MPLAYER_PID=$!
wait $MPLAYER_PID

mplayer returns 1 when it is stopped with Ctrl-C so:
mplayer sound.mp3 || break
will do the work.
One issue of this method is that if mplayer exits 1 for another reason (i.e., sound file has a bad format), it will exit anyway, and it's maybe not the desired behaviour.

Signal handling in a shell script

Following is a shell script (myscript.sh) I have:
#!/bin/bash
sleep 500 &
Aprogram arg1 arg2 # Aprogram is a program which runs for an hour.
echo "done"
I launched this in one terminal, and from another terminal I issued 'kill -INT 12345'. 12345 is the pid of myscript.sh.
After a while I can see that both myscript.sh and Aprogram have been dead. However 'sleep 500 &' is still running.
Can anyone explain why is this behavior?
Also, when I issued SIGINT signal to the 'myscript.sh' what exactly is happening? Why is 'Aprogram' getting killed and why not 'sleep' ? How is the signal INT getting transmitted to it's child processes?

You need to use trap to catch signals:
To just ignore SIGINT use:
trap '' 2
if you want to specify some special action for this you can make it that in line:
trap 'some commands here' 2
or better wrap it into a function
function do_for_sigint() {
...
}
trap 'do_for_sigint' 2
and if you wish to allow your script to finish all it's tasks first:
keep_running="yes"
trap 'keep_running="no"' 2
while [ $keep_running=="yes" ]; do
# main body of your script here
done

You start sleep in the background. As such, it is not killed when you kill the script.
If you want to kill sleep too when the script is terminated, you'd need to trap it.
sleep 500 &
sid=($!) # Capture the PID of sleep
trap "kill ${sid[#]}" INT # Define handler for SIGINT
Aprogram arg1 arg2 & # Aprogram is a program which runs for an hour.
sid+=($!)
echo "done"
Now sending SIGINT to your script would cause sleep to terminate as well.

After a while I can see that both myscript.sh and Aprogram have been dead. However 'sleep 500 &' is still running.
As soon as Aprogram is finished myscript.sh prints "Done" and is also finised. sleep 500 gets process with PID 1 as a parent. That is it.
Can anyone explain why is this behavior?
SIGINT is not deliverd to Aprogram when myscript.sh gets it. Use strace to make sure that Aprogram does not receive a signal.
Also, when I issued SIGINT signal to the 'myscript.sh' what exactly is happening?
I first thought that it is the situation like when a user presses Ctrl-C and read this http://www.cons.org/cracauer/sigint.html. But it is not exactly the same situation. In your case shell received SIGINT but the child process didn't. However, shell had at that moment a child process and it did not do anything and kept waiting for a child. This is strace output on my computer after sending SIGINT to a shell script waiting for a child process:
>strace -p 30484
Process 30484 attached - interrupt to quit
wait4(-1, 0x7fffc0cd9abc, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) # 0 (0) ---
rt_sigreturn(0x2) = -1 EINTR (Interrupted system call)
wait4(-1,
Why is 'Aprogram' getting killed and why not 'sleep' ? How is the signal INT getting transmitted to it's child processes?
As far as I can see with strace a child program like your Aprogram is not getting killed. It did not receive SIGINT and finished normally. As soon as it finished your shell script also finished.

How to get right PID of a group of background command and kill it?

Ok, just like in this thread, How to get PID of background process?, I know how to get the PID of background process. However, what I need to do countains more than one operation.
{
sleep 300;
echo "Still running after 5 min, killing process manualy.";
COMMAND COMMAND COMMAND
echo "Shutdown complete"
}&
PID_CHECK_STOP=$!
some stuff...
kill -9 $PID_CHECK_STOP
But it doesn't work. It seems i get either a bad PID or I just can't kill it. I tried to run ps | grep sleep and the pid it gives is always right next to the one i get in PID_CHECK_STOP. Is there a way to make it work? Can i wrap those commands an other way so i can kill them all when i need to?
Thx guys!

kill -9 kills the process before it can do anything else, including signalling its children to exit. Use a gentler signal (kill by itself, which sends a TERM, should be sufficient). You do need to have the process signal its children to exit (if any) explicitly, though, via a trap command.
I'm assuming sleep is a placeholder for the real command. sleep is tricky, however, as it ignores any signals until it returns (i.e., it is non-interruptible). To make your example work, put sleep itself in the background and immediately wait on it. When you kill the "outer" background process, it will interrupt the wait call, which will allow sleep to be killed as well.
{
trap 'kill $(jobs -p)' EXIT
sleep 300 & wait
echo "Still running after 5 min, killing process manualy.";
COMMAND COMMAND COMMAND
echo "Shutdown complete"
}&
PID_CHECK_STOP=$!
some stuff...
kill $PID_CHECK_STOP
UPDATE: COMMAND COMMAND COMMAND includes a command that runs via sudo. To kill that process, kill must also be run via sudo. Keep in mind that doing so will run the external kill program, not the shell built-in (there is little difference between the two; the built-in exists to allow you to kill a process when your process quota has been reached).

You can have another script containing those commands and kill that script. If you are dynamically generating code for the block, just write out a script, execute it and kill when you are done.

The { ... } surrounding the statements starts a new shell, and you get its PID afterwards. sleep and other commands within the block get separate PIDs.
To illustrate, look for your process in ps afux | less - the parent shell process (above the sleep) has the PID you were just given.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string