I have the following process tree
test1.sh
\- test2.sh
\- sleep 600
Normally If I kill the test1.sh process, the child processes test2.sh and sleep 600 will continue running. But If I suspend the sleep 600 process through send signal (SIGSTOP or SIGTSTP), and then kill the test1.sh process, the child test2.sh and sleep 600 will exit. Why?
Here is my test program:
test1.sh
#!/bin/sh
./test2.sh
test2.sh
#!/bin/sh
sleep 600
Test steps:
run test1.sh
$ ./test1.sh
open new console and suspend the child process.
$ kill -19 < sleep pid > or kill -20 < sleep pid >
kill the parent process test1.sh
$ kill < test1.sh pid >
You will find the after step3, the test2.sh and sleep 600 exited.
Bug if I only run step1 and step3, ignore step2, the test2.sh and sleep 600 process will not exit.
Can anyone explain this? Many thanks.
When you are killing process test1.sh, you leave test2.sh orphan so you need to know what happens with orphan processes in your Operating System.
When process test2.sh is running and his parent dies, the OS moves it to the init process and keeps its execution. So the result is both, test2.sh and sleep processes are still up even if you have killed test1.sh.
When process sleep is stopped (signal 20) and his parent dies, the OS tries to move it to the init process. However, since the process is stopped and there will no longer be any tty capable of resuming it (since its parent has died), the OS may decide to do other things with the process. In your case, it dies with SIGKILL to avoid the problem of many stopped, orphaned processes lying around the system. Since the sleep process have exited, the test2.sh process ends too.
From the GNU man page:
While a process is stopped, no more signals can be delivered to it
until it is continued, except SIGKILL signals and (obviously) SIGCONT
signals. The signals are marked as pending, but not delivered until
the process is continued. The SIGKILL signal always causes termination
of the process and can’t be blocked, handled or ignored. You can
ignore SIGCONT, but it always causes the process to be continued
anyway if it is stopped. Sending a SIGCONT signal to a process causes
any pending stop signals for that process to be discarded. Likewise,
any pending SIGCONT signals for a process are discarded when it
receives a stop signal.
When a process in an orphaned process group (see Orphaned Process
Groups) receives a SIGTSTP, SIGTTIN, or SIGTTOU signal and does not
handle it, the process does not stop. Stopping the process would
probably not be very useful, since there is no shell program that will
notice it stop and allow the user to continue it. What happens instead
depends on the operating system you are using. Some systems may do
nothing; others may deliver another signal instead, such as SIGKILL or
SIGHUP. On GNU/Hurd systems, the process dies with SIGKILL; this
avoids the problem of many stopped, orphaned processes lying around
the system.
By the way, if you are willing to kill them always you can add a trap on the main process to capture signals and exit the children properly.
Related
The POSIX spec says
The system() function shall ignore the SIGINT and SIGQUIT signals, and shall block the SIGCHLD signal, while waiting for the command to terminate. If this might cause the application to miss a signal that would have killed it, then the application should examine the return value from system() and take whatever action is appropriate to the application if the command terminated due to receipt of a signal.
This means that a program that starts a long-running sub-process will have SIGINT and SIGQUIT blocked for a long time. Here is a test program compiled on my Ubuntu 18.10 laptop:
$ cat > test_system.c <<< EOF
#include <stdlib.h>
int main() {
system("sleep 86400"); // Sleep for 24 hours
}
EOF
$ gcc test_system.c -o test_system
If I start this test program running in the background...
$ ./test_system &
[1] 7489
..Then I can see that SIGINT(2) and SIGQUIT(3) are marked as ignored in the bitmask.
$ ps -H -o pid,pgrp,cmd,ignored
PID PGRP CMD IGNORED
6956 6956 -bash 0000000000380004
7489 7489 ./test_system 0000000000000006
7491 7489 sh -c sleep 86400 0000000000000000
7492 7489 sleep 86400 0000000000000000
Trying to kill test_system with SIGINT has no effect..
$ kill -SIGINT 7489
.. But sending SIGINT to the process group does kill it (this is expected, it means that every process in the process group receives the signal - sleep will exit and system will return).
$ kill -SIGINT -7489
[1]+ Done ./test_system
Questions
What is the purpose of having SIGINT and SIGQUIT ignored since the process can still be killed via the process group (that's what happens when you do a ^C in the terminal).
Bonus question: Why does POSIX demand that SIGCHLD should be blocked?
Update If SIGINT and SIGQUIT are ignored to ensure we don't leave children behind, then why is there no handling for SIGTERM - it's the default signal sent by kill!
SIGINT and SIGQUIT are terminal generated signals. By default, they're sent to the foreground process group when you press Ctrl+C or Ctrl+\ respectively.
I believe the idea for ignoring them while running a child via system is that the terminal should be as if it was temporarily owned by the child and Ctrl+C or Ctrl+\ should temporarily only affect the child and its descendants, not the parent.
SIGCHLD is blocked so that system's the SIGCHLD caused by the child terminating won't trigger a SIGCHLD handler if you have one, because such a SIGCHLD handler might reap the child started by system before system reaps it.
I have a script whose internals boil down to:
trap "exit" SIGINT SIGTERM
while :
do
mplayer sound.mp3
sleep 3
done
(yes, it is a bit more meaningful than the above, but that's not relevant to the problem). Several instances of the script may be running at the same time.
Sometimes I want to ^C the script... but that does not succeed. As I understand, when ^C kills mplayer, it continues to sleep, and when ^C kills sleep, it continues to mplayer, and I never happen to catch it in between. As I understand, trap just never works.
How do I terminate the script?
You can get the PID of mplayer and upon trapping send the kill signal to mplayer's PID.
function clean_up {
# Perform program exit housekeeping
KILL $MPLAYER_PID
exit
}
trap clean_up SIGHUP SIGINT SIGTERM
mplayer sound.mp3 &
MPLAYER_PID=$!
wait $MPLAYER_PID
mplayer returns 1 when it is stopped with Ctrl-C so:
mplayer sound.mp3 || break
will do the work.
One issue of this method is that if mplayer exits 1 for another reason (i.e., sound file has a bad format), it will exit anyway, and it's maybe not the desired behaviour.
Following is a shell script (myscript.sh) I have:
#!/bin/bash
sleep 500 &
Aprogram arg1 arg2 # Aprogram is a program which runs for an hour.
echo "done"
I launched this in one terminal, and from another terminal I issued 'kill -INT 12345'. 12345 is the pid of myscript.sh.
After a while I can see that both myscript.sh and Aprogram have been dead. However 'sleep 500 &' is still running.
Can anyone explain why is this behavior?
Also, when I issued SIGINT signal to the 'myscript.sh' what exactly is happening? Why is 'Aprogram' getting killed and why not 'sleep' ? How is the signal INT getting transmitted to it's child processes?
You need to use trap to catch signals:
To just ignore SIGINT use:
trap '' 2
if you want to specify some special action for this you can make it that in line:
trap 'some commands here' 2
or better wrap it into a function
function do_for_sigint() {
...
}
trap 'do_for_sigint' 2
and if you wish to allow your script to finish all it's tasks first:
keep_running="yes"
trap 'keep_running="no"' 2
while [ $keep_running=="yes" ]; do
# main body of your script here
done
You start sleep in the background. As such, it is not killed when you kill the script.
If you want to kill sleep too when the script is terminated, you'd need to trap it.
sleep 500 &
sid=($!) # Capture the PID of sleep
trap "kill ${sid[#]}" INT # Define handler for SIGINT
Aprogram arg1 arg2 & # Aprogram is a program which runs for an hour.
sid+=($!)
echo "done"
Now sending SIGINT to your script would cause sleep to terminate as well.
After a while I can see that both myscript.sh and Aprogram have been dead. However 'sleep 500 &' is still running.
As soon as Aprogram is finished myscript.sh prints "Done" and is also finised. sleep 500 gets process with PID 1 as a parent. That is it.
Can anyone explain why is this behavior?
SIGINT is not deliverd to Aprogram when myscript.sh gets it. Use strace to make sure that Aprogram does not receive a signal.
Also, when I issued SIGINT signal to the 'myscript.sh' what exactly is happening?
I first thought that it is the situation like when a user presses Ctrl-C and read this http://www.cons.org/cracauer/sigint.html. But it is not exactly the same situation. In your case shell received SIGINT but the child process didn't. However, shell had at that moment a child process and it did not do anything and kept waiting for a child. This is strace output on my computer after sending SIGINT to a shell script waiting for a child process:
>strace -p 30484
Process 30484 attached - interrupt to quit
wait4(-1, 0x7fffc0cd9abc, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) # 0 (0) ---
rt_sigreturn(0x2) = -1 EINTR (Interrupted system call)
wait4(-1,
Why is 'Aprogram' getting killed and why not 'sleep' ? How is the signal INT getting transmitted to it's child processes?
As far as I can see with strace a child program like your Aprogram is not getting killed. It did not receive SIGINT and finished normally. As soon as it finished your shell script also finished.
I am writing one shell script in which I have parent process and it has child processes which are created by sleep & command. Now I wish to kill the parent process so that the child process will be also killed. I was able to do that this with below command:
trap "kill $$" SIGINT
trap 'kill -HUP 0' EXIT
trap 'kill $(jobs -p)' EXIT
These commands are working with kill [parent_process_ID] commands but if I use kill -9 [parent_process_ID] then only the parent process will be killed.
Please guide me further to achieve this functionality so that when I kill parent process with any command then child process should be also killed.
When you kill a process alone, it will not kill the children.
You have to send the signal to the process group if you want all processes for a given group to receive the signal.
kill -9 -parentpid
Otherwise, orphans will be linked to init.
Child can ask kernel to deliver SIGHUP (or other signal) when parent dies by specifying option PR_SET_PDEATHSIG in prctl() syscall like this:
prctl(PR_SET_PDEATHSIG, SIGHUP);
See man 2 prctl for details.
Sending the -9 signal (SIGKILL) to a program gives no chance for it to execute its own signal handlers (e.g., your trap statements). That is why the children don't get killed automatically. (In general, -9 gives no chance for the app to clean up after itself.) You have to use a weaker signal to kill it (such as SIGTERM.)
See man 7 signal for details.
I try to kill a process with the kill command in linux. (not using -9 as argument)
I need to make sure that the process is really killed.
As far as I know, the kill command runs asynchronously and it can take some time till it is finished.
I need to make sure, after I run the kill that my process has died using bash
Can you please assist?
Thanks!!!
Killing a process with signal 0 will check if the process is still running, and not actually kill it. Just check the return code.
Assuming $PID holds the pid of your process, you could do something like this:
kill "$PID"
while [ $(kill -0 "$PID") ]; do
sleep 1
done
echo "Process is killed"
kill is used to send signals to processes. It doesn't necessarily terminate the process (but usually do). kill without explicitly mentioned signal will send SIGTERM to the process. The default action on SIGTERM is to terminate process but process can setup a different signal handler and process might not be terminated.
What, I think you need, is a way to find if the process has handled the signal or not. This can be done using ps s $PID. If this shows 0s as pending mask, the process has received the signal and processed it.