kill -s SIGTERM kills parent process and one level child process only - linux

I've been doing some experimenting with writing a command to kill parent and all it's children recursively. I've a script as below
parent.sh:
#!/bin/bash
/home/oracle/child.sh &
sleep infinity
child.sh:
#!/bin/bash
sleep infinity
Started command using
su oracle -c parent.sh &
I see a process tree like below
[root#source ~]# ps -ef | grep "/home/oracle"
root 14129 1171 0 12:39 pts/1 00:00:00 su oracle -c /home/oracle/parent.sh
oracle 14130 14129 0 12:39 ? 00:00:00 /bin/bash /home/oracle/parent.sh
oracle 14131 14130 0 12:39 ? 00:00:00 /bin/bash /home/oracle/child.sh
When I send sigterm to 14129 using kill -s SIGTERM 14129 it appears to kill 14129 and then 14130 goes down as well immediately; but 14131 stays up for a very long time. The last level child appears to have been reparented and has become a zombie.
oracle 14131 1 0 12:39 ? 00:00:00 /bin/bash /home/oracle/child.sh
If kill doesn't terminate any child processes why did 14130 get killed when I sent a SIGTERM to 14129?
If kill can kill child processes, why would does it go only one level down? Is the behavior here guaranteed?

The relevant part of what pilcrow provided is this:
SIGNALS top
Upon receiving either SIGINT, SIGQUIT or SIGTERM, su terminates
its child and afterwards terminates itself with the received
signal.
>> The child is terminated by SIGTERM,
>> (then) after unsuccessful attempt (to kill with SIGTERM) and
>> (after) 2 seconds of delay (,) the child is (then) killed
>> by SIGKILL [a second, harsher method].
That harsher method, SIGKILL, prevents that child process from attempting to kill its own children, hence the zombie state.
I haven't used it myself, but it seems that something like
killall --process-group parent.sh
would kill all processes tied to the process group associated with the "parent.sh" script. BUT ... not sure if "--wait" will serve you well, if the method used in the attempt to terminate is not being accepted.

Related

How to launch a process outside a systemd control group

I have a server process (launched from systemd) which can launch an update process. The update process self-daemonizes itself and then (in theory) kills the server with SIGTERM. My problem is that the SIGTERM propagates to the update process and it's children.
For debugging purposes, the update process just sleeps, and I send the kill by hand.
Sample PS output before the kill:
1 1869 1869 1869 ? -1 Ss 0 0:00 /usr/local/bin/state_controller --start
1869 1873 1869 1869 ? -1 Sl 0 0:00 \_ ProcessWebController --start
1869 1886 1869 1869 ? -1 Z 0 0:00 \_ [UpdateSystem] <defunct>
1 1900 1900 1900 ? -1 Ss 0 0:00 /bin/bash /usr/local/bin/UpdateSystem refork /var/ttm/update.bin
1900 1905 1900 1900 ? -1 S 0 0:00 \_ sleep 10000
Note that UpdateSystem is in a separate PGID and TPGID. (The <defunct> process is a result of the daemonization, and is not (I think) a problem.)
UpdateSystem is a bash script (although I can easily make it a C program if that will help). After the daemonization code taken from https://stackoverflow.com/a/29107686/771073, the interesting bit is:
#############################################
trap "echo Ignoring SIGTERM" SIGTERM
sleep 10000
echo Awoken from sleep - presumably by the SIGTERM
exit 0
When I kill 1869 (which sends SIGTERM to the state_controller server process, my logfile contains:
Terminating
Ignoring SIGTERM
Awoken from sleep - presumably by the SIGTERM
I really want to prevent SIGTERM being sent to the sleep process.
(Actually, I really want to stop it being sent to apt-get upgrade which is stopping the system via the moral equivalent of systemctl stop ttm.service and the ExecStop is specified as /bin/kill $MAINPID - just in case that changes anyone's answer.)
This question is similar, but the accepted answer (use KillMode=process) doesn't work well for me - I want to kill some of the child processes, just not the update process:
Can't detach child process when main process is started from systemd
A completely different approach is for the upgrade process to remove itself from the service group by updating the /sys/fs/cgroup/systemd filesystem. Specifically in bash:
echo $$ > /sys/fs/cgroup/systemd/tasks
A process belongs to exactly one control group. Writing its PID to the root tasks file adds it to the other control group, and removes it from the service control group.
We were having exactly the same problem. What we ended up doing is launching the update process as transient cgroup with systemd-run:
systemd-run --unit=my_system_upgrade --scope --slice=my_system_upgrade_slice -E setsid nohup start-the-upgrade &> /tmp/some-logs.log &
That way, the update process will run in a different cgroup and will not be terminated. Additionally, we use setsid + nohup to make sure the process has its own group and session and that the parent process is the init process.
The approach we have decided to take is to launch the update process in a separate (single-shot) service. As such, it automatically belongs to a separate control group, so killing the main service doesn't kill it.
There is a wrinkle to this though. The package installs ttm.service and ttm.template.update.service. To run the updater, we copy ttm.template.update.service to ttm.update.service, run systemctl daemon-reload, and then run systemctl start ttm.update.service. Why the copy? Because when the updater installs a new version of ttm.template.update.service, it will forcibly terminate any processes running as that service. KillMode=None appears to offer a way round that, but although it appears to work, a subsequent call to apt-get yields a nasty error about dpkg having been interrupted.
Are you sure it is not systemd sending the TERM signal to the child process?
Depending on the service type, if your main process dies, systemd will do a cleanup and terminate all the child processes under the same cgroup.
This is defined by KillMode= property which is by default set to control-group. You could set it to "none" or "process". https://www.freedesktop.org/software/systemd/man/systemd.kill.html
I have same sitation with you.
Upgrade process is a child process of parent process. The parent process is call by a service.
The main point is not Cgroup, is MAINPID.
If you use PIDFILE to sepecify the MAINPID, when the service type = forking, then the situation solved.
[Service]
Type=forking
PIDFile=/run/test.pid

How to kill all child processes spawned by a process started from a script on kill or kill -9

I have a shell script called Launcher.sh that gets executed by a java process. The java process internally uses ProcessBuilder to execute the bash script.
Inside the Launcher.sh, I have the following code
#!/bin/bash
trap "kill -- -$$ && kill -INT -$PID" SIGINT SIGTERM SIGKILL
bash Process_A.sh &
pid=$!
echo $pid
The Process_A script will spawn another child process called Process_B.
I want to kill both Process_A and Process_B when the Launcher.sh script receives a "kill" command or "kill -9" command from its parent the java process.
So I added a trap command to trap SIGINT, SIGTERM and SIGKILL interrupts.
But when I do
kill $pid
it only kills Process_A but not the child Process_B. Both have the same PGID.
How can I kill all the child and grandchild processes spawned from my launcher.sh script correctly?
Here is an actual output of "ps j" before and after kill.
Inside my script I do "dse spark" which inturn spawns a java process. I want this java process to be killed when the launcer script gets a kill signal
root#WeveJobs01:~# ps j
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
2380 2381 2381 2281 pts/1 59265 S 0 0:00 /bin/bash
1 58917 58916 1152 pts/0 1236 S 0 0:00 bash /usr/bin/dse spark
58917 59041 58916 1152 pts/0 1236 Sl 0 0:07 /usr/lib/jvm/java-8-oracle/jre//bin/java -cp /etc/dse/spark/:/usr/share/dse/dse-
2381 59265 59265 2281 pts/1 59265 R+ 0 0:00 ps j
root#WeveJobs01:~# kill 58917
root#WeveJobs01:~# ps j
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1152 1235 1235 1152 pts/0 1236 S 0 0:00 sudo -s
1235 1236 1236 1152 pts/0 1236 S+ 0 0:00 /bin/bash
1 59041 58916 1152 pts/0 1236 Sl 0 0:23 /usr/bin/java -cp /etc/dse/spark/:/usr/share/dse/dse-
2381 59513 59513 2281 pts/1 59513 R+ 0 0:00 ps j
I tried this..and when I do "kill pid" where pid is that of the script. I get segmentation fault as it goes to infinite loop
trap 'echo "Kill All"; kill -TERM -$$' TERM INT
bash child.sh &
PID=$!
wait $PID
trap - TERM INT
wait $PID
EXIT_STATUS=$?
I need to reset kill -term in the trap statement to prevent infinite loop. This worked
trap "trap -INT && kill -- -$$"
There are a few different ways to kill a looping Bash script, depending on how the script is set up and how you want to stop it. Here are a few options:
Use the kill command to send a signal to the script process. For example, you can use kill -9 to send a SIGKILL signal to the script, which will immediately terminate it. Replace with the process ID of the script. You can find the process ID by running ps aux | grep .
If the script is running in a separate terminal window, you can simply close the window to terminate the script.
If the script is running in the background, you can use the fg command to bring it back to the foreground, then press CTRL+C to terminate the script.
If the script is running in a loop that can be interrupted with a break statement, you can include a signal handler in the script that catches a signal and breaks out of the loop. For example:
trap 'break' INT
while true; do
# script code goes here
done
This will cause the script to break out of the loop and terminate when it receives an INT signal (e.g., when you press CTRL+C).

In unix I used kill command by providing a ppid then it close the terminal . why? kill -9 ppid

sleep 5000
In one terminal and in second terminal I'm running:
ps -ef | grep sleep
Then I'm killing this process in second terminal by using the ppid. Then it will close the first terminal where I run the sleep command. It will not create sleep command as an orphan.
$ ps -ef | grep sleep
trainee 4887 4864 0 17:05 pts/0 00:00:00 sleep 5000
trainee 4889 4264 0 17:05 pts/1 00:00:00 grep --color=auto sleep
kill -9 4864
Why?
Presumably the parent of the sleep is your shell. When you kill that your login is terminated and your terminal closes.
The Wikipedia article on Orphan process reads (in part),
An orphan process is a computer process whose parent process has finished or terminated, though it remains running itself.
and
A process can be orphaned unintentionally, such as when the parent process terminates or crashes. The process group mechanism in most Unix-like operation systems can be used to help protect against accidental orphaning, where in coordination with the user's shell will try to terminate all the child processes with the SIGHUP process signal, rather than letting them continue to run as orphans.

Why SIGINT can stop bash in terminal but not via kill -INT?

I noticed that when I am running a hanging process via bash script like this
foo.sh:
sleep 999
If I run it via command, and press Ctrl+C
./foo.sh
^C
The sleep will be interrupted. However, when I try to kill it with SIGINT
ps aux | grep foo
kill -INT 12345 # the /bin/bash ./foo.sh process
Then it looks like bash and sleep ignores the SIGINT and keep running. This surprises me. I thought Ctrl + C is actually sending SIGINT to the foreground process, so why is that behaviors differently for Ctrl + C in terminal and kill -INT?
CtrlC actually sends SIGINT to the foreground process group (which consists of a bash process, and a sleep process). To do the same with a kill command, send the signal to the process group, e.g:
kill -INT -12345
Your script is executing "sleep 999" and when you hit CTRL-C the shell that is running the sleep command will send the SIGINT to its foreground process, sleep. However, when you tried to kill the shell script from another window with kill, you didn't target "sleep" process, you targetted the parent shell process, which is catching SIGINT. Instead, find the process ID for the "sleep 999" and kill -2 it, it should exit.
In short, you are killing 2 different processes in your test cases, and comparing apples to oranges.
root 27979 27977 0 03:33 pts/0 00:00:00 -bash <-- CTRL-C is interpreted by shell
root 28001 27999 0 03:33 pts/1 00:00:00 -bash
root 28078 27979 0 03:49 pts/0 00:00:00 /bin/bash ./foo.sh
root 28079 28078 0 03:49 pts/0 00:00:00 sleep 100 <-- this is what you should try killing

Who does the daemonizing?

There are various tricks to daemonize a linux process, i.e. to make a command running after the terminal is closed.
nohup is used for this purpose, and fork()/setsid() combination can be used in a C program to make itself a daemon process.
The above was my knowledge about linux daemon, but today I noticed that exiting the terminal doesn't really terminate processes started with & at the end of the command.
$ while :; do echo "hi" >> temp.log ; done &
[1] 11108
$ ps -ef | grep 11108
username 11108 11076 83 15:25 pts/0 00:00:05 /bin/sh
username 11116 11076 0 15:25 pts/0 00:00:00 grep 11108
$ exit
(after reconnecting)
$ ps -ef | grep 11108
username 11108 1 91 15:25 pts/0 00:00:17 /bin/sh
username 11130 11540 0 15:25 pts/0 00:00:00 grep 11108
So apparently, the process's PPID changed to 1, meaning that it got daemonized somehow.
This contradicts my knowledge, that & is not enough and one must use nohup or some other tricks to a process 'daemon'.
Does anyone know who is doing this daemonizing?
I'm using a CentOS 6.3 host and putty/cygwin/sshclient produced the same result.
You can daemonize a process if that doesn't respond to SIGHUP signal.
When bash shell is terminated while it is running background tasks, bash shell sends SIGHUP
(hangup signal) to all tasks. However bash won't wait until child processes are completely
terminated. If child process doesn't respond to SIGHUP signal, that process becomes an orphan
process. (its parent pid is changed to 1 - init process - to prevent becoming a useless zombie process)
Subshell executions basically do not responds to SIGHUP signals, thus your command will still be running after logging out from the first shell.

Resources