How does linux kill D status process during reboot? - linux

I know ths D status processes is uninterruptable sleep processes.
Many people say to kill D status processes is to reboot the system.
But how does reboot operation can kill the D status processes?
I find "init 0" will "kill -9 " all of the processes at last. But "kill -9 " can not kill D status process.
Someone tell me how?

It does not kill them at all. Those processes in D state will not respond to any signal. kill generates signals -- they cannot be delivered to these processes. So, no kill.
The loss of process context when the kernel stops running allows nothing to persist, processes are kernel objects. The state D processes become history at that point.
If you see this often it usually means some kind of hardware problem, like a cdrom/DVD device. The D state means the process is blocking on some uninterruptable operation on a device.
This is a good question!

I use the yum update command to kill the D state process.

Related

Soft kill vs hard kill in unix

Can anyone knows internals/ difference between executing these two commands in unix? I have been told soft kill will wait for all threads to terminate started by this process. My process is a tomcat server.
Kill -9 pid
Kill pid
Invoking kill command sends a signal to the process or process group.
When we invoke kill -9 PID. The signal sent to process / process group is to exit with no blocking.
When no signal is passed to kill viz. kill PID, the default signal is passed to kill.
The default signal for kill is TERM, and in such cases the command is interpreted as kill -15 PID.
More detailed information on kill is surely available in Linux man pages.
Another good description available in this document which says;
The command kill sends the specified signal to the specified process
or process group. If no signal is specified, the TERM signal is sent.
The TERM signal will kill processes which do not catch this signal.
For other processes, it may be necessary to use the KILL (9) signal,
since this signal cannot be caught.
Which means that though usual kill PID can do your job for all those processes where TERM signal is uncaught. Using -9 becomes imperative where TERM signal is tolerated.

Can not kill process by kill -9?

I try to kill process by using command "kill -9 pid", but can not succeed. Anybody know how could I kill such process and why I can't kill it ?
The process could be zombie? Its good to check process state using ps command as well if you have permission.
If your process is in an uninterruptable sleep (D) due to hanging in some hardware access, you indeed cannot terminate that process.
Here is another explanation.
Personally, I saw such D states for example when accessing files on a SD card or USB stick when there was a hardware problem. But there are many other scenarios where such a state might occur.

What special precautions must I make for docker apps running as pid 1?

From what I gather, programs that run as pid 1 may need to take special precautions such as capturing certain signals.
It's not altogether clear how to correctly write a pid 1. I'd rather not use runit or supervisor in my case. For example, supervisor is written in python and if you install that, it'll result in a much larger container. I'm not a fan of runit.
Looking at the source code for runit is intersting but as usual, comments are virtually non-existent and don't explain what's being done for what reason.
There is a good discussion here:
When the process with pid 1 die for any reason, all other processes
are killed with KILL signal
When any process having children dies for any reason, its children are reparented to process with PID 1
Many signals which have default action of Term do not have one for PID 1.
The relevant part for your question:
you can’t stop process by sending SIGTERM or SIGINT, if process have not installed a signal handler

Using appropriate POSIX signals

I am currently working on a project which has a daemon process that looks at a queue of tasks, runs those tasks, and then collects information about those tasks. In some cases, the daemon must "kill" a task if it has taken too long to run.
The explanation for SIGTERM is "termination signal" but that's not very informative. I would like to use the most appropriate signal for this.
What is the most appropriate POSIX signal number to use for telling a process "you took too much time to run so you need to stop now"?
If you're in control of the child processes, you can pretty much do as you please, but SIGTERM is the self-documenting signal for this. It asks a process to terminate, politely: the process chooses how to handle the signal and may perform cleanup actions before actually exiting (or may ignore the signal).
The standard way to kill a process, then, is to first send a SIGTERM; then wait for it to terminate with a grace period of, say, five seconds (longer if termination can take a long time, e.g. because of massive disk I/O). If the grace period has expired, send a SIGKILL. That's the "hard" version of SIGTERM and cannot be ignored, but also leaves the process no chance of neatly cleaning up after itself. Having to send a SIGKILL should be considered an issue with the child process and reported as such.
Usually you'll first send SIGTERM to a process. When the process recives this signal it is able to clean up some things an then terminate itself:
kill -15 PID_OF_PROCESS # 15 means SIGTERM
You can check if the process is still running by sending the 0 signal to it's pid.
kill -0 PID_OF_PROCESS # 0 means 0 :)
if [ "$?" == "0" ] ; then
echo "the process is still running"
fi
However, you'll need some grace period to let the process clean up. If the process didn't terminated itself after a grace period, you kill it using SIGKILL this signal can't be handled by the process and the OS will terminate the process immediately.
kill -9 PID_OF_PROCESS # 9 means SIGKILL, means DIE!

Why child process still alive after parent process was killed in Linux?

Someone told me that when you killed a parent process in linux, the child would die.
But I doubt it. So I wrote two bash scripts, where father.shwould invoke child.sh
Here is my script:
Now I run bash father.sh, you could check it ps -alf
Then I killed the father.sh by kill -9 24588, and I guessed the child process should be terminated but unfortunately I was wrong.
Could anyone explain why?
thx
No, when you kill a process alone, it will not kill the children.
You have to send the signal to the process group if you want all processes for a given group to receive the signal
For example, if your parent process id has the code 1234, you will have to specify the parentpid adding the symbol minus followed by your parent process id:
kill -9 -1234
Otherwise, orphans will be linked to init, as shown by your third screenshot (PPID of the child has become 1).
-bash: kill: (-123) - No such process
In an interactive Terminal.app session the foreground process group id number and background process group id number are different by design when job control/monitor mode is enabled. In other words, if you background a command in a job-control enabled Terminal.app session, the $! pid of the backgrounded process is in fact a new process group id number (pgid).
In a script having no job control enabled, however, this may not be the case! The pid of the backgrounded process may not be a new pgid but a normal pid! And this is, what causes the error message -bash: kill: (-123) - No such process, trying to kill a process group but only specifying a normal pid (instead of a pgid) to the kill command.
# the following code works in Terminal.app because $! == $pgid
{
sleep 100 &
IFS=" " read -r pgid <<EOF
$(ps -p $! -o pgid=)
EOF
echo $$ $! $pgid
sleep 10
kill -HUP -- -$!
#kill -HUP -- -${pgid} # use in script
}
pkill -TERM -P <ProcessID>
This will kill both Parent as well as child
Generally killing the parent also kills the child.
The reason that you are seeing the child still alive after killing the father is because the child only will die after it "chooses" (the kernel chooses) to handle the SIGKILL event. It doesn't have to handle it right away. Your script is running a sleep() command (i.e. in the kernel), which will not wake up to handle any events whatsoever until the sleep is completed.
Why is PPID #1? The parent has died and is no longer in the process table. child.sh isn't linked inexplicably to init now. It simply has no running parent. Saying it is linked to init creates the impression that if we somehow leave init, that init has control over shutting down the process. It also creates the impression that killing a parent will make the grandparent the owner of a child. Both are not true. That child process still exists in the process table and is running, but no new events based upon it's process ID will be handled until it handles SIGKILL. Which means that the child is a pre-zombie, walking dead, in danger of being labeled .
Killing in the process group is different, and is used to kill the siblings, and the parent by the process group #. It's probably also important to note that "killing a process" is not "killing" per se, in the human way, where you expect the process to be destroyed and all memory returned as though it never was. It just sends a particular event, among many, to the process for it to handle. If the process does not handle it properly, then after a while the OS will come along and "clean it up" forcibly.
It (killing) doesn't happen right away because the child (or even the parent) could have written something to disk and be waiting for I/O to complete or doing some other critical task that could compromise system stability or file integrity.

Resources