Someone told me that when you killed a parent process in linux, the child would die.
But I doubt it. So I wrote two bash scripts, where father.shwould invoke child.sh
Here is my script:
Now I run bash father.sh, you could check it ps -alf
Then I killed the father.sh by kill -9 24588, and I guessed the child process should be terminated but unfortunately I was wrong.
Could anyone explain why?
thx
No, when you kill a process alone, it will not kill the children.
You have to send the signal to the process group if you want all processes for a given group to receive the signal
For example, if your parent process id has the code 1234, you will have to specify the parentpid adding the symbol minus followed by your parent process id:
kill -9 -1234
Otherwise, orphans will be linked to init, as shown by your third screenshot (PPID of the child has become 1).
-bash: kill: (-123) - No such process
In an interactive Terminal.app session the foreground process group id number and background process group id number are different by design when job control/monitor mode is enabled. In other words, if you background a command in a job-control enabled Terminal.app session, the $! pid of the backgrounded process is in fact a new process group id number (pgid).
In a script having no job control enabled, however, this may not be the case! The pid of the backgrounded process may not be a new pgid but a normal pid! And this is, what causes the error message -bash: kill: (-123) - No such process, trying to kill a process group but only specifying a normal pid (instead of a pgid) to the kill command.
# the following code works in Terminal.app because $! == $pgid
{
sleep 100 &
IFS=" " read -r pgid <<EOF
$(ps -p $! -o pgid=)
EOF
echo $$ $! $pgid
sleep 10
kill -HUP -- -$!
#kill -HUP -- -${pgid} # use in script
}
pkill -TERM -P <ProcessID>
This will kill both Parent as well as child
Generally killing the parent also kills the child.
The reason that you are seeing the child still alive after killing the father is because the child only will die after it "chooses" (the kernel chooses) to handle the SIGKILL event. It doesn't have to handle it right away. Your script is running a sleep() command (i.e. in the kernel), which will not wake up to handle any events whatsoever until the sleep is completed.
Why is PPID #1? The parent has died and is no longer in the process table. child.sh isn't linked inexplicably to init now. It simply has no running parent. Saying it is linked to init creates the impression that if we somehow leave init, that init has control over shutting down the process. It also creates the impression that killing a parent will make the grandparent the owner of a child. Both are not true. That child process still exists in the process table and is running, but no new events based upon it's process ID will be handled until it handles SIGKILL. Which means that the child is a pre-zombie, walking dead, in danger of being labeled .
Killing in the process group is different, and is used to kill the siblings, and the parent by the process group #. It's probably also important to note that "killing a process" is not "killing" per se, in the human way, where you expect the process to be destroyed and all memory returned as though it never was. It just sends a particular event, among many, to the process for it to handle. If the process does not handle it properly, then after a while the OS will come along and "clean it up" forcibly.
It (killing) doesn't happen right away because the child (or even the parent) could have written something to disk and be waiting for I/O to complete or doing some other critical task that could compromise system stability or file integrity.
Related
Can anyone knows internals/ difference between executing these two commands in unix? I have been told soft kill will wait for all threads to terminate started by this process. My process is a tomcat server.
Kill -9 pid
Kill pid
Invoking kill command sends a signal to the process or process group.
When we invoke kill -9 PID. The signal sent to process / process group is to exit with no blocking.
When no signal is passed to kill viz. kill PID, the default signal is passed to kill.
The default signal for kill is TERM, and in such cases the command is interpreted as kill -15 PID.
More detailed information on kill is surely available in Linux man pages.
Another good description available in this document which says;
The command kill sends the specified signal to the specified process
or process group. If no signal is specified, the TERM signal is sent.
The TERM signal will kill processes which do not catch this signal.
For other processes, it may be necessary to use the KILL (9) signal,
since this signal cannot be caught.
Which means that though usual kill PID can do your job for all those processes where TERM signal is uncaught. Using -9 becomes imperative where TERM signal is tolerated.
I am working on a project in which I need to loosely recreate supervisord(job control system) in D. I am using spawnShell() as opposed to spawnProcess() for ease of configuring arguments etc. This has the effect of running sh -c "command". However, it returns the PID of sh NOT of the child process(for obvious reasons). This becomes a problem because my program needs to be able to send a SIGKILL to the process if it doesn't respond to a SIGTERM after a certain period of time. I am able to send a SIGTERM no problem(presumably because sh catches the SIGTERM and passes it to it's child process/processes before exiting). However, for again obvious reasons, SIGKILL stops sh before it gets a chance to send a signal to the child process and it's left orphaned. Which brings me to my questions:
A: Can I safely assume that the PID of the spawned process will always be one higher than the PID of sh? It has behaved as such in all my testing so far.
B: If not, then is there a more elegant way(system call or such) to get a child process's PID knowing only the parent process's PID than having my program just execute pgrep -P <sh PID>?
You just need:
sh -c 'exec command'
the shell replaces itself with your command and gets out of the way, so there is no intermediate process.
No, you cannot assume pids will differ by one.
Can I safely assume that the PID of the spawned process will always be one higher than the PID of sh? It has behaved as such in all my testing so far.
No. Linux is a multitasking OS. While rare, other processes could start in between. Don't rely on a race condition.
If not, then is there a more elegant way (system call or such) to get a child process's PID knowing only the parent process's PID than having my program just execute pgrep -P <sh PID>?
Not really. Trying to navigate the process tree is a sign that your approach is wrong.
You're solving the wrong problem. Get rid of the shell middle man.
I'm working with parallel processing and rather than dealing with cvars and locks I've found it's much easier to run a few commands in a shell script in sequence to avoid race conditions in one place. The new problem is that one of these commands calls another program, which the OS has decided to put into a new process. I need to kill this process from the parent program, but the parent program only knows the pid of the parent (shell script), so this process keeps executing on its own.
Is there a way in bash to set a subprocess to die when the parent dies? I've tried to figure out how to execute it as a daemon because I read daemons exit when the parent dies, but it's tricky and I can't quite get it right. Thanks!
Found the problem, and this fixed it (except for some pesky messages that somehow cannot be redirected to /dev/null).
trap "trap - SIGTERM && kill -- -$$" SIGINT SIGTERM EXIT
Is it possible to stop a PID from being reused?
For example if I run a job myjob in the background with myjob &, and get the PID using PID=$!, is it possible to prevent the linux system from re-using that PID until I have checked that the PID no longer exists (the process has finished)?
In other words I want to do something like:
myjob &
PID=$!
do_not_use_this_pid $PID
wait $PID
allow_use_of_this_pid $PID
The reasons for wanting to do this do not make much sense in the example given above, but consider launching multiple background jobs in series and then waiting for them all to finish.
Some programmer dude rightly points out that no 2 processes may share the same PID. That is correct, but not what I am asking here. I am asking for a method of preventing a PID from being re-used after a process has been launched with a particular PID. And then also a method of re-enabling its use later after I have finished using it to check whether my original process finished.
Since it has been asked for, here is a use case:
launch multiple background jobs
get PID's of background jobs
prevent PID's from being re-used by another process after background job terminates
check for PID's of "background jobs" - ie, to ensure background jobs finish
[note if disabled PID re-use for the PID's of the background jobs those PIDs could not be used by a new process which was launched after a background process terminated]*
re-enable PID of background jobs
repeat
*Further explanation:
Assume 10 jobs launched
Job 5 exits
New process started by another user, for example, they login to a tty
New process has same PID as Job 5!
Now our script checks for Job 5 termination, but sees PID in use by tty!
You can't "block" a PID from being reused by the kernel. However, I am inclined to think this isn't really a problem for you.
but consider launching multiple background jobs in series and then waiting for them all to finish.
A simple wait (without arguments) would wait for all the child processes to complete. So, you don't need to worry about the
PIDs being reused.
When you launch several background process, it's indeed possible that PIDs may be reused by other processes.
But it's not a problem because you can't wait on a process unless it's your child process.
Otherwise, checking whether one of the background jobs you started is completed by any means other than wait is always going to unreliable.
Unless you've retrieved the return value of the child process it will exist in the kernel. That also means that it's pid is bound to it and can't being re-used during that time.
Further suggestion to work around this - if you suspect that a PID assigned to one of your background jobs is reassigned, check it in ps to see if it still is your process with your executable and has PPID (parent PID) 1.
If you are afraid of reusing PID's, which won't happen if you wait as other answers explain, you can use
echo 4194303 > /proc/sys/kernel/pid_max
to decrease your fear ;-)
Restarting a service is often implemented via a PID file - I.e. the process ID is written to some file and based on that number the stop command will kill the process (or before a restart).
When you think about it (or if you don't like this, then search) you'll find that this is problematic as every PID could be reused. Imagine a complete server restart where you call './your-script.sh start' at startup (e.g. #reboot in crontab). Now your-script.sh will kill an arbitrary PID because it has stored the PID from the live before the restart.
One workaround I can imagine is to store an additional information, so that you could do 'ps -pid | grep ' and only if this returns something you kill it. Or are there better options in terms of reliability and/or simplicity?
#!/bin/bash
function start() {
nohub java -jar somejar.jar >> file.log 2>&1 &
PID=$!
# one could even store the "ps -$PID" information but this makes the
# killing too specific e.g. if some arguments will be added or similar
echo "$PID somejar.jar" > $PID_FILE
}
function stop() {
if [[ -f "$PID_FILE" ]]; then
PID=$(cut -f1 -d' ' $PID_FILE)
# now get the second information and grep the process list with this
PID_INFO=$(cut -f2 -d' ' $PID_FILE)
RES=$(ps -$PID | grep $PID_INFO)
if [[ "x$RES" != "x" ]]; then
kill $PID
fi
fi
}
The problem with PID files is multifold, not just limited to recycling and reboot.
The bigger issue is the fact that there is an unavoidable disconnect/race between the information in the PID file and the state of the process.
This is the flow of using PID files:
You fork & exec a process. The "parent" process knows the PID of the fork and has guarantees that this PID is reserved exclusively for his fork.
Your parent writes the PID of the fork to a file.
Your parent dies, along with it the guarantee about PID exclusivity.
A different process reads the number in the PID file.
The different process checks whether there is a process on the system with the same PID as the one he read.
The different process sends a signal to the process with the PID he read.
In (1) everything is fine and dandy. We have a PID and we are guaranteed by the kernel that the number is reserved for our intended process.
In (2) you are yielding control of the PID to other processes that do not have this guarantee. In itself not an issue, but such an act is rarely if ever without fault.
In (3) your parent process dies. It alone had the kernel guarantee on PID exclusivity. It may or may not have done a wait(2) on the PID. The true status of the intended process is lost, all we have left is an identifier in the PID file which may or may not refer to the intended process.
In (4) a process without any guarantees reads the PID file, any use of this number has only arbitrary success.
In (5) a process without any guarantees actually uses the identifier for something, this is the first point where we're actually doing something bad: we're querying the kernel using a process identifier that may or may not refer to the intended process. The answer we'll get back will be on the state of the process with that PID, not necessarily of our intended process at all.
In (6) we make the worst mistake: we're actually performing a mutating action, intended to impact our initially started process but by no means guaranteeing that intent. We could be signalling any random system process instead.
Why is this? What kind of stuff can happen to mess with the PID?
Anywhere after (1), the real process may die. So long as the parent retains his guarantee on the PID's exclusivity, the kernel will not recycle the PID. It will still exist and refer to what used to be your process (we call this a "zombie" process, your real process died but the PID is still reserved for it alone). No other process can use this PID and signalling it will not reach any process at all.
As soon as the parent releases his guarantee or after (3), the kernel recycles the PID of the dead process. The zombie is gone and the PID now free to be used by any other new process that is forked. Say you're compiling something, thousands of small processes get spawned. The kernel picks random or sequential (depending on its configuration) new PIDs for each. You're done, now you restart apache. The kernel reuses the freed PID of your dead process for something important.
The PID file still contains the PID, though. Any process that reads the PID file (4) is assuming that this number refers to your long dead process.
Any action (5) (6) you take with the number you read will target the new process, not the old one.
Not only that, but you cannot perform any check prior to your action since there is an unavoidable race between any check you can perform and any action you can perform. If you first look at ps to see what the "name" of your process is (not that this is a really awesome guarantee of anything, please don't do this), and then signal it, the time between your ps check and your signal could still have seen the process die, and/or get recycled by a new process. The root of all of these problems is that the kernel is not giving you any exclusive use guarantees on the PID, since you are not its parent.
Moral of the story: Do NOT give the PID of your children to anyone else. The parent and only the parent should use it, because he is the only one on the system (save the kernel) with any guarantees on its existence and identity.
This usually means keeping the parent alive and instead of signalling something to terminate the process, talking to the parent instead; by means of sockets or the like. See http://smarden.org/runit/ et al.
As an alternative to runit there is the daemon command from the libslack library that can automatically respawn the client program when it terminates - without using a PID file.
Using a named daemon with the daemon command allows you to manually restart the client program; this, however, will create a PID file which may lead to race conditions as already pointed out by lhunath.
# daemon example without PID file
daemon --respawn --acceptable=10 --delay=10 bash -- -c 'sleep 30'
# from: man daemon
# "If started with the --respawn option, the client process
# will be restarted after it is killed by the SIGTERM signal."
#
# (Problem would be to reliably get e.g. the bash pid in the daemon example above.)