Kill a process and wait for the process to exit - linux

When I start my tcp server from my bash script, I need to kill the previous instance (which may still be listening to the same port) right before the current instance starts listening.
I could use something like pkill <previous_pid>. If I understand it correctly, this just sends SIGTERM to the target pid. When pkill returns, the target process may still be alive. Is there a way to let pkill wait until it exits?

No. What you can do is write a loop with kill -0 $PID. If this call fails ($? -ne 0), the process has terminated (after your normal kill):
while kill -0 $PID; do
sleep 1
done
(kudos to qbolec for the code)
Related:
What does `kill -0 $pid` in a shell script do?

Use wait (bash builtin) to wait for the process to finish:
pkill <previous_pid>
wait <previous_pid>

Related

How to catch SIGINT within a Bash subshell

If I run a command, such as grep, at the command line and hit ^C, the command is properly killed (with SIGINT I think). And if I run the grep in background and then run a kill SIGINT on its PID, it similarly gets terminated. But if I'm inside a script and run grep in background from the script, get its PID and then use 'kill -s SIGINT $PID', grep does not get killed. Why? If I use SIGTERM, instead of SIGINT, the kill does work.
#!/bin/bash
grep -rqa shazam /usr &
PID=$!
kill -s SIGINT $PID
Even if I put the grep in a subprocess, preceded by a SIGINT handler (in the subprocess), and hit the subprocess with SIGINT, the handler is not invoked.
#!/bin/bash
( trap 'echo "caught signal"' SIGINT; grep -rqa shazam /usr ) &
PID=$!
kill -s SIGINT $PID
The trap handler is invoked if I use SIGTERM, instead of SIGINT, but does not interrupt grep. If I add '/bin/kill -s SIGTERM 0' to the trap handler, there is an indication that the grep process gets terminated, but grep has already completed its work by then. I realize that Bash may have different default behaviors for the different signals, but I don't understand why my call to kill SIGINT is different than a ^C, why the trap call works for SIGTERM, but not for SIGINT, nor why SIGTERM isn't handled by the subprocess immediately.
Well, with further digging, I figured out 2 of my 3 questions. When I backgrounded grep within the script, the shell told it to ignore SIGINT. And Bash says it will wait to handle the signal until the subcommand is complete in some situations (which I don't fully follow at the moment), but the signal is handled immediately if hit the grep process directly with pkill.
"Actually bash will disable SIGINT (and SIGQUIT) on background processes and they can't be enabled" Background process and signals How SIGINT works
"Further background jobs are not supposed to be tied to the shell that started them. If you exit a shell, they will continue running. As such they shouldn't be interrupted by SIGINT, not by default. When job control is enabled, that is fulfilled automatically, since background jobs are running in separate process groups. When job control is disabled (generally in non-interactive shells), bash makes the asynchronous commands ignore SIGINT." Independent Program
Reason why SIGTERM works

kill & wait in one step

If I use a combination to kill a child process in batch and wait for it's termination, I use
kill $PID
wait $PID
If the process exists immediately, the wait will fail, since the pid is not running anymore.
Is there a way to combine both statements to a single one to aviod the error?
Edit: The process I have to kill uses a tempfile; thus it has to be closed (and not just signaled to close) to start it again. Checking the return value of kill does not help, since this indicates whether the signal was delivered successfully.
It's not a one-liner, but would you be willing to consider spawning off the kill with a short sleep, then waiting in the main thread? Something like:
(sleep 1; kill $PID) &
wait $PID
This addresses your concern of the PID being reused after the kill. Even if you reduce the sleep to something much smaller, it introduces idle time, but it should at least ensure that you wait on the correct process.
Effectively, there is not an atomic kill and wait since they are two separate system calls. Some wrapper must be written to execute both functions.
If you do not care about the status,
kill $PID 2> /dev/null
wait $PID 2> /dev/null
If you do care about the status, but do not want an error message, do something like
if ! kill $PID 2> /dev/null; then
# Error logic here
fi
The fancy way for both is a function
killAndWait() {
kill $1 2> /dev/null && wait $1 2> /dev/null
}
Then, do
killAndWait $PID
kill $PID
wait $PID
If the process exists immediately, the wait will fail, since the pid is not running anymore.
As long as $PID really points to a child process of the shell, I don't think wait will fail. I don't see an error with your code.
Experiment:
bash-3.2$ while : ; do ls > /dev/null ; done &
[1] 44908
bash-3.2$ kill 44908
[1]+ Terminated: [...]
bash-3.2$ wait 44908
bash-3.2$ echo $?
143
143 is the return code for SIGTERM, so the kill worked as expected, and Bash could wait for the dead process.

Killing a terminal-attached process that doesn't respond to SIGINT, SIGQUIT

Sometimes both Ctr-C (SIGINT) and Ctrl-\ (SIGQUIT) are too weak. Is there a way to do an more aggressive kill (e.g. kill -9) on the currently-attached process using a quick keyboard shortcut?
If you are a zsh user, you can send SIGTERM with this in .zshrc
function terminate-current-job() { kill -s TERM %+ ; }
zle -N terminate-current-job terminate-current-job
bindkey "^T" terminate-current-job
That binds CTRLT to the previously defined widget/function.
If you are having problems with a specific command not responding to CTRL-C (because it ignores SIGINT, or because it asked the terminal driver to no longer recognise it as an interrupt character) , you can try wrapping it in rlwrap:
rlwrap -a -I <command>
rlwrap will catch the SIGINT sent by the terminal driver when you press CTRL-C and send a SIGTERM to <command> instead.
Of course, <command> may catch, or even ignore SIGTERM as well, but many commands that ignore SIGINT will respond to SIGTERM - while still being able to clean up before they terminate, in contrast to what happens when you use SIGKILL (kill -9)
Like proposed in the other answers, you can try to kill the process by catching some other signal. This can be also done with the linux bash built in trap command
that is used to execute a command when the shell receives any signal
To KILL your executable if SIGINT (CTRL-C) is captured, you need to start it like this:
yourexecutable & pid=$! ; trap 'echo KILL ; kill -9 $pid' INT ; echo WAIT $pid ; wait $pid ; echo DONE
Note that the echos are just for debugging purposes, they can simply be removed if you don't need them.

How to make sure that a process was killed? (using kill command)

I try to kill a process with the kill command in linux. (not using -9 as argument)
I need to make sure that the process is really killed.
As far as I know, the kill command runs asynchronously and it can take some time till it is finished.
I need to make sure, after I run the kill that my process has died using bash
Can you please assist?
Thanks!!!
Killing a process with signal 0 will check if the process is still running, and not actually kill it. Just check the return code.
Assuming $PID holds the pid of your process, you could do something like this:
kill "$PID"
while [ $(kill -0 "$PID") ]; do
sleep 1
done
echo "Process is killed"
kill is used to send signals to processes. It doesn't necessarily terminate the process (but usually do). kill without explicitly mentioned signal will send SIGTERM to the process. The default action on SIGTERM is to terminate process but process can setup a different signal handler and process might not be terminated.
What, I think you need, is a way to find if the process has handled the signal or not. This can be done using ps s $PID. If this shows 0s as pending mask, the process has received the signal and processed it.

Sleep in a while loop gets its own pid

I have a bash script that does some parallel processing in a loop. I don't want the parallel process to spike the CPU, so I use a sleep command. Here's a simplified version.
(while true;do sleep 99999;done)&
So I execute the above line from a bash prompt and get something like:
[1] 12345
Where [1] is the job number and 12345 is the process ID (pid) of the while loop. I do a kill 12345 and get:
[1]+ Terminated ( while true; do
sleep 99999;
done )
It looks like the entire script was terminated. However, I do a ps aux|grep sleep and find the sleep command is still going strong but with its own pid! I can kill the sleep and everything seems fine. However, if I were to kill the sleep first, the while loop starts a new sleep pid. This is such a surprise to me since the sleep is not parallel to the while loop. The loop itself is a single path of execution.
So I have two questions:
Why did the sleep command get its own process ID?
How do I easily kill the while loop and the sleep?
Sleep gets its own PID because it is a process running and just waiting. Try which sleep to see where it is.
You can use ps -uf to see the process tree on your system. From there you can determine what the PPID (parent PID) of the shell (the one running the loop) of the sleep is.
Because "sleep" is a process, not a build-in function or similar
You could do the following:
(while true;do sleep 99999;done)&
whilepid=$!
kill -- -$whilepid
The above code kills the process group, because the PID is specified as a negative number (e.g. -123 instead of 123). In addition, it uses the variable $!, which stores the PID of the most recently executed process.
Note:
When you execute any process in background on interactive mode (i.e. using the command line prompt) it creates a new process group, which is what is happening to you. That way, it's relatively easy to "kill 'em all", because you just have to kill the whole process group. However, when the same is done within a script, it doesn't create any new group, because all new processes belong to the script PID, even if they are executed in background (jobs control is disabled by default). To enable jobs control in a script, you just have to put the following at the beginning of the script:
#!/bin/bash
set -m
Have you tried doing kill %1, where 1 is the number you get after launching the command in background?
I did it right now after launching (while true;do sleep 99999;done)& and it correctly terminated it.
"ps --ppid" selects all processes with the specified parent pid, eg:
$ (while true;do sleep 99999;done)&
[1] 12345
$ ppid=12345 ; kill -9 $ppid $(ps --ppid $ppid -o pid --no-heading)
You can kill the process group.
To find the process group of your process run:
ps --no-headers -o "%r" -p 15864
Then kill the process group using:
kill -- -[PGID]
You can do it all in one command. Let's try it out:
$ (while true;do sleep 99999;done)&
[1] 16151
$ kill -- -$(ps --no-headers -o "%r" -p 16151)
[1]+ Terminated ( while true; do
sleep 99999;
done )
To kill the while loop and the sleep using $! you can also use a trap signal handler inside the subshell.
(trap 'kill ${!}; exit' TERM; while true; do sleep 99999 & wait ${!}; done)&
kill -TERM ${!}

Resources