kill & wait in one step - linux

If I use a combination to kill a child process in batch and wait for it's termination, I use
kill $PID
wait $PID
If the process exists immediately, the wait will fail, since the pid is not running anymore.
Is there a way to combine both statements to a single one to aviod the error?
Edit: The process I have to kill uses a tempfile; thus it has to be closed (and not just signaled to close) to start it again. Checking the return value of kill does not help, since this indicates whether the signal was delivered successfully.

It's not a one-liner, but would you be willing to consider spawning off the kill with a short sleep, then waiting in the main thread? Something like:
(sleep 1; kill $PID) &
wait $PID
This addresses your concern of the PID being reused after the kill. Even if you reduce the sleep to something much smaller, it introduces idle time, but it should at least ensure that you wait on the correct process.

Effectively, there is not an atomic kill and wait since they are two separate system calls. Some wrapper must be written to execute both functions.
If you do not care about the status,
kill $PID 2> /dev/null
wait $PID 2> /dev/null
If you do care about the status, but do not want an error message, do something like
if ! kill $PID 2> /dev/null; then
# Error logic here
fi
The fancy way for both is a function
killAndWait() {
kill $1 2> /dev/null && wait $1 2> /dev/null
}
Then, do
killAndWait $PID

kill $PID
wait $PID
If the process exists immediately, the wait will fail, since the pid is not running anymore.
As long as $PID really points to a child process of the shell, I don't think wait will fail. I don't see an error with your code.
Experiment:
bash-3.2$ while : ; do ls > /dev/null ; done &
[1] 44908
bash-3.2$ kill 44908
[1]+ Terminated: [...]
bash-3.2$ wait 44908
bash-3.2$ echo $?
143
143 is the return code for SIGTERM, so the kill worked as expected, and Bash could wait for the dead process.

Related

Killing process started by bash script but not script itself

So basically I have one script that is keeping a server alive. It starts the server process and then starts it again after the process stops. Although sometimes the server becomes non responsive. For that I want to have another script which would ping the server and would kill the process if it wouldn't respond in 60 seconds.
The problem is that if I kill the server process the bash script also gets terminated.
The start script is just while do: sh Server.sh. It calls other shell script that has additional parameters for starting the server. The server is using java so it starts a java process. If the server hangs I use kill -9 pid because nothing else stops it. If the server doesn't hang and does the usual restart it gracefully stops and the bash script start second loop.
Doing The Right Thing
Use a real process supervision system -- your Linux distribution almost certainly includes one.
Directly monitoring the supervised process by PID
An awful, ugly, moderately buggy approach (for instance, able to kill the wrong process in the event of a PID collision) is the following:
while :; do
./Server.sh & server_pid=$!
echo "$server_pid" > server.pid
wait "$server_pid"
done
...and, to kill the process:
#!/bin/bash
# ^^^^ - DO NOT run this with "sh scriptname"; it must be "bash scriptname".
server_pid="$(<server.pid)"; [[ $server_pid ]] || exit
# allow 5 seconds for clean shutdown -- adjust to taste
for (( i=0; i<5; i++ )); do
if kill -0 "$server_pid"; then
sleep 1
else
exit 0 # server exited gracefully, nothing else to do
fi
done
# escalate to a SIGKILL
kill -9 "$server_pid"
Note that we're storing the PID of the server in our pidfile, and killing that directly -- thus, avoiding inadvertently targeting the supervision script.
Monitoring the supervised process and all children via lockfile
Note that this is using some Linux-specific tools -- but you do have linux on your question.
A more robust approach -- which will work across reboots even in the case of pidfile reuse -- is to use a lockfile:
while :; do
flock -x Server.lock sh Server.sh
done
...and, on the other end:
#!/bin/bash
# kill all programs having a handle on Server.lock
fuser -k Server.lock
for ((i=0; i<5; i++)); do
if fuser -s Server.lock; then
sleep 1
else
exit 0
fi
done
fuser -k -KILL Server.lock

Kill a process and wait for the process to exit

When I start my tcp server from my bash script, I need to kill the previous instance (which may still be listening to the same port) right before the current instance starts listening.
I could use something like pkill <previous_pid>. If I understand it correctly, this just sends SIGTERM to the target pid. When pkill returns, the target process may still be alive. Is there a way to let pkill wait until it exits?
No. What you can do is write a loop with kill -0 $PID. If this call fails ($? -ne 0), the process has terminated (after your normal kill):
while kill -0 $PID; do
sleep 1
done
(kudos to qbolec for the code)
Related:
What does `kill -0 $pid` in a shell script do?
Use wait (bash builtin) to wait for the process to finish:
pkill <previous_pid>
wait <previous_pid>

How to kill a child process after a given timeout in Bash?

I have a bash script that launches a child process that crashes (actually, hangs) from time to time and with no apparent reason (closed source, so there isn't much I can do about it). As a result, I would like to be able to launch this process for a given amount of time, and kill it if it did not return successfully after a given amount of time.
Is there a simple and robust way to achieve that using bash?
P.S.: tell me if this question is better suited to serverfault or superuser.
(As seen in:
BASH FAQ entry #68: "How do I run a command, and have it abort (timeout) after N seconds?")
If you don't mind downloading something, use timeout (sudo apt-get install timeout) and use it like: (most Systems have it already installed otherwise use sudo apt-get install coreutils)
timeout 10 ping www.goooooogle.com
If you don't want to download something, do what timeout does internally:
( cmdpid=$BASHPID; (sleep 10; kill $cmdpid) & exec ping www.goooooogle.com )
In case that you want to do a timeout for longer bash code, use the second option as such:
( cmdpid=$BASHPID;
(sleep 10; kill $cmdpid) \
& while ! ping -w 1 www.goooooogle.com
do
echo crap;
done )
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) &
or to get the exit codes as well:
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) & waiter=$!
# wait on our worker process and return the exitcode
exitcode=$(wait $pid && echo $?)
# kill the waiter subshell, if it still runs
kill -9 $waiter 2>/dev/null
# 0 if we killed the waiter, cause that means the process finished before the waiter
finished_gracefully=$?
sleep 999&
t=$!
sleep 10
kill $t
I also had this question and found two more things very useful:
The SECONDS variable in bash.
The command "pgrep".
So I use something like this on the command line (OSX 10.9):
ping www.goooooogle.com & PING_PID=$(pgrep 'ping'); SECONDS=0; while pgrep -q 'ping'; do sleep 0.2; if [ $SECONDS = 10 ]; then kill $PING_PID; fi; done
As this is a loop I included a "sleep 0.2" to keep the CPU cool. ;-)
(BTW: ping is a bad example anyway, you just would use the built-in "-t" (timeout) option.)
Assuming you have (or can easily make) a pid file for tracking the child's pid, you could then create a script that checks the modtime of the pid file and kills/respawns the process as needed. Then just put the script in crontab to run at approximately the period you need.
Let me know if you need more details. If that doesn't sound like it'd suit your needs, what about upstart?
One way is to run the program in a subshell, and communicate with the subshell through a named pipe with the read command. This way you can check the exit status of the process being run and communicate this back through the pipe.
Here's an example of timing out the yes command after 3 seconds. It gets the PID of the process using pgrep (possibly only works on Linux). There is also some problem with using a pipe in that a process opening a pipe for read will hang until it is also opened for write, and vice versa. So to prevent the read command hanging, I've "wedged" open the pipe for read with a background subshell. (Another way to prevent a freeze to open the pipe read-write, i.e. read -t 5 <>finished.pipe - however, that also may not work except with Linux.)
rm -f finished.pipe
mkfifo finished.pipe
{ yes >/dev/null; echo finished >finished.pipe ; } &
SUBSHELL=$!
# Get command PID
while : ; do
PID=$( pgrep -P $SUBSHELL yes )
test "$PID" = "" || break
sleep 1
done
# Open pipe for writing
{ exec 4>finished.pipe ; while : ; do sleep 1000; done } &
read -t 3 FINISHED <finished.pipe
if [ "$FINISHED" = finished ] ; then
echo 'Subprocess finished'
else
echo 'Subprocess timed out'
kill $PID
fi
rm finished.pipe
Here's an attempt which tries to avoid killing a process after it has already exited, which reduces the chance of killing another process with the same process ID (although it's probably impossible to avoid this kind of error completely).
run_with_timeout ()
{
t=$1
shift
echo "running \"$*\" with timeout $t"
(
# first, run process in background
(exec sh -c "$*") &
pid=$!
echo $pid
# the timeout shell
(sleep $t ; echo timeout) &
waiter=$!
echo $waiter
# finally, allow process to end naturally
wait $pid
echo $?
) \
| (read pid
read waiter
if test $waiter != timeout ; then
read status
else
status=timeout
fi
# if we timed out, kill the process
if test $status = timeout ; then
kill $pid
exit 99
else
# if the program exited normally, kill the waiting shell
kill $waiter
exit $status
fi
)
}
Use like run_with_timeout 3 sleep 10000, which runs sleep 10000 but ends it after 3 seconds.
This is like other answers which use a background timeout process to kill the child process after a delay. I think this is almost the same as Dan's extended answer (https://stackoverflow.com/a/5161274/1351983), except the timeout shell will not be killed if it has already ended.
After this program has ended, there will still be a few lingering "sleep" processes running, but they should be harmless.
This may be a better solution than my other answer because it does not use the non-portable shell feature read -t and does not use pgrep.
Here's the third answer I've submitted here. This one handles signal interrupts and cleans up background processes when SIGINT is received. It uses the $BASHPID and exec trick used in the top answer to get the PID of a process (in this case $$ in a sh invocation). It uses a FIFO to communicate with a subshell that is responsible for killing and cleanup. (This is like the pipe in my second answer, but having a named pipe means that the signal handler can write into it too.)
run_with_timeout ()
{
t=$1 ; shift
trap cleanup 2
F=$$.fifo ; rm -f $F ; mkfifo $F
# first, run main process in background
"$#" & pid=$!
# sleeper process to time out
( sh -c "echo \$\$ >$F ; exec sleep $t" ; echo timeout >$F ) &
read sleeper <$F
# control shell. read from fifo.
# final input is "finished". after that
# we clean up. we can get a timeout or a
# signal first.
( exec 0<$F
while : ; do
read input
case $input in
finished)
test $sleeper != 0 && kill $sleeper
rm -f $F
exit 0
;;
timeout)
test $pid != 0 && kill $pid
sleeper=0
;;
signal)
test $pid != 0 && kill $pid
;;
esac
done
) &
# wait for process to end
wait $pid
status=$?
echo finished >$F
return $status
}
cleanup ()
{
echo signal >$$.fifo
}
I've tried to avoid race conditions as far as I can. However, one source of error I couldn't remove is when the process ends near the same time as the timeout. For example, run_with_timeout 2 sleep 2 or run_with_timeout 0 sleep 0. For me, the latter gives an error:
timeout.sh: line 250: kill: (23248) - No such process
as it is trying to kill a process that has already exited by itself.
#Kill command after 10 seconds
timeout 10 command
#If you don't have timeout installed, this is almost the same:
sh -c '(sleep 10; kill "$$") & command'
#The same as above, with muted duplicate messages:
sh -c '(sleep 10; kill "$$" 2>/dev/null) & command'

Sleep in a while loop gets its own pid

I have a bash script that does some parallel processing in a loop. I don't want the parallel process to spike the CPU, so I use a sleep command. Here's a simplified version.
(while true;do sleep 99999;done)&
So I execute the above line from a bash prompt and get something like:
[1] 12345
Where [1] is the job number and 12345 is the process ID (pid) of the while loop. I do a kill 12345 and get:
[1]+ Terminated ( while true; do
sleep 99999;
done )
It looks like the entire script was terminated. However, I do a ps aux|grep sleep and find the sleep command is still going strong but with its own pid! I can kill the sleep and everything seems fine. However, if I were to kill the sleep first, the while loop starts a new sleep pid. This is such a surprise to me since the sleep is not parallel to the while loop. The loop itself is a single path of execution.
So I have two questions:
Why did the sleep command get its own process ID?
How do I easily kill the while loop and the sleep?
Sleep gets its own PID because it is a process running and just waiting. Try which sleep to see where it is.
You can use ps -uf to see the process tree on your system. From there you can determine what the PPID (parent PID) of the shell (the one running the loop) of the sleep is.
Because "sleep" is a process, not a build-in function or similar
You could do the following:
(while true;do sleep 99999;done)&
whilepid=$!
kill -- -$whilepid
The above code kills the process group, because the PID is specified as a negative number (e.g. -123 instead of 123). In addition, it uses the variable $!, which stores the PID of the most recently executed process.
Note:
When you execute any process in background on interactive mode (i.e. using the command line prompt) it creates a new process group, which is what is happening to you. That way, it's relatively easy to "kill 'em all", because you just have to kill the whole process group. However, when the same is done within a script, it doesn't create any new group, because all new processes belong to the script PID, even if they are executed in background (jobs control is disabled by default). To enable jobs control in a script, you just have to put the following at the beginning of the script:
#!/bin/bash
set -m
Have you tried doing kill %1, where 1 is the number you get after launching the command in background?
I did it right now after launching (while true;do sleep 99999;done)& and it correctly terminated it.
"ps --ppid" selects all processes with the specified parent pid, eg:
$ (while true;do sleep 99999;done)&
[1] 12345
$ ppid=12345 ; kill -9 $ppid $(ps --ppid $ppid -o pid --no-heading)
You can kill the process group.
To find the process group of your process run:
ps --no-headers -o "%r" -p 15864
Then kill the process group using:
kill -- -[PGID]
You can do it all in one command. Let's try it out:
$ (while true;do sleep 99999;done)&
[1] 16151
$ kill -- -$(ps --no-headers -o "%r" -p 16151)
[1]+ Terminated ( while true; do
sleep 99999;
done )
To kill the while loop and the sleep using $! you can also use a trap signal handler inside the subshell.
(trap 'kill ${!}; exit' TERM; while true; do sleep 99999 & wait ${!}; done)&
kill -TERM ${!}

Limiting the time a program runs in Linux

In Linux I would like to run a program but only for a limited time, like 1 second. If the program exceeds this running time I would like to kill the process and show an error message.
Ah well. timeout(1).
DESCRIPTION
Start COMMAND, and kill it if still running after DURATION.
StackOverflow won't allow me to delete my answer since it's the accepted one. It's garnering down-votes since it's at the top of the list with a better solution below it. If you're on a GNU system, please use timeout instead as suggested by #wRAR. So in the hopes that you'll stop down-voting, here's how it works:
timeout 1s ./myProgram
You can use s, m, h or d for seconds (the default if omitted), minutes, hours or days. A nifty feature here is that you may specify another option -k 30s (before the 1s above) in order to kill it with a SIGKILL after another 30 seconds, should it not respond to the original SIGTERM.
A very useful tool. Now scroll down and up-vote #wRAR's answer.
For posterity, this was my original - inferior - suggestion, it might still be if some use for someone.
A simple bash-script should be able to do that for you
./myProgram &
sleep 1
kill $! 2>/dev/null && echo "myProgram didn't finish"
That ought to do it.
$! expands to the last backgrounded process (through the use of &), and kill returns false if it didn't kill any process, so the echo is only executed if it actually killed something.
2>/dev/null redirects kill's stderr, otherwise it would print something telling you it was unable to kill the process.
You might want to add a -KILL or whichever signal you want to use to get rid of your process too.
EDIT
As ephemient pointed out, there's a race here if your program finishes and the some other process snatches the pid, it'll get killed instead. To reduce the probability of it happening, you could react to the SIGCHLD and not try to kill it if that happens. There's still chance to kill the wrong process, but it's very remote.
trapped=""
trap 'trapped=yes' SIGCHLD
./myProgram &
sleep 1
[ -z "$trapped" ] && kill $! 2>/dev/null && echo '...'
Maybe CPU time limit (ulimit -t/setrlimit(RLIMIT_CPU)) will help?
you could launch it in a shell script using &
your_program &
pid=$!
sleep 1
if [ `pgrep $pid` ]
then
kill $pid
echo "killed $pid because it took too long."
fi
hope you get the idea, I'm not sure this is correct my shell skills need some refresh :)
tail -f file & pid=$!
sleep 10
kill $pid 2>/dev/null && echo '...'
If you have the sources, you can fork() early in main() and then have the parent process measure the time and possibly kill the child process. Just use standard system calls fork(), waitpid(), kill(), ... maybe some standard Unix signal handling. Not too complicated but takes some effort.
You can also script something on the shell although I doubt it will be as accurate with respect to the time of 1 second.
If you just want to measure the time, type time <cmd ...> on the shell.
Ok, so just write a short C program that forks, calls execlp or something similar in the child, measures the time in the parent and kills the child. Should be easy ...

Resources