Related
Is there any builtin feature in Bash to wait for a process to finish?
The wait command only allows one to wait for child processes to finish.
I would like to know if there is any way to wait for any process to finish before proceeding in any script.
A mechanical way to do this is as follows but I would like to know if there is any builtin feature in Bash.
while ps -p `cat $PID_FILE` > /dev/null; do sleep 1; done
To wait for any process to finish
Linux (doesn't work on Alpine, where ash doesn't support tail --pid):
tail --pid=$pid -f /dev/null
Darwin (requires that $pid has open files):
lsof -p $pid +r 1 &>/dev/null
With timeout (seconds)
Linux:
timeout $timeout tail --pid=$pid -f /dev/null
Darwin (requires that $pid has open files):
lsof -p $pid +r 1m%s -t | grep -qm1 $(date -v+${timeout}S +%s 2>/dev/null || echo INF)
There's no builtin. Use kill -0 in a loop for a workable solution:
anywait(){
for pid in "$#"; do
while kill -0 "$pid"; do
sleep 0.5
done
done
}
Or as a simpler oneliner for easy one time usage:
while kill -0 PIDS 2> /dev/null; do sleep 1; done;
As noted by several commentators, if you want to wait for processes that you do not have the privilege to send signals to, you have find some other way to detect if the process is running to replace the kill -0 $pid call. On Linux, test -d "/proc/$pid" works, on other systems you might have to use pgrep (if available) or something like ps | grep "^$pid ".
I found "kill -0" does not work if the process is owned by root (or other), so I used pgrep and came up with:
while pgrep -u root process_name > /dev/null; do sleep 1; done
This would have the disadvantage of probably matching zombie processes.
This bash script loop ends if the process does not exist, or it's a zombie.
PID=<pid to watch>
while s=`ps -p $PID -o s=` && [[ "$s" && "$s" != 'Z' ]]; do
sleep 1
done
EDIT: The above script was given below by Rockallite. Thanks!
My orignal answer below works for Linux, relying on procfs i.e. /proc/. I don't know its portability:
while [[ ( -d /proc/$PID ) && ( -z `grep zombie /proc/$PID/status` ) ]]; do
sleep 1
done
It's not limited to shell, but OS's themselves do not have system calls to watch non-child process termination.
FreeBSD and Solaris have this handy pwait(1) utility, which does exactly, what you want.
I believe, other modern OSes also have the necessary system calls too (MacOS, for example, implements BSD's kqueue), but not all make it available from command-line.
From the bash manpage
wait [n ...]
Wait for each specified process and return its termination status
Each n may be a process ID or a job specification; if a
job spec is given, all processes in that job's pipeline are
waited for. If n is not given, all currently active child processes
are waited for, and the return status is zero. If n
specifies a non-existent process or job, the return status is
127. Otherwise, the return status is the exit status of the
last process or job waited for.
Okay, so it seems the answer is -- no, there is no built in tool.
After setting /proc/sys/kernel/yama/ptrace_scope to 0, it is possible to use the strace program. Further switches can be used to make it silent, so that it really waits passively:
strace -qqe '' -p <PID>
All these solutions are tested in Ubuntu 14.04:
Solution 1 (by using ps command):
Just to add up to Pierz answer, I would suggest:
while ps axg | grep -vw grep | grep -w process_name > /dev/null; do sleep 1; done
In this case, grep -vw grep ensures that grep matches only process_name and not grep itself. It has the advantage of supporting the cases where the process_name is not at the end of a line at ps axg.
Solution 2 (by using top command and process name):
while [[ $(awk '$12=="process_name" {print $0}' <(top -n 1 -b)) ]]; do sleep 1; done
Replace process_name with the process name that appears in top -n 1 -b. Please keep the quotation marks.
To see the list of processes that you wait for them to be finished, you can run:
while : ; do p=$(awk '$12=="process_name" {print $0}' <(top -n 1 -b)); [[ $b ]] || break; echo $p; sleep 1; done
Solution 3 (by using top command and process ID):
while [[ $(awk '$1=="process_id" {print $0}' <(top -n 1 -b)) ]]; do sleep 1; done
Replace process_id with the process ID of your program.
Blocking solution
Use the wait in a loop, for waiting for terminate all processes:
function anywait()
{
for pid in "$#"
do
wait $pid
echo "Process $pid terminated"
done
echo 'All processes terminated'
}
This function will exits immediately, when all processes was terminated. This is the most efficient solution.
Non-blocking solution
Use the kill -0 in a loop, for waiting for terminate all processes + do anything between checks:
function anywait_w_status()
{
for pid in "$#"
do
while kill -0 "$pid"
do
echo "Process $pid still running..."
sleep 1
done
done
echo 'All processes terminated'
}
The reaction time decreased to sleep time, because have to prevent high CPU usage.
A realistic usage:
Waiting for terminate all processes + inform user about all running PIDs.
function anywait_w_status2()
{
while true
do
alive_pids=()
for pid in "$#"
do
kill -0 "$pid" 2>/dev/null \
&& alive_pids+="$pid "
done
if [ ${#alive_pids[#]} -eq 0 ]
then
break
fi
echo "Process(es) still running... ${alive_pids[#]}"
sleep 1
done
echo 'All processes terminated'
}
Notes
These functions getting PIDs via arguments by $# as BASH array.
Had the same issue, I solved the issue killing the process and then waiting for each process to finish using the PROC filesystem:
while [ -e /proc/${pid} ]; do sleep 0.1; done
There is no builtin feature to wait for any process to finish.
You could send kill -0 to any PID found, so you don't get puzzled by zombies and stuff that will still be visible in ps (while still retrieving the PID list using ps).
If you need to both kill a process and wait for it finish, this can be achieved with killall(1) (based on process names), and start-stop-daemon(8) (based on a pidfile).
To kill all processes matching someproc and wait for them to die:
killall someproc --wait # wait forever until matching processes die
timeout 10s killall someproc --wait # timeout after 10 seconds
(Unfortunately, there's no direct equivalent of --wait with kill for a specific pid).
To kill a process based on a pidfile /var/run/someproc.pid using signal SIGINT, while waiting for it to finish, with SIGKILL being sent after 20 seconds of timeout, use:
start-stop-daemon --stop --signal INT --retry 20 --pidfile /var/run/someproc.pid
Use inotifywait to monitor some file that gets closed, when your process terminates. Example (on Linux):
yourproc >logfile.log & disown
inotifywait -q -e close logfile.log
-e specifies the event to wait for, -q means minimal output only on termination. In this case it will be:
logfile.log CLOSE_WRITE,CLOSE
A single wait command can be used to wait for multiple processes:
yourproc1 >logfile1.log & disown
yourproc2 >logfile2.log & disown
yourproc3 >logfile3.log & disown
inotifywait -q -e close logfile1.log logfile2.log logfile3.log
The output string of inotifywait will tell you, which process terminated. This only works with 'real' files, not with something in /proc/
Rauno Palosaari's solution for Timeout in Seconds Darwin, is an excellent workaround for a UNIX-like OS that does not have GNU tail (it is not specific to Darwin). But, depending on the age of the UNIX-like operating system, the command-line offered is more complex than necessary, and can fail:
lsof -p $pid +r 1m%s -t | grep -qm1 $(date -v+${timeout}S +%s 2>/dev/null || echo INF)
On at least one old UNIX, the lsof argument +r 1m%s fails (even for a superuser):
lsof: can't read kernel name list.
The m%s is an output format specification. A simpler post-processor does not require it. For example, the following command waits on PID 5959 for up to five seconds:
lsof -p 5959 +r 1 | awk '/^=/ { if (T++ >= 5) { exit 1 } }'
In this example, if PID 5959 exits of its own accord before the five seconds elapses, ${?} is 0. If not ${?} returns 1 after five seconds.
It may be worth expressly noting that in +r 1, the 1 is the poll interval (in seconds), so it may be changed to suit the situation.
On a system like OSX you might not have pgrep so you can try this appraoch, when looking for processes by name:
while ps axg | grep process_name$ > /dev/null; do sleep 1; done
The $ symbol at the end of the process name ensures that grep matches only process_name to the end of line in the ps output and not itself.
I have a process that i would like to kill and then restart a service. Someone has written a code to kill the process by writing the following set of scripts
ps -ef |grep "process_name" | awk '{print "kill -15 " $2}'> /projects/test/kill.sh
#run the kill script
/projects/test/kill.sh
and then again
ps -ef |grep "process_name" | awk '{print "kill -9 " $2}'> /projects/test/kill.sh
#run the kill script
/projects/test/kill.sh
#finally
service restart command here
# the problem here is that service does not restart properly sometimes,
as it thinks that process is still running.
As i understand kill -15 gracefully kills the process. But then right away they have the kill -9 as well.
So if a process was getting killed in the first command, what happens when kill -9 is also run on the same process? Or will the ps -ef even list out that process since it has been marked for kill?
Thanks!
You are correct that kill -15 is to gracefully kill a process. But, killing a process is something that happens instantaneously. So the program above is going to check for pid, attempting to kill it gracefully .. If the kill -15 fails -- The kill -9 is performed. The way it knows that kill -15 failed, is the grep command. If kill -15 was successful, that pid should not exist any longer, making the following grep return empty.
So really, kill -9 only runs if kill -15 failed to gracefully stop the program. The problem with this approach, is that sometimes gracefully stopping a process can take some time depending on the program. So IMHO there needs to be a wait period or a sleep for a few seconds to allow kill -15 to attempt to gracefully stop the process .. Most assuredly with the approach above, kill -9 is almost always invoked since the script doesn't allow much time for the process to be shut down properly. In the event that kill -15 is still processing, kill -9 will just override and instantly stop the process.
If you have the option to refactor, you can use /proc/$PID as a more efficient way to detect if a process is running.
stopSvc() { local svc=$1
read x pid x < <( ps -fu "$App_user" | grep -E " ($App_baseDIR/$1/|)$svc.jar$" ||: )
local -i starting="$(date +%s)" # linux epoch timestamp in seconds
while [[ -d "/proc/$pid" ]]
do ps -fp "$pid"
kill -term "$pid"
if (( ( $(date +%s) - starting ) < 20 )) # been trying for less than 20s
then sleep 2
date
else echo "$svc is hung - using a hard stop"
kill -KILL "$pid"
break
fi
done
sleep 2
[[ -d "/proc/$pid" ]] && return 1 || return 0 # flip the return
}
Basically, the kill -15 is a term signal, which the process could catch to trigger a graceful shutdown, closing pipes, sockets, & files, cleaning up temp space, etc, so to be useful it should give some time. The -9 is a kill and can't be caught. It's the Big Hammer that you use to squish the jobs that are misbehaving, and should be reserved for those cases.
You are totally right, this makes little sense. If you're going to use the -9 so soon, might as well skip the careless attempt at better practice and just remove the -15.
I am currently doing an exercise that requires me to write a script that kills the "sleep" process based on the nice value of it. So in one terminal, a sleep command of 100 (with the default niceness value of 0) would be terminated immediately when I run my script in another terminal. However, I'm having trouble writing the script for it. Here is what I have so far:
#!/usr/local/bin/bash
nice="$(ps eo pid,user,nice,command | grep sleep)"
if nice <= 4
then
kill -9 sleep
fi
My question is: How do I get the nice value from a command into a simple variable that I can run through my if statement?
Also, I'm running into trouble running my scripts. When I have a sleep command run in one terminal, and try to input sh kill_sleep.sh, it insists that it can't open it. What could be going wrong?
The command below kills all sleep processes with niceness <= 4:
ps -o pid= -o nice= -C sleep | awk '$2<=4{system("kill " $1)}'
The option -C sleep tells ps to select only sleep commands.
The options -o pid= -o nice= specify that ps should output the process ID and the nice value while omitting the header.
In the awk command, $2<=4 selects only those lines that have nice less than or equal to 4. (Since nice is the second value on each line of ps output, awk refers to it as $2.)
For those selected lines, the awk command system("kill " $1) is run. This runs the shell command kill on the pid. (Since PID is the first value on each line of ps output, awk referes to it as $1.)
The kill pid command sends the process the default signal which is TERM. This signal allows the process to shut down properly. kill -9 should almost always be avoided.
You can also do it simply even without awk:
read pid nice < <(ps -C sleep ho pid,nice)
if (( $nice <= 4 ))
then kill $pid
fi
-C filters only sleep commands in ps output
h in ps -C sleep ho suppresses output of names of columns (header)
read assigns the according values to variables pid and nice
kill might be with -9 if you prefer
<(...) construct is process substitution, it allows to read from process output as if it was a file
If you want to reflect the possibility of several running sleep instances (and kill all which are not nice), you can read ps output in while loop:
while read pid nice; do
if (($nice <= 4))
then kill $pid;
fi
done < <(ps -C sleep ho pid,nice)
You can use awk to match one column and return another column:
sleep=$(ps eo pid,nice,command | awk '$3 == "sleep" && $2 <= 4 {print $1}')
if [ "$sleep" ]
then kill $sleep
fi
I made simple script which works in infinite loop. It looks like that:
while :
do
#operations
sleep 5
done
and I added it to autorun programs like this.
Everything works fine but after logout I have 2 instances of this script process (3 after next logout and so on). Only one of them show notifications but they both run own sleep processes.
What can I do to solve this problem?
Log out doesn't kill all processes. You need to kill that process yourself. One way is to add conditional kill inside your script.
Example:
#!/bin/bash
for proc in $(pgrep $(basename "$0"));do
[[ $proc -ne $$ ]] && kill $proc
done
while :
do
#operations
sleep 5
done
If you run this script twice, the second one will kill the previous one/s and make sure only one instance of this script is running at a time.
If there are more than one users who use that process then you might want it to be user specific. For that, change the line:
[[ $proc -ne $$ ]] && kill $proc
to:
[[ $(echo $(pgrep -u $USER) | grep -o $proc) -ne $$ ]] && kill $proc
Note: Sometimes, your process can get into a defunct state when normal kill command won't be enough to kill them. Use kill -9 in those cases.
This question already has answers here:
How to get pid given the process name
(4 answers)
Closed 5 years ago.
I want to write a shell script (.sh file) to get a given process id. What I'm trying to do here is once I get the process ID, I want to kill that process. I'm running on Ubuntu (Linux).
I was able to do it with a command like
ps -aux|grep ruby
kill -9 <pid>
but I'm not sure how to do it through a shell script.
Using grep on the results of ps is a bad idea in a script, since some proportion of the time it will also match the grep process you've just invoked. The command pgrep avoids this problem, so if you need to know the process ID, that's a better option. (Note that, of course, there may be many processes matched.)
However, in your example, you could just use the similar command pkill to kill all matching processes:
pkill ruby
Incidentally, you should be aware that using -9 is overkill (ho ho) in almost every case - there's some useful advice about that in the text of the "Useless Use of kill -9 form letter ":
No no no. Don't use kill -9.
It doesn't give the process a chance to cleanly:
shut down socket connections
clean up temp files
inform its children that it is going away
reset its terminal characteristics
and so on and so on and so on.
Generally, send 15, and wait a second or two, and if that doesn't
work, send 2, and if that doesn't work, send 1. If that doesn't,
REMOVE THE BINARY because the program is badly behaved!
Don't use kill -9. Don't bring out the combine harvester just to tidy
up the flower pot.
If you are going to use ps and grep then you should do it this way:
ps aux|grep r[u]by
Those square brackets will cause grep to skip the line for the grep command itself. So to use this in a script do:
output=`ps aux|grep r\[u\]by`
set -- $output
pid=$2
kill $pid
sleep 2
kill -9 $pid >/dev/null 2>&1
The backticks allow you to capture the output of a comand in a shell variable. The set -- parses the ps output into words, and $2 is the second word on the line which happens to be the pid. Then you send a TERM signal, wait a couple of seconds for ruby to to shut itself down, then kill it mercilessly if it still exists, but throw away any output because most of the time kill -9 will complain that the process is already dead.
I know that I have used this without the backslashes before the square brackets but just now I checked it on Ubuntu 12 and it seems to require them. This probably has something to do with bash's many options and the default config on different Linux distros. Hopefully the [ and ] will work anywhere but I no longer have access to the servers where I know that it worked without backslash so I cannot be sure.
One comment suggests grep-v and that is what I used to do, but then when I learned of the [] variant, I decided it was better to spawn one fewer process in the pipeline.
As a start there is no need to do a ps -aux | grep... The command pidof is far better to use. And almost never ever do kill -9 see here
to get the output from a command in bash, use something like
pid=$(pidof ruby)
or use pkill directly.
option -v is very important. It can exclude a grep expression itself
e.g.
ps -w | grep sshd | grep -v grep | awk '{print $1}' to get sshd id
This works in Cygwin but it should be effective in Linux as well.
ps -W | awk '/ruby/,NF=1' | xargs kill -f
or
ps -W | awk '$0~z,NF=1' z=ruby | xargs kill -f
Bash Pitfalls
You can use the command killall:
$ killall ruby
Its pretty simple.
Simply Run Any Program like this :- x= gedit & echo $! this will give you PID of this process.
then do this kill -9 $x
To kill the process in shell
getprocess=`ps -ef|grep servername`
#echo $getprocess
set $getprocess
pid=$2
#echo $pid
kill -9 $pid
If you already know the process then this will be useful:
PID=`ps -eaf | grep <process> | grep -v grep | awk '{print $2}'`
if [[ "" != "$PID" ]]; then
echo "killing $PID"
kill -9 $PID
fi