how to kill process groups in trap? - linux

In a bash script I normally use trap to clean up spawned processes:
function cleanup()
{
jobs -l
jobs -p | xargs -r -I {} kill -TERM {}
jobs -l
echo "do something after kill all jobs."
}
trap cleanup EXIT
However this does not work for process groups:
function cleanup()
{
jobs -l
jobs -p | xargs -r -I {} kill -TERM {}
jobs -l
echo "do something after kill all jobs."
}
trap cleanup EXIT
(sleep 100 | tee /tmp/sleep_test.log) | tee sleep_test2.log &
ps -ax -o pid,pgid,ppid,args | grep sleep
jobs -l
sleep 1
the jobs -p give out a ppid of process group of (sleep 100 | tee ...) and a process of tee ... The process group cannot be killed as above. It need to do kill -TERM -PGID. Is there any easy way to let jobs output process group PGID? Or is there any command can kill process group via PPID and process PID with a uniform interface?
update:
kill -TERM 0 does not work here since it kill itself also. But I still need to do something after kill all jobs.

The only way I found is killing sub-processes directly.
#!/usr/bin/env bash
function cleanup()
{
jobs -l
for p in $(jobs -p); do
kill $(pgrep -P $p)
done
jobs -l
echo "do something after kill all jobs."
}
trap cleanup EXIT
(sleep 100 | tee /tmp/sleep_test.log) | tee sleep_test2.log &
ps -ax -o pid,pgid,ppid,args | grep sleep
jobs -l
sleep 1
I tried to kill PGID, which didn't work for me.

Related

How to wait on a backgrounded sub-process with `wait` command [duplicate]

Is there any builtin feature in Bash to wait for a process to finish?
The wait command only allows one to wait for child processes to finish.
I would like to know if there is any way to wait for any process to finish before proceeding in any script.
A mechanical way to do this is as follows but I would like to know if there is any builtin feature in Bash.
while ps -p `cat $PID_FILE` > /dev/null; do sleep 1; done
To wait for any process to finish
Linux (doesn't work on Alpine, where ash doesn't support tail --pid):
tail --pid=$pid -f /dev/null
Darwin (requires that $pid has open files):
lsof -p $pid +r 1 &>/dev/null
With timeout (seconds)
Linux:
timeout $timeout tail --pid=$pid -f /dev/null
Darwin (requires that $pid has open files):
lsof -p $pid +r 1m%s -t | grep -qm1 $(date -v+${timeout}S +%s 2>/dev/null || echo INF)
There's no builtin. Use kill -0 in a loop for a workable solution:
anywait(){
for pid in "$#"; do
while kill -0 "$pid"; do
sleep 0.5
done
done
}
Or as a simpler oneliner for easy one time usage:
while kill -0 PIDS 2> /dev/null; do sleep 1; done;
As noted by several commentators, if you want to wait for processes that you do not have the privilege to send signals to, you have find some other way to detect if the process is running to replace the kill -0 $pid call. On Linux, test -d "/proc/$pid" works, on other systems you might have to use pgrep (if available) or something like ps | grep "^$pid ".
I found "kill -0" does not work if the process is owned by root (or other), so I used pgrep and came up with:
while pgrep -u root process_name > /dev/null; do sleep 1; done
This would have the disadvantage of probably matching zombie processes.
This bash script loop ends if the process does not exist, or it's a zombie.
PID=<pid to watch>
while s=`ps -p $PID -o s=` && [[ "$s" && "$s" != 'Z' ]]; do
sleep 1
done
EDIT: The above script was given below by Rockallite. Thanks!
My orignal answer below works for Linux, relying on procfs i.e. /proc/. I don't know its portability:
while [[ ( -d /proc/$PID ) && ( -z `grep zombie /proc/$PID/status` ) ]]; do
sleep 1
done
It's not limited to shell, but OS's themselves do not have system calls to watch non-child process termination.
FreeBSD and Solaris have this handy pwait(1) utility, which does exactly, what you want.
I believe, other modern OSes also have the necessary system calls too (MacOS, for example, implements BSD's kqueue), but not all make it available from command-line.
From the bash manpage
wait [n ...]
Wait for each specified process and return its termination status
Each n may be a process ID or a job specification; if a
job spec is given, all processes in that job's pipeline are
waited for. If n is not given, all currently active child processes
are waited for, and the return status is zero. If n
specifies a non-existent process or job, the return status is
127. Otherwise, the return status is the exit status of the
last process or job waited for.
Okay, so it seems the answer is -- no, there is no built in tool.
After setting /proc/sys/kernel/yama/ptrace_scope to 0, it is possible to use the strace program. Further switches can be used to make it silent, so that it really waits passively:
strace -qqe '' -p <PID>
All these solutions are tested in Ubuntu 14.04:
Solution 1 (by using ps command):
Just to add up to Pierz answer, I would suggest:
while ps axg | grep -vw grep | grep -w process_name > /dev/null; do sleep 1; done
In this case, grep -vw grep ensures that grep matches only process_name and not grep itself. It has the advantage of supporting the cases where the process_name is not at the end of a line at ps axg.
Solution 2 (by using top command and process name):
while [[ $(awk '$12=="process_name" {print $0}' <(top -n 1 -b)) ]]; do sleep 1; done
Replace process_name with the process name that appears in top -n 1 -b. Please keep the quotation marks.
To see the list of processes that you wait for them to be finished, you can run:
while : ; do p=$(awk '$12=="process_name" {print $0}' <(top -n 1 -b)); [[ $b ]] || break; echo $p; sleep 1; done
Solution 3 (by using top command and process ID):
while [[ $(awk '$1=="process_id" {print $0}' <(top -n 1 -b)) ]]; do sleep 1; done
Replace process_id with the process ID of your program.
Blocking solution
Use the wait in a loop, for waiting for terminate all processes:
function anywait()
{
for pid in "$#"
do
wait $pid
echo "Process $pid terminated"
done
echo 'All processes terminated'
}
This function will exits immediately, when all processes was terminated. This is the most efficient solution.
Non-blocking solution
Use the kill -0 in a loop, for waiting for terminate all processes + do anything between checks:
function anywait_w_status()
{
for pid in "$#"
do
while kill -0 "$pid"
do
echo "Process $pid still running..."
sleep 1
done
done
echo 'All processes terminated'
}
The reaction time decreased to sleep time, because have to prevent high CPU usage.
A realistic usage:
Waiting for terminate all processes + inform user about all running PIDs.
function anywait_w_status2()
{
while true
do
alive_pids=()
for pid in "$#"
do
kill -0 "$pid" 2>/dev/null \
&& alive_pids+="$pid "
done
if [ ${#alive_pids[#]} -eq 0 ]
then
break
fi
echo "Process(es) still running... ${alive_pids[#]}"
sleep 1
done
echo 'All processes terminated'
}
Notes
These functions getting PIDs via arguments by $# as BASH array.
Had the same issue, I solved the issue killing the process and then waiting for each process to finish using the PROC filesystem:
while [ -e /proc/${pid} ]; do sleep 0.1; done
There is no builtin feature to wait for any process to finish.
You could send kill -0 to any PID found, so you don't get puzzled by zombies and stuff that will still be visible in ps (while still retrieving the PID list using ps).
If you need to both kill a process and wait for it finish, this can be achieved with killall(1) (based on process names), and start-stop-daemon(8) (based on a pidfile).
To kill all processes matching someproc and wait for them to die:
killall someproc --wait # wait forever until matching processes die
timeout 10s killall someproc --wait # timeout after 10 seconds
(Unfortunately, there's no direct equivalent of --wait with kill for a specific pid).
To kill a process based on a pidfile /var/run/someproc.pid using signal SIGINT, while waiting for it to finish, with SIGKILL being sent after 20 seconds of timeout, use:
start-stop-daemon --stop --signal INT --retry 20 --pidfile /var/run/someproc.pid
Use inotifywait to monitor some file that gets closed, when your process terminates. Example (on Linux):
yourproc >logfile.log & disown
inotifywait -q -e close logfile.log
-e specifies the event to wait for, -q means minimal output only on termination. In this case it will be:
logfile.log CLOSE_WRITE,CLOSE
A single wait command can be used to wait for multiple processes:
yourproc1 >logfile1.log & disown
yourproc2 >logfile2.log & disown
yourproc3 >logfile3.log & disown
inotifywait -q -e close logfile1.log logfile2.log logfile3.log
The output string of inotifywait will tell you, which process terminated. This only works with 'real' files, not with something in /proc/
Rauno Palosaari's solution for Timeout in Seconds Darwin, is an excellent workaround for a UNIX-like OS that does not have GNU tail (it is not specific to Darwin). But, depending on the age of the UNIX-like operating system, the command-line offered is more complex than necessary, and can fail:
lsof -p $pid +r 1m%s -t | grep -qm1 $(date -v+${timeout}S +%s 2>/dev/null || echo INF)
On at least one old UNIX, the lsof argument +r 1m%s fails (even for a superuser):
lsof: can't read kernel name list.
The m%s is an output format specification. A simpler post-processor does not require it. For example, the following command waits on PID 5959 for up to five seconds:
lsof -p 5959 +r 1 | awk '/^=/ { if (T++ >= 5) { exit 1 } }'
In this example, if PID 5959 exits of its own accord before the five seconds elapses, ${?} is 0. If not ${?} returns 1 after five seconds.
It may be worth expressly noting that in +r 1, the 1 is the poll interval (in seconds), so it may be changed to suit the situation.
On a system like OSX you might not have pgrep so you can try this appraoch, when looking for processes by name:
while ps axg | grep process_name$ > /dev/null; do sleep 1; done
The $ symbol at the end of the process name ensures that grep matches only process_name to the end of line in the ps output and not itself.

Bash: Killing all processes in subprocess

In bash I can get the process ID (pid) of the last subprocess through the $! variable. I can then kill this subprocess before it finishes:
(sleep 5) & pid=$!
kill -9 $pid
This works as advertised. If I now extend the subprocess with more commands after the sleep, the sleep command continues after the subprocess is killed, even though the other commands never get executed.
As an example, consider the following, which spins up a subprocess and monitor its assassination using ps:
# Start subprocess and get its pid
(sleep 5; echo done) & pid=$!
# grep for subprocess
echo "grep before kill:"
ps aux | grep "$pid\|sleep 5"
# Kill the subprocess
echo
echo "Killing process $pid"
kill -9 $pid
# grep for subprocess
echo
echo "grep after kill:"
ps aux | grep "$pid\|sleep 5"
# Wait for sleep to finish
sleep 6
# grep for subprocess
echo
echo "grep after sleep is finished:"
ps aux | grep "$pid\|sleep 5"
If I save this to a file named filename and run it, I get this printout:
grep before kill:
username 7464 <...> bash filename
username 7466 <...> sleep 5
username 7467 <...> grep 7464\|sleep 5
Killing process 7464
grep after kill:
username 7466 <...> sleep 5
username 7469 <...> grep 7464\|sleep 5
grep after sleep is finished:
username 7472 <...> grep 7464\|sleep 5
where unimportant information from the ps command is replaced with <...>. It looks like the kill has killed the overall bash execution of filename, while leaving sleep running.
How can I correctly kill the entire subprocess?
You can set a trap in the subshell to kill any active jobs before exiting:
(trap 'kill $(jobs -p)' EXIT; sleep 5; echo done ) & pid=$!
I don't know exactly why that sleep process gets orphaned, anyway instead kill you can use pkill with -P flag to also kill all children
pkill -TERM -P $pid
EDIT:
that means that in order to kill a process and all it's children you should use instead
CPIDS=`pgrep -P $pid` # gets pids of child processes
kill -9 $pid
for cpid in $CPIDS ; do kill -9 $cpid ; done
You can have a look at rkill that seems to meet your requirements :
http://www.unix.com/man-page/debian/1/rkill/
rkill [-SIG] pid/name...
When invoked as rkill, this utility does not display information about the processes, but
sends them all a signal instead. If not specified on the command line, a terminate
(SIGTERM) signal is sent.

Daemon won't kill children that are reading from a named pipe

I've written this bash daemon that keeps an eye on a named pipe, logs everything it sees on a file named $LOG_FILE_BASENAME.$DATE, and it also creates a filtered version of it in $ACTIONABLE_LOG_FILE:
while true
do
DATE=`date +%Y%m%d`
cat $NAMED_PIPE | tee -a "$LOG_FILE_BASENAME.$DATE" | grep -P -v "$EXCEPTIONS" >> "$ACTIONABLE_LOG_FILE"
done
pkill -P $$ # Here it's where it should kill it's children
exit 0
When the daemon is running, this is how the process table looks:
/bin/sh the_daemon.sh
\_ cat the_fifo_queue
\_ tee -a log_file.20150807
\_ grep -P -v "regexp" > filtered_log_file
The problem is that when I kill the daemon (SIGTERM), the cat, the tee, and the grep processes that where spawned by the daemon are not collected by the parent. Instead, they become orphans and keep on waiting for input on the named pipe.
Once the FIFO receives some input, then they process that input as instructed and die.
How can I make the daemon kill its children before dying? Why aren't they dying with pkill -P $$?
You want to setup a signal handler for your script which kills all members of its process group (its children) in case the script itself gets signalled:
#!/bin/bash
function handle_sigterm()
{
pkill -P $$
exit 0
}
trap handle_sigterm SIGTERM
while true
do
DATE=`date +%Y%m%d`
cat $NAMED_PIPE | tee -a "$LOG_FILE_BASENAME.$DATE" | grep -P -v "$EXCEPTIONS" >> "$ACTIONABLE_LOG_FILE"
done
handle_sigterm
exit 0
Update:
As per pilcrow's comment replace
cat $NAMED_PIPE | tee -a "$LOG_FILE_BASENAME.$DATE" | grep -P -v "$EXCEPTIONS" >> "$ACTIONABLE_LOG_FILE"
by
cat $NAMED_PIPE | tee -a "$LOG_FILE_BASENAME.$DATE" | grep -P -v "$EXCEPTIONS" >> "$ACTIONABLE_LOG_FILE" &
wait $!

How to terminate a job dispatcher in back ground in Linux?

I have a job dispatcher bash shell script containing below codes:
for (( i=0; i<$toBeDoneNum; i=i+1 ))
do
while true
do
processNum=`ps aux | grep Checking | wc -l`
if [ $processNum -lt $maxProcessNum ]; then
break
fi
echo "Too many processes: Max process is $maxProcessNum."
sleep $sleepSec
done
java -classpath ".:./conf:./lib/*" odx.comm.cwv.main.Checking $i
done
I run the script like this to be in the background:
./dispatcher.sh &
I want to terminate this dispatcher process with kill -9. But I didn't record the pid of the dispatcher process at the first time. Instead I used jobs to show all the process but it shows nothing. Even this fg cannot bring the process to foreground.
fg
bash: fg: current: no such job
But I think this dispatcher process is still running because it still continues to assign java program to run. How should I terminate this job dispatcher bash shell script process?
Edit: I used jobs, jobs -l, jobs -r and jobs -s. Nothing showed.
create test.sh with content
sleep 60
then
jobs -l | grep 'test.sh &' | grep -v grep | awk '{print $2}'
this gives me the process id on Ubuntu and OSX
you can assign it to a variable and then kill it
pid=`jobs -l | grep 'test.sh &' | grep -v grep | awk '{print $2}'`
kill -9 $pid

bash: silently kill background function process

shell gurus,
I have a bash shell script, in which I launch a background function, say foo(), to display a progress bar for a boring and long command:
foo()
{
while [ 1 ]
do
#massively cool progress bar display code
sleep 1
done
}
foo &
foo_pid=$!
boring_and_long_command
kill $foo_pid >/dev/null 2>&1
sleep 10
now, when foo dies, I see the following text:
/home/user/script: line XXX: 30290 Killed foo
This totally destroys the awesomeness of my, otherwise massively cool, progress bar display.
How do I get rid of this message?
kill $foo_pid
wait $foo_pid 2>/dev/null
BTW, I don't know about your massively cool progress bar, but have you seen Pipe Viewer (pv)? http://www.ivarch.com/programs/pv.shtml
Just came across this myself, and realised "disown" is what we are looking for.
foo &
foo_pid=$!
disown
boring_and_long_command
kill $foo_pid
sleep 10
The death message is being printed because the process is still in the shells list of watched "jobs". The disown command will remove the most recently spawned process from this list so that no debug message will be generated when it is killed, even with SIGKILL (-9).
Try to replace your line kill $foo_pid >/dev/null 2>&1 with the line:
(kill $foo_pid 2>&1) >/dev/null
Update:
This answer is not correct for the reason explained by #mklement0 in his comment:
The reason this answer isn't effective with background jobs is that
Bash itself asynchronously, after the kill command has completed,
outputs a status message about the killed job, which you cannot
suppress directly - unless you use wait, as in the accepted answer.
This "hack" seems to work:
# Some trickery to hide killed message
exec 3>&2 # 3 is now a copy of 2
exec 2> /dev/null # 2 now points to /dev/null
kill $foo_pid >/dev/null 2>&1
sleep 1 # sleep to wait for process to die
exec 2>&3 # restore stderr to saved
exec 3>&- # close saved version
and it was inspired from here. World order has been restored.
This is a solution I came up with for a similar problem (wanted to display a timestamp during long running processes). This implements a killsub function that allows you to kill any subshell quietly as long as you know the pid. Note, that the trap instructions are important to include: in case the script is interrupted, the subshell will not continue to run.
foo()
{
while [ 1 ]
do
#massively cool progress bar display code
sleep 1
done
}
#Kills the sub process quietly
function killsub()
{
kill -9 ${1} 2>/dev/null
wait ${1} 2>/dev/null
}
foo &
foo_pid=$!
#Add a trap incase of unexpected interruptions
trap 'killsub ${foo_pid}; exit' INT TERM EXIT
boring_and_long_command
#Kill foo after finished
killsub ${foo_pid}
#Reset trap
trap - INT TERM EXIT
Add at the start of the function:
trap 'exit 0' TERM
You can use set +m before to suppress that. More information on that here
Another way to do it:
func_terminate_service(){
[[ "$(pidof ${1})" ]] && killall ${1}
sleep 2
[[ "$(pidof ${1})" ]] && kill -9 "$(pidof ${1})"
}
call it with
func_terminate_service "firefox"
Yet another way to disable job notifications is to put your command to be backgrounded in a sh -c 'cmd &' construct.
#!/bin/bash
foo()
{
while [ 1 ]
do
sleep 1
done
}
#foo &
#foo_pid=$!
export -f foo
foo_pid=`sh -c 'foo & echo ${!}' | head -1`
# if shell does not support exporting functions (export -f foo)
#arg1='foo() { while [ 1 ]; do sleep 1; done; }'
#foo_pid=`sh -c 'eval "$1"; foo & echo ${!}' _ "$arg1" | head -1`
sleep 3
echo kill ${foo_pid}
kill ${foo_pid}
sleep 3
exit
The error message should come from the default signal handler which dump the signal source in the script. I met the similar errors only on bash 3.x and 4.x. To always quietly kill the child process everywhere(tested on bash 3/4/5, dash, ash, zsh), we could trap the TERM signal at the very first of child process:
#!/bin/sh
## assume script name is test.sh
foo() {
trap 'exit 0' TERM ## here is the key
while true; do sleep 1; done
}
echo before child
ps aux | grep 'test\.s[h]\|slee[p]'
foo &
foo_pid=$!
sleep 1 # wait trap is done
echo before kill
ps aux | grep 'test\.s[h]\|slee[p]'
kill $foo_pid
sleep 1 # wait kill is done
echo after kill
ps aux | grep 'test\.s[h]\|slee[p]'

Resources