Wait for a program (non-child) to finish and execute a command - linux

I have a program running on a remote computer which shouldn't be stopped. I need to track when this program is stopped and immediately execute a command. PID is known. How can I do that?

You cannot wait for non-child processes.
Probably the most efficient way in a shell would be to poll using the exit code of kill -0 <pid> to check if the process still exists:
while kill -0 $PID 2>/dev/null; do sleep 1; done
This is both simpler and more efficient than any approaches involving ps and grep. However, it only works if your user has permission to send signals to that process.

Code like this can do the work (to be run on remote computer)
while true
do
if [ "$(ps -efl|grep $PIDN|grep -v grep|wc -l)" -lt 1 ]
then <exec code>; break
fi
sleep 5
done
It expect the variable PIDN to contain the PID.
P.S. I know the code is ugly and power hungry
EDIT: it is possible to use -p in ps to avoid one grep
while true
do
if [ "$(ps -p $PIDN|wc -l)" -lt 2 ]
then <exec code>; break
fi
sleep 5
done

Here's a fairly simple way to wait for a process to terminate using the ps -p PID strategy:
if ps -p "$PID" >/dev/null 2>&1; then
echo "Process $PID is running ..."
while ps -p "$PID" >/dev/null 2>&1; do
sleep 5
done
echo "Process $PID is not running anymore."
fi
Checking for a process by PID
In general, to check for process ownership or permission to kill (send signals to) a proccess, you can use a combination of ps -p PID and kill -0:
if ps -p "$PID" >/dev/null 2>&1; then
echo "Process $PID exists!"
if kill -0 "$PID" >/dev/null 2>&1; then
echo "You can send signals to process $PID, e.g. with 'kill $PID'"
else
echo "You do not have permission to send signals to process $PID"
fi
else
echo "Process $PID does not exist."
fi

You can use exitsnoop to achieve this.
The bcc toolkit implements many excellent monitoring capabilities based on eBPF. Among them, exitsnoop traces process termination, showing the command name and reason for termination,
either an exit or a fatal signal.
It catches processes of all users, processes in containers, as well as processes that
become zombie.
This works by tracing the kernel sched_process_exit() function using dynamic tracing, and
will need updating to match any changes to this function.
Since this uses BPF, only the root user can use this tool.
exitsnoop examples:
Trace all process termination
# exitsnoop
Trace all process termination, and include timestamps:
# exitsnoop -t
Exclude successful exits, only include non-zero exit codes and fatal signals:
# exitsnoop -x
Trace PID 181 only:
# exitsnoop -p 181
Label each output line with 'EXIT':
# exitsnoop --label EXIT
You can get more information about this tool from the link below:
Github repo: tools/exitsnoop: Trace process termination (exit and fatal signals). Examples.
Linux Extended BPF (eBPF) Tracing Tools
ubuntu manpages: exitsnoop-bpfcc
Another option
use this project:
https://github.com/stormc/waitforpid

Related

Bash - use 1 stop script for multiple similar services, and kill the correct process only

I have multiple processes running as services on a machine
Before moving from 1 process/service to multiple ones, I used the following script to stop my service
#!/bin/sh
SIGNAL=${SIGNAL:-TERM}
PIDS=$(ps ax | grep -i 'datastream' | grep java | grep -v grep | awk '{print $1}')
if [ -z "$PIDS" ]; then
echo "No Brooklin server to stop"
exit 1
else
kill -s $SIGNAL $PIDS
fi
The issue now is that this script kills all processes of this type if invoked as a service stop command
My services are called for example service-A, service-B, service-C. If I send a service service-C stop command, the current script will stop all 3 processes.
I would like to make the script use the provided service name to determine which process to stop (I can grep A/B/C from the process output to ps, but I haven't managed to tell it how to stop only the process given in the service stop command.
Does anyone have experience handling something similar?
You can try something like below while starting your application which can store your PID in a static file and then you can use the same file to kill the process.
Pasting below one of my start - stop script which I have used in past for churning up multiple processes.
Start Script :-
#!/bin/bash
export PORT=$1
. /application/setEnv.sh
/java/jdk1.8.0_152/bin/java -Xms512m -Xmx2G -XX:+DisableExplicitGC -jar /application/api-1.0-0-all.jar </dev/null >>$LOGDIR/service$PORT.log 2>&1 &
echo $! > /application/service$PORT.pid
disown $!
Stop Script :-
#!/bin/bash
PORT=$1
PID=`cat /application/service$PORT.pid`
if [ ! -z "$PID" ]; then
rm /application/service$PORT.pid
kill -9 $PID >/dev/null 2>&1
if [ $? -gt 0 ]; then
echo "PID file found but no matching process was found. Stop aborted."
exit 1
fi
else
echo "PID file is empty and has been ignored."
fi
mv /application/logs/service$PORT.log /application/logs/service$PORT.log`date +%d%m%Y%H%M%S`
Only change which I can think of is the replace my port utilisation logic viz. $PORT with your service names viz. A/B/C.

How to wait on a backgrounded sub-process with `wait` command [duplicate]

Is there any builtin feature in Bash to wait for a process to finish?
The wait command only allows one to wait for child processes to finish.
I would like to know if there is any way to wait for any process to finish before proceeding in any script.
A mechanical way to do this is as follows but I would like to know if there is any builtin feature in Bash.
while ps -p `cat $PID_FILE` > /dev/null; do sleep 1; done
To wait for any process to finish
Linux (doesn't work on Alpine, where ash doesn't support tail --pid):
tail --pid=$pid -f /dev/null
Darwin (requires that $pid has open files):
lsof -p $pid +r 1 &>/dev/null
With timeout (seconds)
Linux:
timeout $timeout tail --pid=$pid -f /dev/null
Darwin (requires that $pid has open files):
lsof -p $pid +r 1m%s -t | grep -qm1 $(date -v+${timeout}S +%s 2>/dev/null || echo INF)
There's no builtin. Use kill -0 in a loop for a workable solution:
anywait(){
for pid in "$#"; do
while kill -0 "$pid"; do
sleep 0.5
done
done
}
Or as a simpler oneliner for easy one time usage:
while kill -0 PIDS 2> /dev/null; do sleep 1; done;
As noted by several commentators, if you want to wait for processes that you do not have the privilege to send signals to, you have find some other way to detect if the process is running to replace the kill -0 $pid call. On Linux, test -d "/proc/$pid" works, on other systems you might have to use pgrep (if available) or something like ps | grep "^$pid ".
I found "kill -0" does not work if the process is owned by root (or other), so I used pgrep and came up with:
while pgrep -u root process_name > /dev/null; do sleep 1; done
This would have the disadvantage of probably matching zombie processes.
This bash script loop ends if the process does not exist, or it's a zombie.
PID=<pid to watch>
while s=`ps -p $PID -o s=` && [[ "$s" && "$s" != 'Z' ]]; do
sleep 1
done
EDIT: The above script was given below by Rockallite. Thanks!
My orignal answer below works for Linux, relying on procfs i.e. /proc/. I don't know its portability:
while [[ ( -d /proc/$PID ) && ( -z `grep zombie /proc/$PID/status` ) ]]; do
sleep 1
done
It's not limited to shell, but OS's themselves do not have system calls to watch non-child process termination.
FreeBSD and Solaris have this handy pwait(1) utility, which does exactly, what you want.
I believe, other modern OSes also have the necessary system calls too (MacOS, for example, implements BSD's kqueue), but not all make it available from command-line.
From the bash manpage
wait [n ...]
Wait for each specified process and return its termination status
Each n may be a process ID or a job specification; if a
job spec is given, all processes in that job's pipeline are
waited for. If n is not given, all currently active child processes
are waited for, and the return status is zero. If n
specifies a non-existent process or job, the return status is
127. Otherwise, the return status is the exit status of the
last process or job waited for.
Okay, so it seems the answer is -- no, there is no built in tool.
After setting /proc/sys/kernel/yama/ptrace_scope to 0, it is possible to use the strace program. Further switches can be used to make it silent, so that it really waits passively:
strace -qqe '' -p <PID>
All these solutions are tested in Ubuntu 14.04:
Solution 1 (by using ps command):
Just to add up to Pierz answer, I would suggest:
while ps axg | grep -vw grep | grep -w process_name > /dev/null; do sleep 1; done
In this case, grep -vw grep ensures that grep matches only process_name and not grep itself. It has the advantage of supporting the cases where the process_name is not at the end of a line at ps axg.
Solution 2 (by using top command and process name):
while [[ $(awk '$12=="process_name" {print $0}' <(top -n 1 -b)) ]]; do sleep 1; done
Replace process_name with the process name that appears in top -n 1 -b. Please keep the quotation marks.
To see the list of processes that you wait for them to be finished, you can run:
while : ; do p=$(awk '$12=="process_name" {print $0}' <(top -n 1 -b)); [[ $b ]] || break; echo $p; sleep 1; done
Solution 3 (by using top command and process ID):
while [[ $(awk '$1=="process_id" {print $0}' <(top -n 1 -b)) ]]; do sleep 1; done
Replace process_id with the process ID of your program.
Blocking solution
Use the wait in a loop, for waiting for terminate all processes:
function anywait()
{
for pid in "$#"
do
wait $pid
echo "Process $pid terminated"
done
echo 'All processes terminated'
}
This function will exits immediately, when all processes was terminated. This is the most efficient solution.
Non-blocking solution
Use the kill -0 in a loop, for waiting for terminate all processes + do anything between checks:
function anywait_w_status()
{
for pid in "$#"
do
while kill -0 "$pid"
do
echo "Process $pid still running..."
sleep 1
done
done
echo 'All processes terminated'
}
The reaction time decreased to sleep time, because have to prevent high CPU usage.
A realistic usage:
Waiting for terminate all processes + inform user about all running PIDs.
function anywait_w_status2()
{
while true
do
alive_pids=()
for pid in "$#"
do
kill -0 "$pid" 2>/dev/null \
&& alive_pids+="$pid "
done
if [ ${#alive_pids[#]} -eq 0 ]
then
break
fi
echo "Process(es) still running... ${alive_pids[#]}"
sleep 1
done
echo 'All processes terminated'
}
Notes
These functions getting PIDs via arguments by $# as BASH array.
Had the same issue, I solved the issue killing the process and then waiting for each process to finish using the PROC filesystem:
while [ -e /proc/${pid} ]; do sleep 0.1; done
There is no builtin feature to wait for any process to finish.
You could send kill -0 to any PID found, so you don't get puzzled by zombies and stuff that will still be visible in ps (while still retrieving the PID list using ps).
If you need to both kill a process and wait for it finish, this can be achieved with killall(1) (based on process names), and start-stop-daemon(8) (based on a pidfile).
To kill all processes matching someproc and wait for them to die:
killall someproc --wait # wait forever until matching processes die
timeout 10s killall someproc --wait # timeout after 10 seconds
(Unfortunately, there's no direct equivalent of --wait with kill for a specific pid).
To kill a process based on a pidfile /var/run/someproc.pid using signal SIGINT, while waiting for it to finish, with SIGKILL being sent after 20 seconds of timeout, use:
start-stop-daemon --stop --signal INT --retry 20 --pidfile /var/run/someproc.pid
Use inotifywait to monitor some file that gets closed, when your process terminates. Example (on Linux):
yourproc >logfile.log & disown
inotifywait -q -e close logfile.log
-e specifies the event to wait for, -q means minimal output only on termination. In this case it will be:
logfile.log CLOSE_WRITE,CLOSE
A single wait command can be used to wait for multiple processes:
yourproc1 >logfile1.log & disown
yourproc2 >logfile2.log & disown
yourproc3 >logfile3.log & disown
inotifywait -q -e close logfile1.log logfile2.log logfile3.log
The output string of inotifywait will tell you, which process terminated. This only works with 'real' files, not with something in /proc/
Rauno Palosaari's solution for Timeout in Seconds Darwin, is an excellent workaround for a UNIX-like OS that does not have GNU tail (it is not specific to Darwin). But, depending on the age of the UNIX-like operating system, the command-line offered is more complex than necessary, and can fail:
lsof -p $pid +r 1m%s -t | grep -qm1 $(date -v+${timeout}S +%s 2>/dev/null || echo INF)
On at least one old UNIX, the lsof argument +r 1m%s fails (even for a superuser):
lsof: can't read kernel name list.
The m%s is an output format specification. A simpler post-processor does not require it. For example, the following command waits on PID 5959 for up to five seconds:
lsof -p 5959 +r 1 | awk '/^=/ { if (T++ >= 5) { exit 1 } }'
In this example, if PID 5959 exits of its own accord before the five seconds elapses, ${?} is 0. If not ${?} returns 1 after five seconds.
It may be worth expressly noting that in +r 1, the 1 is the poll interval (in seconds), so it may be changed to suit the situation.
On a system like OSX you might not have pgrep so you can try this appraoch, when looking for processes by name:
while ps axg | grep process_name$ > /dev/null; do sleep 1; done
The $ symbol at the end of the process name ensures that grep matches only process_name to the end of line in the ps output and not itself.

Shell scripts and how to avoid running the same script at the same time on a Linux machine

I have Linux centralize server – Linux 5.X.
In some cases on my Linux server the get_hosts.ksh script could be run from some other different hosts.
For example get_hosts.ksh could run on my Linux machine three or more times at the same time.
My question:
How to avoid running multiple instances of process/script?
A common solution for your problem on *nix systems is to check for a lock file existence.
Usually lock file contains current process PID.
This is an example ksh script:
#!/bin/ksh
pid="/var/run/get_hosts.pid"
trap "rm -f $pid" SIGSEGV
trap "rm -f $pid" SIGINT
if [ -e $pid ]; then
exit # pid file exists, another instance is running, so now we politely exit
else
echo $$ > $pid # pid file doesn't exit, create one and go on
fi
# your normal workflow here...
rm -f $pid # remove pid file just before exiting
exit
UPDATE: Answering to OP comment, I add handling program interruptions and segfaults with trap command.
The normal way of doing this is to write the process id into a file. The first thing the script does is check for the existence of the file, read the pid, check if a process with that pid exists, and for extra paranoia points, if that process actually runs the script. If yes, the script exits.
Here's a simple example. The process in question is a binary, and this script makes sure the binary runs only once. This is not exactly what you need, but you should be able to adapt this:
RUNNING=0
PIDFILE=$PATH_TO/var/run/example.pid
if [ -f $PIDFILE ]
then
PID=`cat $PIDFILE`
ps -eo pid | grep $PID >/dev/null 2>&1
if [ $? -eq 0 ]
then
RUNNING=1
fi
fi
if [ $RUNNING -ne 1 ]
then
run_binary
PID=$!
echo $PID > $PIDFILE
fi
This is not very elaborate but should get you on the right track.
You can use a pid file to keep track of when the process is running. At the top of the script, check for the existence of the pid file and if it doesn't exist, create it and run the script, otherwise return.
Some sample code can be seen in this answer to a similar question.
You might consider using the (optional) lockfile(1) command (provided by procmail package on Debian).
I have a lot of scripts, and using this below code for prevent multiple/simulate run:
PID="/var/scripts/PID.txt" # Temp file
if [ ! -f "$PID" ]; then
echo $$ > "$PID" # Print actual PID into a file
else
ps -p $(cat "$PID") > /dev/null && exit || echo $$ > "$PID"
fi
Building on wallenborn's answer I also added a "staleness" check just in case the PID lock file is beyond a certain expected age in seconds.
# prevent simultaneous executions within an hourish
pid_file="$HOME/.harness.pid"
max_stale_seconds=3600
if [ -f $pid_file ]; then
pid="$(cat "$pid_file")"
let age_in_seconds="$(date +%s) - $(date -r "$pid_file" +%s)"
if ps $pid >/dev/null && [ $age_in_seconds -lt $max_stale_seconds ]; then
exit 1
fi
fi
echo $$>"$pid_file"
trap "rm -f \"$pid_file\"" SIGSEGV
trap "rm -f \"$pid_file\"" SIGINT
This could be made "smarter" to kill off the other executions should the PID be valid but this would be dangerous. Consider a sudden power failure and reset situation where the PID file contains a number that may now reference a completely different process.

Wait for arbitrary process and get its exit code in Linux

Is there a way to wait until a process finishes if I'm not the one who started it?
e.g. if I ran "ps -ef" and pick any PID (assuming I have rights to access process information) - is there a way I can wait until the PID completes and get its exit code?
You could use strace, which tracks signals and system calls. The following command waits until a program is done, then prints its exit code:
$ strace -e none -e exit_group -p $PID # process calls exit(1)
Process 23541 attached - interrupt to quit
exit_group(1) = ?
Process 23541 detached
$ strace -e none -e exit_group -p $PID # ^C at the keyboard
Process 22979 attached - interrupt to quit
--- SIGINT (Interrupt) # 0 (0) ---
Process 22979 detached
$ strace -e none -e exit_group -p $PID # kill -9 $PID
Process 22983 attached - interrupt to quit
+++ killed by SIGKILL +++
Signals from ^Z, fg and kill -USR1 get printed too. Either way, you'll need to use sed if you want to use the exit code in a shell script.
If that's too much shell code, you can use a program I hacked together in C a while back. It uses ptrace() to catch signals and exit codes of pids. (It has rough edges and may not work in all situations.)
I hope that helps!
is there a way I can wait until the PID completes and get its exit code
Yes, if the process is not being ptraced by somebody else, you can PTRACE_ATTACH to it, and get notified about various events (e.g. signals received), and about its exit.
Beware, this is quite complicated to handle properly.
If you can live without the exit code:
tail --pid=$pid -f /dev/null
If you know the process ID you can make use of the wait command which is a bash builtin:
wait PID
You can get the PID of the last command run in bash using $!. Or, you can grep for it with from the output of ps.
In fact, the wait command is a useful way to run parralel command in bash. Here's an example:
# Start the processes in parallel...
./script1.sh 1>/dev/null 2>&1 &
pid1=$!
./script2.sh 1>/dev/null 2>&1 &
pid2=$!
./script3.sh 1>/dev/null 2>&1 &
pid3=$!
./script4.sh 1>/dev/null 2>&1 &
pid4=$!
# Wait for processes to finish...
echo -ne "Commands sent... "
wait $pid1
err1=$?
wait $pid2
err2=$?
wait $pid3
err3=$?
wait $pid4
err4=$?
# Do something useful with the return codes...
if [ $err1 -eq 0 -a $err2 -eq 0 -a $err3 -eq 0 -a $err4 -eq 0 ]
then
echo "pass"
else
echo "fail"
fi

wait child process but get error: 'pid is not a child of this shell'

I write a script to get data from HDFS parrallel,then I wait these child processes in a for loop, but sometimes it returns "pid is not a child of this shell". sometimes, it works well。It's so puzzled. I use "jobs -l" to show all the jobs run in the background. I am sure these pid is the child process of the shell process, and I use "ps aux" to make sure these pids is note assign to other process. Here is my script.
PID=()
FILE=()
let serial=0
while read index_tar
do
echo $index_tar | grep index > /dev/null 2>&1
if [[ $? -ne 0 ]]
then
continue
fi
suffix=`printf '%03d' $serial`
mkdir input/output_$suffix
$HADOOP_HOME/bin/hadoop fs -cat $index_tar | tar zxf - -C input/output_$suffix \
&& mv input/output_$suffix/index_* input/output_$suffix/index &
PID[$serial]=$!
FILE[$serial]=$index_tar
let serial++
done < file.list
for((i=0;i<$serial;i++))
do
wait ${PID[$i]}
if [[ $? -ne 0 ]]
then
LOG "get ${FILE[$i]} failed, PID:${PID[$i]}"
exit -1
else
LOG "get ${FILE[$i]} success, PID:${PID[$i]}"
fi
done
Just find the process id of the process you want to wait for and replace that with 12345 in below script. Further changes can be made as per your requirement.
#!/bin/sh
PID=12345
while [ -e /proc/$PID ]
do
echo "Process: $PID is still running" >> /home/parv/waitAndRun.log
sleep .6
done
echo "Process $PID has finished" >> /home/parv/waitAndRun.log
/usr/bin/waitingScript.sh
http://iamparv.blogspot.in/2013/10/unix-wait-for-running-process-not-child.html
Either your while loop or the for loop runs in a subshell, which is why you cannot await a child of the (parent, outer) shell.
Edit this might happen if the while loop or for loop is actually
(a) in a {...} block
(b) participating in a piper (e.g. for....done|somepipe)
If you're running this in a container of some sort, the condition apparently can be caused by a bug in bash that is easier to encounter in a containerized envrionment.
From my reading of the bash source (specifically see comments around RECYCLES_PIDS and CHILD_MAX in bash-4.2/jobs.c), it looks like in their effort to optimize their tracking of background jobs, they leave themselves vulnerable to PID aliasing (where a new process might obscure the status of an old one); to mitigate that, they prune their background process history (apparently as mandated by POSIX?). If you should happen to want to wait on a pruned process, the shell can't find it in the history and assumes this to mean that it never knew about it (i.e., that it "is not a child of this shell").

Resources