Wait for arbitrary process and get its exit code in Linux

Wait for arbitrary process and get its exit code in Linux - linux

Is there a way to wait until a process finishes if I'm not the one who started it?
e.g. if I ran "ps -ef" and pick any PID (assuming I have rights to access process information) - is there a way I can wait until the PID completes and get its exit code?

You could use strace, which tracks signals and system calls. The following command waits until a program is done, then prints its exit code:
$ strace -e none -e exit_group -p $PID # process calls exit(1)
Process 23541 attached - interrupt to quit
exit_group(1) = ?
Process 23541 detached
$ strace -e none -e exit_group -p $PID # ^C at the keyboard
Process 22979 attached - interrupt to quit
--- SIGINT (Interrupt) # 0 (0) ---
Process 22979 detached
$ strace -e none -e exit_group -p $PID # kill -9 $PID
Process 22983 attached - interrupt to quit
+++ killed by SIGKILL +++
Signals from ^Z, fg and kill -USR1 get printed too. Either way, you'll need to use sed if you want to use the exit code in a shell script.
If that's too much shell code, you can use a program I hacked together in C a while back. It uses ptrace() to catch signals and exit codes of pids. (It has rough edges and may not work in all situations.)
I hope that helps!

is there a way I can wait until the PID completes and get its exit code
Yes, if the process is not being ptraced by somebody else, you can PTRACE_ATTACH to it, and get notified about various events (e.g. signals received), and about its exit.
Beware, this is quite complicated to handle properly.

If you can live without the exit code:
tail --pid=$pid -f /dev/null

If you know the process ID you can make use of the wait command which is a bash builtin:
wait PID
You can get the PID of the last command run in bash using $!. Or, you can grep for it with from the output of ps.
In fact, the wait command is a useful way to run parralel command in bash. Here's an example:
# Start the processes in parallel...
./script1.sh 1>/dev/null 2>&1 &
pid1=$!
./script2.sh 1>/dev/null 2>&1 &
pid2=$!
./script3.sh 1>/dev/null 2>&1 &
pid3=$!
./script4.sh 1>/dev/null 2>&1 &
pid4=$!
# Wait for processes to finish...
echo -ne "Commands sent... "
wait $pid1
err1=$?
wait $pid2
err2=$?
wait $pid3
err3=$?
wait $pid4
err4=$?
# Do something useful with the return codes...
if [ $err1 -eq 0 -a $err2 -eq 0 -a $err3 -eq 0 -a $err4 -eq 0 ]
then
echo "pass"
else
echo "fail"
fi

Related

Wait for a program (non-child) to finish and execute a command

I have a program running on a remote computer which shouldn't be stopped. I need to track when this program is stopped and immediately execute a command. PID is known. How can I do that?

You cannot wait for non-child processes.
Probably the most efficient way in a shell would be to poll using the exit code of kill -0 <pid> to check if the process still exists:
while kill -0 $PID 2>/dev/null; do sleep 1; done
This is both simpler and more efficient than any approaches involving ps and grep. However, it only works if your user has permission to send signals to that process.

Code like this can do the work (to be run on remote computer)
while true
do
if [ "$(ps -efl|grep $PIDN|grep -v grep|wc -l)" -lt 1 ]
then <exec code>; break
fi
sleep 5
done
It expect the variable PIDN to contain the PID.
P.S. I know the code is ugly and power hungry
EDIT: it is possible to use -p in ps to avoid one grep
while true
do
if [ "$(ps -p $PIDN|wc -l)" -lt 2 ]
then <exec code>; break
fi
sleep 5
done

Here's a fairly simple way to wait for a process to terminate using the ps -p PID strategy:
if ps -p "$PID" >/dev/null 2>&1; then
echo "Process $PID is running ..."
while ps -p "$PID" >/dev/null 2>&1; do
sleep 5
done
echo "Process $PID is not running anymore."
fi
Checking for a process by PID
In general, to check for process ownership or permission to kill (send signals to) a proccess, you can use a combination of ps -p PID and kill -0:
if ps -p "$PID" >/dev/null 2>&1; then
echo "Process $PID exists!"
if kill -0 "$PID" >/dev/null 2>&1; then
echo "You can send signals to process $PID, e.g. with 'kill $PID'"
else
echo "You do not have permission to send signals to process $PID"
fi
else
echo "Process $PID does not exist."
fi

You can use exitsnoop to achieve this.
The bcc toolkit implements many excellent monitoring capabilities based on eBPF. Among them, exitsnoop traces process termination, showing the command name and reason for termination,
either an exit or a fatal signal.
It catches processes of all users, processes in containers, as well as processes that
become zombie.
This works by tracing the kernel sched_process_exit() function using dynamic tracing, and
will need updating to match any changes to this function.
Since this uses BPF, only the root user can use this tool.
exitsnoop examples:
Trace all process termination
# exitsnoop
Trace all process termination, and include timestamps:
# exitsnoop -t
Exclude successful exits, only include non-zero exit codes and fatal signals:
# exitsnoop -x
Trace PID 181 only:
# exitsnoop -p 181
Label each output line with 'EXIT':
# exitsnoop --label EXIT
You can get more information about this tool from the link below：
Github repo: tools/exitsnoop: Trace process termination (exit and fatal signals). Examples.
Linux Extended BPF (eBPF) Tracing Tools
ubuntu manpages: exitsnoop-bpfcc
Another option
use this project:
https://github.com/stormc/waitforpid

How to wait on a backgrounded sub-process with `wait` command [duplicate]

Is there any builtin feature in Bash to wait for a process to finish?
The wait command only allows one to wait for child processes to finish.
I would like to know if there is any way to wait for any process to finish before proceeding in any script.
A mechanical way to do this is as follows but I would like to know if there is any builtin feature in Bash.
while ps -p `cat $PID_FILE` > /dev/null; do sleep 1; done

To wait for any process to finish
Linux (doesn't work on Alpine, where ash doesn't support tail --pid):
tail --pid=$pid -f /dev/null
Darwin (requires that $pid has open files):
lsof -p $pid +r 1 &>/dev/null
With timeout (seconds)
Linux:
timeout $timeout tail --pid=$pid -f /dev/null
Darwin (requires that $pid has open files):
lsof -p $pid +r 1m%s -t | grep -qm1 $(date -v+${timeout}S +%s 2>/dev/null || echo INF)

There's no builtin. Use kill -0 in a loop for a workable solution:
anywait(){
for pid in "$#"; do
while kill -0 "$pid"; do
sleep 0.5
done
done
}
Or as a simpler oneliner for easy one time usage:
while kill -0 PIDS 2> /dev/null; do sleep 1; done;
As noted by several commentators, if you want to wait for processes that you do not have the privilege to send signals to, you have find some other way to detect if the process is running to replace the kill -0 $pid call. On Linux, test -d "/proc/$pid" works, on other systems you might have to use pgrep (if available) or something like ps | grep "^$pid ".

I found "kill -0" does not work if the process is owned by root (or other), so I used pgrep and came up with:
while pgrep -u root process_name > /dev/null; do sleep 1; done
This would have the disadvantage of probably matching zombie processes.

This bash script loop ends if the process does not exist, or it's a zombie.
PID=<pid to watch>
while s=`ps -p $PID -o s=` && [[ "$s" && "$s" != 'Z' ]]; do
sleep 1
done
EDIT: The above script was given below by Rockallite. Thanks!
My orignal answer below works for Linux, relying on procfs i.e. /proc/. I don't know its portability:
while [[ ( -d /proc/$PID ) && ( -z `grep zombie /proc/$PID/status` ) ]]; do
sleep 1
done
It's not limited to shell, but OS's themselves do not have system calls to watch non-child process termination.

FreeBSD and Solaris have this handy pwait(1) utility, which does exactly, what you want.
I believe, other modern OSes also have the necessary system calls too (MacOS, for example, implements BSD's kqueue), but not all make it available from command-line.

From the bash manpage
wait [n ...]
Wait for each specified process and return its termination status
Each n may be a process ID or a job specification; if a
job spec is given, all processes in that job's pipeline are
waited for. If n is not given, all currently active child processes
are waited for, and the return status is zero. If n
specifies a non-existent process or job, the return status is
127. Otherwise, the return status is the exit status of the
last process or job waited for.

Okay, so it seems the answer is -- no, there is no built in tool.
After setting /proc/sys/kernel/yama/ptrace_scope to 0, it is possible to use the strace program. Further switches can be used to make it silent, so that it really waits passively:
strace -qqe '' -p <PID>

All these solutions are tested in Ubuntu 14.04:
Solution 1 (by using ps command):
Just to add up to Pierz answer, I would suggest:
while ps axg | grep -vw grep | grep -w process_name > /dev/null; do sleep 1; done
In this case, grep -vw grep ensures that grep matches only process_name and not grep itself. It has the advantage of supporting the cases where the process_name is not at the end of a line at ps axg.
Solution 2 (by using top command and process name):
while [[ $(awk '$12=="process_name" {print $0}' <(top -n 1 -b)) ]]; do sleep 1; done
Replace process_name with the process name that appears in top -n 1 -b. Please keep the quotation marks.
To see the list of processes that you wait for them to be finished, you can run:
while : ; do p=$(awk '$12=="process_name" {print $0}' <(top -n 1 -b)); [[ $b ]] || break; echo $p; sleep 1; done
Solution 3 (by using top command and process ID):
while [[ $(awk '$1=="process_id" {print $0}' <(top -n 1 -b)) ]]; do sleep 1; done
Replace process_id with the process ID of your program.

Blocking solution
Use the wait in a loop, for waiting for terminate all processes:
function anywait()
{
for pid in "$#"
do
wait $pid
echo "Process $pid terminated"
done
echo 'All processes terminated'
}
This function will exits immediately, when all processes was terminated. This is the most efficient solution.
Non-blocking solution
Use the kill -0 in a loop, for waiting for terminate all processes + do anything between checks:
function anywait_w_status()
{
for pid in "$#"
do
while kill -0 "$pid"
do
echo "Process $pid still running..."
sleep 1
done
done
echo 'All processes terminated'
}
The reaction time decreased to sleep time, because have to prevent high CPU usage.
A realistic usage:
Waiting for terminate all processes + inform user about all running PIDs.
function anywait_w_status2()
{
while true
do
alive_pids=()
for pid in "$#"
do
kill -0 "$pid" 2>/dev/null \
&& alive_pids+="$pid "
done
if [ ${#alive_pids[#]} -eq 0 ]
then
break
fi
echo "Process(es) still running... ${alive_pids[#]}"
sleep 1
done
echo 'All processes terminated'
}
Notes
These functions getting PIDs via arguments by $# as BASH array.

Had the same issue, I solved the issue killing the process and then waiting for each process to finish using the PROC filesystem:
while [ -e /proc/${pid} ]; do sleep 0.1; done

There is no builtin feature to wait for any process to finish.
You could send kill -0 to any PID found, so you don't get puzzled by zombies and stuff that will still be visible in ps (while still retrieving the PID list using ps).

If you need to both kill a process and wait for it finish, this can be achieved with killall(1) (based on process names), and start-stop-daemon(8) (based on a pidfile).
To kill all processes matching someproc and wait for them to die:
killall someproc --wait # wait forever until matching processes die
timeout 10s killall someproc --wait # timeout after 10 seconds
(Unfortunately, there's no direct equivalent of --wait with kill for a specific pid).
To kill a process based on a pidfile /var/run/someproc.pid using signal SIGINT, while waiting for it to finish, with SIGKILL being sent after 20 seconds of timeout, use:
start-stop-daemon --stop --signal INT --retry 20 --pidfile /var/run/someproc.pid

Use inotifywait to monitor some file that gets closed, when your process terminates. Example (on Linux):
yourproc >logfile.log & disown
inotifywait -q -e close logfile.log
-e specifies the event to wait for, -q means minimal output only on termination. In this case it will be:
logfile.log CLOSE_WRITE,CLOSE
A single wait command can be used to wait for multiple processes:
yourproc1 >logfile1.log & disown
yourproc2 >logfile2.log & disown
yourproc3 >logfile3.log & disown
inotifywait -q -e close logfile1.log logfile2.log logfile3.log
The output string of inotifywait will tell you, which process terminated. This only works with 'real' files, not with something in /proc/

Rauno Palosaari's solution for Timeout in Seconds Darwin, is an excellent workaround for a UNIX-like OS that does not have GNU tail (it is not specific to Darwin). But, depending on the age of the UNIX-like operating system, the command-line offered is more complex than necessary, and can fail:
lsof -p $pid +r 1m%s -t | grep -qm1 $(date -v+${timeout}S +%s 2>/dev/null || echo INF)
On at least one old UNIX, the lsof argument +r 1m%s fails (even for a superuser):
lsof: can't read kernel name list.
The m%s is an output format specification. A simpler post-processor does not require it. For example, the following command waits on PID 5959 for up to five seconds:
lsof -p 5959 +r 1 | awk '/^=/ { if (T++ >= 5) { exit 1 } }'
In this example, if PID 5959 exits of its own accord before the five seconds elapses, ${?} is 0. If not ${?} returns 1 after five seconds.
It may be worth expressly noting that in +r 1, the 1 is the poll interval (in seconds), so it may be changed to suit the situation.

On a system like OSX you might not have pgrep so you can try this appraoch, when looking for processes by name:
while ps axg | grep process_name$ > /dev/null; do sleep 1; done
The $ symbol at the end of the process name ensures that grep matches only process_name to the end of line in the ps output and not itself.

How is wait behaving in this script?

I have this:
#!/bin/bash
trap 'echo $? $?' SIGINT
for i in `seq 10`; do
echo hello from for
sleep 10
done &
bgproc=$!
echo bgproc is $bgproc
ps -o pid,ppid,cmd
echo "waiting now"
wait $bgproc
I do
kill -2 <pid>
and get
0 0
as o/p
Question:
When I send SIGINT to this script.
Why does it terminate ? I know its because of the wait statement at the end. But whats happening there ?

From the Bash Reference Manual:
When Bash is waiting for an asynchronous command via the wait
builtin, the reception of a signal for which a trap has been set will
cause the wait builtin to return immediately with an exit status
greater than 128, immediately after which the trap is executed.

bash: silently kill background function process

shell gurus,
I have a bash shell script, in which I launch a background function, say foo(), to display a progress bar for a boring and long command:
foo()
{
while [ 1 ]
do
#massively cool progress bar display code
sleep 1
done
}
foo &
foo_pid=$!
boring_and_long_command
kill $foo_pid >/dev/null 2>&1
sleep 10
now, when foo dies, I see the following text:
/home/user/script: line XXX: 30290 Killed foo
This totally destroys the awesomeness of my, otherwise massively cool, progress bar display.
How do I get rid of this message?

kill $foo_pid
wait $foo_pid 2>/dev/null
BTW, I don't know about your massively cool progress bar, but have you seen Pipe Viewer (pv)? http://www.ivarch.com/programs/pv.shtml

Just came across this myself, and realised "disown" is what we are looking for.
foo &
foo_pid=$!
disown
boring_and_long_command
kill $foo_pid
sleep 10
The death message is being printed because the process is still in the shells list of watched "jobs". The disown command will remove the most recently spawned process from this list so that no debug message will be generated when it is killed, even with SIGKILL (-9).

Try to replace your line kill $foo_pid >/dev/null 2>&1 with the line:
(kill $foo_pid 2>&1) >/dev/null
Update:
This answer is not correct for the reason explained by #mklement0 in his comment:
The reason this answer isn't effective with background jobs is that
Bash itself asynchronously, after the kill command has completed,
outputs a status message about the killed job, which you cannot
suppress directly - unless you use wait, as in the accepted answer.

This "hack" seems to work:
# Some trickery to hide killed message
exec 3>&2 # 3 is now a copy of 2
exec 2> /dev/null # 2 now points to /dev/null
kill $foo_pid >/dev/null 2>&1
sleep 1 # sleep to wait for process to die
exec 2>&3 # restore stderr to saved
exec 3>&- # close saved version
and it was inspired from here. World order has been restored.

This is a solution I came up with for a similar problem (wanted to display a timestamp during long running processes). This implements a killsub function that allows you to kill any subshell quietly as long as you know the pid. Note, that the trap instructions are important to include: in case the script is interrupted, the subshell will not continue to run.
foo()
{
while [ 1 ]
do
#massively cool progress bar display code
sleep 1
done
}
#Kills the sub process quietly
function killsub()
{
kill -9 ${1} 2>/dev/null
wait ${1} 2>/dev/null
}
foo &
foo_pid=$!
#Add a trap incase of unexpected interruptions
trap 'killsub ${foo_pid}; exit' INT TERM EXIT
boring_and_long_command
#Kill foo after finished
killsub ${foo_pid}
#Reset trap
trap - INT TERM EXIT

Add at the start of the function:
trap 'exit 0' TERM

You can use set +m before to suppress that. More information on that here

Another way to do it:
func_terminate_service(){
[[ "$(pidof ${1})" ]] && killall ${1}
sleep 2
[[ "$(pidof ${1})" ]] && kill -9 "$(pidof ${1})"
}
call it with
func_terminate_service "firefox"

Yet another way to disable job notifications is to put your command to be backgrounded in a sh -c 'cmd &' construct.
#!/bin/bash
foo()
{
while [ 1 ]
do
sleep 1
done
}
#foo &
#foo_pid=$!
export -f foo
foo_pid=`sh -c 'foo & echo ${!}' | head -1`
# if shell does not support exporting functions (export -f foo)
#arg1='foo() { while [ 1 ]; do sleep 1; done; }'
#foo_pid=`sh -c 'eval "$1"; foo & echo ${!}' _ "$arg1" | head -1`
sleep 3
echo kill ${foo_pid}
kill ${foo_pid}
sleep 3
exit

The error message should come from the default signal handler which dump the signal source in the script. I met the similar errors only on bash 3.x and 4.x. To always quietly kill the child process everywhere(tested on bash 3/4/5, dash, ash, zsh), we could trap the TERM signal at the very first of child process:
#!/bin/sh
## assume script name is test.sh
foo() {
trap 'exit 0' TERM ## here is the key
while true; do sleep 1; done
}
echo before child
ps aux | grep 'test\.s[h]\|slee[p]'
foo &
foo_pid=$!
sleep 1 # wait trap is done
echo before kill
ps aux | grep 'test\.s[h]\|slee[p]'
kill $foo_pid
sleep 1 # wait kill is done
echo after kill
ps aux | grep 'test\.s[h]\|slee[p]'

Get exit code of a background process

I have a command CMD called from my main bourne shell script that takes forever.
I want to modify the script as follows:
Run the command CMD in parallel as a background process (CMD &).
In the main script, have a loop to monitor the spawned command every few seconds. The loop also echoes some messages to stdout indicating progress of the script.
Exit the loop when the spawned command terminates.
Capture and report the exit code of the spawned process.
Can someone give me pointers to accomplish this?

1: In bash, $! holds the PID of the last background process that was executed. That will tell you what process to monitor, anyway.
4: wait <n> waits until the process with PID <n> is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.
2, 3: ps or ps | grep " $! " can tell you whether the process is still running. It is up to you how to understand the output and decide how close it is to finishing. (ps | grep isn't idiot-proof. If you have time you can come up with a more robust way to tell whether the process is still running).
Here's a skeleton script:
# simulate a long process that will have an identifiable exit code
(sleep 15 ; /bin/false) &
my_pid=$!
while ps | grep " $my_pid " # might also need | grep -v grep here
do
echo $my_pid is still in the ps output. Must still be running.
sleep 3
done
echo Oh, it looks like the process is done.
wait $my_pid
# The variable $? always holds the exit code of the last command to finish.
# Here it holds the exit code of $my_pid, since wait exits with that code.
my_status=$?
echo The exit status of the process was $my_status

This is how I solved it when I had a similar need:
# Some function that takes a long time to process
longprocess() {
# Sleep up to 14 seconds
sleep $((RANDOM % 15))
# Randomly exit with 0 or 1
exit $((RANDOM % 2))
}
pids=""
# Run five concurrent processes
for i in {1..5}; do
( longprocess ) &
# store PID of process
pids+=" $!"
done
# Wait for all processes to finish, will take max 14s
# as it waits in order of launch, not order of finishing
for p in $pids; do
if wait $p; then
echo "Process $p success"
else
echo "Process $p fail"
fi
done

The pid of a backgrounded child process is stored in $!.
You can store all child processes' pids into an array, e.g. PIDS[].
wait [-n] [jobspec or pid …]
Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for. If a job spec is given, all processes in the job are waited for. If no arguments are given, all currently active child processes are waited for, and the return status is zero. If the -n option is supplied, wait waits for any job to terminate and returns its exit status. If neither jobspec nor pid specifies an active child process of the shell, the return status is 127.
Use wait command you can wait for all child processes finish, meanwhile you can get exit status of each child processes via $? and store status into STATUS[]. Then you can do something depending by status.
I have tried the following 2 solutions and they run well. solution01 is
more concise, while solution02 is a little complicated.
solution01
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
PIDS+=($!)
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS+=($?)
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
solution02
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
i=0
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
pid=$!
PIDS[$i]=${pid}
((i+=1))
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
i=0
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS[$i]=$?
((i+=1))
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done

As I see almost all answers use external utilities (mostly ps) to poll the state of the background process. There is a more unixesh solution, catching the SIGCHLD signal. In the signal handler it has to be checked which child process was stopped. It can be done by kill -0 <PID> built-in (universal) or checking the existence of /proc/<PID> directory (Linux specific) or using the jobs built-in (bash specific. jobs -l also reports the pid. In this case the 3rd field of the output can be Stopped|Running|Done|Exit . ).
Here is my example.
The launched process is called loop.sh. It accepts -x or a number as an argument. For -x is exits with exit code 1. For a number it waits num*5 seconds. In every 5 seconds it prints its PID.
The launcher process is called launch.sh:
#!/bin/bash
handle_chld() {
local tmp=()
for((i=0;i<${#pids[#]};++i)); do
if [ ! -d /proc/${pids[i]} ]; then
wait ${pids[i]}
echo "Stopped ${pids[i]}; exit code: $?"
else tmp+=(${pids[i]})
fi
done
pids=(${tmp[#]})
}
set -o monitor
trap "handle_chld" CHLD
# Start background processes
./loop.sh 3 &
pids+=($!)
./loop.sh 2 &
pids+=($!)
./loop.sh -x &
pids+=($!)
# Wait until all background processes are stopped
while [ ${#pids[#]} -gt 0 ]; do echo "WAITING FOR: ${pids[#]}"; sleep 2; done
echo STOPPED
For more explanation see: Starting a process from bash script failed

#/bin/bash
#pgm to monitor
tail -f /var/log/messages >> /tmp/log&
# background cmd pid
pid=$!
# loop to monitor running background cmd
while :
do
ps ax | grep $pid | grep -v grep
ret=$?
if test "$ret" != "0"
then
echo "Monitored pid ended"
break
fi
sleep 5
done
wait $pid
echo $?

I would change your approach slightly. Rather than checking every few seconds if the command is still alive and reporting a message, have another process that reports every few seconds that the command is still running and then kill that process when the command finishes. For example:
#!/bin/sh
cmd() { sleep 5; exit 24; }
cmd & # Run the long running process
pid=$! # Record the pid
# Spawn a process that coninually reports that the command is still running
while echo "$(date): $pid is still running"; do sleep 1; done &
echoer=$!
# Set a trap to kill the reporter when the process finishes
trap 'kill $echoer' 0
# Wait for the process to finish
if wait $pid; then
echo "cmd succeeded"
else
echo "cmd FAILED!! (returned $?)"
fi

Our team had the same need with a remote SSH-executed script which was timing out after 25 minutes of inactivity. Here is a solution with the monitoring loop checking the background process every second, but printing only every 10 minutes to suppress an inactivity timeout.
long_running.sh &
pid=$!
# Wait on a background job completion. Query status every 10 minutes.
declare -i elapsed=0
# `ps -p ${pid}` works on macOS and CentOS. On both OSes `ps ${pid}` works as well.
while ps -p ${pid} >/dev/null; do
sleep 1
if ((++elapsed % 600 == 0)); then
echo "Waiting for the completion of the main script. $((elapsed / 60))m and counting ..."
fi
done
# Return the exit code of the terminated background process. This works in Bash 4.4 despite what Bash docs say:
# "If neither jobspec nor pid specifies an active child process of the shell, the return status is 127."
wait ${pid}

A simple example, similar to the solutions above. This doesn't require monitoring any process output. The next example uses tail to follow output.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'sleep 30; exit 5' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh &
[1] 7454
$ pid=$!
$ wait $pid
[1]+ Exit 5 ./tmp.sh
$ echo $?
5
Use tail to follow process output and quit when the process is complete.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'i=0; while let "$i < 10"; do sleep 5; echo "$i"; let i=$i+1; done; exit 5;' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh
0
1
2
^C
$ ./tmp.sh > /tmp/tmp.log 2>&1 &
[1] 7673
$ pid=$!
$ tail -f --pid $pid /tmp/tmp.log
0
1
2
3
4
5
6
7
8
9
[1]+ Exit 5 ./tmp.sh > /tmp/tmp.log 2>&1
$ wait $pid
$ echo $?
5

Another solution is to monitor processes via the proc filesystem (safer than ps/grep combo); when you start a process it has a corresponding folder in /proc/$pid, so the solution could be
#!/bin/bash
....
doSomething &
local pid=$!
while [ -d /proc/$pid ]; do # While directory exists, the process is running
doSomethingElse
....
else # when directory is removed from /proc, process has ended
wait $pid
local exit_status=$?
done
....
Now you can use the $exit_status variable however you like.

With this method, your script doesnt have to wait for the background process, you will only have to monitor a temporary file for the exit status.
FUNCmyCmd() { sleep 3;return 6; };
export retFile=$(mktemp);
FUNCexecAndWait() { FUNCmyCmd;echo $? >$retFile; };
FUNCexecAndWait&
now, your script can do anything else while you just have to keep monitoring the contents of retFile (it can also contain any other information you want like the exit time).
PS.: btw, I coded thinking in bash

My solution was to use an anonymous pipe to pass the status to a monitoring loop. There are no temporary files used to exchange status so nothing to cleanup. If you were uncertain about the number of background jobs the break condition could be [ -z "$(jobs -p)" ].
#!/bin/bash
exec 3<> <(:)
{ sleep 15 ; echo "sleep/exit $?" >&3 ; } &
while read -u 3 -t 1 -r STAT CODE || STAT="timeout" ; do
echo "stat: ${STAT}; code: ${CODE}"
if [ "${STAT}" = "sleep/exit" ] ; then
break
fi
done

how about ...
# run your stuff
unset PID
for process in one two three four
do
( sleep $((RANDOM%20)); echo hello from process $process; exit $((RANDOM%3)); ) & 2>&1
PID+=($!)
done
# (optional) report on the status of that stuff as it exits
for pid in "${PID[#]}"
do
( wait "$pid"; echo "process $pid complemted with exit status $?") &
done
# (optional) while we wait, monitor that stuff
while ps --pid "${PID[*]}" --ppid "${PID[*]}" --format pid,ppid,command,pcpu
do
sleep 5
done | xargs -i date '+%x %X {}'
# return non-zero if any are non zero
SUCCESS=0
for pid in "${PID[#]}"
do
wait "$pid" && ((SUCCESS++)) && echo "$pid OK" || echo "$pid returned $?"
done
echo "success for $SUCCESS out of ${#PID} jobs"
exit $(( ${#PID} - SUCCESS ))

This may be extending beyond your question, however if you're concerned about the length of time processes are running for, you may be interested in checking the status of running background processes after an interval of time. It's easy enough to check which child PIDs are still running using pgrep -P $$, however I came up with the following solution to check the exit status of those PIDs that have already expired:
cmd1() { sleep 5; exit 24; }
cmd2() { sleep 10; exit 0; }
pids=()
cmd1 & pids+=("$!")
cmd2 & pids+=("$!")
lasttimeout=0
for timeout in 2 7 11; do
echo -n "interval-$timeout: "
sleep $((timeout-lasttimeout))
# you can only wait on a pid once
remainingpids=()
for pid in ${pids[*]}; do
if ! ps -p $pid >/dev/null ; then
wait $pid
echo -n "pid-$pid:exited($?); "
else
echo -n "pid-$pid:running; "
remainingpids+=("$pid")
fi
done
pids=( ${remainingpids[*]} )
lasttimeout=$timeout
echo
done
which outputs:
interval-2: pid-28083:running; pid-28084:running;
interval-7: pid-28083:exited(24); pid-28084:running;
interval-11: pid-28084:exited(0);
Note: You could change $pids to a string variable rather than array to simplify things if you like.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Wait for arbitrary process and get its exit code in Linux - linux

Is there a way to wait until a process finishes if I'm not the one who started it? e.g. if I ran "ps -ef" and pick any PID (assuming I have rights to access process information) - is there a way I can wait until the PID completes and get its exit code?

is there a way I can wait until the PID completes and get its exit code Yes, if the process is not being ptraced by somebody else, you can PTRACE_ATTACH to it, and get notified about various events (e.g. signals received), and about its exit. Beware, this is quite complicated to handle properly.

If you can live without the exit code: tail --pid=$pid -f /dev/null

Related

Wait for a program (non-child) to finish and execute a command

How to wait on a backgrounded sub-process with `wait` command [duplicate]

How is wait behaving in this script?

bash: silently kill background function process

Get exit code of a background process

Categories

Resources