Bash - get process pid and parse output line by line - linux

I am trying to write a bash script that runs a process and parses its output line by line (like here).
I would also like to get the process PID so for each line I can run ps and get CPU and memory usage (and print them with the output line).
I know I can get the PID with $! if I run the process in background, but then I won't know how to read the output.
Thanks in advance

You can create a FIFO and read that while the background process is running. For each line you read, you can do whatever you want with the child_pid.
First we need a small sample program:
bgp() { sleep 1; echo 1; sleep 1; echo 2; sleep 1; echo 3; }
Then create a fifo (maybe choose some path in /tmp)
tmp_fifo="/tmp/my_fifo"
rm "${tmp_fifo}" &>/dev/null
mkfifo "${tmp_fifo}"
Start your process in the background and redirect the output to the FIFO:
bgp > "${tmp_fifo}" &
child_pid=$!
Now read the output until the child process dies.
while true; do
if jobs %% >&/dev/null; then
if read -r -u 9 line; then
# Do whatever with $child_pid
echo -n "output from [$child_pid]: "
echo $line
fi
else
echo "info: child finished or died"
break
fi
done 9< "${tmp_fifo}"

Related

why the command 'exec' can remove the blocking state of fifo file?

I'am studying how to use multi thread to process tasks.And i noticed that the fifo file can help that.here is the effect:
#!/bin/bash
my_cmd(){
echo "process $1"
sleep 3
}
ff="d:/myfifo/$$.fifo"
mkfifo $ff
exec 7<>$ff
for i in {1..10};do echo;done >&7
for i in {1..1000};do {
read -u 7
my_cmd $i
echo >&7
}& done
rm $ff
wait
echo "end"
This shell script can run normally(process 1000 cmds, 10 at a time).And i modified this script slightly
#!/bin/bash
my_cmd(){
echo "process $1"
sleep 3
}
ff="d:/myfifo/$$.fifo"
mkfifo $ff
exec 7<>$ff
for i in {1..10};do echo;done >$ff # modified
for i in {1..1000};do {
read <$ff # modified
my_cmd $i
echo >$ff # modified
}& done
wait
rm $ff
echo "end"
As expected,the second script can also run normally.But i made a error when i modified it again.
#!/bin/bash
my_cmd(){
echo "process $1"
sleep 3
}
ff="d:/myfifo/$$.fifo"
mkfifo $ff
# exec 7<>$ff modified
for i in {1..10};do echo;done >$ff
for i in {1..1000};do {
read <$ff
my_cmd $i
echo >$ff
}& done
wait
rm $ff
echo "end"
The script wait a input to fifo file,because the fifo file entered a blocking state.It seems that this command 'exec 7<>$ff' lifted the blocking state of this fifo file.So is this the case?
On Linux, at least (Not sure about other OSes, and POSIX doesn't define a behavior), opening a fifo for both reading and writing will succeed at once without blocking waiting for the other end of the pipe to be opened.
So when you commented out the exec 7<>$ff line, the next line for i in {1..10};do echo;done >$ff will opening the fifo for writing, and block waiting for something else to open it for writing before going on. With the original version using the exec, it was already opened for reading so there was no need to block.
The Linux fifo(7) documentation does note
A process that uses both ends of the connection in order to communicate with itself should be very careful to avoid deadlocks.

Launch two processes simultaneously and collect results from the process finished earlier

Suppose I want to run two commands c1 and c2, which essentially process (but not modify) the same piece of data on Linux.
Right now I would like to launch them simultaneously, and see which one finishes quicker, once one process has finished, I will collect its output (could be dumpped into a file with c1 >> log1.txt), and terminate the other process.
Note that the processing time of two process could be largely different and hence observable, say one takes ten seconds, while the other takes 60 seconds.
=======================update
I tried the following script set but it causes infinite loop on my computer:
import os
os.system("./launch.sh")
launch.sh
#!/usr/bin/env bash
rm /tmp/smack-checker2
mkfifo /tmp/smack-checker2
setsid bash -c "./sleep60.sh ; echo 1 > /tmp/run-checker2" &
pid0=$!
setsid bash -c "./sleep10.sh ; echo 2 > /tmp/run-checker2" &
pid1=$!
read line </tmp/smack-checker2
printf "Process %d finished earlier\n" "$line"
rm /tmp/smack-checker2
eval kill -- -\$"pid$((line ^ 1))"
sleep60.sh
#!/usr/bin/env bash
sleep 60
sleep10.sh
#!/usr/bin/env bash
sleep 10
Use wait -n to wait for either process to exit. Ignoring race conditions and pid number wrapping,
c1 & P1=$!
c2 & P2=$!
wait -n # wait for either one to exit
if ! kill $P1; then
# failure to kill $P1 indicates c1 finished first
kill $P2
# collect c1 results...
else
# c2 finished first
kill $P1
# collect c2 results...
fi
See help wait or man bash for documentation.
I would run 2 processes and make them write to the shared named pipe
after they finish. Reading from a named pipe is a blocking operation
so you don't need funny sleep instructions inside a loop. It would
be:
#!/usr/bin/env bash
mkfifo /tmp/run-checker
(./sleep60.sh ; echo 0 > /tmp/run-checker) &
(./sleep10.sh ; echo 1 > /tmp/run-checker) &
read line </tmp/run-checker
printf "Process %d finished earlier\n" "$line"
rm /tmp/run-checker
kill -- -$$
sleep60.sh:
#!/usr/bin/env bash
sleep 60
sleep10.sh:
#!/usr/bin/env bash
sleep 10
EDIT:
If you're going to call the script form Python script like that:
#!/usr/bin/env python3
import os
os.system("./parallel.sh")
print("Done")
you'll get:
Process 1 finished earlier
./parallel.sh: line 11: kill: (-13807) - No such process
Done
This is because kill -- -$$ tries to send TERM signal to the process
group as specified in man 1 kill:
-n
where n is larger than 1. All processes in process group n are
signaled. When an argument of the form '-n' is given, and it
is meant to denote a process group, either a signal must be
specified first, or the argument must be preceded by a '--'
option, otherwise it will be taken as the signal to send.
It works when you run parallel.sh from the terminal because $$ is a
PID of the subshell and also of the process group. I used it because
it's very convenient to kill parallel.sh, process0 or process1 and all
their children in one shot. However, when parallel.sh is called from
Python script $$ does not longer denote process group and kill --
fails.
You could modify parallel.sh like that:
#!/usr/bin/env bash
mkfifo /tmp/run-checker
setsid bash -c "./sleep60.sh ; echo 0 > /tmp/run-checker" &
pid0=$!
setsid bash -c "./sleep10.sh ; echo 1 > /tmp/run-checker" &
pid1=$!
read line </tmp/run-checker
printf "Process %d finished earlier\n" "$line"
rm /tmp/run-checker
eval kill -- -\$"pid$((line ^ 1))"
It will now work also when called from Python script. The last line
eval kill -- -\$"pid$((line ^ 1))"
kills pid0 if pid1 finished earlier or pid0 if pid1 finished earlier
using ^ binary operator to convert 0 to 1 and vice versa. If you
don't like it you can use a bit more verbose form:
if [ "$line" -eq "$pid0" ]
then
echo kill "$pid1"
kill -- -"$pid1"
else
echo kill "$pid0"
kill -- -"$pid0"
fi
Can this snippet give you some idea?
#!/bin/sh
runproc1() {
sleep 5
touch proc1 # file created when terminated
exit
}
runproc2() {
sleep 10
touch proc2 # file created when terminated
exit
}
# remove flags
rm proc1
rm proc2
# run processes concurrently
runproc1 &
runproc2 &
# wait until one of them is finished
while [ ! -f proc1 -a ! -f proc2 ]; do
sleep 1
echo -n "."
done
The idea is to enclose two processes into two functions which, at the end, touch a file to signal that computing is terminated. The functions are executed in background, after having removed the files used as flags. The last step is to watch for either file to show up. At that point, anything can be done: continue to wait for the other process, or kill it.
Launching this precise script, it takes about 5 seconds, then terminates. I see that the file "proc1" is created, with no proc2. After a few seconds (5, to be precise), also "proc2" is created. This means that even when the script is terminated, any unfinished job keeps to run.

How to send to stdin in bash 10 seconds after starting the process?

What I want to do is:
run a process
wait 10 seconds
send a string to the stdin of the process
This should be done in a bash script.
I've tried:
./script&
pid=$!
sleep 10
echo $string > /proc/${pid}/fd/0
It does work in a shell but not when I run it in a script.
( sleep 10; echo "how you doin?" ) | ./script
Your approach might work on Linux if e.g., your scripts stdin is e.g., something like a FIFO:
myscript(){ tr a-z A-Z; }
rm -f p
mkfifo p
exec 3<>p
myscript <&3 &
pid=$!
echo :waiting
sleep 0.5
echo :writing
file /proc/$pid/fd/0
echo hi > /proc/$pid/fd/0
exec 3>&-
But this /proc/$pid/fd stuff behaves differently on different Unices.
It doesn't work for your case because your scripts stdin is a terminal.
With default terminal settings, the terminal driver will put background proccesses trying to read from it to sleep (by sending them the SIGTTIN signal) and writes to a terminal filedescriptor will just get echoed -- they won't wake up the sleeping background process that's was put to sleep trying to read from the terminal.
What about this (as OP requested it to be done in the script):
#! /bin/bash
./script&
pid=$!
sleep 10
echo $string > /proc/${pid}/fd/0
just proposing the missing element not commenting on coding style ;-)

Using named pipes with bash - Problem with data loss

Did some search online, found simple 'tutorials' to use named pipes. However when I do anything with background jobs I seem to lose a lot of data.
[[Edit: found a much simpler solution, see reply to post. So the question I put forward is now academic - in case one might want a job server]]
Using Ubuntu 10.04 with Linux 2.6.32-25-generic #45-Ubuntu SMP Sat Oct 16 19:52:42 UTC 2010 x86_64 GNU/Linux
GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu).
My bash function is:
function jqs
{
pipe=/tmp/__job_control_manager__
trap "rm -f $pipe; exit" EXIT SIGKILL
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
fi
while true
do
if read txt <"$pipe"
then
echo "$(date +'%Y'): new text is [[$txt]]"
if [[ "$txt" == 'quit' ]]
then
break
fi
fi
done
}
I run this in the background:
> jqs&
[1] 5336
And now I feed it:
for i in 1 2 3 4 5 6 7 8
do
(echo aaa$i > /tmp/__job_control_manager__ && echo success$i &)
done
The output is inconsistent.
I frequently don't get all success echoes.
I get at most as many new text echos as success echoes, sometimes less.
If I remove the '&' from the 'feed', it seems to work, but I am blocked until the output is read. Hence me wanting to let sub-processes get blocked, but not the main process.
The aim being to write a simple job control script so I can run say 10 jobs in parallel at most and queue the rest for later processing, but reliably know that they do run.
Full job manager below:
function jq_manage
{
export __gn__="$1"
pipe=/tmp/__job_control_manager_"$__gn__"__
trap "rm -f $pipe" EXIT
trap "break" SIGKILL
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
fi
while true
do
date
jobs
if (($(jobs | egrep "Running.*echo '%#_Group_#%_$__gn__'" | wc -l) < $__jN__))
then
echo "Waiting for new job"
if read new_job <"$pipe"
then
echo "new job is [[$new_job]]"
if [[ "$new_job" == 'quit' ]]
then
break
fi
echo "In group $__gn__, starting job $new_job"
eval "(echo '%#_Group_#%_$__gn__' > /dev/null; $new_job) &"
fi
else
sleep 3
fi
done
}
function jq
{
# __gn__ = first parameter to this function, the job group name (the pool within which to allocate __jN__ jobs)
# __jN__ = second parameter to this function, the maximum of job numbers to run concurrently
export __gn__="$1"
shift
export __jN__="$1"
shift
export __jq__=$(jobs | egrep "Running.*echo '%#_GroupQueue_#%_$__gn__'" | wc -l)
if (($__jq__ '<' 1))
then
eval "(echo '%#_GroupQueue_#%_$__gn__' > /dev/null; jq_manage $__gn__) &"
fi
pipe=/tmp/__job_control_manager_"$__gn__"__
echo $# >$pipe
}
Calling
jq <name> <max processes> <command>
jq abc 2 sleep 20
will start one process.
That part works fine. Start a second one, fine.
One by one by hand seem to work fine.
But starting 10 in a loop seems to lose the system, as in the simpler example above.
Any hints as to what I can do to solve this apparent loss of IPC data would be greatly appreciated.
Regards,
Alain.
Your problem is if statement below:
while true
do
if read txt <"$pipe"
....
done
What is happening is that your job queue server is opening and closing the pipe each time around the loop. This means that some of the clients are getting a "broken pipe" error when they try to write to the pipe - that is, the reader of the pipe goes away after the writer opens it.
To fix this, change your loop in the server open the pipe once for the entire loop:
while true
do
if read txt
....
done < "$pipe"
Done this way, the pipe is opened once and kept open.
You will need to be careful of what you run inside the loop, as all processing inside the loop will have stdin attached to the named pipe. You will want to make sure you redirect stdin of all your processes inside the loop from somewhere else, otherwise they may consume the data from the pipe.
Edit: With the problem now being that you are getting EOF on your reads when the last client closes the pipe, you can use jilles method of duping the file descriptors, or you can just make sure you are a client too and keep the write side of the pipe open:
while true
do
if read txt
....
done < "$pipe" 3> "$pipe"
This will hold the write side of the pipe open on fd 3. The same caveat applies with this file descriptor as with stdin. You will need to close it so any child processes dont inherit it. It probably matters less than with stdin, but it would be cleaner.
As said in other answers you need to keep the fifo open at all times to avoid losing data.
However, once all writers have left after the fifo has been open (so there was a writer), reads return immediately (and poll() returns POLLHUP). The only way to clear this state is to reopen the fifo.
POSIX does not provide a solution to this but at least Linux and FreeBSD do: if reads start failing, open the fifo again while keeping the original descriptor open. This works because in Linux and FreeBSD the "hangup" state is local to a particular open file description, while in POSIX it is global to the fifo.
This can be done in a shell script like this:
while :; do
exec 3<tmp/testfifo
exec 4<&-
while read x; do
echo "input: $x"
done <&3
exec 4<&3
exec 3<&-
done
Just for those that might be interested, [[re-edited]] following comments by camh and jilles, here are two new versions of the test server script.
Both versions now works exactly as hoped.
camh's version for pipe management:
function jqs # Job queue manager
{
pipe=/tmp/__job_control_manager__
trap "rm -f $pipe; exit" EXIT TERM
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
fi
while true
do
if read -u 3 txt
then
echo "$(date +'%Y'): new text is [[$txt]]"
if [[ "$txt" == 'quit' ]]
then
break
else
sleep 1
# process $txt - remember that if this is to be a spawned job, we should close fd 3 and 4 beforehand
fi
fi
done 3< "$pipe" 4> "$pipe" # 4 is just to keep the pipe opened so any real client does not end up causing read to return EOF
}
jille's version for pipe management:
function jqs # Job queue manager
{
pipe=/tmp/__job_control_manager__
trap "rm -f $pipe; exit" EXIT TERM
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
fi
exec 3< "$pipe"
exec 4<&-
while true
do
if read -u 3 txt
then
echo "$(date +'%Y'): new text is [[$txt]]"
if [[ "$txt" == 'quit' ]]
then
break
else
sleep 1
# process $txt - remember that if this is to be a spawned job, we should close fd 3 and 4 beforehand
fi
else
# Close the pipe and reconnect it so that the next read does not end up returning EOF
exec 4<&3
exec 3<&-
exec 3< "$pipe"
exec 4<&-
fi
done
}
Thanks to all for your help.
Like camh & Dennis Williamson say don't break the pipe.
Now I have smaller examples, direct on the command line:
Server:
(
for i in {0,1,2,3,4}{0,1,2,3,4,5,6,7,8,9};
do
if read s;
then echo ">>$i--$s//";
else
echo "<<$i";
fi;
done < tst-fifo
)&
Client:
(
for i in {%a,#b}{1,2}{0,1};
do
echo "Test-$i" > tst-fifo;
done
)&
Can replace the key line with:
(echo "Test-$i" > tst-fifo&);
All client data sent to the pipe gets read, though with option two of the client one may need to start the server a couple of times before all data is read.
But although the read waits for data in the pipe to start with, once data has been pushed, it reads the empty string forever.
Any way to stop this?
Thanks for any insights again.
On the one hand the problem is worse than I thought:
Now there seems to be a case in my more complex example (jq_manage) where the same data is being read over and over again from the pipe (even though no new data is being written to it).
On the other hand, I found a simple solution (edited following Dennis' comment):
function jqn # compute the number of jobs running in that group
{
__jqty__=$(jobs | egrep "Running.*echo '%#_Group_#%_$__groupn__'" | wc -l)
}
function jq
{
__groupn__="$1"; shift # job group name (the pool within which to allocate $__jmax__ jobs)
__jmax__="$1"; shift # maximum of job numbers to run concurrently
jqn
while (($__jqty__ '>=' $__jmax__))
do
sleep 1
jqn
done
eval "(echo '%#_Group_#%_$__groupn__' > /dev/null; $#) &"
}
Works like a charm.
No socket or pipe involved.
Simple.
run say 10 jobs in parallel at most and queue the rest for later processing, but reliably know that they do run
You can do this with GNU Parallel. You will not need a this scripting.
http://www.gnu.org/software/parallel/man.html#options
You can set max-procs "Number of jobslots. Run up to N jobs in parallel." There is an option to set the number of CPU cores you want to use. You can save the list of executed jobs to a log file, but that is a beta feature.

Get exit code of a background process

I have a command CMD called from my main bourne shell script that takes forever.
I want to modify the script as follows:
Run the command CMD in parallel as a background process (CMD &).
In the main script, have a loop to monitor the spawned command every few seconds. The loop also echoes some messages to stdout indicating progress of the script.
Exit the loop when the spawned command terminates.
Capture and report the exit code of the spawned process.
Can someone give me pointers to accomplish this?
1: In bash, $! holds the PID of the last background process that was executed. That will tell you what process to monitor, anyway.
4: wait <n> waits until the process with PID <n> is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.
2, 3: ps or ps | grep " $! " can tell you whether the process is still running. It is up to you how to understand the output and decide how close it is to finishing. (ps | grep isn't idiot-proof. If you have time you can come up with a more robust way to tell whether the process is still running).
Here's a skeleton script:
# simulate a long process that will have an identifiable exit code
(sleep 15 ; /bin/false) &
my_pid=$!
while ps | grep " $my_pid " # might also need | grep -v grep here
do
echo $my_pid is still in the ps output. Must still be running.
sleep 3
done
echo Oh, it looks like the process is done.
wait $my_pid
# The variable $? always holds the exit code of the last command to finish.
# Here it holds the exit code of $my_pid, since wait exits with that code.
my_status=$?
echo The exit status of the process was $my_status
This is how I solved it when I had a similar need:
# Some function that takes a long time to process
longprocess() {
# Sleep up to 14 seconds
sleep $((RANDOM % 15))
# Randomly exit with 0 or 1
exit $((RANDOM % 2))
}
pids=""
# Run five concurrent processes
for i in {1..5}; do
( longprocess ) &
# store PID of process
pids+=" $!"
done
# Wait for all processes to finish, will take max 14s
# as it waits in order of launch, not order of finishing
for p in $pids; do
if wait $p; then
echo "Process $p success"
else
echo "Process $p fail"
fi
done
The pid of a backgrounded child process is stored in $!.
You can store all child processes' pids into an array, e.g. PIDS[].
wait [-n] [jobspec or pid …]
Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for. If a job spec is given, all processes in the job are waited for. If no arguments are given, all currently active child processes are waited for, and the return status is zero. If the -n option is supplied, wait waits for any job to terminate and returns its exit status. If neither jobspec nor pid specifies an active child process of the shell, the return status is 127.
Use wait command you can wait for all child processes finish, meanwhile you can get exit status of each child processes via $? and store status into STATUS[]. Then you can do something depending by status.
I have tried the following 2 solutions and they run well. solution01 is
more concise, while solution02 is a little complicated.
solution01
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
PIDS+=($!)
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS+=($?)
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
solution02
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
i=0
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
pid=$!
PIDS[$i]=${pid}
((i+=1))
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
i=0
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS[$i]=$?
((i+=1))
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
As I see almost all answers use external utilities (mostly ps) to poll the state of the background process. There is a more unixesh solution, catching the SIGCHLD signal. In the signal handler it has to be checked which child process was stopped. It can be done by kill -0 <PID> built-in (universal) or checking the existence of /proc/<PID> directory (Linux specific) or using the jobs built-in (bash specific. jobs -l also reports the pid. In this case the 3rd field of the output can be Stopped|Running|Done|Exit . ).
Here is my example.
The launched process is called loop.sh. It accepts -x or a number as an argument. For -x is exits with exit code 1. For a number it waits num*5 seconds. In every 5 seconds it prints its PID.
The launcher process is called launch.sh:
#!/bin/bash
handle_chld() {
local tmp=()
for((i=0;i<${#pids[#]};++i)); do
if [ ! -d /proc/${pids[i]} ]; then
wait ${pids[i]}
echo "Stopped ${pids[i]}; exit code: $?"
else tmp+=(${pids[i]})
fi
done
pids=(${tmp[#]})
}
set -o monitor
trap "handle_chld" CHLD
# Start background processes
./loop.sh 3 &
pids+=($!)
./loop.sh 2 &
pids+=($!)
./loop.sh -x &
pids+=($!)
# Wait until all background processes are stopped
while [ ${#pids[#]} -gt 0 ]; do echo "WAITING FOR: ${pids[#]}"; sleep 2; done
echo STOPPED
For more explanation see: Starting a process from bash script failed
#/bin/bash
#pgm to monitor
tail -f /var/log/messages >> /tmp/log&
# background cmd pid
pid=$!
# loop to monitor running background cmd
while :
do
ps ax | grep $pid | grep -v grep
ret=$?
if test "$ret" != "0"
then
echo "Monitored pid ended"
break
fi
sleep 5
done
wait $pid
echo $?
I would change your approach slightly. Rather than checking every few seconds if the command is still alive and reporting a message, have another process that reports every few seconds that the command is still running and then kill that process when the command finishes. For example:
#!/bin/sh
cmd() { sleep 5; exit 24; }
cmd & # Run the long running process
pid=$! # Record the pid
# Spawn a process that coninually reports that the command is still running
while echo "$(date): $pid is still running"; do sleep 1; done &
echoer=$!
# Set a trap to kill the reporter when the process finishes
trap 'kill $echoer' 0
# Wait for the process to finish
if wait $pid; then
echo "cmd succeeded"
else
echo "cmd FAILED!! (returned $?)"
fi
Our team had the same need with a remote SSH-executed script which was timing out after 25 minutes of inactivity. Here is a solution with the monitoring loop checking the background process every second, but printing only every 10 minutes to suppress an inactivity timeout.
long_running.sh &
pid=$!
# Wait on a background job completion. Query status every 10 minutes.
declare -i elapsed=0
# `ps -p ${pid}` works on macOS and CentOS. On both OSes `ps ${pid}` works as well.
while ps -p ${pid} >/dev/null; do
sleep 1
if ((++elapsed % 600 == 0)); then
echo "Waiting for the completion of the main script. $((elapsed / 60))m and counting ..."
fi
done
# Return the exit code of the terminated background process. This works in Bash 4.4 despite what Bash docs say:
# "If neither jobspec nor pid specifies an active child process of the shell, the return status is 127."
wait ${pid}
A simple example, similar to the solutions above. This doesn't require monitoring any process output. The next example uses tail to follow output.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'sleep 30; exit 5' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh &
[1] 7454
$ pid=$!
$ wait $pid
[1]+ Exit 5 ./tmp.sh
$ echo $?
5
Use tail to follow process output and quit when the process is complete.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'i=0; while let "$i < 10"; do sleep 5; echo "$i"; let i=$i+1; done; exit 5;' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh
0
1
2
^C
$ ./tmp.sh > /tmp/tmp.log 2>&1 &
[1] 7673
$ pid=$!
$ tail -f --pid $pid /tmp/tmp.log
0
1
2
3
4
5
6
7
8
9
[1]+ Exit 5 ./tmp.sh > /tmp/tmp.log 2>&1
$ wait $pid
$ echo $?
5
Another solution is to monitor processes via the proc filesystem (safer than ps/grep combo); when you start a process it has a corresponding folder in /proc/$pid, so the solution could be
#!/bin/bash
....
doSomething &
local pid=$!
while [ -d /proc/$pid ]; do # While directory exists, the process is running
doSomethingElse
....
else # when directory is removed from /proc, process has ended
wait $pid
local exit_status=$?
done
....
Now you can use the $exit_status variable however you like.
With this method, your script doesnt have to wait for the background process, you will only have to monitor a temporary file for the exit status.
FUNCmyCmd() { sleep 3;return 6; };
export retFile=$(mktemp);
FUNCexecAndWait() { FUNCmyCmd;echo $? >$retFile; };
FUNCexecAndWait&
now, your script can do anything else while you just have to keep monitoring the contents of retFile (it can also contain any other information you want like the exit time).
PS.: btw, I coded thinking in bash
My solution was to use an anonymous pipe to pass the status to a monitoring loop. There are no temporary files used to exchange status so nothing to cleanup. If you were uncertain about the number of background jobs the break condition could be [ -z "$(jobs -p)" ].
#!/bin/bash
exec 3<> <(:)
{ sleep 15 ; echo "sleep/exit $?" >&3 ; } &
while read -u 3 -t 1 -r STAT CODE || STAT="timeout" ; do
echo "stat: ${STAT}; code: ${CODE}"
if [ "${STAT}" = "sleep/exit" ] ; then
break
fi
done
how about ...
# run your stuff
unset PID
for process in one two three four
do
( sleep $((RANDOM%20)); echo hello from process $process; exit $((RANDOM%3)); ) & 2>&1
PID+=($!)
done
# (optional) report on the status of that stuff as it exits
for pid in "${PID[#]}"
do
( wait "$pid"; echo "process $pid complemted with exit status $?") &
done
# (optional) while we wait, monitor that stuff
while ps --pid "${PID[*]}" --ppid "${PID[*]}" --format pid,ppid,command,pcpu
do
sleep 5
done | xargs -i date '+%x %X {}'
# return non-zero if any are non zero
SUCCESS=0
for pid in "${PID[#]}"
do
wait "$pid" && ((SUCCESS++)) && echo "$pid OK" || echo "$pid returned $?"
done
echo "success for $SUCCESS out of ${#PID} jobs"
exit $(( ${#PID} - SUCCESS ))
This may be extending beyond your question, however if you're concerned about the length of time processes are running for, you may be interested in checking the status of running background processes after an interval of time. It's easy enough to check which child PIDs are still running using pgrep -P $$, however I came up with the following solution to check the exit status of those PIDs that have already expired:
cmd1() { sleep 5; exit 24; }
cmd2() { sleep 10; exit 0; }
pids=()
cmd1 & pids+=("$!")
cmd2 & pids+=("$!")
lasttimeout=0
for timeout in 2 7 11; do
echo -n "interval-$timeout: "
sleep $((timeout-lasttimeout))
# you can only wait on a pid once
remainingpids=()
for pid in ${pids[*]}; do
if ! ps -p $pid >/dev/null ; then
wait $pid
echo -n "pid-$pid:exited($?); "
else
echo -n "pid-$pid:running; "
remainingpids+=("$pid")
fi
done
pids=( ${remainingpids[*]} )
lasttimeout=$timeout
echo
done
which outputs:
interval-2: pid-28083:running; pid-28084:running;
interval-7: pid-28083:exited(24); pid-28084:running;
interval-11: pid-28084:exited(0);
Note: You could change $pids to a string variable rather than array to simplify things if you like.

Resources