Why does wait generate “<pid> is not a child of this shell” error if a pipe is used afterwards? - linux

In the following I create a background process and wait for it to complete.
$ bash -c "sleep 5 | false" & wait $!
[1] 46950
[1]+ Exit 1 bash -c "sleep 5 | false"
$ echo $?
1
This works and the prompt returns after 5 seconds.
However, wait returns an error if I use one more pipe after it.
$ bash -c "sleep 5 | false" & wait $! | true
[1] 49493
-bash: wait: pid 49493 is not a child of this shell
hbaba#mbp-005063:~/misc$ echo $?
0
hbaba#mbp-005063:~/misc$ ps -T -f
UID PID PPID C STIME TTY TIME CMD
980771313 49493 69771 0 12:56AM ttys056 0:00.00 bash -c sleep 5 | false
980771313 49498 49493 0 12:56AM ttys056 0:00.00 sleep 5
0 49555 69771 0 12:56AM ttys056 0:00.01 ps -T -f
What is happening here?
I am using bash version GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin15)
I can reproduce the wait error every time.
I think it has something to do with each pipe being a separate subshell. https://unix.stackexchange.com/a/127346/212862
Maybe the wait $! command looks for the child process in the wrong shell.
The error message mentions the 49493 pid. That is indeed the right pid for the bash -c … Command. The ps -T shows that.
There are relevant questions q1 and q2. But in them there is no pipe usage after the wait built-in.
Update
I had a misunderstanding about operator precedence in bash between the & and |. #randomir pointed that out in his answer. Adding curly braces makes the wait wait on the previously backgrounded process. For example:
{ bash -c "sleep 5 | false" & wait $! ; } | true
This does not return the same wait error.

There are two key points to observe here:
wait (a shell built-in) can wait only (the shell's) children
each command in a pipeline runs in a separate subshell
So, when you say:
cmd & wait $!
then cmd is run in your current shell, in background, and wait (being the shell's built-in) can wait on cmd's PID, since cmd is a child of that shell (and therefore a child of wait).
On the other hand, when you say:
cmd & wait $! | cmd2
then cmd is still run in your current shell, but the pipe induces a new subshell for wait (a new bash process) in which cmd is not its child, and wait can not wait on its sibling (a child of its parent).
As an additional clarification of shell's grammar - the & operator (along with ;, && and ||) separates pipelines, forming lists. So, a list is a sequence of pipelines, and a pipeline is a sequence of commands separated by |.
That means the last example above is equivalent to:
cmd & { wait $! | cmd2; }
and not to this:
{ cmd & wait $! ; } | cmd2
which is equivalent to what you have expected.

Bash manual:
Each command in a pipeline is executed as a separate process (i.e., in
a subshell).
You can check this with:
$ echo "me1: $BASHPID"
me1: 34438
$ sleep 100 &
[1] 34989
$ echo "me2: $BASHPID, child: $!"
me2: 34438, child: 34989
$ true | echo "me3: $BASHPID, child: $!"
me3: 34991, child: 34989
The first and second echo (me1 and me2) are executed in the context of the top shell (34438). The sleep child process (34989) is one of its children. The third echo (me3) is executed in the context of a subshell (34991). The sleep process is not one of its children any more.

Related

Bash run a group of two children in the background and kill them later

Let's group two commands (cd and bash ..) together like this:
#!/bin/bash
C="directory"
SH="bash process.sh"
(cd ${C}; ${SH})&
PID=$!
sleep 1
KILL=`kill ${PID}`
process.sh prints out the date (each second and five times):
C=0
while true
do
date
sleep 1
if [ ${C} -eq 4 ]; then
break
fi
C=$((C+1))
done
Now I actually would expect the background subprocess to be killed right after 1 second, but it just continues like nothing happens. INB4: "Why don't you just bash directory/process.sh" No, this cd is just an example.
What am I doing wrong?
Use exec when you want a process to replace itself in-place, rather than creating a new subprocess with its own PID.
That is to say, this code can create two subprocesses, storing the PID of the first one in $! but then using the second one to execute process.sh:
# store the subshell that runs cd in $!; not necessarily the shell that runs process.sh
# ...as the shell that runs cd is allowed to fork off a child and run process.sh there.
(cd "$dir" && bash process.sh) & pid=$!
...whereas this code creates only one subprocess, because it uses exec to make the first process replace itself with the second:
# explicitly replace the shell that runs cd with the one that runs process.sh
# so $! is guaranteed to have the right thing
(cd "$dir" && exec bash process.sh) &
you can check all child processes with "ps --ppid $$"
so,
#!/bin/bash
C="directory"
SH="bash process.sh"
(cd ${C}; ${SH})&
PID=$!
sleep 1
ps -o pid= --ppid $$|xargs kill

Launch two processes simultaneously and collect results from the process finished earlier

Suppose I want to run two commands c1 and c2, which essentially process (but not modify) the same piece of data on Linux.
Right now I would like to launch them simultaneously, and see which one finishes quicker, once one process has finished, I will collect its output (could be dumpped into a file with c1 >> log1.txt), and terminate the other process.
Note that the processing time of two process could be largely different and hence observable, say one takes ten seconds, while the other takes 60 seconds.
=======================update
I tried the following script set but it causes infinite loop on my computer:
import os
os.system("./launch.sh")
launch.sh
#!/usr/bin/env bash
rm /tmp/smack-checker2
mkfifo /tmp/smack-checker2
setsid bash -c "./sleep60.sh ; echo 1 > /tmp/run-checker2" &
pid0=$!
setsid bash -c "./sleep10.sh ; echo 2 > /tmp/run-checker2" &
pid1=$!
read line </tmp/smack-checker2
printf "Process %d finished earlier\n" "$line"
rm /tmp/smack-checker2
eval kill -- -\$"pid$((line ^ 1))"
sleep60.sh
#!/usr/bin/env bash
sleep 60
sleep10.sh
#!/usr/bin/env bash
sleep 10
Use wait -n to wait for either process to exit. Ignoring race conditions and pid number wrapping,
c1 & P1=$!
c2 & P2=$!
wait -n # wait for either one to exit
if ! kill $P1; then
# failure to kill $P1 indicates c1 finished first
kill $P2
# collect c1 results...
else
# c2 finished first
kill $P1
# collect c2 results...
fi
See help wait or man bash for documentation.
I would run 2 processes and make them write to the shared named pipe
after they finish. Reading from a named pipe is a blocking operation
so you don't need funny sleep instructions inside a loop. It would
be:
#!/usr/bin/env bash
mkfifo /tmp/run-checker
(./sleep60.sh ; echo 0 > /tmp/run-checker) &
(./sleep10.sh ; echo 1 > /tmp/run-checker) &
read line </tmp/run-checker
printf "Process %d finished earlier\n" "$line"
rm /tmp/run-checker
kill -- -$$
sleep60.sh:
#!/usr/bin/env bash
sleep 60
sleep10.sh:
#!/usr/bin/env bash
sleep 10
EDIT:
If you're going to call the script form Python script like that:
#!/usr/bin/env python3
import os
os.system("./parallel.sh")
print("Done")
you'll get:
Process 1 finished earlier
./parallel.sh: line 11: kill: (-13807) - No such process
Done
This is because kill -- -$$ tries to send TERM signal to the process
group as specified in man 1 kill:
-n
where n is larger than 1. All processes in process group n are
signaled. When an argument of the form '-n' is given, and it
is meant to denote a process group, either a signal must be
specified first, or the argument must be preceded by a '--'
option, otherwise it will be taken as the signal to send.
It works when you run parallel.sh from the terminal because $$ is a
PID of the subshell and also of the process group. I used it because
it's very convenient to kill parallel.sh, process0 or process1 and all
their children in one shot. However, when parallel.sh is called from
Python script $$ does not longer denote process group and kill --
fails.
You could modify parallel.sh like that:
#!/usr/bin/env bash
mkfifo /tmp/run-checker
setsid bash -c "./sleep60.sh ; echo 0 > /tmp/run-checker" &
pid0=$!
setsid bash -c "./sleep10.sh ; echo 1 > /tmp/run-checker" &
pid1=$!
read line </tmp/run-checker
printf "Process %d finished earlier\n" "$line"
rm /tmp/run-checker
eval kill -- -\$"pid$((line ^ 1))"
It will now work also when called from Python script. The last line
eval kill -- -\$"pid$((line ^ 1))"
kills pid0 if pid1 finished earlier or pid0 if pid1 finished earlier
using ^ binary operator to convert 0 to 1 and vice versa. If you
don't like it you can use a bit more verbose form:
if [ "$line" -eq "$pid0" ]
then
echo kill "$pid1"
kill -- -"$pid1"
else
echo kill "$pid0"
kill -- -"$pid0"
fi
Can this snippet give you some idea?
#!/bin/sh
runproc1() {
sleep 5
touch proc1 # file created when terminated
exit
}
runproc2() {
sleep 10
touch proc2 # file created when terminated
exit
}
# remove flags
rm proc1
rm proc2
# run processes concurrently
runproc1 &
runproc2 &
# wait until one of them is finished
while [ ! -f proc1 -a ! -f proc2 ]; do
sleep 1
echo -n "."
done
The idea is to enclose two processes into two functions which, at the end, touch a file to signal that computing is terminated. The functions are executed in background, after having removed the files used as flags. The last step is to watch for either file to show up. At that point, anything can be done: continue to wait for the other process, or kill it.
Launching this precise script, it takes about 5 seconds, then terminates. I see that the file "proc1" is created, with no proc2. After a few seconds (5, to be precise), also "proc2" is created. This means that even when the script is terminated, any unfinished job keeps to run.

How to kill a process by reading from pid file using bash script in Jenkins?

Inside Jenkins, I have to run 2 separate scripts: start.sh and stop.sh. These scripts are inside my application which is fetched from a SCM . They are inside same directory.
The start.sh script runs a process in the background using nohup, and writes the processId to save_pid.pid. This script works fine. It successfully starts my application.
Then inside stop.sh, I am trying to read the processId from save_pid.pid to delete the process. But,I am unable to delete the process and the application keeps running until I kill the process manually using: sudo kill {processId}.
Here are the approaches that I have tried so far inside stop.sh but none of these work:
kill $(cat /path/to/save_pid.pid)
kill `cat /path/to/save_pid.pid`
kill -9 $(cat /path/to/save_pid.pid)
kill -9 `cat /path/to/save_pid.pid`
pkill -F /path/to/save_pid.pid
I have also tried all of these steps with sudo as well. But, it just doesn't work. I have kept an echo statement inside stop.sh, which prints and then there is nothing.
What am I doing wrong here ?
UPDATE:
The nohup command that I am using inside start.sh is something like this:
nohup deploy_script > $WORKSPACE/app.log 2>&1 & echo $! > $WORKSPACE/save_pid.pid
Please Note:
In my case, the value written inside save_pid.pid is surprisingly
always less by 1 than the value of actual processId. !!!
I think the reason why this happens is because you are not getting the PID of the process that you are interested in, but the PID of the shell executing your command.
Look:
$ echo "/bin/sleep 10" > /tmp/foo
$ chmod +x /tmp/foo
$ nohup /tmp/foo & echo $!
[1] 26787
26787
nohup: ignoring input and appending output to 'nohup.out'
$ pgrep sleep
26789
So 'nohup' will exec the 'shell', the 'shell' will fork a second 'shell' to exec 'sleep' in, however I can only count two processes here, so I am unable to account for one created PID.
Note that, if you put the nohup and the pgrep on one line, then pgrep will apparently be started faster than the shell that 'exec's 'sleep' and thus pgrep will yield nothing, which somewhat confirms my theory:
$ nohup /tmp/foo & echo $! ; pgrep sleep
[2] 26899
nohup: ignoring input and appending output to 'nohup.out'
$
If you launch your process directly, then nohup will "exec" your process and thus keep the same PID for the process as nohup itself had (see http://sources.debian.net/src/coreutils/8.23-4/src/nohup.c/#L225):
$ nohup /bin/sleep 10 & echo "$!"; pgrep sleep
[1] 27130
27130
nohup: ignoring input and appending output to 'nohup.out'
27130
Also, if you 'exec' 'sleep' inside the script, then there's only one process that's created (as expected):
$ echo "exec /bin/sleep 10" > /tmp/foo
$ nohup /tmp/foo & echo "$!"; pgrep sleep
[1] 27309
27309
nohup: ignoring input and appending output to 'nohup.out'
27309
Thus, according to my theory, if you'd 'exec' your process inside the script, then you'd be getting the correct PID.

bash: silently kill background function process

shell gurus,
I have a bash shell script, in which I launch a background function, say foo(), to display a progress bar for a boring and long command:
foo()
{
while [ 1 ]
do
#massively cool progress bar display code
sleep 1
done
}
foo &
foo_pid=$!
boring_and_long_command
kill $foo_pid >/dev/null 2>&1
sleep 10
now, when foo dies, I see the following text:
/home/user/script: line XXX: 30290 Killed foo
This totally destroys the awesomeness of my, otherwise massively cool, progress bar display.
How do I get rid of this message?
kill $foo_pid
wait $foo_pid 2>/dev/null
BTW, I don't know about your massively cool progress bar, but have you seen Pipe Viewer (pv)? http://www.ivarch.com/programs/pv.shtml
Just came across this myself, and realised "disown" is what we are looking for.
foo &
foo_pid=$!
disown
boring_and_long_command
kill $foo_pid
sleep 10
The death message is being printed because the process is still in the shells list of watched "jobs". The disown command will remove the most recently spawned process from this list so that no debug message will be generated when it is killed, even with SIGKILL (-9).
Try to replace your line kill $foo_pid >/dev/null 2>&1 with the line:
(kill $foo_pid 2>&1) >/dev/null
Update:
This answer is not correct for the reason explained by #mklement0 in his comment:
The reason this answer isn't effective with background jobs is that
Bash itself asynchronously, after the kill command has completed,
outputs a status message about the killed job, which you cannot
suppress directly - unless you use wait, as in the accepted answer.
This "hack" seems to work:
# Some trickery to hide killed message
exec 3>&2 # 3 is now a copy of 2
exec 2> /dev/null # 2 now points to /dev/null
kill $foo_pid >/dev/null 2>&1
sleep 1 # sleep to wait for process to die
exec 2>&3 # restore stderr to saved
exec 3>&- # close saved version
and it was inspired from here. World order has been restored.
This is a solution I came up with for a similar problem (wanted to display a timestamp during long running processes). This implements a killsub function that allows you to kill any subshell quietly as long as you know the pid. Note, that the trap instructions are important to include: in case the script is interrupted, the subshell will not continue to run.
foo()
{
while [ 1 ]
do
#massively cool progress bar display code
sleep 1
done
}
#Kills the sub process quietly
function killsub()
{
kill -9 ${1} 2>/dev/null
wait ${1} 2>/dev/null
}
foo &
foo_pid=$!
#Add a trap incase of unexpected interruptions
trap 'killsub ${foo_pid}; exit' INT TERM EXIT
boring_and_long_command
#Kill foo after finished
killsub ${foo_pid}
#Reset trap
trap - INT TERM EXIT
Add at the start of the function:
trap 'exit 0' TERM
You can use set +m before to suppress that. More information on that here
Another way to do it:
func_terminate_service(){
[[ "$(pidof ${1})" ]] && killall ${1}
sleep 2
[[ "$(pidof ${1})" ]] && kill -9 "$(pidof ${1})"
}
call it with
func_terminate_service "firefox"
Yet another way to disable job notifications is to put your command to be backgrounded in a sh -c 'cmd &' construct.
#!/bin/bash
foo()
{
while [ 1 ]
do
sleep 1
done
}
#foo &
#foo_pid=$!
export -f foo
foo_pid=`sh -c 'foo & echo ${!}' | head -1`
# if shell does not support exporting functions (export -f foo)
#arg1='foo() { while [ 1 ]; do sleep 1; done; }'
#foo_pid=`sh -c 'eval "$1"; foo & echo ${!}' _ "$arg1" | head -1`
sleep 3
echo kill ${foo_pid}
kill ${foo_pid}
sleep 3
exit
The error message should come from the default signal handler which dump the signal source in the script. I met the similar errors only on bash 3.x and 4.x. To always quietly kill the child process everywhere(tested on bash 3/4/5, dash, ash, zsh), we could trap the TERM signal at the very first of child process:
#!/bin/sh
## assume script name is test.sh
foo() {
trap 'exit 0' TERM ## here is the key
while true; do sleep 1; done
}
echo before child
ps aux | grep 'test\.s[h]\|slee[p]'
foo &
foo_pid=$!
sleep 1 # wait trap is done
echo before kill
ps aux | grep 'test\.s[h]\|slee[p]'
kill $foo_pid
sleep 1 # wait kill is done
echo after kill
ps aux | grep 'test\.s[h]\|slee[p]'

Get exit code of a background process

I have a command CMD called from my main bourne shell script that takes forever.
I want to modify the script as follows:
Run the command CMD in parallel as a background process (CMD &).
In the main script, have a loop to monitor the spawned command every few seconds. The loop also echoes some messages to stdout indicating progress of the script.
Exit the loop when the spawned command terminates.
Capture and report the exit code of the spawned process.
Can someone give me pointers to accomplish this?
1: In bash, $! holds the PID of the last background process that was executed. That will tell you what process to monitor, anyway.
4: wait <n> waits until the process with PID <n> is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.
2, 3: ps or ps | grep " $! " can tell you whether the process is still running. It is up to you how to understand the output and decide how close it is to finishing. (ps | grep isn't idiot-proof. If you have time you can come up with a more robust way to tell whether the process is still running).
Here's a skeleton script:
# simulate a long process that will have an identifiable exit code
(sleep 15 ; /bin/false) &
my_pid=$!
while ps | grep " $my_pid " # might also need | grep -v grep here
do
echo $my_pid is still in the ps output. Must still be running.
sleep 3
done
echo Oh, it looks like the process is done.
wait $my_pid
# The variable $? always holds the exit code of the last command to finish.
# Here it holds the exit code of $my_pid, since wait exits with that code.
my_status=$?
echo The exit status of the process was $my_status
This is how I solved it when I had a similar need:
# Some function that takes a long time to process
longprocess() {
# Sleep up to 14 seconds
sleep $((RANDOM % 15))
# Randomly exit with 0 or 1
exit $((RANDOM % 2))
}
pids=""
# Run five concurrent processes
for i in {1..5}; do
( longprocess ) &
# store PID of process
pids+=" $!"
done
# Wait for all processes to finish, will take max 14s
# as it waits in order of launch, not order of finishing
for p in $pids; do
if wait $p; then
echo "Process $p success"
else
echo "Process $p fail"
fi
done
The pid of a backgrounded child process is stored in $!.
You can store all child processes' pids into an array, e.g. PIDS[].
wait [-n] [jobspec or pid …]
Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for. If a job spec is given, all processes in the job are waited for. If no arguments are given, all currently active child processes are waited for, and the return status is zero. If the -n option is supplied, wait waits for any job to terminate and returns its exit status. If neither jobspec nor pid specifies an active child process of the shell, the return status is 127.
Use wait command you can wait for all child processes finish, meanwhile you can get exit status of each child processes via $? and store status into STATUS[]. Then you can do something depending by status.
I have tried the following 2 solutions and they run well. solution01 is
more concise, while solution02 is a little complicated.
solution01
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
PIDS+=($!)
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS+=($?)
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
solution02
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
i=0
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
pid=$!
PIDS[$i]=${pid}
((i+=1))
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
i=0
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS[$i]=$?
((i+=1))
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
As I see almost all answers use external utilities (mostly ps) to poll the state of the background process. There is a more unixesh solution, catching the SIGCHLD signal. In the signal handler it has to be checked which child process was stopped. It can be done by kill -0 <PID> built-in (universal) or checking the existence of /proc/<PID> directory (Linux specific) or using the jobs built-in (bash specific. jobs -l also reports the pid. In this case the 3rd field of the output can be Stopped|Running|Done|Exit . ).
Here is my example.
The launched process is called loop.sh. It accepts -x or a number as an argument. For -x is exits with exit code 1. For a number it waits num*5 seconds. In every 5 seconds it prints its PID.
The launcher process is called launch.sh:
#!/bin/bash
handle_chld() {
local tmp=()
for((i=0;i<${#pids[#]};++i)); do
if [ ! -d /proc/${pids[i]} ]; then
wait ${pids[i]}
echo "Stopped ${pids[i]}; exit code: $?"
else tmp+=(${pids[i]})
fi
done
pids=(${tmp[#]})
}
set -o monitor
trap "handle_chld" CHLD
# Start background processes
./loop.sh 3 &
pids+=($!)
./loop.sh 2 &
pids+=($!)
./loop.sh -x &
pids+=($!)
# Wait until all background processes are stopped
while [ ${#pids[#]} -gt 0 ]; do echo "WAITING FOR: ${pids[#]}"; sleep 2; done
echo STOPPED
For more explanation see: Starting a process from bash script failed
#/bin/bash
#pgm to monitor
tail -f /var/log/messages >> /tmp/log&
# background cmd pid
pid=$!
# loop to monitor running background cmd
while :
do
ps ax | grep $pid | grep -v grep
ret=$?
if test "$ret" != "0"
then
echo "Monitored pid ended"
break
fi
sleep 5
done
wait $pid
echo $?
I would change your approach slightly. Rather than checking every few seconds if the command is still alive and reporting a message, have another process that reports every few seconds that the command is still running and then kill that process when the command finishes. For example:
#!/bin/sh
cmd() { sleep 5; exit 24; }
cmd & # Run the long running process
pid=$! # Record the pid
# Spawn a process that coninually reports that the command is still running
while echo "$(date): $pid is still running"; do sleep 1; done &
echoer=$!
# Set a trap to kill the reporter when the process finishes
trap 'kill $echoer' 0
# Wait for the process to finish
if wait $pid; then
echo "cmd succeeded"
else
echo "cmd FAILED!! (returned $?)"
fi
Our team had the same need with a remote SSH-executed script which was timing out after 25 minutes of inactivity. Here is a solution with the monitoring loop checking the background process every second, but printing only every 10 minutes to suppress an inactivity timeout.
long_running.sh &
pid=$!
# Wait on a background job completion. Query status every 10 minutes.
declare -i elapsed=0
# `ps -p ${pid}` works on macOS and CentOS. On both OSes `ps ${pid}` works as well.
while ps -p ${pid} >/dev/null; do
sleep 1
if ((++elapsed % 600 == 0)); then
echo "Waiting for the completion of the main script. $((elapsed / 60))m and counting ..."
fi
done
# Return the exit code of the terminated background process. This works in Bash 4.4 despite what Bash docs say:
# "If neither jobspec nor pid specifies an active child process of the shell, the return status is 127."
wait ${pid}
A simple example, similar to the solutions above. This doesn't require monitoring any process output. The next example uses tail to follow output.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'sleep 30; exit 5' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh &
[1] 7454
$ pid=$!
$ wait $pid
[1]+ Exit 5 ./tmp.sh
$ echo $?
5
Use tail to follow process output and quit when the process is complete.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'i=0; while let "$i < 10"; do sleep 5; echo "$i"; let i=$i+1; done; exit 5;' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh
0
1
2
^C
$ ./tmp.sh > /tmp/tmp.log 2>&1 &
[1] 7673
$ pid=$!
$ tail -f --pid $pid /tmp/tmp.log
0
1
2
3
4
5
6
7
8
9
[1]+ Exit 5 ./tmp.sh > /tmp/tmp.log 2>&1
$ wait $pid
$ echo $?
5
Another solution is to monitor processes via the proc filesystem (safer than ps/grep combo); when you start a process it has a corresponding folder in /proc/$pid, so the solution could be
#!/bin/bash
....
doSomething &
local pid=$!
while [ -d /proc/$pid ]; do # While directory exists, the process is running
doSomethingElse
....
else # when directory is removed from /proc, process has ended
wait $pid
local exit_status=$?
done
....
Now you can use the $exit_status variable however you like.
With this method, your script doesnt have to wait for the background process, you will only have to monitor a temporary file for the exit status.
FUNCmyCmd() { sleep 3;return 6; };
export retFile=$(mktemp);
FUNCexecAndWait() { FUNCmyCmd;echo $? >$retFile; };
FUNCexecAndWait&
now, your script can do anything else while you just have to keep monitoring the contents of retFile (it can also contain any other information you want like the exit time).
PS.: btw, I coded thinking in bash
My solution was to use an anonymous pipe to pass the status to a monitoring loop. There are no temporary files used to exchange status so nothing to cleanup. If you were uncertain about the number of background jobs the break condition could be [ -z "$(jobs -p)" ].
#!/bin/bash
exec 3<> <(:)
{ sleep 15 ; echo "sleep/exit $?" >&3 ; } &
while read -u 3 -t 1 -r STAT CODE || STAT="timeout" ; do
echo "stat: ${STAT}; code: ${CODE}"
if [ "${STAT}" = "sleep/exit" ] ; then
break
fi
done
how about ...
# run your stuff
unset PID
for process in one two three four
do
( sleep $((RANDOM%20)); echo hello from process $process; exit $((RANDOM%3)); ) & 2>&1
PID+=($!)
done
# (optional) report on the status of that stuff as it exits
for pid in "${PID[#]}"
do
( wait "$pid"; echo "process $pid complemted with exit status $?") &
done
# (optional) while we wait, monitor that stuff
while ps --pid "${PID[*]}" --ppid "${PID[*]}" --format pid,ppid,command,pcpu
do
sleep 5
done | xargs -i date '+%x %X {}'
# return non-zero if any are non zero
SUCCESS=0
for pid in "${PID[#]}"
do
wait "$pid" && ((SUCCESS++)) && echo "$pid OK" || echo "$pid returned $?"
done
echo "success for $SUCCESS out of ${#PID} jobs"
exit $(( ${#PID} - SUCCESS ))
This may be extending beyond your question, however if you're concerned about the length of time processes are running for, you may be interested in checking the status of running background processes after an interval of time. It's easy enough to check which child PIDs are still running using pgrep -P $$, however I came up with the following solution to check the exit status of those PIDs that have already expired:
cmd1() { sleep 5; exit 24; }
cmd2() { sleep 10; exit 0; }
pids=()
cmd1 & pids+=("$!")
cmd2 & pids+=("$!")
lasttimeout=0
for timeout in 2 7 11; do
echo -n "interval-$timeout: "
sleep $((timeout-lasttimeout))
# you can only wait on a pid once
remainingpids=()
for pid in ${pids[*]}; do
if ! ps -p $pid >/dev/null ; then
wait $pid
echo -n "pid-$pid:exited($?); "
else
echo -n "pid-$pid:running; "
remainingpids+=("$pid")
fi
done
pids=( ${remainingpids[*]} )
lasttimeout=$timeout
echo
done
which outputs:
interval-2: pid-28083:running; pid-28084:running;
interval-7: pid-28083:exited(24); pid-28084:running;
interval-11: pid-28084:exited(0);
Note: You could change $pids to a string variable rather than array to simplify things if you like.

Resources