ftp to multiple servers at same time [duplicate] - multithreading

Lets say I have a loop in Bash:
for foo in `some-command`
do
do-something $foo
done
do-something is cpu bound and I have a nice shiny 4 core processor. I'd like to be able to run up to 4 do-something's at once.
The naive approach seems to be:
for foo in `some-command`
do
do-something $foo &
done
This will run all do-somethings at once, but there are a couple downsides, mainly that do-something may also have some significant I/O which performing all at once might slow down a bit. The other problem is that this code block returns immediately, so no way to do other work when all the do-somethings are finished.
How would you write this loop so there are always X do-somethings running at once?

Depending on what you want to do xargs also can help (here: converting documents with pdf2ps):
cpus=$( ls -d /sys/devices/system/cpu/cpu[[:digit:]]* | wc -w )
find . -name \*.pdf | xargs --max-args=1 --max-procs=$cpus pdf2ps
From the docs:
--max-procs=max-procs
-P max-procs
Run up to max-procs processes at a time; the default is 1.
If max-procs is 0, xargs will run as many processes as possible at a
time. Use the -n option with -P; otherwise chances are that only one
exec will be done.

With GNU Parallel http://www.gnu.org/software/parallel/ you can write:
some-command | parallel do-something
GNU Parallel also supports running jobs on remote computers. This will run one per CPU core on the remote computers - even if they have different number of cores:
some-command | parallel -S server1,server2 do-something
A more advanced example: Here we list of files that we want my_script to run on. Files have extension (maybe .jpeg). We want the output of my_script to be put next to the files in basename.out (e.g. foo.jpeg -> foo.out). We want to run my_script once for each core the computer has and we want to run it on the local computer, too. For the remote computers we want the file to be processed transferred to the given computer. When my_script finishes, we want foo.out transferred back and we then want foo.jpeg and foo.out removed from the remote computer:
cat list_of_files | \
parallel --trc {.}.out -S server1,server2,: \
"my_script {} > {.}.out"
GNU Parallel makes sure the output from each job does not mix, so you can use the output as input for another program:
some-command | parallel do-something | postprocess
See the videos for more examples: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

maxjobs=4
parallelize () {
while [ $# -gt 0 ] ; do
jobcnt=(`jobs -p`)
if [ ${#jobcnt[#]} -lt $maxjobs ] ; then
do-something $1 &
shift
else
sleep 1
fi
done
wait
}
parallelize arg1 arg2 "5 args to third job" arg4 ...

Here an alternative solution that can be inserted into .bashrc and used for everyday one liner:
function pwait() {
while [ $(jobs -p | wc -l) -ge $1 ]; do
sleep 1
done
}
To use it, all one has to do is put & after the jobs and a pwait call, the parameter gives the number of parallel processes:
for i in *; do
do_something $i &
pwait 10
done
It would be nicer to use wait instead of busy waiting on the output of jobs -p, but there doesn't seem to be an obvious solution to wait till any of the given jobs is finished instead of a all of them.

Instead of a plain bash, use a Makefile, then specify number of simultaneous jobs with make -jX where X is the number of jobs to run at once.
Or you can use wait ("man wait"): launch several child processes, call wait - it will exit when the child processes finish.
maxjobs = 10
foreach line in `cat file.txt` {
jobsrunning = 0
while jobsrunning < maxjobs {
do job &
jobsrunning += 1
}
wait
}
job ( ){
...
}
If you need to store the job's result, then assign their result to a variable. After wait you just check what the variable contains.

If you're familiar with the make command, most of the time you can express the list of commands you want to run as a a makefile. For example, if you need to run $SOME_COMMAND on files *.input each of which produces *.output, you can use the makefile
INPUT = a.input b.input
OUTPUT = $(INPUT:.input=.output)
%.output : %.input
$(SOME_COMMAND) $< $#
all: $(OUTPUT)
and then just run
make -j<NUMBER>
to run at most NUMBER commands in parallel.

While doing this right in bash is probably impossible, you can do a semi-right fairly easily. bstark gave a fair approximation of right but his has the following flaws:
Word splitting: You can't pass any jobs to it that use any of the following characters in their arguments: spaces, tabs, newlines, stars, question marks. If you do, things will break, possibly unexpectedly.
It relies on the rest of your script to not background anything. If you do, or later you add something to the script that gets sent in the background because you forgot you weren't allowed to use backgrounded jobs because of his snippet, things will break.
Another approximation which doesn't have these flaws is the following:
scheduleAll() {
local job i=0 max=4 pids=()
for job; do
(( ++i % max == 0 )) && {
wait "${pids[#]}"
pids=()
}
bash -c "$job" & pids+=("$!")
done
wait "${pids[#]}"
}
Note that this one is easily adaptable to also check the exit code of each job as it ends so you can warn the user if a job fails or set an exit code for scheduleAll according to the amount of jobs that failed, or something.
The problem with this code is just that:
It schedules four (in this case) jobs at a time and then waits for all four to end. Some might be done sooner than others which will cause the next batch of four jobs to wait until the longest of the previous batch is done.
A solution that takes care of this last issue would have to use kill -0 to poll whether any of the processes have disappeared instead of the wait and schedule the next job. However, that introduces a small new problem: you have a race condition between a job ending, and the kill -0 checking whether it's ended. If the job ended and another process on your system starts up at the same time, taking a random PID which happens to be that of the job that just finished, the kill -0 won't notice your job having finished and things will break again.
A perfect solution isn't possible in bash.

Maybe try a parallelizing utility instead rewriting the loop? I'm a big fan of xjobs. I use xjobs all the time to mass copy files across our network, usually when setting up a new database server.
http://www.maier-komor.de/xjobs.html

function for bash:
parallel ()
{
awk "BEGIN{print \"all: ALL_TARGETS\\n\"}{print \"TARGET_\"NR\":\\n\\t#-\"\$0\"\\n\"}END{printf \"ALL_TARGETS:\";for(i=1;i<=NR;i++){printf \" TARGET_%d\",i};print\"\\n\"}" | make $# -f - all
}
using:
cat my_commands | parallel -j 4

Really late to the party here, but here's another solution.
A lot of solutions don't handle spaces/special characters in the commands, don't keep N jobs running at all times, eat cpu in busy loops, or rely on external dependencies (e.g. GNU parallel).
With inspiration for dead/zombie process handling, here's a pure bash solution:
function run_parallel_jobs {
local concurrent_max=$1
local callback=$2
local cmds=("${#:3}")
local jobs=( )
while [[ "${#cmds[#]}" -gt 0 ]] || [[ "${#jobs[#]}" -gt 0 ]]; do
while [[ "${#jobs[#]}" -lt $concurrent_max ]] && [[ "${#cmds[#]}" -gt 0 ]]; do
local cmd="${cmds[0]}"
cmds=("${cmds[#]:1}")
bash -c "$cmd" &
jobs+=($!)
done
local job="${jobs[0]}"
jobs=("${jobs[#]:1}")
local state="$(ps -p $job -o state= 2>/dev/null)"
if [[ "$state" == "D" ]] || [[ "$state" == "Z" ]]; then
$callback $job
else
wait $job
$callback $job $?
fi
done
}
And sample usage:
function job_done {
if [[ $# -lt 2 ]]; then
echo "PID $1 died unexpectedly"
else
echo "PID $1 exited $2"
fi
}
cmds=( \
"echo 1; sleep 1; exit 1" \
"echo 2; sleep 2; exit 2" \
"echo 3; sleep 3; exit 3" \
"echo 4; sleep 4; exit 4" \
"echo 5; sleep 5; exit 5" \
)
# cpus="$(getconf _NPROCESSORS_ONLN)"
cpus=3
run_parallel_jobs $cpus "job_done" "${cmds[#]}"
The output:
1
2
3
PID 56712 exited 1
4
PID 56713 exited 2
5
PID 56714 exited 3
PID 56720 exited 4
PID 56724 exited 5
For per-process output handling $$ could be used to log to a file, for example:
function job_done {
cat "$1.log"
}
cmds=( \
"echo 1 \$\$ >\$\$.log" \
"echo 2 \$\$ >\$\$.log" \
)
run_parallel_jobs 2 "job_done" "${cmds[#]}"
Output:
1 56871
2 56872

The project I work on uses the wait command to control parallel shell (ksh actually) processes. To address your concerns about IO, on a modern OS, it's possible parallel execution will actually increase efficiency. If all processes are reading the same blocks on disk, only the first process will have to hit the physical hardware. The other processes will often be able to retrieve the block from OS's disk cache in memory. Obviously, reading from memory is several orders of magnitude quicker than reading from disk. Also, the benefit requires no coding changes.

This might be good enough for most purposes, but is not optimal.
#!/bin/bash
n=0
maxjobs=10
for i in *.m4a ; do
# ( DO SOMETHING ) &
# limit jobs
if (( $(($((++n)) % $maxjobs)) == 0 )) ; then
wait # wait until all have finished (not optimal, but most times good enough)
echo $n wait
fi
done

Here is how I managed to solve this issue in a bash script:
#! /bin/bash
MAX_JOBS=32
FILE_LIST=($(cat ${1}))
echo Length ${#FILE_LIST[#]}
for ((INDEX=0; INDEX < ${#FILE_LIST[#]}; INDEX=$((${INDEX}+${MAX_JOBS})) ));
do
JOBS_RUNNING=0
while ((JOBS_RUNNING < MAX_JOBS))
do
I=$((${INDEX}+${JOBS_RUNNING}))
FILE=${FILE_LIST[${I}]}
if [ "$FILE" != "" ];then
echo $JOBS_RUNNING $FILE
./M22Checker ${FILE} &
else
echo $JOBS_RUNNING NULL &
fi
JOBS_RUNNING=$((JOBS_RUNNING+1))
done
wait
done

You can use a simple nested for loop (substitute appropriate integers for N and M below):
for i in {1..N}; do
(for j in {1..M}; do do_something; done & );
done
This will execute do_something N*M times in M rounds, each round executing N jobs in parallel. You can make N equal the number of CPUs you have.

My solution to always keep a given number of processes running, keep tracking of errors and handle ubnterruptible / zombie processes:
function log {
echo "$1"
}
# Take a list of commands to run, runs them sequentially with numberOfProcesses commands simultaneously runs
# Returns the number of non zero exit codes from commands
function ParallelExec {
local numberOfProcesses="${1}" # Number of simultaneous commands to run
local commandsArg="${2}" # Semi-colon separated list of commands
local pid
local runningPids=0
local counter=0
local commandsArray
local pidsArray
local newPidsArray
local retval
local retvalAll=0
local pidState
local commandsArrayPid
IFS=';' read -r -a commandsArray <<< "$commandsArg"
log "Runnning ${#commandsArray[#]} commands in $numberOfProcesses simultaneous processes."
while [ $counter -lt "${#commandsArray[#]}" ] || [ ${#pidsArray[#]} -gt 0 ]; do
while [ $counter -lt "${#commandsArray[#]}" ] && [ ${#pidsArray[#]} -lt $numberOfProcesses ]; do
log "Running command [${commandsArray[$counter]}]."
eval "${commandsArray[$counter]}" &
pid=$!
pidsArray+=($pid)
commandsArrayPid[$pid]="${commandsArray[$counter]}"
counter=$((counter+1))
done
newPidsArray=()
for pid in "${pidsArray[#]}"; do
# Handle uninterruptible sleep state or zombies by ommiting them from running process array (How to kill that is already dead ? :)
if kill -0 $pid > /dev/null 2>&1; then
pidState=$(ps -p$pid -o state= 2 > /dev/null)
if [ "$pidState" != "D" ] && [ "$pidState" != "Z" ]; then
newPidsArray+=($pid)
fi
else
# pid is dead, get it's exit code from wait command
wait $pid
retval=$?
if [ $retval -ne 0 ]; then
log "Command [${commandsArrayPid[$pid]}] failed with exit code [$retval]."
retvalAll=$((retvalAll+1))
fi
fi
done
pidsArray=("${newPidsArray[#]}")
# Add a trivial sleep time so bash won't eat all CPU
sleep .05
done
return $retvalAll
}
Usage:
cmds="du -csh /var;du -csh /tmp;sleep 3;du -csh /root;sleep 10; du -csh /home"
# Execute 2 processes at a time
ParallelExec 2 "$cmds"
# Execute 4 processes at a time
ParallelExec 4 "$cmds"

$DOMAINS = "list of some domain in commands"
for foo in some-command
do
eval `some-command for $DOMAINS` &
job[$i]=$!
i=$(( i + 1))
done
Ndomains=echo $DOMAINS |wc -w
for i in $(seq 1 1 $Ndomains)
do
echo "wait for ${job[$i]}"
wait "${job[$i]}"
done
in this concept will work for the parallelize. important thing is last line of eval is '&'
which will put the commands to backgrounds.

Related

Asynchronous programming using shell scripting [duplicate]

How to wait in a bash script for several subprocesses spawned from that script to finish, and then return exit code !=0 when any of the subprocesses ends with code !=0?
Simple script:
#!/bin/bash
for i in `seq 0 9`; do
doCalculations $i &
done
wait
The above script will wait for all 10 spawned subprocesses, but it will always give exit status 0 (see help wait). How can I modify this script so it will discover exit statuses of spawned subprocesses and return exit code 1 when any of subprocesses ends with code !=0?
Is there any better solution for that than collecting PIDs of the subprocesses, wait for them in order and sum exit statuses?
wait also (optionally) takes the PID of the process to wait for, and with $! you get the PID of the last command launched in the background.
Modify the loop to store the PID of each spawned sub-process into an array, and then loop again waiting on each PID.
# run processes and store pids in array
for i in $n_procs; do
./procs[${i}] &
pids[${i}]=$!
done
# wait for all pids
for pid in ${pids[*]}; do
wait $pid
done
http://jeremy.zawodny.com/blog/archives/010717.html :
#!/bin/bash
FAIL=0
echo "starting"
./sleeper 2 0 &
./sleeper 2 1 &
./sleeper 3 0 &
./sleeper 2 0 &
for job in `jobs -p`
do
echo $job
wait $job || let "FAIL+=1"
done
echo $FAIL
if [ "$FAIL" == "0" ];
then
echo "YAY!"
else
echo "FAIL! ($FAIL)"
fi
Here is simple example using wait.
Run some processes:
$ sleep 10 &
$ sleep 10 &
$ sleep 20 &
$ sleep 20 &
Then wait for them with wait command:
$ wait < <(jobs -p)
Or just wait (without arguments) for all.
This will wait for all jobs in the background are completed.
If the -n option is supplied, waits for the next job to terminate and returns its exit status.
See: help wait and help jobs for syntax.
However the downside is that this will return on only the status of the last ID, so you need to check the status for each subprocess and store it in the variable.
Or make your calculation function to create some file on failure (empty or with fail log), then check of that file if exists, e.g.
$ sleep 20 && true || tee fail &
$ sleep 20 && false || tee fail &
$ wait < <(jobs -p)
$ test -f fail && echo Calculation failed.
How about simply:
#!/bin/bash
pids=""
for i in `seq 0 9`; do
doCalculations $i &
pids="$pids $!"
done
wait $pids
...code continued here ...
Update:
As pointed by multiple commenters, the above waits for all processes to be completed before continuing, but does not exit and fail if one of them fails, it can be made to do with the following modification suggested by #Bryan, #SamBrightman, and others:
#!/bin/bash
pids=""
RESULT=0
for i in `seq 0 9`; do
doCalculations $i &
pids="$pids $!"
done
for pid in $pids; do
wait $pid || let "RESULT=1"
done
if [ "$RESULT" == "1" ];
then
exit 1
fi
...code continued here ...
If you have GNU Parallel installed you can do:
# If doCalculations is a function
export -f doCalculations
seq 0 9 | parallel doCalculations {}
GNU Parallel will give you exit code:
0 - All jobs ran without error.
1-253 - Some of the jobs failed. The exit status gives the number of failed jobs
254 - More than 253 jobs failed.
255 - Other error.
Watch the intro videos to learn more: http://pi.dk/1
Here's what I've come up with so far. I would like to see how to interrupt the sleep command if a child terminates, so that one would not have to tune WAITALL_DELAY to one's usage.
waitall() { # PID...
## Wait for children to exit and indicate whether all exited with 0 status.
local errors=0
while :; do
debug "Processes remaining: $*"
for pid in "$#"; do
shift
if kill -0 "$pid" 2>/dev/null; then
debug "$pid is still alive."
set -- "$#" "$pid"
elif wait "$pid"; then
debug "$pid exited with zero exit status."
else
debug "$pid exited with non-zero exit status."
((++errors))
fi
done
(("$#" > 0)) || break
# TODO: how to interrupt this sleep when a child terminates?
sleep ${WAITALL_DELAY:-1}
done
((errors == 0))
}
debug() { echo "DEBUG: $*" >&2; }
pids=""
for t in 3 5 4; do
sleep "$t" &
pids="$pids $!"
done
waitall $pids
To parallelize this...
for i in $(whatever_list) ; do
do_something $i
done
Translate it to this...
for i in $(whatever_list) ; do echo $i ; done | ## execute in parallel...
(
export -f do_something ## export functions (if needed)
export PATH ## export any variables that are required
xargs -I{} --max-procs 0 bash -c ' ## process in batches...
{
echo "processing {}" ## optional
do_something {}
}'
)
If an error occurs in one process, it won't interrupt the other processes, but it will result in a non-zero exit code from the sequence as a whole.
Exporting functions and variables may or may not be necessary, in any particular case.
You can set --max-procs based on how much parallelism you want (0 means "all at once").
GNU Parallel offers some additional features when used in place of xargs -- but it isn't always installed by default.
The for loop isn't strictly necessary in this example since echo $i is basically just regenerating the output of $(whatever_list). I just think the use of the for keyword makes it a little easier to see what is going on.
Bash string handling can be confusing -- I have found that using single quotes works best for wrapping non-trivial scripts.
You can easily interrupt the entire operation (using ^C or similar), unlike the the more direct approach to Bash parallelism.
Here's a simplified working example...
for i in {0..5} ; do echo $i ; done |xargs -I{} --max-procs 2 bash -c '
{
echo sleep {}
sleep 2s
}'
This is something that I use:
#wait for jobs
for job in `jobs -p`; do wait ${job}; done
This is an expansion on the most-upvoted answer, by #Luca Tettamanti, to make a fully-runnable example.
That answer left me wondering:
What type of variable is n_procs, and what does it contain? What type of variable is procs, and what does it contain? Can someone please update this answer to make it runnable by adding definitions for those variables? I don't understand how.
...and also:
How do you get the return code from the subprocess when it has completed (which is the whole crux of this question)?
Anyway, I figured it out, so here is a fully-runnable example.
Notes:
$! is how to obtain the PID (Process ID) of the last-executed sub-process.
Running any command with the & after it, like cmd &, for example, causes it to run in the background as a parallel suprocess with the main process.
myarray=() is how to create an array in bash.
To learn a tiny bit more about the wait built-in command, see help wait. See also, and especially, the official Bash user manual on Job Control built-ins, such as wait and jobs, here: https://www.gnu.org/software/bash/manual/html_node/Job-Control-Builtins.html#index-wait.
Full, runnable program: wait for all processes to end
multi_process_program.sh (from my eRCaGuy_hello_world repo):
#!/usr/bin/env bash
# This is a special sleep function which returns the number of seconds slept as
# the "error code" or return code" so that we can easily see that we are in
# fact actually obtaining the return code of each process as it finishes.
my_sleep() {
seconds_to_sleep="$1"
sleep "$seconds_to_sleep"
return "$seconds_to_sleep"
}
# Create an array of whatever commands you want to run as subprocesses
procs=() # bash array
procs+=("my_sleep 5")
procs+=("my_sleep 2")
procs+=("my_sleep 3")
procs+=("my_sleep 4")
num_procs=${#procs[#]} # number of processes
echo "num_procs = $num_procs"
# run commands as subprocesses and store pids in an array
pids=() # bash array
for (( i=0; i<"$num_procs"; i++ )); do
echo "cmd = ${procs[$i]}"
${procs[$i]} & # run the cmd as a subprocess
# store pid of last subprocess started; see:
# https://unix.stackexchange.com/a/30371/114401
pids+=("$!")
echo " pid = ${pids[$i]}"
done
# OPTION 1 (comment this option out if using Option 2 below): wait for all pids
for pid in "${pids[#]}"; do
wait "$pid"
return_code="$?"
echo "PID = $pid; return_code = $return_code"
done
echo "All $num_procs processes have ended."
Change the file above to be executable by running chmod +x multi_process_program.sh, then run it like this:
time ./multi_process_program.sh
Sample output. See how the output of the time command in the call shows it took 5.084sec to run. We were also able to successfully retrieve the return code from each subprocess.
eRCaGuy_hello_world/bash$ time ./multi_process_program.sh
num_procs = 4
cmd = my_sleep 5
pid = 21694
cmd = my_sleep 2
pid = 21695
cmd = my_sleep 3
pid = 21697
cmd = my_sleep 4
pid = 21699
PID = 21694; return_code = 5
PID = 21695; return_code = 2
PID = 21697; return_code = 3
PID = 21699; return_code = 4
All 4 processes have ended.
PID 21694 is done; return_code = 5; 3 PIDs remaining.
PID 21695 is done; return_code = 2; 2 PIDs remaining.
PID 21697 is done; return_code = 3; 1 PIDs remaining.
PID 21699 is done; return_code = 4; 0 PIDs remaining.
real 0m5.084s
user 0m0.025s
sys 0m0.061s
Going further: determine live when each individual process ends
If you'd like to do some action as each process finishes, and you don't know when they will finish, you can poll in an infinite while loop to see when each process terminates, then do whatever action you want.
Simply comment out the "OPTION 1" block of code above, and replace it with this "OPTION 2" block instead:
# OR OPTION 2 (comment out Option 1 above if using Option 2): poll to detect
# when each process terminates, and print out when each process finishes!
while true; do
for i in "${!pids[#]}"; do
pid="${pids[$i]}"
# echo "pid = $pid" # debugging
# See if PID is still running; see my answer here:
# https://stackoverflow.com/a/71134379/4561887
ps --pid "$pid" > /dev/null
if [ "$?" -ne 0 ]; then
# PID doesn't exist anymore, meaning it terminated
# 1st, read its return code
wait "$pid"
return_code="$?"
# 2nd, remove this PID from the `pids` array by `unset`ting the
# element at this index; NB: due to how bash arrays work, this does
# NOT actually remove this element from the array. Rather, it
# removes its index from the `"${!pids[#]}"` list of indices,
# adjusts the array count(`"${#pids[#]}"`) accordingly, and it sets
# the value at this index to either a null value of some sort, or
# an empty string (I'm not exactly sure).
unset "pids[$i]"
num_pids="${#pids[#]}"
echo "PID $pid is done; return_code = $return_code;" \
"$num_pids PIDs remaining."
fi
done
# exit the while loop if the `pids` array is empty
if [ "${#pids[#]}" -eq 0 ]; then
break
fi
# Do some small sleep here to keep your polling loop from sucking up
# 100% of one of your CPUs unnecessarily. Sleeping allows other processes
# to run during this time.
sleep 0.1
done
Sample run and output of the full program with Option 1 commented out and Option 2 in-use:
eRCaGuy_hello_world/bash$ ./multi_process_program.sh
num_procs = 4
cmd = my_sleep 5
pid = 22275
cmd = my_sleep 2
pid = 22276
cmd = my_sleep 3
pid = 22277
cmd = my_sleep 4
pid = 22280
PID 22276 is done; return_code = 2; 3 PIDs remaining.
PID 22277 is done; return_code = 3; 2 PIDs remaining.
PID 22280 is done; return_code = 4; 1 PIDs remaining.
PID 22275 is done; return_code = 5; 0 PIDs remaining.
Each of those PID XXXXX is done lines prints out live right after that process has terminated! Notice that even though the process for sleep 5 (PID 22275 in this case) was run first, it finished last, and we successfully detected each process right after it terminated. We also successfully detected each return code, just like in Option 1.
Other References:
*****+ [VERY HELPFUL] Get exit code of a background process - this answer taught me the key principle that (emphasis added):
wait <n> waits until the process with PID is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.
In other words, it helped me know that even after the process is complete, you can still call wait on it to get its return code!
How to check if a process id (PID) exists
my answer
Remove an element from a Bash array - note that elements in a bash array aren't actually deleted, they are just "unset". See my comments in the code above for what that means.
How to use the command-line executable true to make an infinite while loop in bash: https://www.cyberciti.biz/faq/bash-infinite-loop/
I see lots of good examples listed on here, wanted to throw mine in as well.
#! /bin/bash
items="1 2 3 4 5 6"
pids=""
for item in $items; do
sleep $item &
pids+="$! "
done
for pid in $pids; do
wait $pid
if [ $? -eq 0 ]; then
echo "SUCCESS - Job $pid exited with a status of $?"
else
echo "FAILED - Job $pid exited with a status of $?"
fi
done
I use something very similar to start/stop servers/services in parallel and check each exit status. Works great for me. Hope this helps someone out!
I don't believe it's possible with Bash's builtin functionality.
You can get notification when a child exits:
#!/bin/sh
set -o monitor # enable script job control
trap 'echo "child died"' CHLD
However there's no apparent way to get the child's exit status in the signal handler.
Getting that child status is usually the job of the wait family of functions in the lower level POSIX APIs. Unfortunately Bash's support for that is limited - you can wait for one specific child process (and get its exit status) or you can wait for all of them, and always get a 0 result.
What it appears impossible to do is the equivalent of waitpid(-1), which blocks until any child process returns.
The following code will wait for completion of all calculations and return exit status 1 if any of doCalculations fails.
#!/bin/bash
for i in $(seq 0 9); do
(doCalculations $i >&2 & wait %1; echo $?) &
done | grep -qv 0 && exit 1
Here's my version that works for multiple pids, logs warnings if execution takes too long, and stops the subprocesses if execution takes longer than a given value.
[EDIT] I have uploaded my newer implementation of WaitForTaskCompletion, called ExecTasks at https://github.com/deajan/ofunctions.
There's also a compat layer for WaitForTaskCompletion
[/EDIT]
function WaitForTaskCompletion {
local pids="${1}" # pids to wait for, separated by semi-colon
local soft_max_time="${2}" # If execution takes longer than $soft_max_time seconds, will log a warning, unless $soft_max_time equals 0.
local hard_max_time="${3}" # If execution takes longer than $hard_max_time seconds, will stop execution, unless $hard_max_time equals 0.
local caller_name="${4}" # Who called this function
local exit_on_error="${5:-false}" # Should the function exit program on subprocess errors
Logger "${FUNCNAME[0]} called by [$caller_name]."
local soft_alert=0 # Does a soft alert need to be triggered, if yes, send an alert once
local log_ttime=0 # local time instance for comparaison
local seconds_begin=$SECONDS # Seconds since the beginning of the script
local exec_time=0 # Seconds since the beginning of this function
local retval=0 # return value of monitored pid process
local errorcount=0 # Number of pids that finished with errors
local pidCount # number of given pids
IFS=';' read -a pidsArray <<< "$pids"
pidCount=${#pidsArray[#]}
while [ ${#pidsArray[#]} -gt 0 ]; do
newPidsArray=()
for pid in "${pidsArray[#]}"; do
if kill -0 $pid > /dev/null 2>&1; then
newPidsArray+=($pid)
else
wait $pid
result=$?
if [ $result -ne 0 ]; then
errorcount=$((errorcount+1))
Logger "${FUNCNAME[0]} called by [$caller_name] finished monitoring [$pid] with exitcode [$result]."
fi
fi
done
## Log a standby message every hour
exec_time=$(($SECONDS - $seconds_begin))
if [ $((($exec_time + 1) % 3600)) -eq 0 ]; then
if [ $log_ttime -ne $exec_time ]; then
log_ttime=$exec_time
Logger "Current tasks still running with pids [${pidsArray[#]}]."
fi
fi
if [ $exec_time -gt $soft_max_time ]; then
if [ $soft_alert -eq 0 ] && [ $soft_max_time -ne 0 ]; then
Logger "Max soft execution time exceeded for task [$caller_name] with pids [${pidsArray[#]}]."
soft_alert=1
SendAlert
fi
if [ $exec_time -gt $hard_max_time ] && [ $hard_max_time -ne 0 ]; then
Logger "Max hard execution time exceeded for task [$caller_name] with pids [${pidsArray[#]}]. Stopping task execution."
kill -SIGTERM $pid
if [ $? == 0 ]; then
Logger "Task stopped successfully"
else
errrorcount=$((errorcount+1))
fi
fi
fi
pidsArray=("${newPidsArray[#]}")
sleep 1
done
Logger "${FUNCNAME[0]} ended for [$caller_name] using [$pidCount] subprocesses with [$errorcount] errors."
if [ $exit_on_error == true ] && [ $errorcount -gt 0 ]; then
Logger "Stopping execution."
exit 1337
else
return $errorcount
fi
}
# Just a plain stupid logging function to be replaced by yours
function Logger {
local value="${1}"
echo $value
}
Example, wait for all three processes to finish, log a warning if execution takes loger than 5 seconds, stop all processes if execution takes longer than 120 seconds. Don't exit program on failures.
function something {
sleep 10 &
pids="$!"
sleep 12 &
pids="$pids;$!"
sleep 9 &
pids="$pids;$!"
WaitForTaskCompletion $pids 5 120 ${FUNCNAME[0]} false
}
# Launch the function
someting
If you have bash 4.2 or later available the following might be useful to you. It uses associative arrays to store task names and their "code" as well as task names and their pids. I have also built in a simple rate-limiting method which might come handy if your tasks consume a lot of CPU or I/O time and you want to limit the number of concurrent tasks.
The script launches all tasks in the first loop and consumes the results in the second one.
This is a bit overkill for simple cases but it allows for pretty neat stuff. For example one can store error messages for each task in another associative array and print them after everything has settled down.
#! /bin/bash
main () {
local -A pids=()
local -A tasks=([task1]="echo 1"
[task2]="echo 2"
[task3]="echo 3"
[task4]="false"
[task5]="echo 5"
[task6]="false")
local max_concurrent_tasks=2
for key in "${!tasks[#]}"; do
while [ $(jobs 2>&1 | grep -c Running) -ge "$max_concurrent_tasks" ]; do
sleep 1 # gnu sleep allows floating point here...
done
${tasks[$key]} &
pids+=(["$key"]="$!")
done
errors=0
for key in "${!tasks[#]}"; do
pid=${pids[$key]}
local cur_ret=0
if [ -z "$pid" ]; then
echo "No Job ID known for the $key process" # should never happen
cur_ret=1
else
wait $pid
cur_ret=$?
fi
if [ "$cur_ret" -ne 0 ]; then
errors=$(($errors + 1))
echo "$key (${tasks[$key]}) failed."
fi
done
return $errors
}
main
I've had a go at this and combined all the best parts from the other examples here. This script will execute the checkpids function when any background process exits, and output the exit status without resorting to polling.
#!/bin/bash
set -o monitor
sleep 2 &
sleep 4 && exit 1 &
sleep 6 &
pids=`jobs -p`
checkpids() {
for pid in $pids; do
if kill -0 $pid 2>/dev/null; then
echo $pid is still alive.
elif wait $pid; then
echo $pid exited with zero exit status.
else
echo $pid exited with non-zero exit status.
fi
done
echo
}
trap checkpids CHLD
wait
#!/bin/bash
set -m
for i in `seq 0 9`; do
doCalculations $i &
done
while fg; do true; done
set -m allows you to use fg & bg in a script
fg, in addition to putting the last process in the foreground, has the same exit status as the process it foregrounds
while fg will stop looping when any fg exits with a non-zero exit status
unfortunately this won't handle the case when a process in the background exits with a non-zero exit status. (the loop won't terminate immediately. it will wait for the previous processes to complete.)
Wait for all jobs and return the exit code of the last failing job. Unlike solutions above, this does not require pid saving, or modifying inner loops of scripts. Just bg away, and wait.
function wait_ex {
# this waits for all jobs and returns the exit code of the last failing job
ecode=0
while true; do
[ -z "$(jobs)" ] && break
wait -n
err="$?"
[ "$err" != "0" ] && ecode="$err"
done
return $ecode
}
EDIT: Fixed the bug where this could be fooled by a script that ran a command that didn't exist.
Just store the results out of the shell, e.g. in a file.
#!/bin/bash
tmp=/tmp/results
: > $tmp #clean the file
for i in `seq 0 9`; do
(doCalculations $i; echo $i:$?>>$tmp)&
done #iterate
wait #wait until all ready
sort $tmp | grep -v ':0' #... handle as required
I've just been modifying a script to background and parallelise a process.
I did some experimenting (on Solaris with both bash and ksh) and discovered that 'wait' outputs the exit status if it's not zero , or a list of jobs that return non-zero exit when no PID argument is provided. E.g.
Bash:
$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]- Exit 2 sleep 20 && exit 2
[2]+ Exit 1 sleep 10 && exit 1
Ksh:
$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]+ Done(2) sleep 20 && exit 2
[2]+ Done(1) sleep 10 && exit 1
This output is written to stderr, so a simple solution to the OPs example could be:
#!/bin/bash
trap "rm -f /tmp/x.$$" EXIT
for i in `seq 0 9`; do
doCalculations $i &
done
wait 2> /tmp/x.$$
if [ `wc -l /tmp/x.$$` -gt 0 ] ; then
exit 1
fi
While this:
wait 2> >(wc -l)
will also return a count but without the tmp file. This might also be used this way, for example:
wait 2> >(if [ `wc -l` -gt 0 ] ; then echo "ERROR"; fi)
But this isn't very much more useful than the tmp file IMO. I couldn't find a useful way to avoid the tmp file whilst also avoiding running the "wait" in a subshell, which wont work at all.
I needed this, but the target process wasn't a child of current shell, in which case wait $PID doesn't work. I did find the following alternative instead:
while [ -e /proc/$PID ]; do sleep 0.1 ; done
That relies on the presence of procfs, which may not be available (Mac doesn't provide it for example). So for portability, you could use this instead:
while ps -p $PID >/dev/null ; do sleep 0.1 ; done
There are already a lot of answers here, but I am surprised no one seems to have suggested using arrays... So here's what I did - this might be useful to some in the future.
n=10 # run 10 jobs
c=0
PIDS=()
while true
my_function_or_command &
PID=$!
echo "Launched job as PID=$PID"
PIDS+=($PID)
(( c+=1 ))
# required to prevent any exit due to error
# caused by additional commands run which you
# may add when modifying this example
true
do
if (( c < n ))
then
continue
else
break
fi
done
# collect launched jobs
for pid in "${PIDS[#]}"
do
wait $pid || echo "failed job PID=$pid"
done
This works, should be just as a good if not better than #HoverHell's answer!
#!/usr/bin/env bash
set -m # allow for job control
EXIT_CODE=0; # exit code of overall script
function foo() {
echo "CHLD exit code is $1"
echo "CHLD pid is $2"
echo $(jobs -l)
for job in `jobs -p`; do
echo "PID => ${job}"
wait ${job} || echo "At least one test failed with exit code => $?" ; EXIT_CODE=1
done
}
trap 'foo $? $$' CHLD
DIRN=$(dirname "$0");
commands=(
"{ echo "foo" && exit 4; }"
"{ echo "bar" && exit 3; }"
"{ echo "baz" && exit 5; }"
)
clen=`expr "${#commands[#]}" - 1` # get length of commands - 1
for i in `seq 0 "$clen"`; do
(echo "${commands[$i]}" | bash) & # run the command via bash in subshell
echo "$i ith command has been issued as a background job"
done
# wait for all to finish
wait;
echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"
# end
and of course, I have immortalized this script, in an NPM project which allows you to run bash commands in parallel, useful for testing:
https://github.com/ORESoftware/generic-subshell
Exactly for this purpose I wrote a bash function called :for.
Note: :for not only preserves and returns the exit code of the failing function, but also terminates all parallel running instance. Which might not be needed in this case.
#!/usr/bin/env bash
# Wait for pids to terminate. If one pid exits with
# a non zero exit code, send the TERM signal to all
# processes and retain that exit code
#
# usage:
# :wait 123 32
function :wait(){
local pids=("$#")
[ ${#pids} -eq 0 ] && return $?
trap 'kill -INT "${pids[#]}" &>/dev/null || true; trap - INT' INT
trap 'kill -TERM "${pids[#]}" &>/dev/null || true; trap - RETURN TERM' RETURN TERM
for pid in "${pids[#]}"; do
wait "${pid}" || return $?
done
trap - INT RETURN TERM
}
# Run a function in parallel for each argument.
# Stop all instances if one exits with a non zero
# exit code
#
# usage:
# :for func 1 2 3
#
# env:
# FOR_PARALLEL: Max functions running in parallel
function :for(){
local f="${1}" && shift
local i=0
local pids=()
for arg in "$#"; do
( ${f} "${arg}" ) &
pids+=("$!")
if [ ! -z ${FOR_PARALLEL+x} ]; then
(( i=(i+1)%${FOR_PARALLEL} ))
if (( i==0 )) ;then
:wait "${pids[#]}" || return $?
pids=()
fi
fi
done && [ ${#pids} -eq 0 ] || :wait "${pids[#]}" || return $?
}
usage
for.sh:
#!/usr/bin/env bash
set -e
# import :for from gist: https://gist.github.com/Enteee/c8c11d46a95568be4d331ba58a702b62#file-for
# if you don't like curl imports, source the actual file here.
source <(curl -Ls https://gist.githubusercontent.com/Enteee/c8c11d46a95568be4d331ba58a702b62/raw/)
msg="You should see this three times"
:(){
i="${1}" && shift
echo "${msg}"
sleep 1
if [ "$i" == "1" ]; then sleep 1
elif [ "$i" == "2" ]; then false
elif [ "$i" == "3" ]; then
sleep 3
echo "You should never see this"
fi
} && :for : 1 2 3 || exit $?
echo "You should never see this"
$ ./for.sh; echo $?
You should see this three times
You should see this three times
You should see this three times
1
References
[1]: blog
[2]: gist
set -e
fail () {
touch .failure
}
expect () {
wait
if [ -f .failure ]; then
rm -f .failure
exit 1
fi
}
sleep 2 || fail &
sleep 2 && false || fail &
sleep 2 || fail
expect
The set -e at top makes your script stop on failure.
expect will return 1 if any subjob failed.
There can be a case where the process is complete before waiting for the process. If we trigger wait for a process that is already finished, it will trigger an error like pid is not a child of this shell. To avoid such cases, the following function can be used to find whether the process is complete or not:
isProcessComplete(){
PID=$1
while [ -e /proc/$PID ]
do
echo "Process: $PID is still running"
sleep 5
done
echo "Process $PID has finished"
}
Starting with Bash 5.1, there is a nice new way of waiting for and handling the results of multiple background jobs thanks to the introduction of wait -p:
#!/usr/bin/env bash
# Spawn background jobs
for ((i=0; i < 10; i++)); do
secs=$((RANDOM % 10)); code=$((RANDOM % 256))
(sleep ${secs}; exit ${code}) &
echo "Started background job (pid: $!, sleep: ${secs}, code: ${code})"
done
# Wait for background jobs, print individual results, determine overall result
result=0
while true; do
wait -n -p pid; code=$?
[[ -z "${pid}" ]] && break
echo "Background job ${pid} finished with code ${code}"
(( ${code} != 0 )) && result=1
done
# Return overall result
exit ${result}
I used this recently (thanks to Alnitak):
#!/bin/bash
# activate child monitoring
set -o monitor
# locking subprocess
(while true; do sleep 0.001; done) &
pid=$!
# count, and kill when all done
c=0
function kill_on_count() {
# you could kill on whatever criterion you wish for
# I just counted to simulate bash's wait with no args
[ $c -eq 9 ] && kill $pid
c=$((c+1))
echo -n '.' # async feedback (but you don't know which one)
}
trap "kill_on_count" CHLD
function save_status() {
local i=$1;
local rc=$2;
# do whatever, and here you know which one stopped
# but remember, you're called from a subshell
# so vars have their values at fork time
}
# care must be taken not to spawn more than one child per loop
# e.g don't use `seq 0 9` here!
for i in {0..9}; do
(doCalculations $i; save_status $i $?) &
done
# wait for locking subprocess to be killed
wait $pid
echo
From there one can easily extrapolate, and have a trigger (touch a file, send a signal) and change the counting criteria (count files touched, or whatever) to respond to that trigger. Or if you just want 'any' non zero rc, just kill the lock from save_status.
Trapping CHLD signal may not work because you can lose some signals if they arrived simultaneously.
#!/bin/bash
trap 'rm -f $tmpfile' EXIT
tmpfile=$(mktemp)
doCalculations() {
echo start job $i...
sleep $((RANDOM % 5))
echo ...end job $i
exit $((RANDOM % 10))
}
number_of_jobs=10
for i in $( seq 1 $number_of_jobs )
do
( trap "echo job$i : exit value : \$? >> $tmpfile" EXIT; doCalculations ) &
done
wait
i=0
while read res; do
echo "$res"
let i++
done < "$tmpfile"
echo $i jobs done !!!
solution to wait for several subprocesses and to exit when any one of them exits with non-zero status code is by using 'wait -n'
#!/bin/bash
wait_for_pids()
{
for (( i = 1; i <= $#; i++ )) do
wait -n $#
status=$?
echo "received status: "$status
if [ $status -ne 0 ] && [ $status -ne 127 ]; then
exit 1
fi
done
}
sleep_for_10()
{
sleep 10
exit 10
}
sleep_for_20()
{
sleep 20
}
sleep_for_10 &
pid1=$!
sleep_for_20 &
pid2=$!
wait_for_pids $pid2 $pid1
status code '127' is for non-existing process which means the child might have exited.
I almost fell into the trap of using jobs -p to collect PIDs, which does not work if the child has already exited, as shown in the script below. The solution I picked was simply calling wait -n N times, where N is the number of children I have, which I happen to know deterministically.
#!/usr/bin/env bash
sleeper() {
echo "Sleeper $1"
sleep $2
echo "Exiting $1"
return $3
}
start_sleepers() {
sleeper 1 1 0 &
sleeper 2 2 $1 &
sleeper 3 5 0 &
sleeper 4 6 0 &
sleep 4
}
echo "Using jobs"
start_sleepers 1
pids=( $(jobs -p) )
echo "PIDS: ${pids[*]}"
for pid in "${pids[#]}"; do
wait "$pid"
echo "Exit code $?"
done
echo "Clearing other children"
wait -n; echo "Exit code $?"
wait -n; echo "Exit code $?"
echo "Waiting for N processes"
start_sleepers 2
for ignored in $(seq 1 4); do
wait -n
echo "Exit code $?"
done
Output:
Using jobs
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
PIDS: 56496 56497
Exiting 3
Exit code 0
Exiting 4
Exit code 0
Clearing other children
Exit code 0
Exit code 1
Waiting for N processes
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
Exit code 0
Exit code 2
Exiting 3
Exit code 0
Exiting 4
Exit code 0

Bash script catch signal but wait afterwards for processes to terminate

currently I'm writing a bash script like this:
foo(){
while true
do
sleep 10
done
}
bar(){
while true
do
sleep 20
done
}
foo &
bar &
wait
(I know there is no point in such a script, it's just about the structure)
Now I want to add signal handling with trap -- <doSomething> RTMIN+1. This works at first. When the script receives the rtmin+1 signal it does doSomething but afterwards it exists (with the 163 exit code, which is the number of the signal being sent).
This is not the behavior I want. I want that after receiving the signal, the script continues to wait for the processes (in this case the two functions) to terminate (which of course will not happen in this case, but the script should wait).
I tried it with adding a ; wait to the things that should be done when receiving the signal, but this does not help (or I'm doing something wrong).
Does anyone know how to achieve the desired behavior?
Thanks in advance and with best wishes.
EDIT: Maybe a more precise example helps:
clock(){
local prefix=C
local interval=1
while true
do
printf "${prefix} $(date '+%d.%m %H:%M:%S')\n"
sleep $interval
done
}
volume(){
prefix=V
volstat="$(amixer get Master 2>/dev/null)"
echo "$volstat" | grep "\[off\]" >/dev/null && icon="" #alternative: deaf:  mute: 
vol=$(echo "$volstat" | grep -o "\[[0-9]\+%\]" | sed "s/[^0-9]*//g;1q")
if [ -z "$icon" ] ; then
if [ "$vol" -gt "50" ]; then
icon=""
#elif [ "$vol" -gt "30" ]; then
# icon=""
else
icon=""
fi
fi
printf "${prefix}%s %3s%%\n" "$icon" "$vol"
}
clock &
volume &
trap -- "volume" RTMIN+2
wait
Now the RTMIN+2 signal should rerun the volume function, but the clock process should not be interrupted. (Up to now, the whole script (with all subprocesses) is terminated upon the receiving of the signal)
When invoked with no operands, wait exits with either 0; which means all process IDs known by the invoking shell have terminated, or a value greater than 128; which means a signal for which a trap has been set is received. So, looping until wait exits with 0 is enough to keep the script alive after receiving a signal. You may also check whether its exit status is less than 128 within the loop, but I don't think that's necessary.
However, if you're sending those signals using pkill, the jobs started at the background will receive them too since child processes inherit the process name from their parent but not the custom signal handlers, and pkill signals all processes whose names match. You need to handle that case as well.
A minimal, working example would be:
#! /bin/bash -
# ignore RTMIN+1 and RTMIN+2 temporarily
# to prevent pkill from killing jobs
trap '' RTMIN+1 RTMIN+2
# start jobs
sleep 20 && echo slept for 20 seconds &
sleep 30 && echo slept for 30 seconds &
# set traps
trap 'echo received rtmin+1' RTMIN+1
trap 'echo received rtmin+2' RTMIN+2
# wait for jobs to terminate
until wait; do
echo still waiting
done
It's also worth to note that ignoring RTMIN+1 and RTMIN+2 before starting jobs prevents shells descending from them from trapping/reseting those signals. If that is a problem, you may set an empty trap within jobs as F. Hauri suggested; or you may totally drop ignoring and use pkill with -o option to send the signal to the oldest matching process.
References:
Shell Command Language § Signals and Error Handling
wait spec. § EXIT STATUS
Related:
Reliably kill sleep process after USR1 signal
Something rewritted
In order to avoid some useless forks.
clock(){ local prefix=C interval=2
trap : RTMIN{,+{{,1}{1,2,3,4,5},6,7,8,9,10}}
while :;do
printf "%s: %(%d.%m %H:%M:%S)T\n" $prefix -1
sleep $interval
done
}
volume(){ local prefix=V vol=() field playback val foo
while IFS=':[]' read field playback val foo;do
[ "$playback" ] && [ -z "${playback//*Playback*}" ] && [ "$val" ] &&
vol+=(${val%\%})
done < <(amixer get Master)
suffix='%%'
if [ "$vol" = "off" ] ;then
icon="" #alternative: deaf:  mute: 
suffix=''
elif (( vol > 50 )) ;then icon=""
elif (( vol > 30 )) ;then icon=""
else icon=""
fi
printf -v values "%3s$suffix " ${vol[#]}
printf "%s%s %s\n" $prefix "$icon" "$values"
}
clock & volume &
trap volume RTMIN+2
trap : RTMIN{,+{{,1}{1,3,4,5},6,7,8,9,10,12}}
echo -e "To get status, run:\n kill -RTMIN+2 $$"
while :;do wait ;done
Regarding my last comment about stereo bug, there is a volume function working for stereo, mono or even quadra:
volume(){
local prefix=V vol=() field playback val foo
local -i overallvol=0
while IFS=':[]' read field playback val foo ;do
[ "$playback" ] && [ -z "${playback//*Playback*}" ] && [ "$val" ] && {
vol+=($val)
val=${val%\%}
overallvol+=${val//off/0}
}
done < <(
amixer get Master
)
overallvol=$overallvol/${#vol[#]}
if (( overallvol == 0 )) ;then
icon=""
elif (( overallvol > 50 )) ;then
icon=""
elif (( overallvol > 30 )) ;then
icon=""
else
icon=""
fi
printf "%s%s %s\n" $prefix "$icon" "${vol[*]}"
}
or even:
volume(){
local prefix=V vol=() field playback val foo icons=(⏻ ¼ ¼ ¼ ½ ½ ¾ ¾ ¾ ¾ ¾)
local -i overallvol=0
while IFS=':[]' read field playback val foo ;do
[ "$playback" ] && [ -z "${playback//*Playback*}" ] && [ "$val" ] && {
vol+=($val)
val=${val%\%}
overallvol+=${val//off/0}
}
done < <(
amixer get Master
)
overallvol=$overallvol/${#vol[#]}
printf "%s%s %s\n" $prefix "${icons[(9+overall)/10]}" "${vol[*]}"
Some explanations
Regarding useless forks in volume() function
I've posted there some ideas to improve the job, reducing resource eating and doing same job of choosing an icon as function of current volume set.
About while :;do wait;done loop
As requested sample stand for an infinite loop in backgrounded sub function, the main script use same infinite loop.
But as question title stand for wait afterwards for processes to terminate, I have to agree with oguz-ismail's comment.
In fact, last line would better be written:
until wait;do :;done
For more information on how wait command work and good practice, please have a look on good oguz-ismail's answer

Run many background tasks and get a final failure exit code if ANY failed? [duplicate]

How to wait in a bash script for several subprocesses spawned from that script to finish, and then return exit code !=0 when any of the subprocesses ends with code !=0?
Simple script:
#!/bin/bash
for i in `seq 0 9`; do
doCalculations $i &
done
wait
The above script will wait for all 10 spawned subprocesses, but it will always give exit status 0 (see help wait). How can I modify this script so it will discover exit statuses of spawned subprocesses and return exit code 1 when any of subprocesses ends with code !=0?
Is there any better solution for that than collecting PIDs of the subprocesses, wait for them in order and sum exit statuses?
wait also (optionally) takes the PID of the process to wait for, and with $! you get the PID of the last command launched in the background.
Modify the loop to store the PID of each spawned sub-process into an array, and then loop again waiting on each PID.
# run processes and store pids in array
for i in $n_procs; do
./procs[${i}] &
pids[${i}]=$!
done
# wait for all pids
for pid in ${pids[*]}; do
wait $pid
done
http://jeremy.zawodny.com/blog/archives/010717.html :
#!/bin/bash
FAIL=0
echo "starting"
./sleeper 2 0 &
./sleeper 2 1 &
./sleeper 3 0 &
./sleeper 2 0 &
for job in `jobs -p`
do
echo $job
wait $job || let "FAIL+=1"
done
echo $FAIL
if [ "$FAIL" == "0" ];
then
echo "YAY!"
else
echo "FAIL! ($FAIL)"
fi
Here is simple example using wait.
Run some processes:
$ sleep 10 &
$ sleep 10 &
$ sleep 20 &
$ sleep 20 &
Then wait for them with wait command:
$ wait < <(jobs -p)
Or just wait (without arguments) for all.
This will wait for all jobs in the background are completed.
If the -n option is supplied, waits for the next job to terminate and returns its exit status.
See: help wait and help jobs for syntax.
However the downside is that this will return on only the status of the last ID, so you need to check the status for each subprocess and store it in the variable.
Or make your calculation function to create some file on failure (empty or with fail log), then check of that file if exists, e.g.
$ sleep 20 && true || tee fail &
$ sleep 20 && false || tee fail &
$ wait < <(jobs -p)
$ test -f fail && echo Calculation failed.
How about simply:
#!/bin/bash
pids=""
for i in `seq 0 9`; do
doCalculations $i &
pids="$pids $!"
done
wait $pids
...code continued here ...
Update:
As pointed by multiple commenters, the above waits for all processes to be completed before continuing, but does not exit and fail if one of them fails, it can be made to do with the following modification suggested by #Bryan, #SamBrightman, and others:
#!/bin/bash
pids=""
RESULT=0
for i in `seq 0 9`; do
doCalculations $i &
pids="$pids $!"
done
for pid in $pids; do
wait $pid || let "RESULT=1"
done
if [ "$RESULT" == "1" ];
then
exit 1
fi
...code continued here ...
If you have GNU Parallel installed you can do:
# If doCalculations is a function
export -f doCalculations
seq 0 9 | parallel doCalculations {}
GNU Parallel will give you exit code:
0 - All jobs ran without error.
1-253 - Some of the jobs failed. The exit status gives the number of failed jobs
254 - More than 253 jobs failed.
255 - Other error.
Watch the intro videos to learn more: http://pi.dk/1
Here's what I've come up with so far. I would like to see how to interrupt the sleep command if a child terminates, so that one would not have to tune WAITALL_DELAY to one's usage.
waitall() { # PID...
## Wait for children to exit and indicate whether all exited with 0 status.
local errors=0
while :; do
debug "Processes remaining: $*"
for pid in "$#"; do
shift
if kill -0 "$pid" 2>/dev/null; then
debug "$pid is still alive."
set -- "$#" "$pid"
elif wait "$pid"; then
debug "$pid exited with zero exit status."
else
debug "$pid exited with non-zero exit status."
((++errors))
fi
done
(("$#" > 0)) || break
# TODO: how to interrupt this sleep when a child terminates?
sleep ${WAITALL_DELAY:-1}
done
((errors == 0))
}
debug() { echo "DEBUG: $*" >&2; }
pids=""
for t in 3 5 4; do
sleep "$t" &
pids="$pids $!"
done
waitall $pids
To parallelize this...
for i in $(whatever_list) ; do
do_something $i
done
Translate it to this...
for i in $(whatever_list) ; do echo $i ; done | ## execute in parallel...
(
export -f do_something ## export functions (if needed)
export PATH ## export any variables that are required
xargs -I{} --max-procs 0 bash -c ' ## process in batches...
{
echo "processing {}" ## optional
do_something {}
}'
)
If an error occurs in one process, it won't interrupt the other processes, but it will result in a non-zero exit code from the sequence as a whole.
Exporting functions and variables may or may not be necessary, in any particular case.
You can set --max-procs based on how much parallelism you want (0 means "all at once").
GNU Parallel offers some additional features when used in place of xargs -- but it isn't always installed by default.
The for loop isn't strictly necessary in this example since echo $i is basically just regenerating the output of $(whatever_list). I just think the use of the for keyword makes it a little easier to see what is going on.
Bash string handling can be confusing -- I have found that using single quotes works best for wrapping non-trivial scripts.
You can easily interrupt the entire operation (using ^C or similar), unlike the the more direct approach to Bash parallelism.
Here's a simplified working example...
for i in {0..5} ; do echo $i ; done |xargs -I{} --max-procs 2 bash -c '
{
echo sleep {}
sleep 2s
}'
This is something that I use:
#wait for jobs
for job in `jobs -p`; do wait ${job}; done
This is an expansion on the most-upvoted answer, by #Luca Tettamanti, to make a fully-runnable example.
That answer left me wondering:
What type of variable is n_procs, and what does it contain? What type of variable is procs, and what does it contain? Can someone please update this answer to make it runnable by adding definitions for those variables? I don't understand how.
...and also:
How do you get the return code from the subprocess when it has completed (which is the whole crux of this question)?
Anyway, I figured it out, so here is a fully-runnable example.
Notes:
$! is how to obtain the PID (Process ID) of the last-executed sub-process.
Running any command with the & after it, like cmd &, for example, causes it to run in the background as a parallel suprocess with the main process.
myarray=() is how to create an array in bash.
To learn a tiny bit more about the wait built-in command, see help wait. See also, and especially, the official Bash user manual on Job Control built-ins, such as wait and jobs, here: https://www.gnu.org/software/bash/manual/html_node/Job-Control-Builtins.html#index-wait.
Full, runnable program: wait for all processes to end
multi_process_program.sh (from my eRCaGuy_hello_world repo):
#!/usr/bin/env bash
# This is a special sleep function which returns the number of seconds slept as
# the "error code" or return code" so that we can easily see that we are in
# fact actually obtaining the return code of each process as it finishes.
my_sleep() {
seconds_to_sleep="$1"
sleep "$seconds_to_sleep"
return "$seconds_to_sleep"
}
# Create an array of whatever commands you want to run as subprocesses
procs=() # bash array
procs+=("my_sleep 5")
procs+=("my_sleep 2")
procs+=("my_sleep 3")
procs+=("my_sleep 4")
num_procs=${#procs[#]} # number of processes
echo "num_procs = $num_procs"
# run commands as subprocesses and store pids in an array
pids=() # bash array
for (( i=0; i<"$num_procs"; i++ )); do
echo "cmd = ${procs[$i]}"
${procs[$i]} & # run the cmd as a subprocess
# store pid of last subprocess started; see:
# https://unix.stackexchange.com/a/30371/114401
pids+=("$!")
echo " pid = ${pids[$i]}"
done
# OPTION 1 (comment this option out if using Option 2 below): wait for all pids
for pid in "${pids[#]}"; do
wait "$pid"
return_code="$?"
echo "PID = $pid; return_code = $return_code"
done
echo "All $num_procs processes have ended."
Change the file above to be executable by running chmod +x multi_process_program.sh, then run it like this:
time ./multi_process_program.sh
Sample output. See how the output of the time command in the call shows it took 5.084sec to run. We were also able to successfully retrieve the return code from each subprocess.
eRCaGuy_hello_world/bash$ time ./multi_process_program.sh
num_procs = 4
cmd = my_sleep 5
pid = 21694
cmd = my_sleep 2
pid = 21695
cmd = my_sleep 3
pid = 21697
cmd = my_sleep 4
pid = 21699
PID = 21694; return_code = 5
PID = 21695; return_code = 2
PID = 21697; return_code = 3
PID = 21699; return_code = 4
All 4 processes have ended.
PID 21694 is done; return_code = 5; 3 PIDs remaining.
PID 21695 is done; return_code = 2; 2 PIDs remaining.
PID 21697 is done; return_code = 3; 1 PIDs remaining.
PID 21699 is done; return_code = 4; 0 PIDs remaining.
real 0m5.084s
user 0m0.025s
sys 0m0.061s
Going further: determine live when each individual process ends
If you'd like to do some action as each process finishes, and you don't know when they will finish, you can poll in an infinite while loop to see when each process terminates, then do whatever action you want.
Simply comment out the "OPTION 1" block of code above, and replace it with this "OPTION 2" block instead:
# OR OPTION 2 (comment out Option 1 above if using Option 2): poll to detect
# when each process terminates, and print out when each process finishes!
while true; do
for i in "${!pids[#]}"; do
pid="${pids[$i]}"
# echo "pid = $pid" # debugging
# See if PID is still running; see my answer here:
# https://stackoverflow.com/a/71134379/4561887
ps --pid "$pid" > /dev/null
if [ "$?" -ne 0 ]; then
# PID doesn't exist anymore, meaning it terminated
# 1st, read its return code
wait "$pid"
return_code="$?"
# 2nd, remove this PID from the `pids` array by `unset`ting the
# element at this index; NB: due to how bash arrays work, this does
# NOT actually remove this element from the array. Rather, it
# removes its index from the `"${!pids[#]}"` list of indices,
# adjusts the array count(`"${#pids[#]}"`) accordingly, and it sets
# the value at this index to either a null value of some sort, or
# an empty string (I'm not exactly sure).
unset "pids[$i]"
num_pids="${#pids[#]}"
echo "PID $pid is done; return_code = $return_code;" \
"$num_pids PIDs remaining."
fi
done
# exit the while loop if the `pids` array is empty
if [ "${#pids[#]}" -eq 0 ]; then
break
fi
# Do some small sleep here to keep your polling loop from sucking up
# 100% of one of your CPUs unnecessarily. Sleeping allows other processes
# to run during this time.
sleep 0.1
done
Sample run and output of the full program with Option 1 commented out and Option 2 in-use:
eRCaGuy_hello_world/bash$ ./multi_process_program.sh
num_procs = 4
cmd = my_sleep 5
pid = 22275
cmd = my_sleep 2
pid = 22276
cmd = my_sleep 3
pid = 22277
cmd = my_sleep 4
pid = 22280
PID 22276 is done; return_code = 2; 3 PIDs remaining.
PID 22277 is done; return_code = 3; 2 PIDs remaining.
PID 22280 is done; return_code = 4; 1 PIDs remaining.
PID 22275 is done; return_code = 5; 0 PIDs remaining.
Each of those PID XXXXX is done lines prints out live right after that process has terminated! Notice that even though the process for sleep 5 (PID 22275 in this case) was run first, it finished last, and we successfully detected each process right after it terminated. We also successfully detected each return code, just like in Option 1.
Other References:
*****+ [VERY HELPFUL] Get exit code of a background process - this answer taught me the key principle that (emphasis added):
wait <n> waits until the process with PID is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.
In other words, it helped me know that even after the process is complete, you can still call wait on it to get its return code!
How to check if a process id (PID) exists
my answer
Remove an element from a Bash array - note that elements in a bash array aren't actually deleted, they are just "unset". See my comments in the code above for what that means.
How to use the command-line executable true to make an infinite while loop in bash: https://www.cyberciti.biz/faq/bash-infinite-loop/
I see lots of good examples listed on here, wanted to throw mine in as well.
#! /bin/bash
items="1 2 3 4 5 6"
pids=""
for item in $items; do
sleep $item &
pids+="$! "
done
for pid in $pids; do
wait $pid
if [ $? -eq 0 ]; then
echo "SUCCESS - Job $pid exited with a status of $?"
else
echo "FAILED - Job $pid exited with a status of $?"
fi
done
I use something very similar to start/stop servers/services in parallel and check each exit status. Works great for me. Hope this helps someone out!
I don't believe it's possible with Bash's builtin functionality.
You can get notification when a child exits:
#!/bin/sh
set -o monitor # enable script job control
trap 'echo "child died"' CHLD
However there's no apparent way to get the child's exit status in the signal handler.
Getting that child status is usually the job of the wait family of functions in the lower level POSIX APIs. Unfortunately Bash's support for that is limited - you can wait for one specific child process (and get its exit status) or you can wait for all of them, and always get a 0 result.
What it appears impossible to do is the equivalent of waitpid(-1), which blocks until any child process returns.
The following code will wait for completion of all calculations and return exit status 1 if any of doCalculations fails.
#!/bin/bash
for i in $(seq 0 9); do
(doCalculations $i >&2 & wait %1; echo $?) &
done | grep -qv 0 && exit 1
Here's my version that works for multiple pids, logs warnings if execution takes too long, and stops the subprocesses if execution takes longer than a given value.
[EDIT] I have uploaded my newer implementation of WaitForTaskCompletion, called ExecTasks at https://github.com/deajan/ofunctions.
There's also a compat layer for WaitForTaskCompletion
[/EDIT]
function WaitForTaskCompletion {
local pids="${1}" # pids to wait for, separated by semi-colon
local soft_max_time="${2}" # If execution takes longer than $soft_max_time seconds, will log a warning, unless $soft_max_time equals 0.
local hard_max_time="${3}" # If execution takes longer than $hard_max_time seconds, will stop execution, unless $hard_max_time equals 0.
local caller_name="${4}" # Who called this function
local exit_on_error="${5:-false}" # Should the function exit program on subprocess errors
Logger "${FUNCNAME[0]} called by [$caller_name]."
local soft_alert=0 # Does a soft alert need to be triggered, if yes, send an alert once
local log_ttime=0 # local time instance for comparaison
local seconds_begin=$SECONDS # Seconds since the beginning of the script
local exec_time=0 # Seconds since the beginning of this function
local retval=0 # return value of monitored pid process
local errorcount=0 # Number of pids that finished with errors
local pidCount # number of given pids
IFS=';' read -a pidsArray <<< "$pids"
pidCount=${#pidsArray[#]}
while [ ${#pidsArray[#]} -gt 0 ]; do
newPidsArray=()
for pid in "${pidsArray[#]}"; do
if kill -0 $pid > /dev/null 2>&1; then
newPidsArray+=($pid)
else
wait $pid
result=$?
if [ $result -ne 0 ]; then
errorcount=$((errorcount+1))
Logger "${FUNCNAME[0]} called by [$caller_name] finished monitoring [$pid] with exitcode [$result]."
fi
fi
done
## Log a standby message every hour
exec_time=$(($SECONDS - $seconds_begin))
if [ $((($exec_time + 1) % 3600)) -eq 0 ]; then
if [ $log_ttime -ne $exec_time ]; then
log_ttime=$exec_time
Logger "Current tasks still running with pids [${pidsArray[#]}]."
fi
fi
if [ $exec_time -gt $soft_max_time ]; then
if [ $soft_alert -eq 0 ] && [ $soft_max_time -ne 0 ]; then
Logger "Max soft execution time exceeded for task [$caller_name] with pids [${pidsArray[#]}]."
soft_alert=1
SendAlert
fi
if [ $exec_time -gt $hard_max_time ] && [ $hard_max_time -ne 0 ]; then
Logger "Max hard execution time exceeded for task [$caller_name] with pids [${pidsArray[#]}]. Stopping task execution."
kill -SIGTERM $pid
if [ $? == 0 ]; then
Logger "Task stopped successfully"
else
errrorcount=$((errorcount+1))
fi
fi
fi
pidsArray=("${newPidsArray[#]}")
sleep 1
done
Logger "${FUNCNAME[0]} ended for [$caller_name] using [$pidCount] subprocesses with [$errorcount] errors."
if [ $exit_on_error == true ] && [ $errorcount -gt 0 ]; then
Logger "Stopping execution."
exit 1337
else
return $errorcount
fi
}
# Just a plain stupid logging function to be replaced by yours
function Logger {
local value="${1}"
echo $value
}
Example, wait for all three processes to finish, log a warning if execution takes loger than 5 seconds, stop all processes if execution takes longer than 120 seconds. Don't exit program on failures.
function something {
sleep 10 &
pids="$!"
sleep 12 &
pids="$pids;$!"
sleep 9 &
pids="$pids;$!"
WaitForTaskCompletion $pids 5 120 ${FUNCNAME[0]} false
}
# Launch the function
someting
If you have bash 4.2 or later available the following might be useful to you. It uses associative arrays to store task names and their "code" as well as task names and their pids. I have also built in a simple rate-limiting method which might come handy if your tasks consume a lot of CPU or I/O time and you want to limit the number of concurrent tasks.
The script launches all tasks in the first loop and consumes the results in the second one.
This is a bit overkill for simple cases but it allows for pretty neat stuff. For example one can store error messages for each task in another associative array and print them after everything has settled down.
#! /bin/bash
main () {
local -A pids=()
local -A tasks=([task1]="echo 1"
[task2]="echo 2"
[task3]="echo 3"
[task4]="false"
[task5]="echo 5"
[task6]="false")
local max_concurrent_tasks=2
for key in "${!tasks[#]}"; do
while [ $(jobs 2>&1 | grep -c Running) -ge "$max_concurrent_tasks" ]; do
sleep 1 # gnu sleep allows floating point here...
done
${tasks[$key]} &
pids+=(["$key"]="$!")
done
errors=0
for key in "${!tasks[#]}"; do
pid=${pids[$key]}
local cur_ret=0
if [ -z "$pid" ]; then
echo "No Job ID known for the $key process" # should never happen
cur_ret=1
else
wait $pid
cur_ret=$?
fi
if [ "$cur_ret" -ne 0 ]; then
errors=$(($errors + 1))
echo "$key (${tasks[$key]}) failed."
fi
done
return $errors
}
main
I've had a go at this and combined all the best parts from the other examples here. This script will execute the checkpids function when any background process exits, and output the exit status without resorting to polling.
#!/bin/bash
set -o monitor
sleep 2 &
sleep 4 && exit 1 &
sleep 6 &
pids=`jobs -p`
checkpids() {
for pid in $pids; do
if kill -0 $pid 2>/dev/null; then
echo $pid is still alive.
elif wait $pid; then
echo $pid exited with zero exit status.
else
echo $pid exited with non-zero exit status.
fi
done
echo
}
trap checkpids CHLD
wait
#!/bin/bash
set -m
for i in `seq 0 9`; do
doCalculations $i &
done
while fg; do true; done
set -m allows you to use fg & bg in a script
fg, in addition to putting the last process in the foreground, has the same exit status as the process it foregrounds
while fg will stop looping when any fg exits with a non-zero exit status
unfortunately this won't handle the case when a process in the background exits with a non-zero exit status. (the loop won't terminate immediately. it will wait for the previous processes to complete.)
Wait for all jobs and return the exit code of the last failing job. Unlike solutions above, this does not require pid saving, or modifying inner loops of scripts. Just bg away, and wait.
function wait_ex {
# this waits for all jobs and returns the exit code of the last failing job
ecode=0
while true; do
[ -z "$(jobs)" ] && break
wait -n
err="$?"
[ "$err" != "0" ] && ecode="$err"
done
return $ecode
}
EDIT: Fixed the bug where this could be fooled by a script that ran a command that didn't exist.
Just store the results out of the shell, e.g. in a file.
#!/bin/bash
tmp=/tmp/results
: > $tmp #clean the file
for i in `seq 0 9`; do
(doCalculations $i; echo $i:$?>>$tmp)&
done #iterate
wait #wait until all ready
sort $tmp | grep -v ':0' #... handle as required
I've just been modifying a script to background and parallelise a process.
I did some experimenting (on Solaris with both bash and ksh) and discovered that 'wait' outputs the exit status if it's not zero , or a list of jobs that return non-zero exit when no PID argument is provided. E.g.
Bash:
$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]- Exit 2 sleep 20 && exit 2
[2]+ Exit 1 sleep 10 && exit 1
Ksh:
$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]+ Done(2) sleep 20 && exit 2
[2]+ Done(1) sleep 10 && exit 1
This output is written to stderr, so a simple solution to the OPs example could be:
#!/bin/bash
trap "rm -f /tmp/x.$$" EXIT
for i in `seq 0 9`; do
doCalculations $i &
done
wait 2> /tmp/x.$$
if [ `wc -l /tmp/x.$$` -gt 0 ] ; then
exit 1
fi
While this:
wait 2> >(wc -l)
will also return a count but without the tmp file. This might also be used this way, for example:
wait 2> >(if [ `wc -l` -gt 0 ] ; then echo "ERROR"; fi)
But this isn't very much more useful than the tmp file IMO. I couldn't find a useful way to avoid the tmp file whilst also avoiding running the "wait" in a subshell, which wont work at all.
I needed this, but the target process wasn't a child of current shell, in which case wait $PID doesn't work. I did find the following alternative instead:
while [ -e /proc/$PID ]; do sleep 0.1 ; done
That relies on the presence of procfs, which may not be available (Mac doesn't provide it for example). So for portability, you could use this instead:
while ps -p $PID >/dev/null ; do sleep 0.1 ; done
There are already a lot of answers here, but I am surprised no one seems to have suggested using arrays... So here's what I did - this might be useful to some in the future.
n=10 # run 10 jobs
c=0
PIDS=()
while true
my_function_or_command &
PID=$!
echo "Launched job as PID=$PID"
PIDS+=($PID)
(( c+=1 ))
# required to prevent any exit due to error
# caused by additional commands run which you
# may add when modifying this example
true
do
if (( c < n ))
then
continue
else
break
fi
done
# collect launched jobs
for pid in "${PIDS[#]}"
do
wait $pid || echo "failed job PID=$pid"
done
This works, should be just as a good if not better than #HoverHell's answer!
#!/usr/bin/env bash
set -m # allow for job control
EXIT_CODE=0; # exit code of overall script
function foo() {
echo "CHLD exit code is $1"
echo "CHLD pid is $2"
echo $(jobs -l)
for job in `jobs -p`; do
echo "PID => ${job}"
wait ${job} || echo "At least one test failed with exit code => $?" ; EXIT_CODE=1
done
}
trap 'foo $? $$' CHLD
DIRN=$(dirname "$0");
commands=(
"{ echo "foo" && exit 4; }"
"{ echo "bar" && exit 3; }"
"{ echo "baz" && exit 5; }"
)
clen=`expr "${#commands[#]}" - 1` # get length of commands - 1
for i in `seq 0 "$clen"`; do
(echo "${commands[$i]}" | bash) & # run the command via bash in subshell
echo "$i ith command has been issued as a background job"
done
# wait for all to finish
wait;
echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"
# end
and of course, I have immortalized this script, in an NPM project which allows you to run bash commands in parallel, useful for testing:
https://github.com/ORESoftware/generic-subshell
Exactly for this purpose I wrote a bash function called :for.
Note: :for not only preserves and returns the exit code of the failing function, but also terminates all parallel running instance. Which might not be needed in this case.
#!/usr/bin/env bash
# Wait for pids to terminate. If one pid exits with
# a non zero exit code, send the TERM signal to all
# processes and retain that exit code
#
# usage:
# :wait 123 32
function :wait(){
local pids=("$#")
[ ${#pids} -eq 0 ] && return $?
trap 'kill -INT "${pids[#]}" &>/dev/null || true; trap - INT' INT
trap 'kill -TERM "${pids[#]}" &>/dev/null || true; trap - RETURN TERM' RETURN TERM
for pid in "${pids[#]}"; do
wait "${pid}" || return $?
done
trap - INT RETURN TERM
}
# Run a function in parallel for each argument.
# Stop all instances if one exits with a non zero
# exit code
#
# usage:
# :for func 1 2 3
#
# env:
# FOR_PARALLEL: Max functions running in parallel
function :for(){
local f="${1}" && shift
local i=0
local pids=()
for arg in "$#"; do
( ${f} "${arg}" ) &
pids+=("$!")
if [ ! -z ${FOR_PARALLEL+x} ]; then
(( i=(i+1)%${FOR_PARALLEL} ))
if (( i==0 )) ;then
:wait "${pids[#]}" || return $?
pids=()
fi
fi
done && [ ${#pids} -eq 0 ] || :wait "${pids[#]}" || return $?
}
usage
for.sh:
#!/usr/bin/env bash
set -e
# import :for from gist: https://gist.github.com/Enteee/c8c11d46a95568be4d331ba58a702b62#file-for
# if you don't like curl imports, source the actual file here.
source <(curl -Ls https://gist.githubusercontent.com/Enteee/c8c11d46a95568be4d331ba58a702b62/raw/)
msg="You should see this three times"
:(){
i="${1}" && shift
echo "${msg}"
sleep 1
if [ "$i" == "1" ]; then sleep 1
elif [ "$i" == "2" ]; then false
elif [ "$i" == "3" ]; then
sleep 3
echo "You should never see this"
fi
} && :for : 1 2 3 || exit $?
echo "You should never see this"
$ ./for.sh; echo $?
You should see this three times
You should see this three times
You should see this three times
1
References
[1]: blog
[2]: gist
set -e
fail () {
touch .failure
}
expect () {
wait
if [ -f .failure ]; then
rm -f .failure
exit 1
fi
}
sleep 2 || fail &
sleep 2 && false || fail &
sleep 2 || fail
expect
The set -e at top makes your script stop on failure.
expect will return 1 if any subjob failed.
There can be a case where the process is complete before waiting for the process. If we trigger wait for a process that is already finished, it will trigger an error like pid is not a child of this shell. To avoid such cases, the following function can be used to find whether the process is complete or not:
isProcessComplete(){
PID=$1
while [ -e /proc/$PID ]
do
echo "Process: $PID is still running"
sleep 5
done
echo "Process $PID has finished"
}
Starting with Bash 5.1, there is a nice new way of waiting for and handling the results of multiple background jobs thanks to the introduction of wait -p:
#!/usr/bin/env bash
# Spawn background jobs
for ((i=0; i < 10; i++)); do
secs=$((RANDOM % 10)); code=$((RANDOM % 256))
(sleep ${secs}; exit ${code}) &
echo "Started background job (pid: $!, sleep: ${secs}, code: ${code})"
done
# Wait for background jobs, print individual results, determine overall result
result=0
while true; do
wait -n -p pid; code=$?
[[ -z "${pid}" ]] && break
echo "Background job ${pid} finished with code ${code}"
(( ${code} != 0 )) && result=1
done
# Return overall result
exit ${result}
I used this recently (thanks to Alnitak):
#!/bin/bash
# activate child monitoring
set -o monitor
# locking subprocess
(while true; do sleep 0.001; done) &
pid=$!
# count, and kill when all done
c=0
function kill_on_count() {
# you could kill on whatever criterion you wish for
# I just counted to simulate bash's wait with no args
[ $c -eq 9 ] && kill $pid
c=$((c+1))
echo -n '.' # async feedback (but you don't know which one)
}
trap "kill_on_count" CHLD
function save_status() {
local i=$1;
local rc=$2;
# do whatever, and here you know which one stopped
# but remember, you're called from a subshell
# so vars have their values at fork time
}
# care must be taken not to spawn more than one child per loop
# e.g don't use `seq 0 9` here!
for i in {0..9}; do
(doCalculations $i; save_status $i $?) &
done
# wait for locking subprocess to be killed
wait $pid
echo
From there one can easily extrapolate, and have a trigger (touch a file, send a signal) and change the counting criteria (count files touched, or whatever) to respond to that trigger. Or if you just want 'any' non zero rc, just kill the lock from save_status.
Trapping CHLD signal may not work because you can lose some signals if they arrived simultaneously.
#!/bin/bash
trap 'rm -f $tmpfile' EXIT
tmpfile=$(mktemp)
doCalculations() {
echo start job $i...
sleep $((RANDOM % 5))
echo ...end job $i
exit $((RANDOM % 10))
}
number_of_jobs=10
for i in $( seq 1 $number_of_jobs )
do
( trap "echo job$i : exit value : \$? >> $tmpfile" EXIT; doCalculations ) &
done
wait
i=0
while read res; do
echo "$res"
let i++
done < "$tmpfile"
echo $i jobs done !!!
solution to wait for several subprocesses and to exit when any one of them exits with non-zero status code is by using 'wait -n'
#!/bin/bash
wait_for_pids()
{
for (( i = 1; i <= $#; i++ )) do
wait -n $#
status=$?
echo "received status: "$status
if [ $status -ne 0 ] && [ $status -ne 127 ]; then
exit 1
fi
done
}
sleep_for_10()
{
sleep 10
exit 10
}
sleep_for_20()
{
sleep 20
}
sleep_for_10 &
pid1=$!
sleep_for_20 &
pid2=$!
wait_for_pids $pid2 $pid1
status code '127' is for non-existing process which means the child might have exited.
I almost fell into the trap of using jobs -p to collect PIDs, which does not work if the child has already exited, as shown in the script below. The solution I picked was simply calling wait -n N times, where N is the number of children I have, which I happen to know deterministically.
#!/usr/bin/env bash
sleeper() {
echo "Sleeper $1"
sleep $2
echo "Exiting $1"
return $3
}
start_sleepers() {
sleeper 1 1 0 &
sleeper 2 2 $1 &
sleeper 3 5 0 &
sleeper 4 6 0 &
sleep 4
}
echo "Using jobs"
start_sleepers 1
pids=( $(jobs -p) )
echo "PIDS: ${pids[*]}"
for pid in "${pids[#]}"; do
wait "$pid"
echo "Exit code $?"
done
echo "Clearing other children"
wait -n; echo "Exit code $?"
wait -n; echo "Exit code $?"
echo "Waiting for N processes"
start_sleepers 2
for ignored in $(seq 1 4); do
wait -n
echo "Exit code $?"
done
Output:
Using jobs
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
PIDS: 56496 56497
Exiting 3
Exit code 0
Exiting 4
Exit code 0
Clearing other children
Exit code 0
Exit code 1
Waiting for N processes
Sleeper 1
Sleeper 2
Sleeper 3
Sleeper 4
Exiting 1
Exiting 2
Exit code 0
Exit code 2
Exiting 3
Exit code 0
Exiting 4
Exit code 0

Bash concurrent jobs gets stuck

I've implemented a way to have concurrent jobs in bash, as seen here.
I'm looping through a file with around 13000 lines. I'm just testing and printing each line, as such:
#!/bin/bash
max_bg_procs(){
if [[ $# -eq 0 ]] ; then
echo "Usage: max_bg_procs NUM_PROCS. Will wait until the number of background (&)"
echo " bash processes (as determined by 'jobs -pr') falls below NUM_PROCS"
return
fi
local max_number=$((0 + ${1:-0}))
while true; do
local current_number=$(jobs -pr | wc -l)
if [[ $current_number -lt $max_number ]]; then
echo "success in if"
break
fi
echo "has to wait"
sleep 4
done
}
download_data(){
echo "link #" $2 "["$1"]"
}
mapfile -t myArray < $1
i=1
for url in "${myArray[#]}"
do
max_bg_procs 6
download_data $url $i &
((i++))
done
echo "finito!"
I've also tried other solutions such as this and this, but my issue is persistent:
At a "random" given step, usually between the 2000th and the 5000th iteration, it simply gets stuck. I've put those various echo in the middle of the code to see where it would get stuck but it the last thing it prints is the $url $i.
I've done the simple test to remove any parallelism and just loop the file contents: all went fine and it looped till the end.
So it makes me think I'm missing some limitation on the parallelism, and I wonder if anyone could help me out figuring it out.
Many thanks!
Here, we have up to 6 parallel bash processes calling download_data, each of which is passed up to 16 URLs per invocation. Adjust per your own tuning.
Note that this expects both bash (for exported function support) and GNU xargs.
#!/usr/bin/env bash
# ^^^^- not /bin/sh
download_data() {
echo "link #$2 [$1]" # TODO: replace this with a job that actually takes some time
}
export -f download_data
<input.txt xargs -d $'\n' -P 6 -n 16 -- bash -c 'for arg; do download_data "$arg"; done' _
Using GNU Parallel it looks like this
cat input.txt | parallel echo link '\#{#} [{}]'
{#} = the job number
{} = the argument
It will spawn one process per CPU. If you instead want 6 in parallel use -j:
cat input.txt | parallel -j6 echo link '\#{#} [{}]'
If you prefer running a function:
download_data(){
echo "link #" $2 "["$1"]"
}
export -f download_data
cat input.txt | parallel -j6 download_data {} {#}

Bash: wait with timeout

In a Bash script, I would like to do something like:
app1 &
pidApp1=$!
app2 &
pidApp2=$1
timeout 60 wait $pidApp1 $pidApp2
kill -9 $pidApp1 $pidApp2
I.e., launch two applications in the background, and give them 60 seconds to complete their work. Then, if they don't finish within that interval, kill them.
Unfortunately, the above does not work, since timeout is an executable, while wait is a shell command. I tried changing it to:
timeout 60 bash -c wait $pidApp1 $pidApp2
But this still does not work, since wait can only be called on a PID launched within the same shell.
Any ideas?
Both your example and the accepted answer are overly complicated, why do you not only use timeout since that is exactly its use case? The timeout command even has an inbuilt option (-k) to send SIGKILL after sending the initial signal to terminate the command (SIGTERM by default) if the command is still running after sending the initial signal (see man timeout).
If the script doesn't necessarily require to wait and resume control flow after waiting it's simply a matter of
timeout -k 60s 60s app1 &
timeout -k 60s 60s app2 &
# [...]
If it does, however, that's just as easy by saving the timeout PIDs instead:
pids=()
timeout -k 60s 60s app1 &
pids+=($!)
timeout -k 60s 60s app2 &
pids+=($!)
wait "${pids[#]}"
# [...]
E.g.
$ cat t.sh
#!/bin/bash
echo "$(date +%H:%M:%S): start"
pids=()
timeout 10 bash -c 'sleep 5; echo "$(date +%H:%M:%S): job 1 terminated successfully"' &
pids+=($!)
timeout 2 bash -c 'sleep 5; echo "$(date +%H:%M:%S): job 2 terminated successfully"' &
pids+=($!)
wait "${pids[#]}"
echo "$(date +%H:%M:%S): done waiting. both jobs terminated on their own or via timeout; resuming script"
.
$ ./t.sh
08:59:42: start
08:59:47: job 1 terminated successfully
08:59:47: done waiting. both jobs terminated on their own or via timeout; resuming script
Write the PIDs to files and start the apps like this:
pidFile=...
( app ; rm $pidFile ; ) &
pid=$!
echo $pid > $pidFile
( sleep 60 ; if [[ -e $pidFile ]]; then killChildrenOf $pid ; fi ; ) &
killerPid=$!
wait $pid
kill $killerPid
That would create another process that sleeps for the timeout and kills the process if it hasn't completed so far.
If the process completes faster, the PID file is deleted and the killer process is terminated.
killChildrenOf is a script that fetches all processes and kills all children of a certain PID. See the answers of this question for different ways to implement this functionality: Best way to kill all child processes
If you want to step outside of BASH, you could write PIDs and timeouts into a directory and watch that directory. Every minute or so, read the entries and check which processes are still around and whether they have timed out.
EDIT If you want to know whether the process has died successfully, you can use kill -0 $pid
EDIT2 Or you can try process groups. kevinarpe said: To get PGID for a PID(146322):
ps -fjww -p 146322 | tail -n 1 | awk '{ print $4 }'
In my case: 145974. Then PGID can be used with a special option of kill to terminate all processes in a group: kill -- -145974
Here's a simplified version of Aaron Digulla's answer, which uses the kill -0 trick that Aaron Digulla leaves in a comment:
app &
pidApp=$!
( sleep 60 ; echo 'timeout'; kill $pidApp ) &
killerPid=$!
wait $pidApp
kill -0 $killerPid && kill $killerPid
In my case, I wanted to be both set -e -x safe and return the status code, so I used:
set -e -x
app &
pidApp=$!
( sleep 45 ; echo 'timeout'; kill $pidApp ) &
killerPid=$!
wait $pidApp
status=$?
(kill -0 $killerPid && kill $killerPid) || true
exit $status
An exit status of 143 indicates SIGTERM, almost certainly from our timeout.
I wrote a bash function that will wait until PIDs finished or until timeout, that return non zero if timeout exceeded and print all the PIDs not finisheds.
function wait_timeout {
local limit=${#:1:1}
local pids=${#:2}
local count=0
while true
do
local have_to_wait=false
for pid in ${pids}; do
if kill -0 ${pid} &>/dev/null; then
have_to_wait=true
else
pids=`echo ${pids} | sed -e "s/${pid}//g"`
fi
done
if ${have_to_wait} && (( $count < $limit )); then
count=$(( count + 1 ))
sleep 1
else
echo ${pids}
return 1
fi
done
return 0
}
To use this is just wait_timeout $timeout $PID1 $PID2 ...
To put in my 2c, we can boild down Teixeira's solution to:
try_wait() {
# Usage: [PID]...
for ((i = 0; i < $#; i += 1)); do
kill -0 $# && sleep 0.001 || return 0
done
return 1 # timeout or no PIDs
} &>/dev/null
Bash's sleep accepts fractional seconds, and 0.001s = 1 ms = 1 KHz = plenty of time. However, UNIX has no loopholes when it comes to files and processes. try_wait accomplishes very little.
$ cat &
[1] 16574
$ try_wait %1 && echo 'exited' || echo 'timeout'
timeout
$ kill %1
$ try_wait %1 && echo 'exited' || echo 'timeout'
exited
We have to answer some hard questions to get further.
Why has wait no timeout parameter? Maybe because the timeout, kill -0, wait and wait -n commands can tell the machine more precisely what we want.
Why is wait builtin to Bash in the first place, so that timeout wait PID is not working? Maybe only so Bash can implement proper signal handling.
Consider:
$ timeout 30s cat &
[1] 6680
$ jobs
[1]+ Running timeout 30s cat &
$ kill -0 %1 && echo 'running'
running
$ # now meditate a bit and then...
$ kill -0 %1 && echo 'running' || echo 'vanished'
bash: kill: (NNN) - No such process
vanished
Whether in the material world or in machines, as we require some
ground on which to run, we require some ground on which to wait too.
When kill fails you hardly know why. Unless you wrote
the process, or its manual names the circumstances, there is no way
to determine a reasonable timeout value.
When you have written the process, you can implement a proper TERM handler or even respond to "Auf Wiedersehen!" send to it through a named pipe. Then you have some ground even for a spell like try_wait :-)
You could use the timeout of the 'read' internal command.
The following will kill unterminated jobs and display the names of the completed jobs after at most 60 seconds:
( (job1; echo -n "job1 ")& (job2; echo -n "job2 ")&) | (read -t 60 -a jobarr; echo ${jobarr[*]} ${#jobarr[*]} )
It works by making a sub shell containing all the background jobs. The output of this sub shell is read into a bash array variable, which can be used as desired (in this example by printing the array + element count).
Be sure to reference the ${jobarr} in the same sub shell as the read command (hence the parenthesis), otherwise ${jobarr} will be empty.
All sub shells will automatically be muted (not killed) after the read command terminates. You have to kill them you self.
app1 &
app2 &
sleep 60 &
wait -n
Yet another timeout bash's script
Running many subprocess with an overall timeout. Using recent bash features, I wrote this:
#!/bin/bash
maxTime=5.0 jobs=() pids=() cnt=1 Started=${EPOCHREALTIME/.}
if [[ $1 == -m ]] ;then maxTime=$2; shift 2; fi
for cmd ;do # $cmd is unquoted in order to use strings as command + args
$cmd &
jobs[$!]=$cnt pids[cnt++]=$!
done
printf -v endTime %.6f $maxTime
endTime=$(( Started + 10#${endTime/.} ))
exec {pio}<> <(:) # Pseudo FD for "builtin sleep" by using "read -t"
while ((${#jobs[#]})) && (( ${EPOCHREALTIME/.} < endTime ));do
for cnt in ${jobs[#]};do
if ! jobs $cnt &>/dev/null;then
Elap=00000$(( ${EPOCHREALTIME/.} - Started ))
printf 'Job %d (%d) ended after %.4f secs.\n' \
$cnt ${pids[cnt]} ${Elap::-6}.${Elap: -6}
unset jobs[${pids[cnt]}] pids[cnt]
fi
done
read -ru $pio -t .02 _
done
if ((${#jobs[#]})) ;then
Elap=00000$(( ${EPOCHREALTIME/.} - Started ))
for cnt in ${jobs[#]};do
printf 'Job %d (%d) killed after %.4f secs.\n' \
$cnt ${pids[cnt]} ${Elap::-6}.${Elap: -6}
done
kill ${pids[#]}
fi
Sample run:
Commands with argument could be submited as strings
-m switch let you choose a float as max time in seconds.
$ ./execTimeout.sh -m 2.3 "sleep 1" 'sleep 2' sleep\ {3,4} 'cat /dev/tty'
Job 1 (460668) ended after 1.0223 secs.
Job 2 (460669) ended after 2.0424 secs.
Job 3 (460670) killed after 2.3100 secs.
Job 4 (460671) killed after 2.3100 secs.
Job 5 (460672) killed after 2.3100 secs.
For testing this, I wrote this script that
choose random duratiopn between 1.0000 and 9.9999 seconds
for output random number of line between 0 and 8. (they could not ouptut anything).
lines output contain process id ($$), number of line left to print and total duration in seconds.
#!/bin/bash
tslp=$RANDOM lnes=${RANDOM: -1}
printf -v tslp %.6f ${tslp::1}.${tslp:1}
slp=00$((${tslp/.}/($lnes?$lnes:1)))
printf -v slp %.6f ${slp::-6}.${slp: -6}
# echo >&2 Slp $lnes x $slp == $tslp
exec {dummy}<> <(: -O)
while read -rt $slp -u $dummy; ((--lnes>0)); do
echo $$ $lnes $tslp
done
Running this script 5 times in once, with a timeout of 5.0 seconds:
$ ./execTimeout.sh -m 5.0 ./tstscript.sh{,,,,}
2869814 6 2.416700
2869815 5 3.645000
2869814 5 2.416700
2869814 4 2.416700
2869815 4 3.645000
2869814 3 2.416700
2869813 5 8.414000
2869812 1 3.408000
2869814 2 2.416700
2869815 3 3.645000
2869814 1 2.416700
2869815 2 3.645000
Job 3 (2869814) ended after 2.4511 secs.
2869813 4 8.414000
2869815 1 3.645000
Job 1 (2869812) ended after 3.4518 secs.
Job 4 (2869815) ended after 3.6757 secs.
2869813 3 8.414000
Job 2 (2869813) killed after 5.0159 secs.
Job 5 (2869816) killed after 5.0159 secs.

Resources