I have a script that runs batches of 4 processes at a time, I don't care about getting the return codes of each proc. I don't ever want to run more than 4 procs concurrently. The issue with below approach is that it does not fill up to 4 procs at a time. For example, if proc2 and proc3 finished early, i would like proc 5 and 6 to start, rather than only starting once 1-4 are complete. How can I achieve this in bash?
run_func_1 &
run_func_2 &
run_func_3 &
run_func_4 &
wait
run_func_5 &
run_func_6 &
run_func_7 &
run_func_8 &
wait
I tried to do a custom implementation with pool of workers and queue of jobs.
New worker will take job from the queue as soon as it finishes with previous one.
You can probably adapt this script to whatever you need, but I hope you will see my intentions.
Here's the script:
#!/bin/bash
f1() { echo Started f1; sleep 10; echo Finished f1; }
f2() { echo Started f2; sleep 8; echo Finished f2; }
f3() { echo Started f3; sleep 12; echo Finished f3; }
f4() { echo Started f4; sleep 14; echo Finished f4; }
f5() { echo Started f5; sleep 7; echo Finished f5; }
declare -r MAX_WORKERS=2
declare -a worker_pids
declare -a jobs=('f1' 'f2' 'f3' 'f4' 'f5')
available_worker_index() {
# If number of workers is less than MAX_WORKERS
# We still have workers that are idle
declare worker_count="${#worker_pids[#]}"
if [[ $worker_count -lt $MAX_WORKERS ]]; then
echo "$worker_count"
return 0
fi
# If we reached this code it means
# All workers are already created and executing a job
# We should check which of them finished and return it's index as available
declare -i index=0
for pid in "${worker_pids[#]}"; do
is_running=$(ps -p "$pid" > /dev/null; echo "$?")
if [[ $is_running != 0 ]]; then
echo "$index"
return 0
fi
index+=1
done
echo "None"
}
for job in "${jobs[#]}"; do
declare worker_index
worker_index=$(available_worker_index)
while [[ $worker_index == "None" ]]; do
# Wait for available worker
sleep 3
worker_index=$(available_worker_index)
done
# Run the job in background
"$job" &
# Save it's pid for later
pid="$!"
worker_pids["$worker_index"]="$pid"
done
# Wait all workers to finish
wait
You can easily change size of the worker pool only by changing MAX_WORKERS variable.
With GNU Parallel it is as simple as:
parallel -j4 ::: run_func_{1..8}
Just remember to export -f the functions.
If GNU Parallel is not installed, use
parallel --embed > new_script
to generate a shell script which embeds GNU Parallel. You then simple change the end of new_script.
By default it will run one job per cpu-core. This can be adjusted with --jobs.
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
For security reasons you should install GNU Parallel with your package manager, but if GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel
Related
I am using jobs command to control the number of compute-intensive processes. I want to not run more than max_cnt processes at a time, and stop only when all the processes have stopped.
I use below bash script to accomplish this. However, this code always lists one process as running even after everything has executed and stopped.
Moreover, I can't find that process listed in htop's list of processes. What should I do or where should I look for that process that is listed by the result of echo $(jobs -p) command and how should I fix this issue of not exiting even when everything stops.
#!/usr/bash
SLEEP=5
max_cnt=8
# generate a random number less than eq $1
function random {
rand=$RANDOM
while [ "$rand" -gt $1 ]
do
rand=$RANDOM
done
}
function job {
# resource intensive job simulated by random sleeping
echo param1="$1",param2="$2"
random 20
echo Sleeping for $rand
sleep $rand
}
for param1 in 1e-6 1e-5 1e-4 1e-3 1e-2
do
for param2 in "ones" "random"
do
echo starting job with $param1 $param2
job $param1 $param2 &
while [ "$(jobs -p|wc -l)" -ge "$max_cnt" ]
do
echo "current running jobs.. $(jobs -p|wc -l) ... sleeping"
sleep $SLEEP
done
done
done
while [ "$(jobs -p|wc -l)" -ge 1 ]
do
echo "current running jobs.. $(jobs -p|wc -l) ... sleeping"
sleep $SLEEP
echo $(jobs -p)
done
As mentioned in the comments, you may want to consider using GNU Parallel, it makes life easier when managing parallel jobs. Your code could look like this:
#!/usr/bin/env bash
function job {
# resource intensive job simulated by random sleeping
echo param1="$1",param2="$2"
((s=(RANDOM%5)+1))
echo Sleeping for $s
sleep $s
}
# export function to subshells
export -f job
parallel -j8 job {1} {2} ::: 1e-6 1e-5 1e-4 1e-3 1e-2 ::: "ones" "random"
Sample Output
param1=1e-6,param2=ones
Sleeping for 1
param1=1e-5,param2=ones
Sleeping for 1
param1=1e-2,param2=ones
Sleeping for 1
param1=1e-4,param2=ones
Sleeping for 2
param1=1e-4,param2=random
Sleeping for 2
param1=1e-6,param2=random
Sleeping for 4
param1=1e-2,param2=random
Sleeping for 3
param1=1e-3,param2=random
Sleeping for 4
param1=1e-5,param2=random
Sleeping for 5
param1=1e-3,param2=ones
Sleeping for 5
There are many other switches and parameters:
parallel --dry-run ... will show you what it would do, without actually doing anything
parallel --eta ... which gives you an "Estimated Time of Arrival"
parallel --bar ... which gives you a progress bar
parallel -k ... which keeps output in order
parallel -j 8 ... which runs 8 jobs at a time rather than the default of 1 job per CPU core
parallel --pipepart ... which will split the contents of a massive file across subprocesses
Note also that GNU Parallel can distribute work across other machines in your network, and it has fail and retry handling, output tagging and so on...
i have a while loop reading lines from a $hosts
while read line
do
ip=$line
check
done < $hosts
my question is can I use some way to speed this up or run the check on 10 hosts at a time and each check is on a different IP and finish when all IP in $host have been checked?
Thanks
You can send tasks to the background by &
If you intend to wait for all of them to finish you can use the wait command:
process_to_background &
echo Processing ...
wait
echo Done
You can get the pid of the given task started in the background if you want to wait for one (or few) specific tasks.
important_process_to_background &
important_pid=$!
while i in {1..10}; do
less_important_process_to_background $i &
done
wait $important_pid
echo Important task finished
wait
echo All tasks finished
On note though: the background processes can mess up the output as they will run asynchronously. You might want to use a named pipe to collect the output from them.
edit
As asked in the comments there might be a need for limiting the background processes forked. In this case you can keep track of how many background processes you've started and communicate with them through a named pipe.
mkfifo tmp # creating named pipe
counter=0
while read ip
do
if [ $counter -lt 10 ]; then # we are under the limit
{ check $ip; echo 'done' > tmp; } &
let $[counter++];
else
read x < tmp # waiting for a process to finish
{ check $ip; echo 'done' > tmp; } &
fi
done
cat /tmp > /dev/null # let all the background processes end
rm tmp # remove fifo
You can start multiple processes, each calling the function check and wait for them to finish.
while read line
do
ip=$line
check &
done < $hosts
wait # wait for all child processes to finish
Whether this increases the speed depends on available processors and the function check's implementation. You have to ensure there's no data dependency in check between iterations.
Use GNU Parallel:
parallel check ::: $hosts
I have access to a machine where I have access to 10 of the cores -- and I would like to actually use them. What I am used to doing on my own machine would be something like this:
for f in *.fa; do
myProgram (options) "./$f" "./$f.tmp"
done
I have 10 files I'd like to do this on -- let's call them blah00.fa, blah01.fa, ... blah09.fa.
The problem with this approach is that myProgram only uses 1 core at a time, and doing it like this on the multi-core machine I'd be using 1 core at a time 10 times, so I wouldn't be using my mahcine to its max capability.
How could I change my script so that it runs all 10 of my .fa files at the same time? I looked at Run a looped process in bash across multiple cores but I couldn't get the command from that to do what I wanted exactly.
You could use
for f in *.fa; do
myProgram (options) "./$f" "./$f.tmp" &
done
wait
which would start all of you jobs in parallel, then wait until they all complete before moving on. In the case where you have more jobs than cores, you would start all of them and let your OS scheduler worry about swapping processes in an out.
One modification is to start 10 jobs at a time
count=0
for f in *.fa; do
myProgram (options) "./$f" "./$f.tmp" &
(( count ++ ))
if (( count = 10 )); then
wait
count=0
fi
done
but this is inferior to using parallel because you can't start new jobs as old ones finish, and you also can't detect if an older job finished before you manage to start 10 jobs. wait allows you to wait on a single particular process or all background processes, but doesn't let you know when any one of an arbitrary set of background processes complete.
With GNU Parallel you can do:
parallel myProgram (options) {} {.}.tmp ::: *.fa
From: http://git.savannah.gnu.org/cgit/parallel.git/tree/README
= Full installation =
Full installation of GNU Parallel is as simple as:
./configure && make && make install
If you are not root you can add ~/bin to your path and install in
~/bin and ~/share:
./configure --prefix=$HOME && make && make install
Or if your system lacks 'make' you can simply copy src/parallel
src/sem src/niceload src/sql to a dir in your path.
= Minimal installation =
If you just need parallel and do not have 'make' installed (maybe the
system is old or Microsoft Windows):
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
mv parallel sem dir-in-your-$PATH/bin/
Watch the intro videos to learn more: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
# Wait while instance count less than $3, run additional instance and exit
function runParallel () {
cmd=$1
args=$2
number=$3
currNumber="1024"
while true ; do
currNumber=`ps -e | grep -v "grep" | grep " $1$" | wc -l`
if [ $currNumber -lt $number ] ; then
break
fi
sleep 1
done
echo "run: $cmd $args"
$cmd $args &
}
loop=0
# We will run 12 sleep commands for 10 seconds each
# and only five of them will work simultaneously
while [ $loop -ne 12 ] ; do
runParallel "sleep" 10 5
loop=`expr $loop + 1`
done
I have a bash script that launches a child process that crashes (actually, hangs) from time to time and with no apparent reason (closed source, so there isn't much I can do about it). As a result, I would like to be able to launch this process for a given amount of time, and kill it if it did not return successfully after a given amount of time.
Is there a simple and robust way to achieve that using bash?
P.S.: tell me if this question is better suited to serverfault or superuser.
(As seen in:
BASH FAQ entry #68: "How do I run a command, and have it abort (timeout) after N seconds?")
If you don't mind downloading something, use timeout (sudo apt-get install timeout) and use it like: (most Systems have it already installed otherwise use sudo apt-get install coreutils)
timeout 10 ping www.goooooogle.com
If you don't want to download something, do what timeout does internally:
( cmdpid=$BASHPID; (sleep 10; kill $cmdpid) & exec ping www.goooooogle.com )
In case that you want to do a timeout for longer bash code, use the second option as such:
( cmdpid=$BASHPID;
(sleep 10; kill $cmdpid) \
& while ! ping -w 1 www.goooooogle.com
do
echo crap;
done )
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) &
or to get the exit codes as well:
# Spawn a child process:
(dosmth) & pid=$!
# in the background, sleep for 10 secs then kill that process
(sleep 10 && kill -9 $pid) & waiter=$!
# wait on our worker process and return the exitcode
exitcode=$(wait $pid && echo $?)
# kill the waiter subshell, if it still runs
kill -9 $waiter 2>/dev/null
# 0 if we killed the waiter, cause that means the process finished before the waiter
finished_gracefully=$?
sleep 999&
t=$!
sleep 10
kill $t
I also had this question and found two more things very useful:
The SECONDS variable in bash.
The command "pgrep".
So I use something like this on the command line (OSX 10.9):
ping www.goooooogle.com & PING_PID=$(pgrep 'ping'); SECONDS=0; while pgrep -q 'ping'; do sleep 0.2; if [ $SECONDS = 10 ]; then kill $PING_PID; fi; done
As this is a loop I included a "sleep 0.2" to keep the CPU cool. ;-)
(BTW: ping is a bad example anyway, you just would use the built-in "-t" (timeout) option.)
Assuming you have (or can easily make) a pid file for tracking the child's pid, you could then create a script that checks the modtime of the pid file and kills/respawns the process as needed. Then just put the script in crontab to run at approximately the period you need.
Let me know if you need more details. If that doesn't sound like it'd suit your needs, what about upstart?
One way is to run the program in a subshell, and communicate with the subshell through a named pipe with the read command. This way you can check the exit status of the process being run and communicate this back through the pipe.
Here's an example of timing out the yes command after 3 seconds. It gets the PID of the process using pgrep (possibly only works on Linux). There is also some problem with using a pipe in that a process opening a pipe for read will hang until it is also opened for write, and vice versa. So to prevent the read command hanging, I've "wedged" open the pipe for read with a background subshell. (Another way to prevent a freeze to open the pipe read-write, i.e. read -t 5 <>finished.pipe - however, that also may not work except with Linux.)
rm -f finished.pipe
mkfifo finished.pipe
{ yes >/dev/null; echo finished >finished.pipe ; } &
SUBSHELL=$!
# Get command PID
while : ; do
PID=$( pgrep -P $SUBSHELL yes )
test "$PID" = "" || break
sleep 1
done
# Open pipe for writing
{ exec 4>finished.pipe ; while : ; do sleep 1000; done } &
read -t 3 FINISHED <finished.pipe
if [ "$FINISHED" = finished ] ; then
echo 'Subprocess finished'
else
echo 'Subprocess timed out'
kill $PID
fi
rm finished.pipe
Here's an attempt which tries to avoid killing a process after it has already exited, which reduces the chance of killing another process with the same process ID (although it's probably impossible to avoid this kind of error completely).
run_with_timeout ()
{
t=$1
shift
echo "running \"$*\" with timeout $t"
(
# first, run process in background
(exec sh -c "$*") &
pid=$!
echo $pid
# the timeout shell
(sleep $t ; echo timeout) &
waiter=$!
echo $waiter
# finally, allow process to end naturally
wait $pid
echo $?
) \
| (read pid
read waiter
if test $waiter != timeout ; then
read status
else
status=timeout
fi
# if we timed out, kill the process
if test $status = timeout ; then
kill $pid
exit 99
else
# if the program exited normally, kill the waiting shell
kill $waiter
exit $status
fi
)
}
Use like run_with_timeout 3 sleep 10000, which runs sleep 10000 but ends it after 3 seconds.
This is like other answers which use a background timeout process to kill the child process after a delay. I think this is almost the same as Dan's extended answer (https://stackoverflow.com/a/5161274/1351983), except the timeout shell will not be killed if it has already ended.
After this program has ended, there will still be a few lingering "sleep" processes running, but they should be harmless.
This may be a better solution than my other answer because it does not use the non-portable shell feature read -t and does not use pgrep.
Here's the third answer I've submitted here. This one handles signal interrupts and cleans up background processes when SIGINT is received. It uses the $BASHPID and exec trick used in the top answer to get the PID of a process (in this case $$ in a sh invocation). It uses a FIFO to communicate with a subshell that is responsible for killing and cleanup. (This is like the pipe in my second answer, but having a named pipe means that the signal handler can write into it too.)
run_with_timeout ()
{
t=$1 ; shift
trap cleanup 2
F=$$.fifo ; rm -f $F ; mkfifo $F
# first, run main process in background
"$#" & pid=$!
# sleeper process to time out
( sh -c "echo \$\$ >$F ; exec sleep $t" ; echo timeout >$F ) &
read sleeper <$F
# control shell. read from fifo.
# final input is "finished". after that
# we clean up. we can get a timeout or a
# signal first.
( exec 0<$F
while : ; do
read input
case $input in
finished)
test $sleeper != 0 && kill $sleeper
rm -f $F
exit 0
;;
timeout)
test $pid != 0 && kill $pid
sleeper=0
;;
signal)
test $pid != 0 && kill $pid
;;
esac
done
) &
# wait for process to end
wait $pid
status=$?
echo finished >$F
return $status
}
cleanup ()
{
echo signal >$$.fifo
}
I've tried to avoid race conditions as far as I can. However, one source of error I couldn't remove is when the process ends near the same time as the timeout. For example, run_with_timeout 2 sleep 2 or run_with_timeout 0 sleep 0. For me, the latter gives an error:
timeout.sh: line 250: kill: (23248) - No such process
as it is trying to kill a process that has already exited by itself.
#Kill command after 10 seconds
timeout 10 command
#If you don't have timeout installed, this is almost the same:
sh -c '(sleep 10; kill "$$") & command'
#The same as above, with muted duplicate messages:
sh -c '(sleep 10; kill "$$" 2>/dev/null) & command'
I want to limit the execution time of a program I am running under Linux. I put in my scons script a line like:
Command("com","","ulimit -t 1; myprogram")
and tested it with an infinite loop program: it did not work and the program ran forever.
Am I missing something?
-- tsf
ulimit -t 1 means that the limit is set to 1 second of CPU time. If your infinite loop program uses any sort of sleep in its inner loop then it will use practically no CPU time. This means it will not get killed in 1 second of real, on the clock time. In fact it may take minutes or hours to use up its 1 second allocation.
What happens if you run the command outside of SCons? Perhaps you don't have permission to change the limit at all...
ulimit -t 1; ./myprogram
For example, it may say the following if the limit is already set to 0:
bash: ulimit: cpu time: cannot modify limit: Operation not permitted
Edit: it seems that the -t option is broken on Ubuntu 9.04. A fix has been committed 05 June 2009, but it may take a while to trickle into the updates - it may not be fixed until 9.10.
As an historical note, this problem no longer exists in Ubuntu 10.04.
You can also use this script:
(taken from http://newsgroups.derkeiler.com/Archive/Comp/comp.sys.mac.system/2005-12/msg00247.html)
#!/bin/sh
# timeout script
#
usage()
{
echo "usage: timeout seconds command args ..."
exit 1
}
[[ $# -lt 2 ]] && usage
seconds=$1; shift
timeout()
{
sleep $seconds
kill -9 $pid >/dev/null 2>/dev/null
}
eval "$#" &
pid=$!
timeout &
wait $pid
.