Bash script - execute commands parallely [duplicate] - linux

I have a bash script similar to:
NUM_PROCS=$1
NUM_ITERS=$2
for ((i=0; i<$NUM_ITERS; i++)); do
python foo.py $i arg2 &
done
What's the most straightforward way to limit the number of parallel processes to NUM_PROCS? I'm looking for a solution that doesn't require packages/installations/modules (like GNU Parallel) if possible.
When I tried Charles Duffy's latest approach, I got the following error from bash -x:
+ python run.py args 1
+ python run.py ... 3
+ python run.py ... 4
+ python run.py ... 2
+ read -r line
+ python run.py ... 1
+ read -r line
+ python run.py ... 4
+ read -r line
+ python run.py ... 2
+ read -r line
+ python run.py ... 3
+ read -r line
+ python run.py ... 0
+ read -r line
... continuing with other numbers between 0 and 5, until too many processes were started for the system to handle and the bash script was shut down.

bash 4.4 will have an interesting new type of parameter expansion that simplifies Charles Duffy's answer.
#!/bin/bash
num_procs=$1
num_iters=$2
num_jobs="\j" # The prompt escape for number of jobs currently running
for ((i=0; i<num_iters; i++)); do
while (( ${num_jobs#P} >= num_procs )); do
wait -n
done
python foo.py "$i" arg2 &
done

GNU, macOS/OSX, FreeBSD and NetBSD can all do this with xargs -P, no bash versions or package installs required. Here's 4 processes at a time:
printf "%s\0" {1..10} | xargs -0 -I # -P 4 python foo.py # arg2

As a very simple implementation, depending on a version of bash new enough to have wait -n (to wait until only the next job exits, as opposed to waiting for all jobs):
#!/bin/bash
# ^^^^ - NOT /bin/sh!
num_procs=$1
num_iters=$2
declare -A pids=( )
for ((i=0; i<num_iters; i++)); do
while (( ${#pids[#]} >= num_procs )); do
wait -n
for pid in "${!pids[#]}"; do
kill -0 "$pid" &>/dev/null || unset "pids[$pid]"
done
done
python foo.py "$i" arg2 & pids["$!"]=1
done
If running on a shell without wait -n, one can (very inefficiently) replace it with a command such as sleep 0.2, to poll every 1/5th of a second.
Since you're actually reading input from a file, another approach is to start N subprocesses, each of processes only lines where (linenum % N == threadnum):
num_procs=$1
infile=$2
for ((i=0; i<num_procs; i++)); do
(
while read -r line; do
echo "Thread $i: processing $line"
done < <(awk -v num_procs="$num_procs" -v i="$i" \
'NR % num_procs == i { print }' <"$infile")
) &
done
wait # wait for all the $num_procs subprocesses to finish

A relatively simple way to accomplish this with only two additional lines of code. Explanation is inline.
NUM_PROCS=$1
NUM_ITERS=$2
for ((i=0; i<$NUM_ITERS; i++)); do
python foo.py $i arg2 &
let 'i>=NUM_PROCS' && wait -n # wait for one process at a time once we've spawned $NUM_PROC workers
done
wait # wait for all remaining workers

Are you aware that if you are allowed to write and run your own scripts, then you can also use GNU Parallel? In essence it is a Perl script in one single file.
From the README:
= Minimal installation =
If you just need parallel and do not have 'make' installed (maybe the
system is old or Microsoft Windows):
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel
chmod 755 parallel
cp parallel sem
mv parallel sem dir-in-your-$PATH/bin/
seq $2 | parallel -j$1 python foo.py {} arg2
parallel --embed (available since 20180322) even makes it possible to distribute GNU Parallel as part of a shell script (i.e. no extra files needed):
parallel --embed >newscript
Then edit the end of newscript.

This isn't the simplest solution, but if your version of bash doesn't have "wait -n" and you don't want to use other programs like parallel, awk etc, here is a solution using while and for loops.
num_iters=10
total_threads=4
iter=1
while [[ "$iter" -lt "$num_iters" ]]; do
iters_remainder=$(echo "(${num_iters}-${iter})+1" | bc)
if [[ "$iters_remainder" -lt "$total_threads" ]]; then
threads=$iters_remainder
else
threads=$total_threads
fi
for ((t=1; t<="$threads"; t++)); do
(
# do stuff
) &
((++iter))
done
wait
done

Related

Script to check if vim is open or another script is running?

I'm making a background script that requires a user to input a certain string (a function) to continue. The script runs fine, but will interrupt anything else that is open in vim or any script that is running. Is there a way I can test in my script if the command line is waiting for input to avoid interrupting something?
I'm running the script enclosed in parenthesis to hide the job completion message, so I'm using (. nightFall &)
Here is the script so far:
#!/bin/bash
# nightFall
clear
text=""
echo "Night begins to fall... Now might be a good time to rest."
while [[ "$text" != "rest" ]]
do
read -p "" text
done
Thank you in advance!
If you launch nightFall from the shell you are monitoring, you can use "ps" with the parent PID to see how many processes are launched by the shell as well:
# bg.sh
for k in `seq 1 15`; do
N=$(ps -ef | grep -sw $PPID | grep -v $$ | wc -l)
(( N -= 2 ))
[ "$N" -eq 0 ] && echo "At prompt"
[ "$N" -ne 0 ] && echo "Child processes: $N"
sleep 1
done
Note that I subtract 2 from N: one for the shell process itself and one for the bg.sh script. The remainder is = how many other child processes does the shell have.
Launch the above script from a shell in background:
bash bg.sh &
Then start any command (for example "sleep 15") and it will detect if you are at the prompt or in a command.

Speed up dig -x in bash script

I have to run as an exercise at my university a bash script to reverse lookup all their DNS entries for a B class network block they own.
This is the fastest I have got but takes forever. Any help optimising this code?
#!/bin/bash
network="a.b"
CMD=/usr/bin/dig
for i in $(seq 1 254); do
for y in $(seq 1 254); do
answer=`$CMD -x $network.$i.$y +short`;
echo $network.$i.$y ' resolves to ' $answer >> hosts_a_b.txt;
done
done
Using GNU xargs to run 64 processes at a time might look like:
#!/usr/bin/env bash
lookupArgs() {
for arg; do
# echo entire line together to ensure atomicity
echo "$arg resolves to $(dig -x "$arg" +short)"
done
}
export -f lookupArgs
network="a.b"
for (( x=1; x<=254; x++ )); do
for (( y=1; y<=254; y++ )); do
printf '%s.%s.%s\0' "$network" "$x" "$y"
done
done | xargs -0 -P64 bash -c 'lookupArgs "$#"' _ >hosts_a_b.txt
Note that this doesn't guarantee order of output (and relies on the lookupArgs function doing one write() syscall per result) -- but output is sortable so you should be able to reorder. Otherwise, one could get ordered output (and ensure atomicity of results) by switching to GNU parallel -- a large perl script, vs GNU xargs' small, simple, relatively low-feature implementation.

Shutdown computer when all instances of a given program have finished

I use the following script to check whether wget has finished downloading. To check for this, I'm looking for its PID, and when it is not found the computer shutdowns. This works fine for a single instance of wget, however, I'd like the script to look for all already running wget programs.
#!/bin/bash
while kill -0 $(pidof wget) 2> /dev/null; do
for i in '-' '/' '|' '\'
do
echo -ne "\b$i"
sleep 0.1
done
done
poweroff
EDIT: I'd would be great if the script would check if at least one instance of wget is running and only then check whether wget has finished and shutdown the computer.
In addition to the other answers, you can satisfy your check for at least one wget pid by initially reading the result of pidof wget into an array, for example:
pids=($(pidof wget))
if ((${#pids[#]} > 0)); then
# do your loop
fi
This also brings up a way to routinely monitor the remaining pids as each wget operation completes, for example,
edit
npids=${#pids[#]} ## save original number of pids
while (( ${#pids[#]} -gt 0 )); do ## while pids remain
for ((i = 0; i < npids; i++)); do ## loop, checking remaining pids
kill -0 ${pids[i]} || pids[$i]= ## if not unset in array
done
## do your sleep and spin
done
poweroff
There are probably many more ways to do it. This is just one that came to mind.
I don't think kill is a right Idea,
may be some thing on the lines like this
while [ 1 ]
do
live_wgets=0
for pid in `ps -ef | grep wget| awk '{print $2}'` ; # Adjust the grep
do
live_wgets=$((live_wgets+1))
done
if test $live_wgets -eq 0; then # shutdown
sudo poweroff; # or whatever that suits
fi
sleep 5; # wait for sometime
done
You can adapt your script in the following way:
#!/bin/bash
spin[0]="-"
spin[1]="\\"
spin[2]="|"
spin[3]="/"
DOWNLOAD=`ps -ef | grep wget | grep -v grep`
while [ -n "$DOWNLOAD" ]; do
for i in "${spin[#]}"
do
DOWNLOAD=`ps -ef | grep wget | grep -v grep`
echo -ne "\b$i"
sleep 0.1
done
done
sudo poweroff
However I would recommend using cron instead of an active waiting approach or even use wait
How to wait in bash for several subprocesses to finish and return exit code !=0 when any subprocess ends with code !=0?

How to add threading to the bash script?

#!/bin/bash
cat input.txt | while read ips
do
cmd="$(snmpwalk -v2c -c abc#123 $ips sysUpTimeInstance)"
echo "$ips ---> $cmd"
echo "$ips $cmd" >> out_uptime.txt
done
How can i add threading to this bash script, i have around 80000 input and it takes lot of time?
Simple method. Assuming the order of the output is unimportant, and that snmpwalk's output is of no interest if it should fail, put a && at the end of each of the commands to background, except the last command which should have a & at the end:
#!/bin/bash
while read ips
do
cmd="$(nice snmpwalk -v2c -c abc#123 $ips sysUpTimeInstance)" &&
echo "$ips ---> $cmd" &&
echo "$ips $cmd" >> out_uptime.txt &
done < input.txt
Less simple. If snmpwalk can fail, and that output is also needed, lose the && and surround the code with curly braces,{}, followed by &. To redirect the appended output to include standard error use &>>:
#!/bin/bash
while read ips
do {
cmd="$(nice snmpwalk -v2c -c abc#123 $ips sysUpTimeInstance)"
echo "$ips ---> $cmd"
echo "$ips $cmd" &>> out_uptime.txt
} &
done < input.txt
The braces can contain more complex if ... then ... else ... fi statements, all of which would be backgrounded.
For those who don't have a complex snmpwalk command to test, here's a similar loop, which prints one through five but sleeps for random durations between echo commands:
for f in {1..5}; do
RANDOM=$f &&
sleep $((RANDOM/6000)) &&
echo $f &
done 2> /dev/null | cat
Output will be the same every time, (remove the RANDOM=$f && for varying output), and requires three seconds to run:
2
4
1
3
5
Compare that to code without the &&s and &:
for f in {1..5}; do
RANDOM=$f
sleep $((RANDOM/6000))
echo $f
done 2> /dev/null | cat
When run, the code requires seven seconds to run, with this output:
1
2
3
4
5
You can send tasks to the background by &. If you intend to wait for all of them to finish you can use the wait command:
process_to_background &
echo Processing ...
wait
echo Done
You can get the pid of the given task started in the background if you want to wait for one (or few) specific tasks.
important_process_to_background &
important_pid=$!
while i in {1..10}; do
less_important_process_to_background $i &
done
wait $important_pid
echo Important task finished
wait
echo All tasks finished
On note though: the background processes can mess up the output as they will run asynchronously. You might want to use a named pipe to collect the output from them.

setting a variable on completion of an bg task in ubuntu linux 11.10 64bit shell

so I have this code:
uthreads=4
x=1
cd ./wxWidgets/wxWidgets/build-win
for upxs in $(ls ./lib/*.dll)
do
while [ true ] ; do
if [ $x -lt $uthreads ] ; then
x=$((x+1))
(upx -qqq --best --color $upxs; x=$((x-1))) &
break
fi
sleep 1
done
done
x=$((x=1))
the problem lies in the variable being modified in parenthesis. Naturally this does not work as intended as the variable never gets sent back to the parent shell. So my question is how do you do this? The variable must be incremented after the upx command finishes regardless of it's exit status. so a simple && or || won't work here. and you can't use a single & here after upx; because then the process runs in the background and instantly changes the variable...
So I'm effectively stuck and could use some help...
first of all, this: while [ true ] is wrong, true is a command and so is [ you don't need both here just do while true; do (and use [[ in bash, the other is just for POSIX-sh).
this: for upxs in $(ls ./lib/*.dll) is also wrong, and it's more serious; I'll give you some links with information about all that but for now just do: for upxs in ./lib/*.dll
another important mistake is that you're not quoting your variables, EVERY SINGLE VARIABLE MUST BE DOUBLE QUOTED, if there's anything you learn from this post please let it be that.
now regarding your particular problem the whole approach is wrong, you can't do what you're trying to do because you don't know which process will end first and you wait for a particular one or wait for all of them.
also notice you're running four processes with the same file, I'm sure your intention was to process 4 different files in parallel and you don't wait for them to finish either, chances are after your script ends your processes get killed.
the suboptimal but easy solution would be to have 4 processes running in parallel and wait for all of them to finish before starting 4 more, that is accomplished by this (which is similar to your current approach):
#!/bin/bash
uthreads=4
cd ./wxWidgets/wxWidgets/build-win
upxs=(lib/*.dll)
i=0
while ((i<${#upxs[#]})); do
for ((x=0; x<uthreads; x++)); do
[[ ${upxs[i+x]} ]] || break
upx -qqq --best --color "${upxs[i+x]}" &
done
i=$((i+x))
wait
done
now, this might not be enough to satisfy you :) so the way to implement 4 running processes at all times would be this (using PIDs):
#!/bin/bash
uthreads=4
cd ./wxWidgets/wxWidgets/build-win
upxs=(lib/*.dll)
i=0 pids=()
while [[ ${upxs[i]} ]]; do
while ((${#pids[#]}<uthreads)); do
upx -qqq --best --color "${upxs[i]}" & pids+=($!)
[[ ${upxs[++i]} ]] || break
done
sleep .1
pids=($(for pid in "${pids[#]}"; do kill -0 $pid && echo $pid; done))
done
wait
now here's the links I promised:
http://mywiki.wooledge.org/BashPitfalls#if_.5Bgrep_foo_myfile.5D
http://mywiki.wooledge.org/BashFAQ/001
http://mywiki.wooledge.org/Quotes
http://mywiki.wooledge.org/ProcessManagement
Assuming that you're trying to 'gate' the activity of a program by controlling how many copies of it are running in the background, check out the xargs command. My system doesn't have upx installed, so the best I can recommend as a code sample is
uthreads=4
cd ./wxWidgets/wxWidgets/build-win
ls ./lib/*.dll | xargs -I{} -P $uthreads upx -qqq --best --color {}
^ ^ ^ ^ ^ ^
+>your list | | | | |
| | | | +> item from your lst
| | | +> your cmd's opts
| | +> your cmd
| + xargs opt for # of processes
+ xargs option to indicate what string to replace with
use xargs --help to see all the options, man xargs if you're lucky for in-depth detail on usage, and search on web and here for more examples of usage. The wikipedia article is a good start.
I hope this helps

Resources