bash while loop threading - multithreading

i have a while loop reading lines from a $hosts
while read line
do
ip=$line
check
done < $hosts
my question is can I use some way to speed this up or run the check on 10 hosts at a time and each check is on a different IP and finish when all IP in $host have been checked?
Thanks

You can send tasks to the background by &
If you intend to wait for all of them to finish you can use the wait command:
process_to_background &
echo Processing ...
wait
echo Done
You can get the pid of the given task started in the background if you want to wait for one (or few) specific tasks.
important_process_to_background &
important_pid=$!
while i in {1..10}; do
less_important_process_to_background $i &
done
wait $important_pid
echo Important task finished
wait
echo All tasks finished
On note though: the background processes can mess up the output as they will run asynchronously. You might want to use a named pipe to collect the output from them.
edit
As asked in the comments there might be a need for limiting the background processes forked. In this case you can keep track of how many background processes you've started and communicate with them through a named pipe.
mkfifo tmp # creating named pipe
counter=0
while read ip
do
if [ $counter -lt 10 ]; then # we are under the limit
{ check $ip; echo 'done' > tmp; } &
let $[counter++];
else
read x < tmp # waiting for a process to finish
{ check $ip; echo 'done' > tmp; } &
fi
done
cat /tmp > /dev/null # let all the background processes end
rm tmp # remove fifo

You can start multiple processes, each calling the function check and wait for them to finish.
while read line
do
ip=$line
check &
done < $hosts
wait # wait for all child processes to finish
Whether this increases the speed depends on available processors and the function check's implementation. You have to ensure there's no data dependency in check between iterations.

Use GNU Parallel:
parallel check ::: $hosts

Related

Fill up four slots of parallel processes constantly even when some finish

I have a script that runs batches of 4 processes at a time, I don't care about getting the return codes of each proc. I don't ever want to run more than 4 procs concurrently. The issue with below approach is that it does not fill up to 4 procs at a time. For example, if proc2 and proc3 finished early, i would like proc 5 and 6 to start, rather than only starting once 1-4 are complete. How can I achieve this in bash?
run_func_1 &
run_func_2 &
run_func_3 &
run_func_4 &
wait
run_func_5 &
run_func_6 &
run_func_7 &
run_func_8 &
wait
I tried to do a custom implementation with pool of workers and queue of jobs.
New worker will take job from the queue as soon as it finishes with previous one.
You can probably adapt this script to whatever you need, but I hope you will see my intentions.
Here's the script:
#!/bin/bash
f1() { echo Started f1; sleep 10; echo Finished f1; }
f2() { echo Started f2; sleep 8; echo Finished f2; }
f3() { echo Started f3; sleep 12; echo Finished f3; }
f4() { echo Started f4; sleep 14; echo Finished f4; }
f5() { echo Started f5; sleep 7; echo Finished f5; }
declare -r MAX_WORKERS=2
declare -a worker_pids
declare -a jobs=('f1' 'f2' 'f3' 'f4' 'f5')
available_worker_index() {
# If number of workers is less than MAX_WORKERS
# We still have workers that are idle
declare worker_count="${#worker_pids[#]}"
if [[ $worker_count -lt $MAX_WORKERS ]]; then
echo "$worker_count"
return 0
fi
# If we reached this code it means
# All workers are already created and executing a job
# We should check which of them finished and return it's index as available
declare -i index=0
for pid in "${worker_pids[#]}"; do
is_running=$(ps -p "$pid" > /dev/null; echo "$?")
if [[ $is_running != 0 ]]; then
echo "$index"
return 0
fi
index+=1
done
echo "None"
}
for job in "${jobs[#]}"; do
declare worker_index
worker_index=$(available_worker_index)
while [[ $worker_index == "None" ]]; do
# Wait for available worker
sleep 3
worker_index=$(available_worker_index)
done
# Run the job in background
"$job" &
# Save it's pid for later
pid="$!"
worker_pids["$worker_index"]="$pid"
done
# Wait all workers to finish
wait
You can easily change size of the worker pool only by changing MAX_WORKERS variable.
With GNU Parallel it is as simple as:
parallel -j4 ::: run_func_{1..8}
Just remember to export -f the functions.
If GNU Parallel is not installed, use
parallel --embed > new_script
to generate a shell script which embeds GNU Parallel. You then simple change the end of new_script.
By default it will run one job per cpu-core. This can be adjusted with --jobs.
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
For security reasons you should install GNU Parallel with your package manager, but if GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Linux : launch a specific action when another process is terminated

I Have a script foo.sh that launches 5 process of bfoo.sh in background like this :
for i in {1..5}
do
./bfoo.sh &
done
wait
echo ok
and I use it like this :
./foo.sh
In foo.s after the for loop, I want to do something like i.e. for each process bfoo.sh terminated do
echo $PID_Terminated
To achieve this, you need to store the PID of each of the background process of bfoo.sh. The $! contains the process id that was last backgrounded by the shell. We append them one at a time to the array and iterate it over later
Remember this runs your background process one after the other since you have wait on each process id separately.
#!/usr/bin/env bash
pidArray=()
for i in {1..5}; do
./bfoo.sh &
pidArray+=( "$!" )
done
Now wait on each of the processes and in a loop
for pid in "${pidArray[#]}"; do
wait "$pid"
printf 'process-id: %d finished with code %d\n' "$pid" "$?"
done
I have additionally added the exit code of the background process $? when it finishes, so that any abnormal exit can be debugged.

Delaying not preventing Bash function from simultaneous execution

I need to block the simultaneous calling of highCpuFunction function. I have tried to create a blocking mechanism, but it is not working. How can I do this?
nameOftheScript="$(basename $0)"
pidOftheScript="$$"
highCpuFunction()
{
# Function with code causing high CPU usage. Like tar, zip, etc.
while [ -f /tmp/"$nameOftheScript"* ];
do
sleep 5;
done
touch /tmp/"$nameOftheScript"_"$pidOftheScript"
echo "$(date +%s) I am a Bad function you do not want to call me simultaniously..."
# Real high CPU usage code for reaching the database and
# parsing logs. It takes the heck out of the CPU.
rm -rf /tmp/"$nameOftheScript"_"$pidOftheScript" 2>/dev/null
}
while true
do
sleep 2
highCpuFunction
done
# The rest of the code...
In short, I want to run highCpuFunction at least with a gap of 5 seconds. Regardless of the instance/user/terminal. I need to allow other users to run this function but in proper sequence and with a gap of at least 5 seconds.
Use the flock tool. Consider this code (let's call it 'onlyoneofme.sh'):
#!/bin/sh
exec 9>/var/lock/myexclusivelock
flock 9
echo start
sleep 10
echo stop
It will open file /var/lock/myexclusivelock as descriptor 9 and then try to lock it exclusively. Only one instance of the script will be allowed to pass behind the flock 9 command. The rest of them will wait for the other script to finish (so the descriptor will be closed and the lock freed). After this, the next script will acquire the lock and execute, and so on.
In the following solution the # rest of the script part can be executed only by one process. The test and set is atomic, and there isn't any race condition, whereas test -f file .. touch file, two processes can touch the file.
try_acquire_lock() {
local lock_file=$1
# Noclobber option to fail if the file already exists
# in a sub-shell to avoid modifying current shell options
( set -o noclobber; : >"$lock_file")
}
# Trap to remove the file when the process exits
trap 'rm "$lock_file"' EXIT
lock_file=/tmp/"$nameOftheScript"_"$pidOftheScript"
while ! try_acquire_lock "$lock_file";
do
echo "failed to acquire lock, sleeping 5sec.."
sleep 5;
done
# The rest of the script
It's not optimal, because the wait is done in a loop with sleep. To improve, one can use inter process communication (FIFO), or operating system notifications or signals.
# Block current shell process
kill -STOP $BASHPID
# Unblock blocked shell process (where <pid> is the id of the blocked process)
kill -CONT <pid>

Run while loop until background process finish its job

In linux shell script i have
vlc /some/file/path.mkv &
wait
Now until my background process return wait call will be blocked. But here i want that until my background process return i want to execute one loop and in that print some data continuous. So when ever my background process return/exit i need to break that loop.
How to do that in shell script?
See if this works for you
vlc /some/file/path.mkv &
while [[ -n $(jobs -r) ]]; do echo -n "some data"; sleep 1; done
jobs -r checks running processes, and printing something, the script stops when your vlc process is done.

Launch the same program with different arguments in parallel via bash

I have a program that has very big computation times. I need to call it with different arguments. I want to run them on a server with a lot of processors, so I'd like to launch them in parallel in order to save time. (One program instance only uses one processor)
I have tried my best to write a bash script which looks like this:
#!/bin/bash
# set maximal number of parallel jobs
MAXPAR=5
# fill the PID array with nonsense pid numbers
for (( PAR=1; PAR<=MAXPAR; PAR++ ))
do
PID[$PAR]=-18
done
# loop over the arguments
for ARG in 50 60 70 90
do
# endless loop that checks, if one of the parallel jobs has finished
while true
do
# check if PID[PAR] is still running, suppress error output of kill
if ! kill -0 ${PID[PAR]} 2> /dev/null
then
# if PID[PAR] is not running, the next job
# can run as parellel job number PAR
break
fi
# if it is still running, check the next parallel job
if [ $PAR -eq $MAXPAR ]
then
PAR=1
else
PAR=$[$PAR+1]
fi
# but sleep 10 seconds before going on
sleep 10
done
# call to the actual program (here sleep for example)
#./complicated_program $ARG &
sleep $ARG &
# get the pid of the process we just started and save it as PID[PAR]
PID[$PAR]=$!
# give some output, so we know where we are
echo ARG=$ARG, par=$PAR, pid=${PID[PAR]}
done
Now, this script works, but I don't quite like it.
Is there any better way to deal with the beginning? (Setting PID[*]=-18 looks wrong to me)
How do I wait for the first job to finish without the ugly infinite loop and sleeping some seconds? I know there is wait, but I'm not sure how to use it here.
I'd be grateful for any comments on how to improve style and conciseness.
I have a much more complicated code that, more or less, does the same thing.
The things you need to consider:
Does the user need to approve the spawning of a new thread
Does the user need to approve the killing of an old thread
Does the thread terminate on it's own or it needs to be killed
Does the user want the script to run endlessly, as long as it has MAXPAR threads
If so, does the user need an escape sequence to stop further spawning
Here is some code for you:
spawn() #function that spawns a thread
{ #usage: spawn 1 ls -l
i=$1 #save the thread index
shift 1 #shift arguments to the left
[ ${thread[$i]} -ne 0 ] && #if the thread is not already running
[ ${#thread[#]} -lt $threads] && #and if we didn't reach maximum number of threads,
$# & #run the thread in the background, with all the arguments
thread[$1]=$! #associate thread id with thread index
}
terminate() #function that terminates threads
{ #usage: terminate 1
[ your condition ] && #if your condition is met,
kill {thread[$1]} && #kill the thread and if so,
thread[$1]=0 #mark the thread as terminated
}
Now, the rest of the code depends on your needs (things to consider), so you will either loop through input arguments and call spawn, and then after some time loop through threads indexes and call terminate. Or, if the threads end on their own, loop through input arguments and call both spawn and terminate,but the condition for the terminate is then:
[ ps -aux 2>/dev/null | grep " ${thread[$i]} " &>/dev/null ]
#look for thread id in process list (note spaces around id)
Or, something along the lines of that, you get the point.
Using the tips #theotherguy gave in the comments, I rewrote the script in a better way using the sem command that comes with GNU Parallel:
#!/bin/bash
# set maximal number of parallel jobs
MAXPAR=5
# loop over the arguments
for ARG in 50 60 70 90
do
# call to the actual program (here sleep for example)
# prefixed by sem -j $MAXPAR
#sem -j $MAXPAR ./complicated_program $ARG
sem -j $MAXPAR sleep $ARG
# give some output, so we know where we are
echo ARG=$ARG
done

Resources