Run Serial inside Paralell Bash - linux

I have added to my explanation a bit. Conceptually, I am running a script that processes in a loop, calling shells that use the line content as an input parameter.(FYI: a kicks off an execution and b monitors that execution)
I am needing 1a and 1b to run first, in paralell for the first two $param
Next, 2a and 2b need to run in serial for $params when step 1 is complete
3a and 3b will kick off once 2a and 2b are complete (irrelevant if serial or parallel)
Loop continues with next 2 lines from input .txt
I cant get it to process the second in serial, only all in parallel: What I need is the following
cat filename | while readline
export param=$line
do
./script1a.sh "param" > process.lg && ./script2b.sh > monitor.log &&
##wait for processes to finish, running 2 in parallel in script1.sh
./script2a.sh "param" > process2.log && ./script2b.sh > minitor2.log &&
##run each of the 2 in serial for script2.sh
./script3a.sh && ./script3b.sh
I tried adding in wait, and tried an if statement containing script2a.sh and script2b.sh that would run in serial, but to no avail.
if ((++i % 2 ==0)) then wait fi
done
#only run two lines at a time, then cycle back through loop
How on earth can I get the script2.sh to run in serial as a result of script1 in parallel??

Locking!
If you want to parallelize script1 and script3, but need all invocations of script2 to be serialized, continue to use:
./script1.sh && ./script2.sh && ./script3.sh &
...but modify script2 to grab a lock before it does anything else:
#!/bin/bash
exec 3>.lock2
flock -x 3
# ... continue with script2's business here.
Note that you must not delete the .lock2 file used here, at risk of allowing multiple processes to think they hold the lock concurrently.

You are not showing us how the lines you read from the file are being consumed.
If I understand your question correctly, you want to run script1 on two lines of filename, each in parallel, and then serially run script2 when both are done?
while read first; do
echo "$first" | ./script1.sh &
read second
echo "$second" | ./script1.sh &
wait
script2.sh & # optionally don't background here?
script3.sh
done <filename &
The while loop contains two read statements, so each iteration reads two lines from filename and feeds each to a separate instance of script1. Then we wait until they both are done before we run script2. I background it so that script3 can start while it runs, and background the whole while loop; but you probably don't actually need to background the entire job by default (development will be much easier if you write it as a regular foreground job, then when it works, background the whole thing when you start it if you need to).
I can think of a number of variations on this depending on how you actually want your data to flow; here is an update in response to your recently updated question.
export param # is this really necessary?
while read param; do
# First instance
./script1a.sh "$param" > process.lg &&
./script2b.sh > monitor.log &
# Second instance
read param
./script2a.sh "$param" > process2.log && ./script2b.sh > minitor2.log &
# Wait for both to finish
wait
./script3a.sh && ./script3b.sh
done <filename
If this still doesn't help, maybe you should post a third question where you really actually explain what you want...

I am not 100% sure what you mean with your question, but now I think you mean something like this in your inner loop:
(
# run script1 and script2 in parallel
script1 &
s1pid=$!
# start no more than one script2 using GNU Parallel as a mutex
sem --fg script2
# when they are both done...
wait $s1pid
# run script3
script3
) & # and do that lot in parallel with previous/next loop iteration

#tripleee I put together the following if interested (note:I changed some variables for the post so sorry if there are inconsistencies anywhere...also the exports have their reasons. I think there is a better way than exporting, but for now it works)
cat input.txt | while read first; do
export step=${first//\"/}
export stepem=EM_${step//,/_}
export steptd=TD_${step//,/_}
export stepeg=EG_${step//,/_}
echo "$step" | $directory"/ws_client.sh" processOptions "$appName" "$step" "$layers" "$stages" "" "$stages" "$stages" FALSE > "$Folder""/""$stepem""_ProcessID.log" &&
$dir_model"/check_ status.sh" "$Folder" "$stepem" > "$Folder""/""$stepem""_Monitor.log" &
read second
export step2=${second//\"/}
export stepem2=ExecuteModel_${step2//,/_}
export steptd2=TransferData_${step2//,/_}
export stepeg2=ExecuteGeneology_${step2//,/_}
echo "$step2" | $directory"/ws_client.sh" processOptions "$appName" "$step2" "$layers" "$stages" "" "$stages" "$stages" FALSE > "$Folder""/""$stepem2""_ProcessID.log" &&
$dir _model"/check _status.sh" "$Folder" "$stepem2" > "$Folder""/""$stepem2""_Monitor.log" &
wait
$directory"/ws_client.sh" processOptions "$appName" "$step" "$layers" "" "" "$stage_final" "" TRUE > "$appLogFolder""/""$steptd""_ProcessID.log" &&
$dir _model"/check_status.sh" "$Folder" "$steptd" > "$Folder""/""$steptd""_Monitor.log" &&
$directory"/ws_client.sh" processOptions "$appName" "$step2" "$layers" "" "" "$stage_final" "" TRUE > "$appLogFolder""/""$steptd2""_ProcessID.log" &&
$dir _model"/check _status.sh" "$Folder" "$steptd2" > "$Folder""/""$steptd2""_Monitor.log" &
wait
$directory"/ws_client.sh" processPaths "$appName" "$step" "$layers" "$genPath_01" > "$appLogFolder""/""$stepeg""_ProcessID.log" &&
$dir _model"/check _status.sh" "$Folder" "$stepeg" > "$Folder""/""$stepeg""_Monitor.log" &&
$directory"/ws_client.sh" processPaths "$appName" "$step2" "$layers" "$genPath_01" > "$appLogFolder""/""$stepeg2""_ProcessID.log" &&
$dir_model"/check _status.sh" "$Folder" "$stepeg2" > "$Folder""/""$stepeg2""_Monitor.log" &
wait
if (( ++i % 2 == 0))
then
echo "Waiting..."
wait
fi

I understand your question like:
You have a list of models. These models needs to be run. After they are run they have to be transferred. The simple solutions is:
run_model model1
transfer_result model1
run_model model2
transfer_result model2
But to make this go faster, we want to parallelize parts. Unfortunately transfer_result cannot be parallelized.
run_model model1
run_model model2
transfer_result model1
transfer_result model2
model1 and model2 are read from a text file. run_model can be run in parallel, and you would like 2 of those running in parallel. transfer_result can only be run one at a time, and you can only transfer a result when it has been computed.
This can be done like this:
cat models.txt | parallel -j2 'run_model {} && sem --id transfer transfer_model {}'
run_model {} && sem --id transfer transfer_model {} will run one model and if it succeeds transfer it. Transferring will only start if no other transfer is running.
parallel -j2 will run two of the these jobs in parallel.
If transfer takes shorter than computing a model, then you should get no surprises: the transfers will at most be swapped with the next transfer. If transfer takes longer than running a model, you might see that the models are transferred completely out of order (e.g. you might see transfer of job 10 before transfer of job 2). But they will all be transferred eventually.
You can see the execution sequence exemplified with this:
seq 10 | parallel -uj2 'echo ran model {} && sem --id transfer "sleep .{};echo transferred {}"'
This solution is better than the wait based solution because you can run model3 while model1+2 is being transferred.

Related

Write output of subprocess launched by a `screen` CLI Command to a log file?

I am launching a bunch of the same script (generate_records.php) into screens. I am doing this to easily parallelize the processes. I would like to write the output of each of the PHP processes to a log file using something like &> log_$i (StdOut an StdErr).
My shell scripting is weak sauce, and I can't get the syntax correct. I keep getting the output of the screen, which is empty.
Exmaple: launch_processes_in_screens.sh
max_record_id=300000000
# number of parallel processors to run
total_processors=10
# max staging companies per processor
(( num_records_per_processor = $max_record_id / $total_processors))
i=0
while [ $i -lt $total_processors ]
do
(( starting_id = $i * $num_records_per_processor + 1 ))
(( ending_id = $starting_id + $num_records_per_processor - 1 ))
printf "\n - Starting processor #%s starting at ID:%s and ending at ID: %s" "$i" "$starting_id" "$ending_id"
screen -d -m -S "process_$i" php generate_records.php "$starting_id" "$num_records_per_processor" "FALSE"
((i++))
done
If the only reason you're using screen is to launch many processes in parallel, you can avoid it entirely and use & to start them in the background:
php generate_records.php "$starting_id" "$num_records_per_processor" FALSE &
You may also be able to remove some code by using parallel.

Bash, run multiple commands simultaneously, wait for N to finish before spawning more

Alright, so I've tried gnu parallel and there's some quirks about getting that to work that makes it not possible for me to use.
Ultimately I'd love to just be able to do something like this:
for modelplot in /${rfolder}/${mplotsfolder}/${mname}/$mscript.gs
do
for regionplot in $mregions
do
opengrads -bclx "${modelplot}.gs $regionplot ${plotimage} ${date} ${run} ${gribfile} ${modelplot}" && wait -n
done
done
But I can't seem to find a way to limit the spawning of background processes to a specific number. Someone mentioned doing:
for i in {1..10}; do echo "${i} & (( count ++ > 5 )) && wait -n; done
Should do it, but I can't really verify if it is working that way. It seems like it just spawns them all instantly. I'm assuming the output in terminal of that should be: echo 1, echo 2, echo 3, echo 4, echo 5. Then echo 6, echo 7, echo 8, echo 9, echo 10.
I'm just trying to spawn, say 5, iterations of a loop, wait for those to finish and then spawn the next 5, wait for those to finish, spawn the next 5, etc until the loop is done.
Each time you start a background job, increment a count. When that count reaches 5 (or whatever), wait for all background jobs to complete, then reset the count to 0 and resume starting background jobs.
p_count=0
for modelplot in /${rfolder}/${mplotsfolder}/${mname}/$mscript.gs; do
for regionplot in $mregions; do
opengrads -bclx "${modelplot}.gs $regionplot ${plotimage} ${date} ${run} ${gribfile} ${modelplot}" &
if (( ++p_count == 5 )); then
wait
p_count=0
fi
done
done
done
It is surprisingly tricky to keep exactly 5, rather than at most 5, jobs running in the background in shell. (wait -n lets you know when a job has finished, but not how many have finished.) To keep the machine busy, a tool like xargs or parallel is more appropriate.
From your comments it is pretty unclear what you want.
But using the {= =} construct you can get almost anything in the arguments.
Append .gs only on the first run:
parallel echo {}'{= if($job->seq() == 1) { $_ = ".gs" } else { $_="" } =}' ::: a b c
Append .gs only on the last run:
parallel echo {}'{= if($job->seq() == $Global::JobQueue->total_jobs() ) { $_ = ".gs" } else { $_="" } =}' ::: a b c
Disregarding the comments and looking only at the loop in the question the solution is:
parallel --header : opengrads -bclx "{modelplot}.gs {regionplot} ${plotimage} ${date} ${run} ${gribfile} {modelplot}" ::: modelplot /${rfolder}/${mplotsfolder}/${mname}/$mscript.gs ::: regionplot $mregions

Wait inside loop bash

I have created a loop to read through a .txt and execute another shell using each line as input.
PROBLEM: I need the loop to execute the first two lines from the txt in parallel, wait for them to finish, then execute the next two lines in parallel.
ATTEMPT AT SOLUTION: I thought of adding in a wait command, just not sure how to structure so it waits for every two lines, as opposed to each churn of the loop.
My current loop:
cat input.txt | while read line; do
export step=${line//\"/}
export step=ExecuteModel_${step//,/_}
export pov=$line
$owsdirectory"/hpm_ws_client.sh" processCalcScriptOptions "$appName" "$pov" "$layers" "$stages" "" "$stages" "$stages" FALSE > "$appLogFolder""/"$step"_ProcessID.log"
/app/dev2/batch/hpcm/shellexe/rate_tool2/model_automation/check_process_status.sh "$appLogFolder" "$step" > "$appLogFolder""/""$step""_Monitor.log"
Input txt:
SEQ010,FY15
SEQ010,FY16
SEQ020,FY15
SEQ020,FY16
SEQ030,FY15
SEQ030,FY16
SEQ030,FY15
SEQ030,FY16
SEQ040,FY15
SEQ040,FY16
SEQ050,FY15
SEQ050,FY16
Normally you'd use sem, xargs or parallel to parallelize a loop, but all these tools optimize throughput by always having 2 (or N) jobs running in parallel, and starting new ones as old ones finish.
To instead run pairs of jobs and wait for both to finish before considering starting more, you can just run them in the background and keep a counter to wait every N iterations:
printf "%s\n" {1..10} | while IFS= read -r line
do
echo "Starting command"
sleep 2 &
if (( ++i % 2 == 0 ))
then
echo "Waiting..."
wait
fi
done
Output:
Starting command
Starting command
Waiting...
Starting command
Starting command
Waiting...

Whiptail Gauge: Variable in loop not being set

Am new to bash and whiptail so excuse the ignorance.
When assigning a var in the for loop, the new value of 20 is never set when using a Whiptail dialog. Any suggestions why ?
andy="10"
{
for ((i = 0 ; i <= 100 ; i+=50)); do
andy="20"
echo $i
sleep 1
done
} | whiptail --gauge "Please wait" 5 50 0
# }
echo "My val $andy
A command inside a pipeline (that is, a series of commands separated by |) is always executed in a subshell, which means that each command has its own variable environment. The same is true of the commands inside the compound command (…), but not the compound command {…}, which can normally be used for grouping without creating a subshell.
In bash or zsh, you can solve this problem using process substitution instead of a pipeline. For example:
andy="10"
for ((i=0 ; i <= 100 ; i+=50)); do
andy="20"
echo $i
sleep 1
done > >(whiptail --gauge "Please wait" 6 50 0)
echo "My val $andy
>(whiptail ...) will cause a subshell to be created to execute whiptail; the entire expression will be substituted by the name of this subshell's standard input (in linux, it will be something like /dev/fd/63, but it could be a FIFO on other OSs). > >(...) causes standard output to be redirected to the subshell's standard input; the first > is just a normal stdout redirect.
The statements inside {} are not ordinarily executed in a sub-shell. However, when you add a pipe (|) to it, they seem to be executed in a sub-shell.
If you remove the pipe to whiptail, you will see the update value of andy.

Bash: Split stdout from multiple concurrent commands into columns

I am running multiple commands in a bash script using single ampersands like so:
commandA & commandB & commandC
They each have their own stdout output but they are all mixed together and flood the console in an incoherent mess.
I'm wondering if there is an easy way to pipe their outputs into their own columns... using the column command or something similar. ie. something like:
commandA | column -1 & commandB | column -2 & commandC | column -3
New to this kind of thing, but from initial digging it seems something like pr might be the ticket? or the column command...?
Regrettably answering my own question.
None of the supplied solutions were exactly what I was looking for. So I developed my own command line utility: multiview. Maybe others will benefit?
It works by piping processes' stdout/stderr to a command interface and then by launching a "viewer" to see their outputs in columns:
fooProcess | multiview -s & \
barProcess | multiview -s & \
bazProcess | multiview -s & \
multiview
This will display a neatly organized column view of their outputs. You can name each process as well by adding a string after the -s flag:
fooProcess | multiview -s "foo" & \
barProcess | multiview -s "bar" & \
bazProcess | multiview -s "baz" & \
multiview
There are a few other options, but thats the gist of it.
Hope this helps!
pr is a solution, but not a perfect one. Consider this, which uses process substitution (<(command) syntax):
pr -m -t <(while true; do echo 12; sleep 1; done) \
<(while true; do echo 34; sleep 2; done)
This produces a marching column of the following:
12 34
12 34
12 34
12 34
Though this trivially provides the output you want, the columns do not advance individually—they advance together when all files have provided the same output. This is tricky, because in theory the first column should produce twice as much output as the second one.
You may want to investigate invoking tmux or screen in a tiled mode to allow the columns to scroll separately. A terminal multiplexer will provide the necessary machinery to buffer output and scroll it independently, which is important when showing output side-by-side without allowing excessive output from commandB to scroll commandA and commandC off-screen. Remember that scrolling each column separately will require a lot of screen redrawing, and the only way to avoid screen redraws is to have all three columns produce output simultaneously.
As a last-ditch solution, consider piping each output to a command that indents each column by a different number of characters:
this is something that commandA outputs and is
and here is something that commandB outputs
interleaved with the other output, but visually
you might have an easier time distinguishing one
here is something that commandC outputs
which is also interleaved with the others
from the other
Script print out three vertical rows and a timer each row containing the output from a single script.
Comment on anything you dont understand and ill add answers to my answer as needed
Hope this helps :)
#!/bin/bash
#Script by jidder
count=0
Elapsed=0
control_c()
{
tput rmcup
rm tail.tmp
rm tail2.tmp
rm tail3.tmp
stty sane
}
Draw()
{
tput clear
echo "SCRIPT 1 Elapsed time =$Elapsed seconds"
echo "------------------------------------------------------------------------------------------------------------------------------------------------------"
tail -n10 tail.tmp
tput cup 25 0
echo "Script 2 "
echo "------------------------------------------------------------------------------------------------------------------------------------------------------"
tail -n10 tail2.tmp
tput cup 50 0
echo "Script 3 "
echo "------------------------------------------------------------------------------------------------------------------------------------------------------"
tail -n10 tail3.tmp
}
Timer()
{
if [[ $count -eq 10 ]]; then
Draw
((Elapsed = Elapsed + 1))
count=0
fi
}
main()
{
stty -icanon time 0 min 0
tput smcup
Draw
count=0
keypress=''
MYSCRIPT1.sh > tail.tmp &
MYSCRIPT2.sh > tail2.tmp &
MYSCRIPT3.sh > tail3.tmp &
while [ "$keypress" != "q" ]; do
sleep 0.1
read keypress
(( count = count + 2 ))
Timer
done
stty sane
tput rmcup
rm tail.tmp
rm tail2.tmp
rm tail3.tmp
echo "Thanks for using this script."
exit 0
}
main
trap control_c SIGINT

Resources