I have created a loop to read through a .txt and execute another shell using each line as input.
PROBLEM: I need the loop to execute the first two lines from the txt in parallel, wait for them to finish, then execute the next two lines in parallel.
ATTEMPT AT SOLUTION: I thought of adding in a wait command, just not sure how to structure so it waits for every two lines, as opposed to each churn of the loop.
My current loop:
cat input.txt | while read line; do
export step=${line//\"/}
export step=ExecuteModel_${step//,/_}
export pov=$line
$owsdirectory"/hpm_ws_client.sh" processCalcScriptOptions "$appName" "$pov" "$layers" "$stages" "" "$stages" "$stages" FALSE > "$appLogFolder""/"$step"_ProcessID.log"
/app/dev2/batch/hpcm/shellexe/rate_tool2/model_automation/check_process_status.sh "$appLogFolder" "$step" > "$appLogFolder""/""$step""_Monitor.log"
Input txt:
SEQ010,FY15
SEQ010,FY16
SEQ020,FY15
SEQ020,FY16
SEQ030,FY15
SEQ030,FY16
SEQ030,FY15
SEQ030,FY16
SEQ040,FY15
SEQ040,FY16
SEQ050,FY15
SEQ050,FY16
Normally you'd use sem, xargs or parallel to parallelize a loop, but all these tools optimize throughput by always having 2 (or N) jobs running in parallel, and starting new ones as old ones finish.
To instead run pairs of jobs and wait for both to finish before considering starting more, you can just run them in the background and keep a counter to wait every N iterations:
printf "%s\n" {1..10} | while IFS= read -r line
do
echo "Starting command"
sleep 2 &
if (( ++i % 2 == 0 ))
then
echo "Waiting..."
wait
fi
done
Output:
Starting command
Starting command
Waiting...
Starting command
Starting command
Waiting...
Related
I have DB error log file, it will grow continuously.
Now i want to set some error monitoring on that file for every 5 minutes.
The problem is i don’t want to scan whole file for every 5 minutes(when monitoring cron executed), because it may grow very big in future. Scanning through whole(big) file for every 5 mins will consume bit more resources.
So i just want to scan only the lines which were inserted/written to the log during last 5 mins interval.
Each error recorded in log will have Timestamp prepend to it like below:
180418 23:45:00 [ERROR] mysql got signal 11.
So i want to search with pattern [ERROR] only on lines which were added from last 5 mins(not whole file) and place the output to another file.
Please help me here.
Feel free if u need more clarification on my question.
I’m using RHEL 7 and i’m trying to implement above monitoring through bash shell script
Serializing the Byte Offset
This picks up where the last instance left off. If you run it every 5 minutes, then, it'll scan 5 minutes of data.
Note that this implementation knowingly can scan data added during an invocation's run twice. This is a little sloppy, but it's much safer to scan overlapping data twice than to never read it at all, which is a risk that can be run if relying on cron to run your program on schedule (likewise, sleeps can run over the requested time if the system is busy).
#!/usr/bin/env bash
file=$1; shift # first input: filename
grep_opts=( "$#" ) # remaining inputs: grep options
dir=$(dirname -- "$file") # extract directory name to use for offset storage
basename=${file##*/} # pick up file name w/o directory
size_file="$dir/.$basename.size" # generate filename to use to store offset
if [[ -s $size_file ]]; then # ...if we already have a file with an offset...
old_size=$(<"$size_file") # ...read it from that file
else
old_size=0 # ...otherwise start at the front.
fi
new_size=$(stat --format=%s -- "$file") || exit # Figure out current size
if (( new_size < old_size )); then
old_size=0 # file was truncated, so we can't trust old_size
elif (( new_size == old_size )); then
exit 0 # no new contents, so no point in trying to search
fi
# read starting at old_size and grep only that content
dd iflag=skip_bytes skip="$old_size" if="$file" | grep "${grep_opts[#]}"; grep_retval=$?
# if the read failed, don't store an updated offset
(( ${PIPESTATUS[0]} != 0 )) && exit 1
# create a new tempfile to store offset in
tempfile=$(mktemp -- "${size_file}.XXXXXX") || exit
# write to that temporary file...
printf '%s\n' "$new_size" > "$tempfile" || { rm -f "$tempfile"; exit 1; }
# ...and if that write succeeded, overwrite the last place where we serialized output.
mv -- "$tempfile" "$new_size" || exit
exit "$grep_retval"
Alternate Mode: Bisect For The Timestamp
Note that this can miss content if you're relying on, say, cron to invoke your code every 5 minutes on-the-dot; storing byte offsets can thus be more accurate.
Using the bsearch tool by Ole Tange:
#!/usr/bin/env bash
file=$1; shift
start_date=$(date -d 'now - 5 minutes' '+%y%m%d %H:%M:%S')
byte_offset=$(bsearch --byte-offset "$file" "$start_date")
dd iflag=skip_bytes skip="$byte_offset" if="$file" | grep "$#"
Another approach could be something like this:
DB_FILE="FULL_PATH_TO_YOUR_DB_FILE"
current_db_size=$(du -b "$DB_FILE" | cut -f 1)
if [[ ! -a SOME_PATH_OF_YOUR_CHOICE/last_size_db_file ]] ; then
tail --bytes $current_db_size $DB_FILE > SOME_PATH_OF_YOUR_CHOICE/log-file_$(date +%Y-%m-%d_%H-%M-%S)
else
if [[ $(cat last_size_db_file) -gt $current_db_size ]] ; then
previously_readed_bytes=0
else
previously_readed_bytes=$(cat last_size_db_file)
fi
new_bytes=$(($current_db_size - $previously_readed_bytes))
tail --bytes $new_bytes $DB_FILE > SOME_PATH_OF_YOUR_CHOICE/log-file_$(date +%Y-%m-%d_%H-%M-%S)
fi
printf $current_db_size > SOME_PATH_OF_YOUR_CHOICE/last_size_db_file
this prints all bytes of DB_FILE not previously printed to SOME_PATH_OF_YOUR_CHOICE/log-file_$(date +%Y-%m-%d_%H-%M-%S)
Note that $(date +%Y-%m-%d_%H-%M-%S) will be the current 'full' date at the time of creating the log file
you can make this an script, and use cron to execute that script every five minutes; something like this:
*/5 * * * * PATH_TO_YOUR_SCRIPT
Here is my approach:
First, read the whole log once so far.
If you reach the end, collect and read new lines for a timespan (in my example 9 seconds, for faster testing, while my dummy server appends to the logfile every 3 seconds).
After the timespan, echo the cache, clear the cache (an array arr), loop and sleep for some time, so that this process doesn't consume all CPU time.
First, my dummy logfile writer:
#!/bin/bash
#
# dummy logfile writer
#
while true
do
s=$(( $(date +%s) % 3600))
echo $s server msg
sleep 3
done >> seconds.log
Startet via ./seconds-out.sh &.
Now the more complicated part:
#!/bin/bash
#
# consume a logfile as written so far. Then, collect every new line
# and show it in an interval of $interval
#
interval=9 # 9 seconds
#
printf -v secnow '%(%s)T' -1
start=$(( secnow % (3600*24*365) ))
declare -a arr
init=false
while true
do
read line
printf -v secnow '%(%s)T' -1
now=$(( secnow % (3600*24*365) ))
# consume every line created in the past
if (( ! init ))
then
# assume reading a line might not take longer than a second (rounded to whole seconds)
while (( ${#line} > 0 && (now - start) < 2 ))
do
read line
start=$now
echo -n "." # for debugging purpose, remove
printf -v secnow '%(%s)T' -1
now=$(( secnow % (3600*24*365) ))
done
init=1
echo "init=$init" # for debugging purpose, remove
# collect new lines, display them every $interval seconds
else
if ((${#line} > 0 ))
then
echo -n "-" # for debugging purpose, remove
arr+=("read: $line \n")
fi
if (( (now - start) > interval ))
then
echo -e "${arr[#]]}"
arr=()
start=$now
fi
fi
sleep .1
done < seconds.log
Output with logfile generator in 3 seconds, running for some time, then starting the read-seconds.sh script, with debugging output activated:
./read-seconds.sh
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................init=1
---read: 1688 server msg
read: 1691 server msg
read: 1694 server msg
---read: 1697 server msg
read: 1700 server msg
read: 1703 server msg
----read: 1706 server msg
read: 1709 server msg
read: 1712 server msg
read: 1715 server msg
^C
Every dot represents a logfile line from the past and therefor skipped.
Every dash represents a logfile line collected.
I am launching a bunch of the same script (generate_records.php) into screens. I am doing this to easily parallelize the processes. I would like to write the output of each of the PHP processes to a log file using something like &> log_$i (StdOut an StdErr).
My shell scripting is weak sauce, and I can't get the syntax correct. I keep getting the output of the screen, which is empty.
Exmaple: launch_processes_in_screens.sh
max_record_id=300000000
# number of parallel processors to run
total_processors=10
# max staging companies per processor
(( num_records_per_processor = $max_record_id / $total_processors))
i=0
while [ $i -lt $total_processors ]
do
(( starting_id = $i * $num_records_per_processor + 1 ))
(( ending_id = $starting_id + $num_records_per_processor - 1 ))
printf "\n - Starting processor #%s starting at ID:%s and ending at ID: %s" "$i" "$starting_id" "$ending_id"
screen -d -m -S "process_$i" php generate_records.php "$starting_id" "$num_records_per_processor" "FALSE"
((i++))
done
If the only reason you're using screen is to launch many processes in parallel, you can avoid it entirely and use & to start them in the background:
php generate_records.php "$starting_id" "$num_records_per_processor" FALSE &
You may also be able to remove some code by using parallel.
Alright, so I've tried gnu parallel and there's some quirks about getting that to work that makes it not possible for me to use.
Ultimately I'd love to just be able to do something like this:
for modelplot in /${rfolder}/${mplotsfolder}/${mname}/$mscript.gs
do
for regionplot in $mregions
do
opengrads -bclx "${modelplot}.gs $regionplot ${plotimage} ${date} ${run} ${gribfile} ${modelplot}" && wait -n
done
done
But I can't seem to find a way to limit the spawning of background processes to a specific number. Someone mentioned doing:
for i in {1..10}; do echo "${i} & (( count ++ > 5 )) && wait -n; done
Should do it, but I can't really verify if it is working that way. It seems like it just spawns them all instantly. I'm assuming the output in terminal of that should be: echo 1, echo 2, echo 3, echo 4, echo 5. Then echo 6, echo 7, echo 8, echo 9, echo 10.
I'm just trying to spawn, say 5, iterations of a loop, wait for those to finish and then spawn the next 5, wait for those to finish, spawn the next 5, etc until the loop is done.
Each time you start a background job, increment a count. When that count reaches 5 (or whatever), wait for all background jobs to complete, then reset the count to 0 and resume starting background jobs.
p_count=0
for modelplot in /${rfolder}/${mplotsfolder}/${mname}/$mscript.gs; do
for regionplot in $mregions; do
opengrads -bclx "${modelplot}.gs $regionplot ${plotimage} ${date} ${run} ${gribfile} ${modelplot}" &
if (( ++p_count == 5 )); then
wait
p_count=0
fi
done
done
done
It is surprisingly tricky to keep exactly 5, rather than at most 5, jobs running in the background in shell. (wait -n lets you know when a job has finished, but not how many have finished.) To keep the machine busy, a tool like xargs or parallel is more appropriate.
From your comments it is pretty unclear what you want.
But using the {= =} construct you can get almost anything in the arguments.
Append .gs only on the first run:
parallel echo {}'{= if($job->seq() == 1) { $_ = ".gs" } else { $_="" } =}' ::: a b c
Append .gs only on the last run:
parallel echo {}'{= if($job->seq() == $Global::JobQueue->total_jobs() ) { $_ = ".gs" } else { $_="" } =}' ::: a b c
Disregarding the comments and looking only at the loop in the question the solution is:
parallel --header : opengrads -bclx "{modelplot}.gs {regionplot} ${plotimage} ${date} ${run} ${gribfile} {modelplot}" ::: modelplot /${rfolder}/${mplotsfolder}/${mname}/$mscript.gs ::: regionplot $mregions
I have added to my explanation a bit. Conceptually, I am running a script that processes in a loop, calling shells that use the line content as an input parameter.(FYI: a kicks off an execution and b monitors that execution)
I am needing 1a and 1b to run first, in paralell for the first two $param
Next, 2a and 2b need to run in serial for $params when step 1 is complete
3a and 3b will kick off once 2a and 2b are complete (irrelevant if serial or parallel)
Loop continues with next 2 lines from input .txt
I cant get it to process the second in serial, only all in parallel: What I need is the following
cat filename | while readline
export param=$line
do
./script1a.sh "param" > process.lg && ./script2b.sh > monitor.log &&
##wait for processes to finish, running 2 in parallel in script1.sh
./script2a.sh "param" > process2.log && ./script2b.sh > minitor2.log &&
##run each of the 2 in serial for script2.sh
./script3a.sh && ./script3b.sh
I tried adding in wait, and tried an if statement containing script2a.sh and script2b.sh that would run in serial, but to no avail.
if ((++i % 2 ==0)) then wait fi
done
#only run two lines at a time, then cycle back through loop
How on earth can I get the script2.sh to run in serial as a result of script1 in parallel??
Locking!
If you want to parallelize script1 and script3, but need all invocations of script2 to be serialized, continue to use:
./script1.sh && ./script2.sh && ./script3.sh &
...but modify script2 to grab a lock before it does anything else:
#!/bin/bash
exec 3>.lock2
flock -x 3
# ... continue with script2's business here.
Note that you must not delete the .lock2 file used here, at risk of allowing multiple processes to think they hold the lock concurrently.
You are not showing us how the lines you read from the file are being consumed.
If I understand your question correctly, you want to run script1 on two lines of filename, each in parallel, and then serially run script2 when both are done?
while read first; do
echo "$first" | ./script1.sh &
read second
echo "$second" | ./script1.sh &
wait
script2.sh & # optionally don't background here?
script3.sh
done <filename &
The while loop contains two read statements, so each iteration reads two lines from filename and feeds each to a separate instance of script1. Then we wait until they both are done before we run script2. I background it so that script3 can start while it runs, and background the whole while loop; but you probably don't actually need to background the entire job by default (development will be much easier if you write it as a regular foreground job, then when it works, background the whole thing when you start it if you need to).
I can think of a number of variations on this depending on how you actually want your data to flow; here is an update in response to your recently updated question.
export param # is this really necessary?
while read param; do
# First instance
./script1a.sh "$param" > process.lg &&
./script2b.sh > monitor.log &
# Second instance
read param
./script2a.sh "$param" > process2.log && ./script2b.sh > minitor2.log &
# Wait for both to finish
wait
./script3a.sh && ./script3b.sh
done <filename
If this still doesn't help, maybe you should post a third question where you really actually explain what you want...
I am not 100% sure what you mean with your question, but now I think you mean something like this in your inner loop:
(
# run script1 and script2 in parallel
script1 &
s1pid=$!
# start no more than one script2 using GNU Parallel as a mutex
sem --fg script2
# when they are both done...
wait $s1pid
# run script3
script3
) & # and do that lot in parallel with previous/next loop iteration
#tripleee I put together the following if interested (note:I changed some variables for the post so sorry if there are inconsistencies anywhere...also the exports have their reasons. I think there is a better way than exporting, but for now it works)
cat input.txt | while read first; do
export step=${first//\"/}
export stepem=EM_${step//,/_}
export steptd=TD_${step//,/_}
export stepeg=EG_${step//,/_}
echo "$step" | $directory"/ws_client.sh" processOptions "$appName" "$step" "$layers" "$stages" "" "$stages" "$stages" FALSE > "$Folder""/""$stepem""_ProcessID.log" &&
$dir_model"/check_ status.sh" "$Folder" "$stepem" > "$Folder""/""$stepem""_Monitor.log" &
read second
export step2=${second//\"/}
export stepem2=ExecuteModel_${step2//,/_}
export steptd2=TransferData_${step2//,/_}
export stepeg2=ExecuteGeneology_${step2//,/_}
echo "$step2" | $directory"/ws_client.sh" processOptions "$appName" "$step2" "$layers" "$stages" "" "$stages" "$stages" FALSE > "$Folder""/""$stepem2""_ProcessID.log" &&
$dir _model"/check _status.sh" "$Folder" "$stepem2" > "$Folder""/""$stepem2""_Monitor.log" &
wait
$directory"/ws_client.sh" processOptions "$appName" "$step" "$layers" "" "" "$stage_final" "" TRUE > "$appLogFolder""/""$steptd""_ProcessID.log" &&
$dir _model"/check_status.sh" "$Folder" "$steptd" > "$Folder""/""$steptd""_Monitor.log" &&
$directory"/ws_client.sh" processOptions "$appName" "$step2" "$layers" "" "" "$stage_final" "" TRUE > "$appLogFolder""/""$steptd2""_ProcessID.log" &&
$dir _model"/check _status.sh" "$Folder" "$steptd2" > "$Folder""/""$steptd2""_Monitor.log" &
wait
$directory"/ws_client.sh" processPaths "$appName" "$step" "$layers" "$genPath_01" > "$appLogFolder""/""$stepeg""_ProcessID.log" &&
$dir _model"/check _status.sh" "$Folder" "$stepeg" > "$Folder""/""$stepeg""_Monitor.log" &&
$directory"/ws_client.sh" processPaths "$appName" "$step2" "$layers" "$genPath_01" > "$appLogFolder""/""$stepeg2""_ProcessID.log" &&
$dir_model"/check _status.sh" "$Folder" "$stepeg2" > "$Folder""/""$stepeg2""_Monitor.log" &
wait
if (( ++i % 2 == 0))
then
echo "Waiting..."
wait
fi
I understand your question like:
You have a list of models. These models needs to be run. After they are run they have to be transferred. The simple solutions is:
run_model model1
transfer_result model1
run_model model2
transfer_result model2
But to make this go faster, we want to parallelize parts. Unfortunately transfer_result cannot be parallelized.
run_model model1
run_model model2
transfer_result model1
transfer_result model2
model1 and model2 are read from a text file. run_model can be run in parallel, and you would like 2 of those running in parallel. transfer_result can only be run one at a time, and you can only transfer a result when it has been computed.
This can be done like this:
cat models.txt | parallel -j2 'run_model {} && sem --id transfer transfer_model {}'
run_model {} && sem --id transfer transfer_model {} will run one model and if it succeeds transfer it. Transferring will only start if no other transfer is running.
parallel -j2 will run two of the these jobs in parallel.
If transfer takes shorter than computing a model, then you should get no surprises: the transfers will at most be swapped with the next transfer. If transfer takes longer than running a model, you might see that the models are transferred completely out of order (e.g. you might see transfer of job 10 before transfer of job 2). But they will all be transferred eventually.
You can see the execution sequence exemplified with this:
seq 10 | parallel -uj2 'echo ran model {} && sem --id transfer "sleep .{};echo transferred {}"'
This solution is better than the wait based solution because you can run model3 while model1+2 is being transferred.
Am new to bash and whiptail so excuse the ignorance.
When assigning a var in the for loop, the new value of 20 is never set when using a Whiptail dialog. Any suggestions why ?
andy="10"
{
for ((i = 0 ; i <= 100 ; i+=50)); do
andy="20"
echo $i
sleep 1
done
} | whiptail --gauge "Please wait" 5 50 0
# }
echo "My val $andy
A command inside a pipeline (that is, a series of commands separated by |) is always executed in a subshell, which means that each command has its own variable environment. The same is true of the commands inside the compound command (…), but not the compound command {…}, which can normally be used for grouping without creating a subshell.
In bash or zsh, you can solve this problem using process substitution instead of a pipeline. For example:
andy="10"
for ((i=0 ; i <= 100 ; i+=50)); do
andy="20"
echo $i
sleep 1
done > >(whiptail --gauge "Please wait" 6 50 0)
echo "My val $andy
>(whiptail ...) will cause a subshell to be created to execute whiptail; the entire expression will be substituted by the name of this subshell's standard input (in linux, it will be something like /dev/fd/63, but it could be a FIFO on other OSs). > >(...) causes standard output to be redirected to the subshell's standard input; the first > is just a normal stdout redirect.
The statements inside {} are not ordinarily executed in a sub-shell. However, when you add a pipe (|) to it, they seem to be executed in a sub-shell.
If you remove the pipe to whiptail, you will see the update value of andy.