Bash concurrent jobs gets stuck

Bash concurrent jobs gets stuck - linux

I've implemented a way to have concurrent jobs in bash, as seen here.
I'm looping through a file with around 13000 lines. I'm just testing and printing each line, as such:
#!/bin/bash
max_bg_procs(){
if [[ $# -eq 0 ]] ; then
echo "Usage: max_bg_procs NUM_PROCS. Will wait until the number of background (&)"
echo " bash processes (as determined by 'jobs -pr') falls below NUM_PROCS"
return
fi
local max_number=$((0 + ${1:-0}))
while true; do
local current_number=$(jobs -pr | wc -l)
if [[ $current_number -lt $max_number ]]; then
echo "success in if"
break
fi
echo "has to wait"
sleep 4
done
}
download_data(){
echo "link #" $2 "["$1"]"
}
mapfile -t myArray < $1
i=1
for url in "${myArray[#]}"
do
max_bg_procs 6
download_data $url $i &
((i++))
done
echo "finito!"
I've also tried other solutions such as this and this, but my issue is persistent:
At a "random" given step, usually between the 2000th and the 5000th iteration, it simply gets stuck. I've put those various echo in the middle of the code to see where it would get stuck but it the last thing it prints is the $url $i.
I've done the simple test to remove any parallelism and just loop the file contents: all went fine and it looped till the end.
So it makes me think I'm missing some limitation on the parallelism, and I wonder if anyone could help me out figuring it out.
Many thanks!

Here, we have up to 6 parallel bash processes calling download_data, each of which is passed up to 16 URLs per invocation. Adjust per your own tuning.
Note that this expects both bash (for exported function support) and GNU xargs.
#!/usr/bin/env bash
# ^^^^- not /bin/sh
download_data() {
echo "link #$2 [$1]" # TODO: replace this with a job that actually takes some time
}
export -f download_data
<input.txt xargs -d $'\n' -P 6 -n 16 -- bash -c 'for arg; do download_data "$arg"; done' _

Using GNU Parallel it looks like this
cat input.txt | parallel echo link '\#{#} [{}]'
{#} = the job number
{} = the argument
It will spawn one process per CPU. If you instead want 6 in parallel use -j:
cat input.txt | parallel -j6 echo link '\#{#} [{}]'
If you prefer running a function:
download_data(){
echo "link #" $2 "["$1"]"
}
export -f download_data
cat input.txt | parallel -j6 download_data {} {#}

Related

bash: set variable inside loop when piping find and grep [duplicate]

i want to compute all *bin files inside a given directory. Initially I was working with a for-loop:
var=0
for i in *ls *bin
do
perform computations on $i ....
var+=1
done
echo $var
However, in some directories there are too many files resulting in an error: Argument list too long
Therefore, I was trying it with a piped while-loop:
var=0
ls *.bin | while read i;
do
perform computations on $i
var+=1
done
echo $var
The problem now is by using the pipe subshells are created. Thus, echo $var returns 0.
How can I deal with this problem?
The original Code:
#!/bin/bash
function entropyImpl {
if [[ -n "$1" ]]
then
if [[ -e "$1" ]]
then
echo "scale = 4; $(gzip -c ${1} | wc -c) / $(cat ${1} | wc -c)" | bc
else
echo "file ($1) not found"
fi
else
datafile="$(mktemp entropy.XXXXX)"
cat - > "$datafile"
entropy "$datafile"
rm "$datafile"
fi
return 1
}
declare acc_entropy=0
declare count=0
ls *.bin | while read i ;
do
echo "Computing $i" | tee -a entropy.txt
curr_entropy=`entropyImpl $i`
curr_entropy=`echo $curr_entropy | bc`
echo -e "\tEntropy: $curr_entropy" | tee -a entropy.txt
acc_entropy=`echo $acc_entropy + $curr_entropy | bc`
let count+=1
done
echo "Out of function: $count | $acc_entropy"
acc_entropy=`echo "scale=4; $acc_entropy / $count" | bc`
echo -e "===================================================\n" | tee -a entropy.txt
echo -e "Accumulated Entropy:\t$acc_entropy ($count files processed)\n" | tee -a entropy.txt

The problem is that the while loop is part of a pipeline. In a bash pipeline, every element of the pipeline is executed in its own subshell [ref]. So after the while loop terminates, the while loop subshell's copy of var is discarded, and the original var of the parent (whose value is unchanged) is echoed.
One way to fix this is by using Process Substitution as shown below:
var=0
while read i;
do
# perform computations on $i
((var++))
done < <(find . -type f -name "*.bin" -maxdepth 1)
Take a look at BashFAQ/024 for other workarounds.
Notice that I have also replaced ls with find because it is not good practice to parse ls.

A POSIX compliant solution would be to use a pipe (p file). This solution is very nice, portable, and POSIX, but writes something on the hard disk.
mkfifo mypipe
find . -type f -name "*.bin" -maxdepth 1 > mypipe &
while read line
do
# action
done < mypipe
rm mypipe
Your pipe is a file on your hard disk. If you want to avoid having useless files, do not forget to remove it.

So researching the generic issue, passing variables from a sub-shelled while loop to the parent. One solution I found, missing here, was to use a here-string. As that was bash-ish, and I preferred a POSIX solution, I found that a here-string is really just a shortcut for a here-document. With that knowledge at hand, I came up with the following, avoiding the subshell; thus allowing variables to be set in the loop.
#!/bin/sh
set -eu
passwd="username,password,uid,gid
root,admin,0,0
john,appleseed,1,1
jane,doe,2,2"
main()
{
while IFS="," read -r _user _pass _uid _gid; do
if [ "${_user}" = "${1:-}" ]; then
password="${_pass}"
fi
done <<-EOT
${passwd}
EOT
if [ -z "${password:-}" ]; then
echo "No password found."
exit 1
fi
echo "The password is '${password}'."
}
main "${#}"
exit 0
One important note to all copy pasters, is that the here-document is setup using the hyphen, indicating that tabs are to be ignored. This is needed to keep the layout somewhat nice. It is important to note, because stackoverflow doesn't render tabs in 'code' and replaces them with spaces. Grmbl. SO, don't mangle my code, just cause you guys favor spaces over tabs, it's irrelevant in this case!
This probably breaks on different editor(settings) and what not. So the alternative would be to have it as:
done <<-EOT
${passwd}
EOT

This could be done with a for loop, too:
var=0;
for file in `find . -type f -name "*.bin" -maxdepth 1`; do
# perform computations on "$i"
((var++))
done
echo $var

bash script run to send process to background

Hi Im making a script to do some rsync process, for the rsync process, Sys admin has created the script, when it run it is asking select options, so i want to create a script to pass that argument from script and run it from cron.
list of directories to rsync take from file.
filelist=$(cat filelist.txt)
for i in filelist;do
echo -e "3\nY" | ./rsync.sh $i
#This will create a rsync log file
so i check the some value of log file and if it is empty i moving to the second file. if the file is not empty, i have to start rsync process as below that will take more that 2 hours.
if [ a != 0 ];then
echo -e "3\nN" | ./rsync.sh $i
above rsync process need to send to the background and take next file to loop. i check with the screen command, but screen is not working with server. also i need to get the duration that take to run process and passing to the log, when i use the time command i am unable to pass the echo variable. Also need to send this to background and take next file. appreciate any suggestions to success this task.
Question
1. How to send argument with Time command
echo -e "3\nY" | time ./rsync.sh $i
above one not working
how to send this to background and take next file to rsync while running previous rsync process.
Full Code
#!/bin/bash
filelist=$(cat filelist.txt)
Lpath=/opt/sas/sas_control/scripts/Logs/rsync_logs
date=$(date +"%m-%d-%Y")
timelog="time_result/rsync_time.log-$date"
for i in $filelist;do
#echo $i
b_i=$(basename $i)
echo $b_i
echo -e "3\nY" | ./rsync.sh $i
f=$(cat $Lpath/$(ls -tr $Lpath| grep rsync-dry-run-$b_i | tail -1) | grep 'transferred:' | cut -d':' -f2)
echo $f
if [ $f != 0 ]; then
#date=$(date +"%D : %r")
start_time=`date +%s`
echo "$b_i-start:$start_time" >> $timelog
#time ./rsync.sh $i < echo -e "3\nY" 2> "./time_result/$b_i-$date" &
time { echo -e "3\nY" | ./rsync.sh $i; } 2> "./time_result/$b_i-$date"
end_time=`date +%s`
s_time=$(cat $timelog|grep "$b_i-start" |cut -d ':' -f2)
duration=$(($end_time-$s_time))
echo "$b_i duration:$duration" >> $timelog
fi
done

Your question is not very clear, but I'll try:
(1) If I understand you correctly, you want to time the rsync.
My first attempt would be to use echo xxxx | time rsycnc. On my bash, this was however broken (or not supposed to work?). I'm normally using Zsh instead of bash, and on zsht, this indeed runs fine.
If it is important for you to use bash, an alternative (since the time for the echo can likely be neglected) would be to time the whole pipe, i.e. time (echo xxxx | time rsync), or even simpler time rsync <(echo xxxx)
(2) To send a process to the background, add an & to the line. However, the time command produces of course output (that's it purpose), and you don't want to receive output from a program in background. The solution is to redirect the output:
(time rsync <(echo xxxx) >output.txt 2>error.txt) &

If you want to time something, you can use:
time sleep 3
If you want to time two things, you can do a compound statement like this (note semicolon after second sleep):
time { sleep 3; sleep 4; }
So, you can do this to time your echo (which will take no time at all) and your rsync:
time { echo "something" | rsync something ; }
If you want to do that in the background:
time { echo "something" | rsync something ; } &
Full Code
#!/bin/bash
filelist=$(cat filelist.txt)
Lpath=/opt/sas/sas_control/scripts/Logs/rsync_logs
date=$(date +"%m-%d-%Y")
timelog="time_result/rsync_time.log-$date"
for i in $filelist;do
#echo $i
b_i=$(basename $i)
echo $b_i
echo -e "3\nY" | ./rsync.sh $i
f=$(cat $Lpath/$(ls -tr $Lpath| grep rsync-dry-run-$b_i | tail -1) | grep 'transferred:' | cut -d':' -f2)
echo $f
if [ $f != 0 ]; then
#date=$(date +"%D : %r")
start_time=`date +%s`
echo "$b_i-start:$start_time" >> $timelog
#time ./rsync.sh $i < echo -e "3\nY" 2> "./time_result/$b_i-$date" &
time { echo -e "3\nY" | ./rsync.sh $i; } 2> "./time_result/$b_i-$date"
end_time=`date +%s`
s_time=$(cat $timelog|grep "$b_i-start" |cut -d ':' -f2)
duration=$(($end_time-$s_time))
echo "$b_i duration:$duration" >> $timelog
fi
done

ftp to multiple servers at same time [duplicate]

Lets say I have a loop in Bash:
for foo in `some-command`
do
do-something $foo
done
do-something is cpu bound and I have a nice shiny 4 core processor. I'd like to be able to run up to 4 do-something's at once.
The naive approach seems to be:
for foo in `some-command`
do
do-something $foo &
done
This will run all do-somethings at once, but there are a couple downsides, mainly that do-something may also have some significant I/O which performing all at once might slow down a bit. The other problem is that this code block returns immediately, so no way to do other work when all the do-somethings are finished.
How would you write this loop so there are always X do-somethings running at once?

Depending on what you want to do xargs also can help (here: converting documents with pdf2ps):
cpus=$( ls -d /sys/devices/system/cpu/cpu[[:digit:]]* | wc -w )
find . -name \*.pdf | xargs --max-args=1 --max-procs=$cpus pdf2ps
From the docs:
--max-procs=max-procs
-P max-procs
Run up to max-procs processes at a time; the default is 1.
If max-procs is 0, xargs will run as many processes as possible at a
time. Use the -n option with -P; otherwise chances are that only one
exec will be done.

With GNU Parallel http://www.gnu.org/software/parallel/ you can write:
some-command | parallel do-something
GNU Parallel also supports running jobs on remote computers. This will run one per CPU core on the remote computers - even if they have different number of cores:
some-command | parallel -S server1,server2 do-something
A more advanced example: Here we list of files that we want my_script to run on. Files have extension (maybe .jpeg). We want the output of my_script to be put next to the files in basename.out (e.g. foo.jpeg -> foo.out). We want to run my_script once for each core the computer has and we want to run it on the local computer, too. For the remote computers we want the file to be processed transferred to the given computer. When my_script finishes, we want foo.out transferred back and we then want foo.jpeg and foo.out removed from the remote computer:
cat list_of_files | \
parallel --trc {.}.out -S server1,server2,: \
"my_script {} > {.}.out"
GNU Parallel makes sure the output from each job does not mix, so you can use the output as input for another program:
some-command | parallel do-something | postprocess
See the videos for more examples: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

maxjobs=4
parallelize () {
while [ $# -gt 0 ] ; do
jobcnt=(`jobs -p`)
if [ ${#jobcnt[#]} -lt $maxjobs ] ; then
do-something $1 &
shift
else
sleep 1
fi
done
wait
}
parallelize arg1 arg2 "5 args to third job" arg4 ...

Here an alternative solution that can be inserted into .bashrc and used for everyday one liner:
function pwait() {
while [ $(jobs -p | wc -l) -ge $1 ]; do
sleep 1
done
}
To use it, all one has to do is put & after the jobs and a pwait call, the parameter gives the number of parallel processes:
for i in *; do
do_something $i &
pwait 10
done
It would be nicer to use wait instead of busy waiting on the output of jobs -p, but there doesn't seem to be an obvious solution to wait till any of the given jobs is finished instead of a all of them.

Instead of a plain bash, use a Makefile, then specify number of simultaneous jobs with make -jX where X is the number of jobs to run at once.
Or you can use wait ("man wait"): launch several child processes, call wait - it will exit when the child processes finish.
maxjobs = 10
foreach line in `cat file.txt` {
jobsrunning = 0
while jobsrunning < maxjobs {
do job &
jobsrunning += 1
}
wait
}
job ( ){
...
}
If you need to store the job's result, then assign their result to a variable. After wait you just check what the variable contains.

If you're familiar with the make command, most of the time you can express the list of commands you want to run as a a makefile. For example, if you need to run $SOME_COMMAND on files *.input each of which produces *.output, you can use the makefile
INPUT = a.input b.input
OUTPUT = $(INPUT:.input=.output)
%.output : %.input
$(SOME_COMMAND) $< $#
all: $(OUTPUT)
and then just run
make -j<NUMBER>
to run at most NUMBER commands in parallel.

While doing this right in bash is probably impossible, you can do a semi-right fairly easily. bstark gave a fair approximation of right but his has the following flaws:
Word splitting: You can't pass any jobs to it that use any of the following characters in their arguments: spaces, tabs, newlines, stars, question marks. If you do, things will break, possibly unexpectedly.
It relies on the rest of your script to not background anything. If you do, or later you add something to the script that gets sent in the background because you forgot you weren't allowed to use backgrounded jobs because of his snippet, things will break.
Another approximation which doesn't have these flaws is the following:
scheduleAll() {
local job i=0 max=4 pids=()
for job; do
(( ++i % max == 0 )) && {
wait "${pids[#]}"
pids=()
}
bash -c "$job" & pids+=("$!")
done
wait "${pids[#]}"
}
Note that this one is easily adaptable to also check the exit code of each job as it ends so you can warn the user if a job fails or set an exit code for scheduleAll according to the amount of jobs that failed, or something.
The problem with this code is just that:
It schedules four (in this case) jobs at a time and then waits for all four to end. Some might be done sooner than others which will cause the next batch of four jobs to wait until the longest of the previous batch is done.
A solution that takes care of this last issue would have to use kill -0 to poll whether any of the processes have disappeared instead of the wait and schedule the next job. However, that introduces a small new problem: you have a race condition between a job ending, and the kill -0 checking whether it's ended. If the job ended and another process on your system starts up at the same time, taking a random PID which happens to be that of the job that just finished, the kill -0 won't notice your job having finished and things will break again.
A perfect solution isn't possible in bash.

Maybe try a parallelizing utility instead rewriting the loop? I'm a big fan of xjobs. I use xjobs all the time to mass copy files across our network, usually when setting up a new database server.
http://www.maier-komor.de/xjobs.html

function for bash:
parallel ()
{
awk "BEGIN{print \"all: ALL_TARGETS\\n\"}{print \"TARGET_\"NR\":\\n\\t#-\"\$0\"\\n\"}END{printf \"ALL_TARGETS:\";for(i=1;i<=NR;i++){printf \" TARGET_%d\",i};print\"\\n\"}" | make $# -f - all
}
using:
cat my_commands | parallel -j 4

Really late to the party here, but here's another solution.
A lot of solutions don't handle spaces/special characters in the commands, don't keep N jobs running at all times, eat cpu in busy loops, or rely on external dependencies (e.g. GNU parallel).
With inspiration for dead/zombie process handling, here's a pure bash solution:
function run_parallel_jobs {
local concurrent_max=$1
local callback=$2
local cmds=("${#:3}")
local jobs=( )
while [[ "${#cmds[#]}" -gt 0 ]] || [[ "${#jobs[#]}" -gt 0 ]]; do
while [[ "${#jobs[#]}" -lt $concurrent_max ]] && [[ "${#cmds[#]}" -gt 0 ]]; do
local cmd="${cmds[0]}"
cmds=("${cmds[#]:1}")
bash -c "$cmd" &
jobs+=($!)
done
local job="${jobs[0]}"
jobs=("${jobs[#]:1}")
local state="$(ps -p $job -o state= 2>/dev/null)"
if [[ "$state" == "D" ]] || [[ "$state" == "Z" ]]; then
$callback $job
else
wait $job
$callback $job $?
fi
done
}
And sample usage:
function job_done {
if [[ $# -lt 2 ]]; then
echo "PID $1 died unexpectedly"
else
echo "PID $1 exited $2"
fi
}
cmds=( \
"echo 1; sleep 1; exit 1" \
"echo 2; sleep 2; exit 2" \
"echo 3; sleep 3; exit 3" \
"echo 4; sleep 4; exit 4" \
"echo 5; sleep 5; exit 5" \
)
# cpus="$(getconf _NPROCESSORS_ONLN)"
cpus=3
run_parallel_jobs $cpus "job_done" "${cmds[#]}"
The output:
1
2
3
PID 56712 exited 1
4
PID 56713 exited 2
5
PID 56714 exited 3
PID 56720 exited 4
PID 56724 exited 5
For per-process output handling $$ could be used to log to a file, for example:
function job_done {
cat "$1.log"
}
cmds=( \
"echo 1 \$\$ >\$\$.log" \
"echo 2 \$\$ >\$\$.log" \
)
run_parallel_jobs 2 "job_done" "${cmds[#]}"
Output:
1 56871
2 56872

The project I work on uses the wait command to control parallel shell (ksh actually) processes. To address your concerns about IO, on a modern OS, it's possible parallel execution will actually increase efficiency. If all processes are reading the same blocks on disk, only the first process will have to hit the physical hardware. The other processes will often be able to retrieve the block from OS's disk cache in memory. Obviously, reading from memory is several orders of magnitude quicker than reading from disk. Also, the benefit requires no coding changes.

This might be good enough for most purposes, but is not optimal.
#!/bin/bash
n=0
maxjobs=10
for i in *.m4a ; do
# ( DO SOMETHING ) &
# limit jobs
if (( $(($((++n)) % $maxjobs)) == 0 )) ; then
wait # wait until all have finished (not optimal, but most times good enough)
echo $n wait
fi
done

Here is how I managed to solve this issue in a bash script:
#! /bin/bash
MAX_JOBS=32
FILE_LIST=($(cat ${1}))
echo Length ${#FILE_LIST[#]}
for ((INDEX=0; INDEX < ${#FILE_LIST[#]}; INDEX=$((${INDEX}+${MAX_JOBS})) ));
do
JOBS_RUNNING=0
while ((JOBS_RUNNING < MAX_JOBS))
do
I=$((${INDEX}+${JOBS_RUNNING}))
FILE=${FILE_LIST[${I}]}
if [ "$FILE" != "" ];then
echo $JOBS_RUNNING $FILE
./M22Checker ${FILE} &
else
echo $JOBS_RUNNING NULL &
fi
JOBS_RUNNING=$((JOBS_RUNNING+1))
done
wait
done

You can use a simple nested for loop (substitute appropriate integers for N and M below):
for i in {1..N}; do
(for j in {1..M}; do do_something; done & );
done
This will execute do_something N*M times in M rounds, each round executing N jobs in parallel. You can make N equal the number of CPUs you have.

My solution to always keep a given number of processes running, keep tracking of errors and handle ubnterruptible / zombie processes:
function log {
echo "$1"
}
# Take a list of commands to run, runs them sequentially with numberOfProcesses commands simultaneously runs
# Returns the number of non zero exit codes from commands
function ParallelExec {
local numberOfProcesses="${1}" # Number of simultaneous commands to run
local commandsArg="${2}" # Semi-colon separated list of commands
local pid
local runningPids=0
local counter=0
local commandsArray
local pidsArray
local newPidsArray
local retval
local retvalAll=0
local pidState
local commandsArrayPid
IFS=';' read -r -a commandsArray <<< "$commandsArg"
log "Runnning ${#commandsArray[#]} commands in $numberOfProcesses simultaneous processes."
while [ $counter -lt "${#commandsArray[#]}" ] || [ ${#pidsArray[#]} -gt 0 ]; do
while [ $counter -lt "${#commandsArray[#]}" ] && [ ${#pidsArray[#]} -lt $numberOfProcesses ]; do
log "Running command [${commandsArray[$counter]}]."
eval "${commandsArray[$counter]}" &
pid=$!
pidsArray+=($pid)
commandsArrayPid[$pid]="${commandsArray[$counter]}"
counter=$((counter+1))
done
newPidsArray=()
for pid in "${pidsArray[#]}"; do
# Handle uninterruptible sleep state or zombies by ommiting them from running process array (How to kill that is already dead ? :)
if kill -0 $pid > /dev/null 2>&1; then
pidState=$(ps -p$pid -o state= 2 > /dev/null)
if [ "$pidState" != "D" ] && [ "$pidState" != "Z" ]; then
newPidsArray+=($pid)
fi
else
# pid is dead, get it's exit code from wait command
wait $pid
retval=$?
if [ $retval -ne 0 ]; then
log "Command [${commandsArrayPid[$pid]}] failed with exit code [$retval]."
retvalAll=$((retvalAll+1))
fi
fi
done
pidsArray=("${newPidsArray[#]}")
# Add a trivial sleep time so bash won't eat all CPU
sleep .05
done
return $retvalAll
}
Usage:
cmds="du -csh /var;du -csh /tmp;sleep 3;du -csh /root;sleep 10; du -csh /home"
# Execute 2 processes at a time
ParallelExec 2 "$cmds"
# Execute 4 processes at a time
ParallelExec 4 "$cmds"

$DOMAINS = "list of some domain in commands"
for foo in some-command
do
eval `some-command for $DOMAINS` &
job[$i]=$!
i=$(( i + 1))
done
Ndomains=echo $DOMAINS |wc -w
for i in $(seq 1 1 $Ndomains)
do
echo "wait for ${job[$i]}"
wait "${job[$i]}"
done
in this concept will work for the parallelize. important thing is last line of eval is '&'
which will put the commands to backgrounds.

Shell function to tail a log file for a specific string for a specific time

I need to the following things to make sure my application server is
Tail a log file for a specific string
Remain blocked until that string is printed
However if the string is not printed for about 20 mins quit and throw and exception message like "Server took more that 20 mins to be up"
If string is printed in the log file quit the loop and proceed.
Is there a way to include time outs in a while loop ?

#!/bin/bash
tail -f logfile | grep 'certain_word' | read -t 1200 dummy_var
[ $? -eq 0 ] && echo 'ok' || echo 'server not up'
This reads anything written to logfile, searches for certain_word, echos ok if all is good, otherwise after waiting 1200 seconds (20 minutes) it complains.

You can do it like this:
start_time=$(date +"%s")
while true
do
elapsed_time=$(($(date +"%s") - $start_time))
if [[ "$elapsed_time" -gt 1200 ]]; then
break
fi
sleep 1
if [[ $(grep -c "specific string" /path/to/log/file.log) -ge 1 ]]; then
break
fi
done

You can use signal handlers from shell scripts (see http://www.ibm.com/developerworks/aix/library/au-usingtraps/index.html).
Basically, you'd define a function to be called on, say, signal 17, then put a sub-script in the background that will send that signal at some later time:
timeout(pid) {
sleep 1200
kill -SIGUSR1 $pid
}
watch_for_input() {
tail -f file | grep item
}
trap 'echo "Not found"; exit' SIGUSR1
timeout($$) &
watch_for_input
Then if you reach 1200 seconds, your function is called and you can choose what to do (like signal your tail/grep combo that is watching for your pattern in order to kill it)

time=0
found=0
while [ $time -lt 1200 ]; do
out=$(tail logfile)
if [[ $out =~ specificString ]]; then
found=1
break;
fi
let time++
sleep 1
done
echo $found

The accepted answer doesn't work and will never exit (because althouth read -t exits, the prior pipe commands (tail -f | grep) will only be notified of read -t exit when they try to write to output, which never happens until the string matches).
A one-liner is probably feasible, but here are scripted (working) approaches.
Logic is the same for each one, they use kill to terminate the current script after the timeout.
Perl is probably more widely available than gawk/read -t
#!/bin/bash
FILE="$1"
MATCH="$2"
# Uses read -t, kill after timeout
#tail -f "$FILE" | grep "$MATCH" | (read -t 1 a ; kill $$)
# Uses gawk read timeout ability (not available in awk)
#tail -f "$FILE" | grep "$MATCH" | gawk "BEGIN {PROCINFO[\"/dev/stdin\", \"READ_TIMEOUT\"] = 1000;getline < \"/dev/stdin\"; system(\"kill $$\")}"
# Uses perl & alarm signal
#tail -f "$FILE" | grep "$MATCH" | perl -e "\$SIG{ALRM} = sub { `kill $$`;exit; };alarm(1);<>;"

looking for a command to tentatively execute a command based on criteria

I am looking for a command (or way of doing) the following:
echo -n 6 | doif -criteria "isgreaterthan 4" -command 'do some stuff'
The echo part would obviously come from a more complicated string of bash commands. Essentially I am taking a piece of text from each line of a file and if it appears in another set of files more than x (say 100) then it will be appended to another file.
Is there a way to perform such trickery with awk somehow? Or is there another command.. I'm hoping that there is some sort of xargs style command to do this in the sense that the -I% portion would be the value with which to check the criteria and whatever follows would be the command to execute.
Thanks for thy insight.

It's possible, though I don't see the reason why you would do that...
function doif
{
read val1
op=$1
val2="$2"
shift 2
if [ $val1 $op "$val2" ]; then
"$#"
fi
}
echo -n 6 | doif -gt 3 ls /

if test 6 -gt 4; then
# do some stuff
fi
or
if test $( echo 6 ) -gt 4; then : ;fi
or
output=$( some cmds that generate text)
# this will be an error if $output is ill-formed
if test "$output" -gt 4; then : ; fi

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Bash concurrent jobs gets stuck - linux

Related

bash: set variable inside loop when piping find and grep [duplicate]

bash script run to send process to background

ftp to multiple servers at same time [duplicate]

Shell function to tail a log file for a specific string for a specific time

looking for a command to tentatively execute a command based on criteria

Categories

Resources