how to re-run a while loop by yes/no prompt, to search a file until it apears - linux

I'm trying to write a script which helps me to search in log for a phrase "started".
The script that I have until now looks like this:
echo "Web server is going to be polled"
i=`expr $i - 1`
echo "Polling Nr. $i"
grep -q '^started$' log
#test, if it goes on
test "$greperg" -gt 0 -a "$i" -gt 0
echo 'waiting ...'
sleep 1
if test "$greperg" -eq 0
echo "Web server has started"
echo -n "Web server is not started"
while ((!valid)); do
echo - "Do you want to poll again? (J/N)"
read -t 5 answer
case "$answer" in
[Jj]) result=1; valid=1;;
[Nn]) result=0; valid=1;;
"") result=0; valid=1;;
*) valid=0 ;;
if ((result));then
: # ...............(repeat the process again, if its not found ask max 5 times)
echo "Timeout"
exit 0
From line 38, I don't know how to re-run it, can anybody help?
What i'm looking for:
The polling should be expanded: If after the 3 attempts that word (started) is still not there, then ask the user with a (Y / N) query whether more should be polled 3 times or not.
The user should be asked a maximum of 5 times
(So a maximum of 3 × 6 = 18 times is polled).
At the very end please state what the status reached is (see example below).
polling ...
Web server has not started yet.
Wait ...
polling ...
Web server has not started yet.
Wait ...
polling ...
Web server has not started yet.
Should it be polled again (Y / N)? _ Y
polling ...
Web server has not started yet.
Wait ...
polling ...
Web server has not started yet.
Wait ...
polling ...
Web server has not started yet.
Should it be polled again (Y / N)? _ N
As requested, no further attempts are made.
Bottom line: web server has not started.

Your code has several odd designs. Bash should generally not need to use expr at all (the shell has built-in facilities for integer arithmetic and substring matching) and you usually want to avoid explicitly testing $?. I would break this up into functions to "divide and conquer" the problem space.
# Print diagnostics to standard error; include script name
echo "$0: Web server is going to be polled" >&2
poll () {
local i
for ((i=1; i<=$1; ++i)); do
# Move wait to beginning of loop
if (($i > 1)); then
echo "$0: waiting ..." >&2
sleep 1
echo "$0: Polling Nr. $i" >&2
# Just check
if grep -q '^started$' log; then
# return success
return 0
# If we fall through to here, return failure
return 1
again () {
while true; do
# Notice read -p for prompt
read -t 5 -p "Do you want to poll again? (J/N)" answer
case "$answer" in
[Jj]*) return 0;;
"") continue;;
*) return 1;;
while true; do
if poll 3
echo "$0: Web server has started" >&2
echo "$0: Web server is not started" >&2
again || break
# Actually return the status to the caller
exit "$status"
The while true loop in the main script could easily be adapted to a for loop just like in the poll function if you want to restrict how many times the user is allowed to restart the polling. I wanted to show two different designs just to exhibit the options available to you.
In a real script, I would probably replace several of the simple if tests with the this || that shorthand. In brief,
this && that || other
is roughly equivalent to
if this; then
with the difference that that if that fails, you will also trigger other in the shorthand case.
Perhaps notice also how ((...)) in Bash is an arithmetic context. The three-place for loop is also a Bash extension.


How to run bash script while it returns code 0?

I have bash script with many lines of code and I need run it while it returns $? == 0, but in case if it has error I need stop it and exit with code 1?
The question is how to do it?
I tried to use set -e command, but Jenkins does not marks build as failed, for him it looks like Success
I also need to get the Error message to show it in my Jenkins log
I managed to get error code(in my case it will be 126), but how to get error message?
main file
rc=$?; if [[ $rc != 0 ]]; then
echo "exit {$rc} ";
set -e
echo "Test"
echo "Test2"
echo "Test3"
Just add the command set -e to the beginning of the file
This should look something similar to this
set -e
#...Your code...
I think you just want:
while; do
sleep 1;
echo failed!! >&2
Note that if the script is written well, then the echo is
redundant as should have written a decent
error message already. Also, the sleep may not be needed, but is useful to prevent a fast loop if the script succeeds quickly.
You can get the explicit return value, but it requires a bit of refactoring.
while test $? = 0; do; done
echo failed with status $?!! >&2
since the return value of the while script will be the
return value of sleep in the first construction.
Its not quite easy to get an error code only.
How about this ...
Msg=$( 2>&1) # redirect all error messages to stdout
if [ "$?" -ne 0 ] # Not Equal
echo "$Msg"
exit 1
exit 0
You catch all messages created by and if the programm returned an error code then you have the error message already saved in a variable.
But this will make a disadvantage, because you will temporary store all messages created by till the error appears.
You can filter the error message with echo "$Msg" |tail -n 1, but its not 100% save.
You should also do some changes in
Switch set -e with trap "exit 1" ERR. this will close the script on errors.
Hope this will help.

Can I detect early exit from a long-running, backgrounded process?

I'm trying to improve the startup scripts for several servers running in a cluster environment. The server processes should run indefinitely but occasionally fails on startup issuing e.g., Address already in use exceptions.
I'd like the exit code for the startup script to reflect these early terminations by, say, waiting for 1 second and telling me if the server seems to have started okay. I also need the server PID echoed.
Here's my best shot so far:
$ cat
# start the server in the bg but if it fails in the first second,
# then kill
CMD="start_server -option1 foo -option2 bar"
eval "($CMD >> cc.log 2>&1 || kill -9 $$ &)"
# the `kill` above only has 1 second to kill me-- otherwise my exit code is 0
sleep 1
The exit code works fine but two problems remain:
If the server is long-running but eventually encounters an error, the parent will have exited already and the $$ PID may have been reused by an unrelated process which this script will then kill off.
The SERVER_PID isn't correct since it's the PID of the subshell rather than the start_server command (which in this case is a grandchild of the script.
Is there a simpler way to background the start_server process, get its PID, and use a timeout'ed check for error codes? I looked into bash builtins wait and timeout but they don't seem to work for processes that shouldn't exit in the end.
I can't change the server code and the startup script should not run indefinitely.
You can also use coproc (and look, I'm putting the command in an array, and also with proper quoting!):
cmd=( start_server -option1 foo -option2 bar )
coproc mycoprocfd { "${cmd[#]}" >> cc.log 2>&1 ; }
sleep 1
if [[ -z "${mycoprocfd[#]}" ]]; then
echo >&2 "Failure detected when starting server! Server died before 1 second."
exit 1
echo $server_pid
The trick is that coproc puts the file descriptors of the redirections of stdin and stdout in a prescribed array (here mycoprocfd) and empties the array when the process exits. So you don't need to do clumsy stuff with the PID itself.
You can hence check for the server to never exit as so:
cmd=( start_server -option1 foo -option2 bar )
coproc mycoprocfd { "${cmd[#]}" >> cc.log 2>&1 ; }
read -u "${mycoprocfd[0]}"
echo >&2 "Oh dear, the server with PID $server_pid died after $SECONDS seconds."
exit 1
That's because read will read on the file descriptor given by coproc (but nothing is ever read here, since the stdout of your command has been redirected to a file!), and read exits when the file descriptor is closed, i.e., when the command launched by coproc exits.
I'd say this is a really elegant solution!
Now, this script will live as long as the coproc lives. I understood that's not what you want. In this case, you can timeout the read with its -t option, and then you'll use the fact that return's exit status is greater than 128 if it timed out. E.g., for a 4.5 seconds timeout
cmd=( start_server -option1 foo -option2 bar )
coproc mycoprocfd { "${cmd[#]}" >> cc.log 2>&1 ; }
read -t $timeout -u "${mycoprocfd[0]}"
if (($?>128)); then
echo "$server_pid <-- all is good, it's still alive after $timeout seconds."
echo >&2 "Oh dear, the server with PID $server_pid died after $timeout seconds."
exit 1
exit 0 # Yay
This is also very elegant :).
Use, extend, and adapt to your needs! (but with good practices!)
Hope this helps!
coproc is a bash-builtin that appeared in bash 4.0. The solutions shown here are 100% pure bash (except the first one, with sleep, which is not the best one at all!).
The use of coproc in scripts is almost always superior to putting jobs in background with & and doing clumsy and awkward stuff with sleep and checking $!.
If you want coproc to keep quiet, whatever happens (e.g., if there's an error launching the command, which is fine here since you're handling everything yourself), do:
coproc mycoprocfd { "${cmd[#]}" >> cc.log 2>&1 ; } > /dev/null 2>&1
20 minutes of more googling revealed and kill -0 $PID from
So it seems I can use:
$ cat
CMD="start_server -option1 foo -option2 bar"
eval "$CMD >> cc.log 2>&1 &"
sleep 1
kill -0 $SERVER_PID
if [ $? != 0 ]; then
echo "Failure detected when starting server! PID $SERVER_PID doesn't exist!" 1>&2
exit 1
This wouldn't work for processes that I can't send signals to but works well enough in my case (where starts the server itself).

Bash infinite loop sleep having strange behavior (NGINX/PHP-FPM/PGSQL)

I'm not sure it should be in stackoverflow or serverfault. I post here because it may be a programming problem.
I have this infinite loop:
RESULT=`curl "http://somepage.php?thread=0"`
while :
if [[ "$RESULT" == "DONE" ]]
RESULT=`curl "http://somepage.php?thread=0"`
elif [[ "$RESULT" == "NONE" ]]
sleep 5
RESULT=`curl "http://somepage.php?thread=0"`
printf "%s %s\n" "$(date --rfc-3339='seconds'): ELSE1-" "$RESULT" >> /var/log/XXX/loopXXX-`date --rfc-3339='date'`
sleep 5
RESULT=`curl "http://somepage.php?thread=0"`
if [[ "$RESULT" == "DONE" ]]
RESULT=`curl "http://jsomepage.php?thread=0"`
elif [[ "$RESULT" == "NONE" ]]
sleep 5
RESULT=`curl "http://somepage.php?thread=0"`
printf "STOP"
I have 3 loops doing the same job and requesting thread 0 to 2. In the DBtable the PHP page request, there is a column thread. So the three loops query the same table (read/write) but never the same lines.
The problem I experience is that in some nights (almost no activity), one loop doesn't request a page for several hours (I checked in NGINX access log). This only happen sometimes and the server is way more powerfull than needed yet.
Is there problems using infinite loop with curl? In total I have around 10 loops (different pages/tables) but they have a 10s sleep instead of 5s.
Is there a problem in my script with memory/curl? Have you ever experienced something similar?
One of the curl lines is probably taking much longer than you expect to execute.
You should use curl's --max-time parameter in order to limit the duration of any single execution to something sane. It expects seconds.
RESULT=`curl --max-time 10 "http://somepage.php?thread=0"`
Note that you may now encounter failures where instead you had been seeing long delays. Checking the output might be satisfactory for your application, but return codes are the path to enlightenment. You may even want to use the "-e" option in your shebang and/or create a handler to be used with a trap for ERR.
Try to set max timeouts for your every curl command to prevent them from hanging. Example:
curl -m 50 ...

Linux Single Instance Kill if running too long

I am using the following to keep a single instance of a script running on my server. I have a cronjob to run this every minute.
How do I daemonize an arbitrary script in unix?
if [[ $# < 1 ]]; then
echo "Name of pid file not given."
# Get the pid file's name.
if [[ $# < 1 ]]; then
echo "No command given."
echo "Checking pid in file $PIDFILE."
#Check to see if process running.
PID=$(cat $PIDFILE 2>/dev/null)
if [[ $? = 0 ]]; then
ps -p $PID >/dev/null 2>&1
if [[ $? = 0 ]]; then
echo "Command $1 already running."
# Write our pid to file.
echo $$ >$PIDFILE
# Get command.
# Run command
Now I found out that my script had hung for some reason and therefore it was stuck. I'd like a way to check if the $PIDFILE is "old" and if so, kill the process. I know that's possible (check the timestamp on the file) but I don't know the syntax or if this is even a good idea. Also, when this script is running, the CPU should be pretty heavily used. If it hangs (rare but it happened at least once so far), the CPU usage drops to 0%. It would be nice if I could check that the process is really hung/not active, but I don't know if there's an easy way to do that (and I don't want to have many false positives where it gets killed but it's running fine).
To answer the question in your title, which seems quite different from your problem, use timeout.
Now, for your problem, I don't see where it could hang, unless you gave it a fifo queue for the pid file. Now, to run and respawn, you can just run this script once, on startup:
while /bin/true; do
Which brings up another bug in the code you got from the other question: "$*" will pass all the arguments to the script as a single argument; without the quotes it'll split arguments with white space. "$#" will pass them individually and handling white space properly.
Call with /path/to/script command [argument]....

Using named pipes with bash - Problem with data loss

Did some search online, found simple 'tutorials' to use named pipes. However when I do anything with background jobs I seem to lose a lot of data.
[[Edit: found a much simpler solution, see reply to post. So the question I put forward is now academic - in case one might want a job server]]
Using Ubuntu 10.04 with Linux 2.6.32-25-generic #45-Ubuntu SMP Sat Oct 16 19:52:42 UTC 2010 x86_64 GNU/Linux
GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu).
My bash function is:
function jqs
trap "rm -f $pipe; exit" EXIT SIGKILL
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
while true
if read txt <"$pipe"
echo "$(date +'%Y'): new text is [[$txt]]"
if [[ "$txt" == 'quit' ]]
I run this in the background:
> jqs&
[1] 5336
And now I feed it:
for i in 1 2 3 4 5 6 7 8
(echo aaa$i > /tmp/__job_control_manager__ && echo success$i &)
The output is inconsistent.
I frequently don't get all success echoes.
I get at most as many new text echos as success echoes, sometimes less.
If I remove the '&' from the 'feed', it seems to work, but I am blocked until the output is read. Hence me wanting to let sub-processes get blocked, but not the main process.
The aim being to write a simple job control script so I can run say 10 jobs in parallel at most and queue the rest for later processing, but reliably know that they do run.
Full job manager below:
function jq_manage
export __gn__="$1"
trap "rm -f $pipe" EXIT
trap "break" SIGKILL
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
while true
if (($(jobs | egrep "Running.*echo '%#_Group_#%_$__gn__'" | wc -l) < $__jN__))
echo "Waiting for new job"
if read new_job <"$pipe"
echo "new job is [[$new_job]]"
if [[ "$new_job" == 'quit' ]]
echo "In group $__gn__, starting job $new_job"
eval "(echo '%#_Group_#%_$__gn__' > /dev/null; $new_job) &"
sleep 3
function jq
# __gn__ = first parameter to this function, the job group name (the pool within which to allocate __jN__ jobs)
# __jN__ = second parameter to this function, the maximum of job numbers to run concurrently
export __gn__="$1"
export __jN__="$1"
export __jq__=$(jobs | egrep "Running.*echo '%#_GroupQueue_#%_$__gn__'" | wc -l)
if (($__jq__ '<' 1))
eval "(echo '%#_GroupQueue_#%_$__gn__' > /dev/null; jq_manage $__gn__) &"
echo $# >$pipe
jq <name> <max processes> <command>
jq abc 2 sleep 20
will start one process.
That part works fine. Start a second one, fine.
One by one by hand seem to work fine.
But starting 10 in a loop seems to lose the system, as in the simpler example above.
Any hints as to what I can do to solve this apparent loss of IPC data would be greatly appreciated.
Your problem is if statement below:
while true
if read txt <"$pipe"
What is happening is that your job queue server is opening and closing the pipe each time around the loop. This means that some of the clients are getting a "broken pipe" error when they try to write to the pipe - that is, the reader of the pipe goes away after the writer opens it.
To fix this, change your loop in the server open the pipe once for the entire loop:
while true
if read txt
done < "$pipe"
Done this way, the pipe is opened once and kept open.
You will need to be careful of what you run inside the loop, as all processing inside the loop will have stdin attached to the named pipe. You will want to make sure you redirect stdin of all your processes inside the loop from somewhere else, otherwise they may consume the data from the pipe.
Edit: With the problem now being that you are getting EOF on your reads when the last client closes the pipe, you can use jilles method of duping the file descriptors, or you can just make sure you are a client too and keep the write side of the pipe open:
while true
if read txt
done < "$pipe" 3> "$pipe"
This will hold the write side of the pipe open on fd 3. The same caveat applies with this file descriptor as with stdin. You will need to close it so any child processes dont inherit it. It probably matters less than with stdin, but it would be cleaner.
As said in other answers you need to keep the fifo open at all times to avoid losing data.
However, once all writers have left after the fifo has been open (so there was a writer), reads return immediately (and poll() returns POLLHUP). The only way to clear this state is to reopen the fifo.
POSIX does not provide a solution to this but at least Linux and FreeBSD do: if reads start failing, open the fifo again while keeping the original descriptor open. This works because in Linux and FreeBSD the "hangup" state is local to a particular open file description, while in POSIX it is global to the fifo.
This can be done in a shell script like this:
while :; do
exec 3<tmp/testfifo
exec 4<&-
while read x; do
echo "input: $x"
done <&3
exec 4<&3
exec 3<&-
Just for those that might be interested, [[re-edited]] following comments by camh and jilles, here are two new versions of the test server script.
Both versions now works exactly as hoped.
camh's version for pipe management:
function jqs # Job queue manager
trap "rm -f $pipe; exit" EXIT TERM
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
while true
if read -u 3 txt
echo "$(date +'%Y'): new text is [[$txt]]"
if [[ "$txt" == 'quit' ]]
sleep 1
# process $txt - remember that if this is to be a spawned job, we should close fd 3 and 4 beforehand
done 3< "$pipe" 4> "$pipe" # 4 is just to keep the pipe opened so any real client does not end up causing read to return EOF
jille's version for pipe management:
function jqs # Job queue manager
trap "rm -f $pipe; exit" EXIT TERM
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
exec 3< "$pipe"
exec 4<&-
while true
if read -u 3 txt
echo "$(date +'%Y'): new text is [[$txt]]"
if [[ "$txt" == 'quit' ]]
sleep 1
# process $txt - remember that if this is to be a spawned job, we should close fd 3 and 4 beforehand
# Close the pipe and reconnect it so that the next read does not end up returning EOF
exec 4<&3
exec 3<&-
exec 3< "$pipe"
exec 4<&-
Thanks to all for your help.
Like camh & Dennis Williamson say don't break the pipe.
Now I have smaller examples, direct on the command line:
for i in {0,1,2,3,4}{0,1,2,3,4,5,6,7,8,9};
if read s;
then echo ">>$i--$s//";
echo "<<$i";
done < tst-fifo
for i in {%a,#b}{1,2}{0,1};
echo "Test-$i" > tst-fifo;
Can replace the key line with:
(echo "Test-$i" > tst-fifo&);
All client data sent to the pipe gets read, though with option two of the client one may need to start the server a couple of times before all data is read.
But although the read waits for data in the pipe to start with, once data has been pushed, it reads the empty string forever.
Any way to stop this?
Thanks for any insights again.
On the one hand the problem is worse than I thought:
Now there seems to be a case in my more complex example (jq_manage) where the same data is being read over and over again from the pipe (even though no new data is being written to it).
On the other hand, I found a simple solution (edited following Dennis' comment):
function jqn # compute the number of jobs running in that group
__jqty__=$(jobs | egrep "Running.*echo '%#_Group_#%_$__groupn__'" | wc -l)
function jq
__groupn__="$1"; shift # job group name (the pool within which to allocate $__jmax__ jobs)
__jmax__="$1"; shift # maximum of job numbers to run concurrently
while (($__jqty__ '>=' $__jmax__))
sleep 1
eval "(echo '%#_Group_#%_$__groupn__' > /dev/null; $#) &"
Works like a charm.
No socket or pipe involved.
run say 10 jobs in parallel at most and queue the rest for later processing, but reliably know that they do run
You can do this with GNU Parallel. You will not need a this scripting.
You can set max-procs "Number of jobslots. Run up to N jobs in parallel." There is an option to set the number of CPU cores you want to use. You can save the list of executed jobs to a log file, but that is a beta feature.
