How to timeout a tail pipeline properly on shell - linux

I am implementing monitor_log function which will tail the most recent line from running log and check required string with while loop, the timeout logic should be when the tail log running over 300 seconds, it must close the tail and while loop pipeline.
The big issue i found is for some server the running log NOT keep generating, which means tail -n 1 -f "running.log" will also NOT generate output for while loop to consume, hence the timeout checking logic if [[ $(($SECONDS - start_timer)) -gt 300 ]] will not hit properly.
e.g I set 300 seconds to timeout, but if running.log stopped generate new line before 300 seconds and no more new line in 30 minutes, tail will not generate new output in 30 minutes, hence timeout checking logic in while loop not hit in 30 minutes, so even after 300 seconds it keep tailing and not break out, and if no new line coming from running.log forever, the timeout checking logic will not hit forever.
function monitor_log() {
if [[ -f "running.log" ]]; then
# Timer start
start_timer=$SECONDS
# Tail the running log last line and keep check required string
tail -n 1 -f "running.log" | while read tail_line
do
if [[ $(($SECONDS - start_timer)) -gt 300 ]]; then
break;
fi
if [[ "$tail_line" == "required string" ]]; then
capture_flag=1
fi
if [[ $capture_flag -eq 1 ]]; then
break;
fi
done
fi
}
Could you help to figure out the proper way to timeout the tail and while loop when 300 seconds ? Thank you.

Two options worth considering for inactivity timeout. Usually, option #1 works better.
Option 1: Use timeout (read -t timeout).
It will cap the the 'read' time. See information from bash man. The timeout will cause the read to fail, breaking the whlie loop.
In the code above, replace
tail -n 1 -f "running.log" | while read tail_line
with
tail -n 1 -f "running.log" | while read -t 300 tail_line
Option 2: TMOUT envvar
It's possible to get same effect by setting TMOUT env var.
From bash man - 'read' command:
-t timeout
Cause read to time out and return failure if a complete line of input (or a specified number of characters) is not
read within timeout seconds. timeout may be a decimal number with a
fractional
portion following the decimal point. This option is only effective if read is reading input from a terminal,
pipe, or other special file; it has no effect when reading from
regular files. If
read times out, read saves any partial input read into the specified variable name. If timeout is 0, read returns
immediately, without trying to read any data. The exit status is 0 if
input is
available on the specified file descriptor, non-zero otherwise. The exit status is greater than 128 if the
timeout is exceeded.

Based on dash-o's answer I did test for option 1, the -t for read command works fine only when while read loop on main shell and tail in sub shell, in my question, the tail in main shell, and while read loop consume its output in subshell, in this condition, even setup -t for read command, script not stop when time used up. Refer to
Monitoring a file until a string is found, Bash tail -f with while-read and pipe hangs and How to [constantly] read the last line of a file?
The working code based on dash-o's solution below:
function monitor_log() {
if [[ -f "running.log" ]]; then
# Tail the running log last line and keep check required string
while read -t 300 tail_line
do
if [[ "$tail_line" == "required string" ]]; then
capture_flag=1
fi
if [[ $capture_flag -eq 1 ]]; then
break;
fi
done < <(tail -n 1 -f "running.log")
# Silently kill the remained tail process
tail_pid=$(ps -ef | grep 'tail' | cut -d' ' -f5)
kill -13 $tail_pid
fi
}
But as test, this function after timeout auto terminate will left tail process alive, we can observe PID by check ps -ef on console, need to kill tail_PID separately.
Also test another solution: not change tail and while read loop position, so tail still on main shell and while read loop keep in sub shell after | pipeline, the only change is adding GNU's timeout command before tail command, it works perfect and no tail process left after timeout auto terminate:
function monitor_log() {
if [[ -f "running.log" ]]; then
# Tail the running log last line and keep check required string
timeout 300 tail -n 1 -f "running.log" | while read tail_line
do
if [[ "$tail_line" == "required string" ]]; then
capture_flag=1
fi
if [[ $capture_flag -eq 1 ]]; then
break;
fi
done
fi
}

Related

Trying to make a live /proc/ reader using bash script for live process monitoring

Im trying to make a little side project script to sit and monitor all of the /proc/ directories, for the most part I have the concept running and it works(to a degree). What im aiming for here is to scan through all the files and cat their status files and pull out the appropriate info, and then I would like to run this process in an infinite loop to give me live updates of when something is running on and dropping off of the scheduler. Right now every time you run the script, it will print 50+ blank lines and every single time it hits the proper regex it will print it correctly, but Im aiming for it to not roll down the screen the way it does. Any help at all would be appreciated.
regex="[0-9]"
temp=""
for f in /proc/*; do
if [[ -d $f && $f =~ /proc/$regex ]]; then
output=$(cat $f/status | grep "^State") #> /dev/null
process_id=$(cut -b 7- <<< $f)
state=$(cut -b 10-19 <<< $output)
tabs 4
if [[ $state =~ "(running)" ]]; then
echo -e "$process_id:$state\n" | sort >> temp
fi
fi
done
cat temp
rm temp````
To get the PID and status of running all processes, try:
awk -F':[[:space:]]*' '/State:/{s=$2} /Pid:/{p=$2} ENDFILE{if (s~/running/) print p,s; p="X"; s="X"}' OFS=: /proc/*/status
To get this output updated every second:
while sleep 1; do awk -F':[[:space:]]*' '/State:/{s=$2} /Pid:/{p=$2} ENDFILE{if (s~/running/) print p,s; p="X"; s="X"}' OFS=: /proc/*/status; done

Bash sizeout script

I like very much the style, how bash handle the shells.
I am looking for the native solution to cover a bash command for testing the size of a result file and exit in the case of that became too big in size.
I am thinking about a command like
sizeout $fileName $maxSize otherBashCommand
It would be usefull to use it in a backup script like:
sizeout $fileName $maxSize timeout 600s ionice nice sudo rear mkbackup
To make it one step more complicated, i would call it over ssh:
ssh $remoteuser#$remoteServer sizeout $fileName $maxSize timeout 600s ionice nice sudo rear mkbackup
What kind of design pattern should i use for this ?
Solution
I have modified Socowi's code a little
#! /bin/bash
# shell script to stop encapsulated script in the case of
# checked file reaching file size limit
# usage
# sizeout.sh filename filesize[Bytes] encapsulated_command arguments
fileName=$1 # file we are checking
maxSize=$2 # max. file size (in bytes) to stop the pid
shift 2
echo "fileName: $fileName"
echo "maxSize: $maxSize"
function limitReached() {
if [[ ! -f $fileName ]]; then
return 1 # file doesn't exist, return with false
fi
actSize=$(stat --format %s $fileName)
if [[ $actSize -lt $maxSize ]]; then
return 1 # filesize under maxsize, return with false
fi
return 0
}
# run command as a background job
$# &
pid=$!
# monitor file size while job is running
while kill -0 $pid; do
limitReached && kill $pid
sleep 1
done 2> /dev/null
wait $pid # return with the exit code of the $pid
I added wait $pid to the end, that returns with the exit code of the background process instead of it's on exit code.
Monitor the File Size Every n Time Units
I don't know whether there is a design pattern for your problem, but you could write the sizeout script as follows:
#! /bin/bash
filename="$1"
maxsize="$2" # max. file size (in bytes)
shift 2
limitReached() {
[[ -e "$filename" ]] &&
(( "$(stat --printf="%s" "$filename")" >= maxsize ))
}
limitReached && exit 0
# run command as a background job
"$#" &
pid="$!"
# monitor file size while job is running
while kill -0 "$pid"; do
limitReached && kill "$pid"
sleep 0.2
done 2> /dev/null
This script checks the file size every 200ms and kills your command if the file size exceeds the maximum. Since we only check every 200ms, the file may end up with (yourWriteSpeed Bytes/s × 0.2s) more than the specified maximum size.
The following points can be improved:
Validate parameters.
Set a trap to kill the background job in every case, for instance when pressing Ctrl+C.
Monitor File Changes
The script from above is not very efficient, since we check the file size every 200ms, even if the file does not change at all. inotifywait allows you to wait until the file changes. See this answer for more information.
A Word on SSH
You just need to copy the sizeout script over to your remote server, then you can use it like on your local machine:
ssh $remoteuser#$remoteServer path/to/sizeout filename maxSize ... mkbackup

Bash script optimization for waiting for a particular string in log files

I am using a bash script that calls multiple processes which have to start up in a particular order, and certain actions have to be completed (they then print out certain messages to the logs) before the next one can be started. The bash script has the following code which works really well for most cases:
tail -Fn +1 "$log_file" | while read line; do
if echo "$line" | grep -qEi "$search_text"; then
echo "[INFO] $process_name process started up successfully"
pkill -9 -P $$ tail
return 0
elif echo "$line" | grep -qEi '^error\b'; then
echo "[INFO] ERROR or Exception is thrown listed below. $process_name process startup aborted"
echo " ($line) "
echo "[INFO] Please check $process_name process log file=$log_file for problems"
pkill -9 -P $$ tail
return 1
fi
done
However, when we set the processes to print logging in DEBUG mode, they print so much logging that this script cannot keep up, and it takes about 15 minutes after the process is complete for the bash script to catch up. Is there a way of optimizing this, like changing 'while read line' to 'while read 100 lines', or something like that?
How about not forking up to two grep processes per log line?
tail -Fn +1 "$log_file" | grep -Ei "$search_text|^error\b" | while read line; do
So one long running grep process shall do preprocessing if you will.
Edit: As noted in the comments, it is safer to add --line-buffered to the grep invocation.
Some tips relevant for this script:
Checking that the service is doing its job is a much better check for daemon startup than looking at the log output
You can use grep ... <<<"$line" to execute fewer echos.
You can use tail -f | grep -q ... to avoid the while loop by stopping as soon as there's a matching line.
If you can avoid -i on grep it might be significantly faster to process the input.
Thou shalt not kill -9.

Terminating half of a pipe on Linux does not terminate the other half

I have a filewatch program:
#!/bin/sh
# On Linux, uses inotifywait -mre close_write, and on OS X uses fswatch.
set -e
[[ "$#" -ne 1 ]] && echo "args count" && exit 2
if [[ `uname` = "Linux" ]]; then
inotifywait -mcre close_write "$1" | sed 's/,".*",//'
elif [[ `uname` = "Darwin" ]]; then
# sed on OSX/BSD wants -l for line-buffering
fswatch "$1" | sed -l 's/^[a-f0-9]\{1,\} //'
fi
echo "fswatch: $$ exiting"
And a construct i'm trying to use from a script (and I am testing with it on the command line on CentOS now):
filewatch . | while read line; do echo "file $line has changed\!\!"; done &
So what I am hoping this does is it will let me process, one line at a time, the output of inotify, which of course sends out one line for each file it has detected a change on.
Now for my script to clean stuff up properly I need to be able to kill this whole backgrounded pipeline when the script exits.
So i run it and then if I run kill on either the first part of the pipe or the second part, the other part does not terminate.
So I think if I kill the while read line part (which should be sh (zsh in the case of running on the cmd line)) then filewatch should be receiving a SIGPIPE. Okay so I am not handling that, I guess it can keep running.
If I kill filewatch, though, it looks like zsh continues with its while read line. Why?

Using named pipes with bash - Problem with data loss

Did some search online, found simple 'tutorials' to use named pipes. However when I do anything with background jobs I seem to lose a lot of data.
[[Edit: found a much simpler solution, see reply to post. So the question I put forward is now academic - in case one might want a job server]]
Using Ubuntu 10.04 with Linux 2.6.32-25-generic #45-Ubuntu SMP Sat Oct 16 19:52:42 UTC 2010 x86_64 GNU/Linux
GNU bash, version 4.1.5(1)-release (x86_64-pc-linux-gnu).
My bash function is:
function jqs
{
pipe=/tmp/__job_control_manager__
trap "rm -f $pipe; exit" EXIT SIGKILL
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
fi
while true
do
if read txt <"$pipe"
then
echo "$(date +'%Y'): new text is [[$txt]]"
if [[ "$txt" == 'quit' ]]
then
break
fi
fi
done
}
I run this in the background:
> jqs&
[1] 5336
And now I feed it:
for i in 1 2 3 4 5 6 7 8
do
(echo aaa$i > /tmp/__job_control_manager__ && echo success$i &)
done
The output is inconsistent.
I frequently don't get all success echoes.
I get at most as many new text echos as success echoes, sometimes less.
If I remove the '&' from the 'feed', it seems to work, but I am blocked until the output is read. Hence me wanting to let sub-processes get blocked, but not the main process.
The aim being to write a simple job control script so I can run say 10 jobs in parallel at most and queue the rest for later processing, but reliably know that they do run.
Full job manager below:
function jq_manage
{
export __gn__="$1"
pipe=/tmp/__job_control_manager_"$__gn__"__
trap "rm -f $pipe" EXIT
trap "break" SIGKILL
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
fi
while true
do
date
jobs
if (($(jobs | egrep "Running.*echo '%#_Group_#%_$__gn__'" | wc -l) < $__jN__))
then
echo "Waiting for new job"
if read new_job <"$pipe"
then
echo "new job is [[$new_job]]"
if [[ "$new_job" == 'quit' ]]
then
break
fi
echo "In group $__gn__, starting job $new_job"
eval "(echo '%#_Group_#%_$__gn__' > /dev/null; $new_job) &"
fi
else
sleep 3
fi
done
}
function jq
{
# __gn__ = first parameter to this function, the job group name (the pool within which to allocate __jN__ jobs)
# __jN__ = second parameter to this function, the maximum of job numbers to run concurrently
export __gn__="$1"
shift
export __jN__="$1"
shift
export __jq__=$(jobs | egrep "Running.*echo '%#_GroupQueue_#%_$__gn__'" | wc -l)
if (($__jq__ '<' 1))
then
eval "(echo '%#_GroupQueue_#%_$__gn__' > /dev/null; jq_manage $__gn__) &"
fi
pipe=/tmp/__job_control_manager_"$__gn__"__
echo $# >$pipe
}
Calling
jq <name> <max processes> <command>
jq abc 2 sleep 20
will start one process.
That part works fine. Start a second one, fine.
One by one by hand seem to work fine.
But starting 10 in a loop seems to lose the system, as in the simpler example above.
Any hints as to what I can do to solve this apparent loss of IPC data would be greatly appreciated.
Regards,
Alain.
Your problem is if statement below:
while true
do
if read txt <"$pipe"
....
done
What is happening is that your job queue server is opening and closing the pipe each time around the loop. This means that some of the clients are getting a "broken pipe" error when they try to write to the pipe - that is, the reader of the pipe goes away after the writer opens it.
To fix this, change your loop in the server open the pipe once for the entire loop:
while true
do
if read txt
....
done < "$pipe"
Done this way, the pipe is opened once and kept open.
You will need to be careful of what you run inside the loop, as all processing inside the loop will have stdin attached to the named pipe. You will want to make sure you redirect stdin of all your processes inside the loop from somewhere else, otherwise they may consume the data from the pipe.
Edit: With the problem now being that you are getting EOF on your reads when the last client closes the pipe, you can use jilles method of duping the file descriptors, or you can just make sure you are a client too and keep the write side of the pipe open:
while true
do
if read txt
....
done < "$pipe" 3> "$pipe"
This will hold the write side of the pipe open on fd 3. The same caveat applies with this file descriptor as with stdin. You will need to close it so any child processes dont inherit it. It probably matters less than with stdin, but it would be cleaner.
As said in other answers you need to keep the fifo open at all times to avoid losing data.
However, once all writers have left after the fifo has been open (so there was a writer), reads return immediately (and poll() returns POLLHUP). The only way to clear this state is to reopen the fifo.
POSIX does not provide a solution to this but at least Linux and FreeBSD do: if reads start failing, open the fifo again while keeping the original descriptor open. This works because in Linux and FreeBSD the "hangup" state is local to a particular open file description, while in POSIX it is global to the fifo.
This can be done in a shell script like this:
while :; do
exec 3<tmp/testfifo
exec 4<&-
while read x; do
echo "input: $x"
done <&3
exec 4<&3
exec 3<&-
done
Just for those that might be interested, [[re-edited]] following comments by camh and jilles, here are two new versions of the test server script.
Both versions now works exactly as hoped.
camh's version for pipe management:
function jqs # Job queue manager
{
pipe=/tmp/__job_control_manager__
trap "rm -f $pipe; exit" EXIT TERM
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
fi
while true
do
if read -u 3 txt
then
echo "$(date +'%Y'): new text is [[$txt]]"
if [[ "$txt" == 'quit' ]]
then
break
else
sleep 1
# process $txt - remember that if this is to be a spawned job, we should close fd 3 and 4 beforehand
fi
fi
done 3< "$pipe" 4> "$pipe" # 4 is just to keep the pipe opened so any real client does not end up causing read to return EOF
}
jille's version for pipe management:
function jqs # Job queue manager
{
pipe=/tmp/__job_control_manager__
trap "rm -f $pipe; exit" EXIT TERM
if [[ ! -p "$pipe" ]]; then
mkfifo "$pipe"
fi
exec 3< "$pipe"
exec 4<&-
while true
do
if read -u 3 txt
then
echo "$(date +'%Y'): new text is [[$txt]]"
if [[ "$txt" == 'quit' ]]
then
break
else
sleep 1
# process $txt - remember that if this is to be a spawned job, we should close fd 3 and 4 beforehand
fi
else
# Close the pipe and reconnect it so that the next read does not end up returning EOF
exec 4<&3
exec 3<&-
exec 3< "$pipe"
exec 4<&-
fi
done
}
Thanks to all for your help.
Like camh & Dennis Williamson say don't break the pipe.
Now I have smaller examples, direct on the command line:
Server:
(
for i in {0,1,2,3,4}{0,1,2,3,4,5,6,7,8,9};
do
if read s;
then echo ">>$i--$s//";
else
echo "<<$i";
fi;
done < tst-fifo
)&
Client:
(
for i in {%a,#b}{1,2}{0,1};
do
echo "Test-$i" > tst-fifo;
done
)&
Can replace the key line with:
(echo "Test-$i" > tst-fifo&);
All client data sent to the pipe gets read, though with option two of the client one may need to start the server a couple of times before all data is read.
But although the read waits for data in the pipe to start with, once data has been pushed, it reads the empty string forever.
Any way to stop this?
Thanks for any insights again.
On the one hand the problem is worse than I thought:
Now there seems to be a case in my more complex example (jq_manage) where the same data is being read over and over again from the pipe (even though no new data is being written to it).
On the other hand, I found a simple solution (edited following Dennis' comment):
function jqn # compute the number of jobs running in that group
{
__jqty__=$(jobs | egrep "Running.*echo '%#_Group_#%_$__groupn__'" | wc -l)
}
function jq
{
__groupn__="$1"; shift # job group name (the pool within which to allocate $__jmax__ jobs)
__jmax__="$1"; shift # maximum of job numbers to run concurrently
jqn
while (($__jqty__ '>=' $__jmax__))
do
sleep 1
jqn
done
eval "(echo '%#_Group_#%_$__groupn__' > /dev/null; $#) &"
}
Works like a charm.
No socket or pipe involved.
Simple.
run say 10 jobs in parallel at most and queue the rest for later processing, but reliably know that they do run
You can do this with GNU Parallel. You will not need a this scripting.
http://www.gnu.org/software/parallel/man.html#options
You can set max-procs "Number of jobslots. Run up to N jobs in parallel." There is an option to set the number of CPU cores you want to use. You can save the list of executed jobs to a log file, but that is a beta feature.

Resources