Perl script log to file, output lag - linux

I have a Perlscript which does some logfile parsing and sometimes executes a bash command:
$messagePath = `ls -t -d -1 $dir | head -n 5 | xargs grep -l "$messageSearchString"\`;
I start my perl script like this ./perlscript.pl > logfile.log.
Now I do a tail on the logfile to watch the progress, but the output gets stuck every time at the line I described above.
The output will stop there for some seconds and then continue. ???
To profile the problem I wrapped it like this:
print `date`;
$messagePath = `ls -t -d -1 $dir | head -n 5 | xargs grep -l "$messageSearchString"`;
print `date`;
The output shows that the command does not consume a lot of time:
So 6. Okt 22:35:04 CEST 2013
So 6. Okt 22:35:04 CEST 2013
If I run the script without redirecting the output to a file there is no LAG.
Any idea why?

I haven't tried to duplicate your behaviour, but it might be a stdout buffering problem. Try with:
$| = 1;
$messagePath = `ls -t -d -1 $dir | head -n 5 | xargs grep -l "$messageSearchString"`;
Update
I have tried to duplicate the behaviour you observe: I've had to make some assumptions but I believe my suspicion was correct. Here I'm piping, but it's the same as redirecting to a file and tailing that file:
./test.pl | awk '{ print strftime("%Y-%m-%d %H:%M:%S"), $0; }'
Without $| = 1, output is buffered and aggregated:
2013-10-06 23:08:27 Saluton, mondo: /home/lserni/test.sh
2013-10-06 23:08:27
2013-10-06 23:08:27 Waiting 10s...
2013-10-06 23:08:27 Saluton denove!
With the modification, each line is printed as it is generated:
2013-10-06 23:09:09 Saluton, mondo: /home/lserni/test.sh
2013-10-06 23:09:09
2013-10-06 23:09:09 Waiting 10s...
2013-10-06 23:09:19 Saluton denove!
I expect that your script is doing something that takes some seconds, and which is not generating that messagePath; and the output will be delayed until Perl has a sizeable chunk of data to send along, giving the impression that it's that line that's stalling.
I forgot: the timing pipe comes from here.

In situations like yours, I've had some success using the unbuffer command. It runs a command in an environment that looks to the command like it's outputting to a tty so it doesn't buffer its output. I don't know how to apply it exactly in your case, so if you want to try it, you will have to experiment a little.

Related

How to "tail -f" with "grep" save outputs to an another file which name is time varying?

STEP1
Like I said in title,
I would like to save output of
tail -f example | grep "DESIRED"
to different file
I have tried
tail -f example | grep "DESIRED" | tee -a different
tail -f example | grep "DESIRED" >> different
all of them not working
and I have searched similar questions and read several experts suggesting buffered
but I cannot use it.....
Is there any other way I can do it?
STEP2
once above is done, I would like to make "different" (filename from above) to time varying. I want to keep change its name in every 30minutes.
For example like
20221203133000
20221203140000
20221203143000
...
I have tried
tail -f example | grep "DESIRED" | tee -a $(date +%Y%m%d%H)$([ $(date +%M) -lt 30 ] && echo 00 || echo 30)00
The problem is since I did not even solve first step, I could not test the second step. But I think this command will only create one file based on the time I run the command,,,, Could I please get some advice?
Below code should do what you want.
Some explanations: as you want bash to execute some "code" (in your case dumping to a different file name) you might need two things running in parallel: the tail + grep, and the code that would decide where to dump.
To connect the two processes I use a name fifo (created with mkfifo) in which what is written by tail + grep (using > tmp_fifo) is read in the while loop (using < tmp_fifo). Then once in a while loop, you are free to output to whatever file name you want.
Note: without line-buffered (like in your question) grep will work, will just wait until it has more data (prob 8k) to dump to the file. So if you do not have lots of data generated in "example" it will not dump it until it is enough.
rm -rf tmp_fifo
mkfifo tmp_fifo
(tail -f input | grep --line-buffered TEXT_TO_CHECK > tmp_fifo &)
while read LINE < tmp_fifo; do
CURRENT_NAME=$(date +%Y%m%d%H)
# or any other code that determines to what file to dump ...
echo $LINE >> ${CURRENT_NAME}
done

Problems with tail -f and awk? [duplicate]

Is that possible to use grep on a continuous stream?
What I mean is sort of a tail -f <file> command, but with grep on the output in order to keep only the lines that interest me.
I've tried tail -f <file> | grep pattern but it seems that grep can only be executed once tail finishes, that is to say never.
Turn on grep's line buffering mode when using BSD grep (FreeBSD, Mac OS X etc.)
tail -f file | grep --line-buffered my_pattern
It looks like a while ago --line-buffered didn't matter for GNU grep (used on pretty much any Linux) as it flushed by default (YMMV for other Unix-likes such as SmartOS, AIX or QNX). However, as of November 2020, --line-buffered is needed (at least with GNU grep 3.5 in openSUSE, but it seems generally needed based on comments below).
I use the tail -f <file> | grep <pattern> all the time.
It will wait till grep flushes, not till it finishes (I'm using Ubuntu).
I think that your problem is that grep uses some output buffering. Try
tail -f file | stdbuf -o0 grep my_pattern
it will set output buffering mode of grep to unbuffered.
If you want to find matches in the entire file (not just the tail), and you want it to sit and wait for any new matches, this works nicely:
tail -c +0 -f <file> | grep --line-buffered <pattern>
The -c +0 flag says that the output should start 0 bytes (-c) from the beginning (+) of the file.
In most cases, you can tail -f /var/log/some.log |grep foo and it will work just fine.
If you need to use multiple greps on a running log file and you find that you get no output, you may need to stick the --line-buffered switch into your middle grep(s), like so:
tail -f /var/log/some.log | grep --line-buffered foo | grep bar
you may consider this answer as enhancement .. usually I am using
tail -F <fileName> | grep --line-buffered <pattern> -A 3 -B 5
-F is better in case of file rotate (-f will not work properly if file rotated)
-A and -B is useful to get lines just before and after the pattern occurrence .. these blocks will appeared between dashed line separators
But For me I prefer doing the following
tail -F <file> | less
this is very useful if you want to search inside streamed logs. I mean go back and forward and look deeply
Didn't see anyone offer my usual go-to for this:
less +F <file>
ctrl + c
/<search term>
<enter>
shift + f
I prefer this, because you can use ctrl + c to stop and navigate through the file whenever, and then just hit shift + f to return to the live, streaming search.
sed would be a better choice (stream editor)
tail -n0 -f <file> | sed -n '/search string/p'
and then if you wanted the tail command to exit once you found a particular string:
tail --pid=$(($BASHPID+1)) -n0 -f <file> | sed -n '/search string/{p; q}'
Obviously a bashism: $BASHPID will be the process id of the tail command. The sed command is next after tail in the pipe, so the sed process id will be $BASHPID+1.
Yes, this will actually work just fine. Grep and most Unix commands operate on streams one line at a time. Each line that comes out of tail will be analyzed and passed on if it matches.
This one command workes for me (Suse):
mail-srv:/var/log # tail -f /var/log/mail.info |grep --line-buffered LOGIN >> logins_to_mail
collecting logins to mail service
Coming some late on this question, considering this kind of work as an important part of monitoring job, here is my (not so short) answer...
Following logs using bash
1. Command tail
This command is a little more porewfull than read on already published answer
Difference between follow option tail -f and tail -F, from manpage:
-f, --follow[={name|descriptor}]
output appended data as the file grows;
...
-F same as --follow=name --retry
...
--retry
keep trying to open a file if it is inaccessible
This mean: by using -F instead of -f, tail will re-open file(s) when removed (on log rotation, for sample).
This is usefull for watching logfile over many days.
Ability of following more than one file simultaneously
I've already used:
tail -F /var/www/clients/client*/web*/log/{error,access}.log /var/log/{mail,auth}.log \
/var/log/apache2/{,ssl_,other_vhosts_}access.log \
/var/log/pure-ftpd/transfer.log
For following events through hundreds of files... (consider rest of this answer to understand how to make it readable... ;)
Using switches -n (Don't use -c for line buffering!).By default tail will show 10 last lines. This can be tunned:
tail -n 0 -F file
Will follow file, but only new lines will be printed
tail -n +0 -F file
Will print whole file before following his progression.
2. Buffer issues when piping:
If you plan to filter ouptuts, consider buffering! See -u option for sed, --line-buffered for grep, or stdbuf command:
tail -F /some/files | sed -une '/Regular Expression/p'
Is (a lot more efficient than using grep) a lot more reactive than if you does'nt use -u switch in sed command.
tail -F /some/files |
sed -une '/Regular Expression/p' |
stdbuf -i0 -o0 tee /some/resultfile
3. Recent journaling system
On recent system, instead of tail -f /var/log/syslog you have to run journalctl -xf, in near same way...
journalctl -axf | sed -une '/Regular Expression/p'
But read man page, this tool was built for log analyses!
4. Integrating this in a bash script
Colored output of two files (or more)
Here is a sample of script watching for many files, coloring ouptut differently for 1st file than others:
#!/bin/bash
tail -F "$#" |
sed -une "
/^==> /{h;};
//!{
G;
s/^\\(.*\\)\\n==>.*${1//\//\\\/}.*<==/\\o33[47m\\1\\o33[0m/;
s/^\\(.*\\)\\n==> .* <==/\\o33[47;31m\\1\\o33[0m/;
p;}"
They work fine on my host, running:
sudo ./myColoredTail /var/log/{kern.,sys}log
Interactive script
You may be watching logs for reacting on events?
Here is a little script playing some sound when some USB device appear or disappear, but same script could send mail, or any other interaction, like powering on coffe machine...
#!/bin/bash
exec {tailF}< <(tail -F /var/log/kern.log)
tailPid=$!
while :;do
read -rsn 1 -t .3 keyboard
[ "${keyboard,}" = "q" ] && break
if read -ru $tailF -t 0 _ ;then
read -ru $tailF line
case $line in
*New\ USB\ device\ found* ) play /some/sound.ogg ;;
*USB\ disconnect* ) play /some/othersound.ogg ;;
esac
printf "\r%s\e[K" "$line"
fi
done
echo
exec {tailF}<&-
kill $tailPid
You could quit by pressing Q key.
you certainly won't succeed with
tail -f /var/log/foo.log |grep --line-buffered string2search
when you use "colortail" as an alias for tail, eg. in bash
alias tail='colortail -n 30'
you can check by
type alias
if this outputs something like
tail isan alias of colortail -n 30.
then you have your culprit :)
Solution:
remove the alias with
unalias tail
ensure that you're using the 'real' tail binary by this command
type tail
which should output something like:
tail is /usr/bin/tail
and then you can run your command
tail -f foo.log |grep --line-buffered something
Good luck.
Use awk(another great bash utility) instead of grep where you dont have the line buffered option! It will continuously stream your data from tail.
this is how you use grep
tail -f <file> | grep pattern
This is how you would use awk
tail -f <file> | awk '/pattern/{print $0}'

bash standard output can not be redirected into file

I am reading 'advanced bash script', in Chapter 31, there is a problem. I can not figure it out.
tail -f /var/log/msg | grep 'error' >> logfile
Why is there nothing to output into logfile?
can you offer me an explanation?
thank you in advance
As #chepner comments, grep is using a larger buffer (perhaps 4k or more) to buffer its stdout. Most of the standard utilities do this when piping or redirecting to a file. They typically only switch to line-buffered mode when outputting directly to the terminal.
You can use the stdbuf utility to force grep to do line buffering of its output:
tail -f /var/log/msg | stdbuf -oL grep 'error' >> logfile
As an easily observable demonstration of this effect, you can try the following two commands:
for ((i=0;;i++)); do echo $i; sleep 0.001; done | grep . | cat
and
for ((i=0;;i++)); do echo $i; sleep 0.001; done | stdbuf -oL grep . | cat
In the first command, the output from grep . (i.e. match all lines) be buffered going into the pipe to cat. On mine the buffer appears to be about 4k. You will see the ascending numbers output in chunks as the buffer gets filled and then flushed.
In the second command, grep's output into the pipe to cat is line-buffered, so you should see terminal output for every line, i.e. more-or-less continuous output.

How do I get an output from Linux Top in Batch Mode on every iteration?

I'm trying to log CPU and Memory stats into a file by using top on an Arch Linux. I'm just interested in one specific process and get the wanted parameters as shown below:
top -b -n1 -p 310 | tail -fn 1 | awk '{printf "%s,%s,%s,%s\n",$1,$12,$9,$10}'
This gives me an output to command line like:
310,name,0.0,10.5
So now, if I want to run this command like 10 times with a delay of 1s and write the output to a logfile I use:
top -b -n10 -p 310 -d 1 | tail -fn 1 | awk '{printf "%s,%s,%s,%s\n",$1,$12,$9,$10}' >> log.txt
But, instead printing me line by line to the logfile, I only get the last output. So my logfile contains only 1 line, although top must have been executed 10 times.
What am I doing wrong here?
PS: Printing to command line instead into a logfile produces only 1 line (the last output) as well...
The problem is because of tail command you use. Try something like this
top -p 310-b -n2 -d 1 | grep -w 310 | awk '{printf "%s,%s,%s,%s\n",$1,$12,$9,$10}'
I use grep -w to filter the lines only containing the info you are interested

Why `read -t` is not timing out in bash on RHEL?

Why read -t doesn't time out when reading from pipe on RHEL5 or RHEL6?
Here is my example which doesn't timeout on my RHEL boxes wile reading from the pipe:
tail -f logfile.log | grep 'something' | read -t 3 variable
If I'm correct read -t 3 should timeout after 3 seconds?
Many thanks in advance.
Chris
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
The solution given by chepner should work.
An explanation why your version doesn't is simple: When you construct a pipe like yours, the data flows through the pipe from the left to the right. When your read times out however, the programs on the left side will keep running until they notice that the pipe is broken, and that happens only when they try to write to the pipe.
A simple example is this:
cat | sleep 5
After five seconds the pipe will be broken because sleep will exit, but cat will nevertheless keep running until you press return.
In your case that means, until grep produces a result, your command will keep running despite the timeout.
While not a direct answer to your specific question, you will need to run something like
read -t 3 variable < <( tail -f logfile.log | grep "something" )
in order for the newly set value of variable to be visible after the pipeline completes. See if this times out as expected.
Since you are simply using read as a way of exiting the pipeline after a fixed amount of time, you don't have to worry about the scope of variable. However, grep may find a match without printing it within your timeout due to its own internal buffering. You can disable that (with GNU grep, at least), using the --line-buffered option:
tail -f logfile.log | grep --line-buffered "something" | read -t 3
Another option, if available, is the timeout command as a replacement for the read:
timeout 3 tail -f logfile.log | grep -q --line-buffered "something"
Here, we kill tail after 3 seconds, and use the exit status of grep in the usual way.
I dont have a RHEL server to test your script right now but I could bet than read is exiting on timeout and working as it should. Try run:
grep 'something' | strace bash -c "read -t 3 variable"
and you can confirm that.

Resources