Buffering problem when piping output between CLI programs

Buffering problem when piping output between CLI programs - linux

I'm trying to tail apache error logs through a few filters.
This works perfectly:
tail -fn0 /var/log/apache2/error.log | egrep -v "PHP Notice|File does not exist"
but there are some literal "\n" in the output which I want to replace with an actual new line so I pipe into perl:
tail -fn0 /var/log/apache2/error.log | egrep -v "PHP Notice|File does not exist" | perl -ne 's/\\n/\n/g; print"$_"'
This seems to have some caching issue (first page hit produces nothing, second page hit and two loads of debugging info comes out), It also seems a bit tempramental.
So I tried sed:
tail -fn0 /var/log/apache2/error.log | egrep -v "PHP Notice|File does not exist" | sed 's/\\n/\n/g'
which seems to suffer the same problem.

Correct, when you use most programs to file or pipe they buffer output. You can control this in some cases: the GNU grep family accepts the --line-buffered option, specifically for use in pipelines like this. Also, in Perl you can use $| = 1; for the same effect. (sed doesn't have any such option that I'm aware of.)
It's the stuff at the beginning or middle of the pipeline that will be buffering, not the end (which is talking to your terminal so it will be line buffered) so you want to use egrep --line-buffered.

Looks like you can use -u for sed as in:
tail -f myLog | sed -u "s/\(joelog\)/^[[46;1m\1^[[0m/g" | sed -u 's/\\n/\n/g'
which tails the log, highlights 'joelog', and then adds linebreaks where there are '\n'
source:
http://www-01.ibm.com/support/docview.wss?uid=isg1IZ42070

Related

Loop to filter out lines from apache log files

I have several apache access files that I would like to clean up a bit before I analyze them. I am trying to use grep in the following way:
grep -v term_to_grep apache_access_log
I have several terms that I want to grep, so I am piping every grep action as follow:
grep -v term_to_grep_1 apache_access_log | grep -v term_to_grep_2 | grep -v term_to_grep_3 | grep -v term_to_grep_n > apache_access_log_cleaned
Until here my rudimentary script works as expected! But I have many apache access logs, and I don't want to do that for every file. I have started to write a bash script but so far I couldn't make it work. This is my try:
for logs in ./access_logs/*;
do
cat $logs | grep -v term_to_grep | grep -v term_to_grep_2 | grep -v term_to_grep_3 | grep -v term_to_grep_n > $logs_clean
done;
Could anyone point me out what I am doing wrong?

If you have a variable and you append _clean to its name, that's a new variable, and not the value of the old one with _clean appended. To fix that, use curly braces:
$ var=file.log
$ echo "<$var>"
<file.log>
$ echo "<$var_clean>"
<>
$ echo "<${var}_clean>"
<file.log_clean>
Without it, your pipeline tries to redirect to the empty string, which results in an error. Note that "$file"_clean would also work.
As for your pipeline, you could combine that into a single grep command:
grep -Ev 'term_to_grep|term_to_grep_2|term_to_grep_3|term_to_grep_n' "$logs" > "${logs}_clean"
No cat needed, only a single invocation of grep.
Or you could stick all your terms into a file:
$ cat excludes
term_to_grep_1
term_to_grep_2
term_to_grep_3
term_to_grep_n
and then use the -f option:
grep -vf excludes "$logs" > "${logs}_clean"
If your terms are strings and not regular expressions, you might be able to speed this up by using -F ("fixed strings"):
grep -vFf excludes "$logs" > "${logs}_clean"
I think GNU grep checks that for you on its own, though.

You are looping over several files, but in your loop you constantly overwrite your result file, so it will only contain the last result from the last file.
You don't need a loop, use this instead:
egrep -v 'term_to_grep|term_to_grep_2|term_to_grep_3' ./access_logs/* > "$logs_clean"
Note, it is always helpful to start a Bash script with set -eEuCo pipefail. This catches most common errors -- it would have stopped with an error when you tried to clobber the $logs_clean file.

Problems with tail -f and awk? [duplicate]

Is that possible to use grep on a continuous stream?
What I mean is sort of a tail -f <file> command, but with grep on the output in order to keep only the lines that interest me.
I've tried tail -f <file> | grep pattern but it seems that grep can only be executed once tail finishes, that is to say never.

Turn on grep's line buffering mode when using BSD grep (FreeBSD, Mac OS X etc.)
tail -f file | grep --line-buffered my_pattern
It looks like a while ago --line-buffered didn't matter for GNU grep (used on pretty much any Linux) as it flushed by default (YMMV for other Unix-likes such as SmartOS, AIX or QNX). However, as of November 2020, --line-buffered is needed (at least with GNU grep 3.5 in openSUSE, but it seems generally needed based on comments below).

I use the tail -f <file> | grep <pattern> all the time.
It will wait till grep flushes, not till it finishes (I'm using Ubuntu).

I think that your problem is that grep uses some output buffering. Try
tail -f file | stdbuf -o0 grep my_pattern
it will set output buffering mode of grep to unbuffered.

If you want to find matches in the entire file (not just the tail), and you want it to sit and wait for any new matches, this works nicely:
tail -c +0 -f <file> | grep --line-buffered <pattern>
The -c +0 flag says that the output should start 0 bytes (-c) from the beginning (+) of the file.

In most cases, you can tail -f /var/log/some.log |grep foo and it will work just fine.
If you need to use multiple greps on a running log file and you find that you get no output, you may need to stick the --line-buffered switch into your middle grep(s), like so:
tail -f /var/log/some.log | grep --line-buffered foo | grep bar

you may consider this answer as enhancement .. usually I am using
tail -F <fileName> | grep --line-buffered <pattern> -A 3 -B 5
-F is better in case of file rotate (-f will not work properly if file rotated)
-A and -B is useful to get lines just before and after the pattern occurrence .. these blocks will appeared between dashed line separators
But For me I prefer doing the following
tail -F <file> | less
this is very useful if you want to search inside streamed logs. I mean go back and forward and look deeply

Didn't see anyone offer my usual go-to for this:
less +F <file>
ctrl + c
/<search term>
<enter>
shift + f
I prefer this, because you can use ctrl + c to stop and navigate through the file whenever, and then just hit shift + f to return to the live, streaming search.

sed would be a better choice (stream editor)
tail -n0 -f <file> | sed -n '/search string/p'
and then if you wanted the tail command to exit once you found a particular string:
tail --pid=$(($BASHPID+1)) -n0 -f <file> | sed -n '/search string/{p; q}'
Obviously a bashism: $BASHPID will be the process id of the tail command. The sed command is next after tail in the pipe, so the sed process id will be $BASHPID+1.

Yes, this will actually work just fine. Grep and most Unix commands operate on streams one line at a time. Each line that comes out of tail will be analyzed and passed on if it matches.

This one command workes for me (Suse):
mail-srv:/var/log # tail -f /var/log/mail.info |grep --line-buffered LOGIN >> logins_to_mail
collecting logins to mail service

Coming some late on this question, considering this kind of work as an important part of monitoring job, here is my (not so short) answer...
Following logs using bash
1. Command tail
This command is a little more porewfull than read on already published answer
Difference between follow option tail -f and tail -F, from manpage:
-f, --follow[={name|descriptor}]
output appended data as the file grows;
...
-F same as --follow=name --retry
...
--retry
keep trying to open a file if it is inaccessible
This mean: by using -F instead of -f, tail will re-open file(s) when removed (on log rotation, for sample).
This is usefull for watching logfile over many days.
Ability of following more than one file simultaneously
I've already used:
tail -F /var/www/clients/client*/web*/log/{error,access}.log /var/log/{mail,auth}.log \
/var/log/apache2/{,ssl_,other_vhosts_}access.log \
/var/log/pure-ftpd/transfer.log
For following events through hundreds of files... (consider rest of this answer to understand how to make it readable... ;)
Using switches -n (Don't use -c for line buffering!).By default tail will show 10 last lines. This can be tunned:
tail -n 0 -F file
Will follow file, but only new lines will be printed
tail -n +0 -F file
Will print whole file before following his progression.
2. Buffer issues when piping:
If you plan to filter ouptuts, consider buffering! See -u option for sed, --line-buffered for grep, or stdbuf command:
tail -F /some/files | sed -une '/Regular Expression/p'
Is (a lot more efficient than using grep) a lot more reactive than if you does'nt use -u switch in sed command.
tail -F /some/files |
sed -une '/Regular Expression/p' |
stdbuf -i0 -o0 tee /some/resultfile
3. Recent journaling system
On recent system, instead of tail -f /var/log/syslog you have to run journalctl -xf, in near same way...
journalctl -axf | sed -une '/Regular Expression/p'
But read man page, this tool was built for log analyses!
4. Integrating this in a bash script
Colored output of two files (or more)
Here is a sample of script watching for many files, coloring ouptut differently for 1st file than others:
#!/bin/bash
tail -F "$#" |
sed -une "
/^==> /{h;};
//!{
G;
s/^\\(.*\\)\\n==>.*${1//\//\\\/}.*<==/\\o33[47m\\1\\o33[0m/;
s/^\\(.*\\)\\n==> .* <==/\\o33[47;31m\\1\\o33[0m/;
p;}"
They work fine on my host, running:
sudo ./myColoredTail /var/log/{kern.,sys}log
Interactive script
You may be watching logs for reacting on events?
Here is a little script playing some sound when some USB device appear or disappear, but same script could send mail, or any other interaction, like powering on coffe machine...
#!/bin/bash
exec {tailF}< <(tail -F /var/log/kern.log)
tailPid=$!
while :;do
read -rsn 1 -t .3 keyboard
[ "${keyboard,}" = "q" ] && break
if read -ru $tailF -t 0 _ ;then
read -ru $tailF line
case $line in
*New\ USB\ device\ found* ) play /some/sound.ogg ;;
*USB\ disconnect* ) play /some/othersound.ogg ;;
esac
printf "\r%s\e[K" "$line"
fi
done
echo
exec {tailF}<&-
kill $tailPid
You could quit by pressing Q key.

you certainly won't succeed with
tail -f /var/log/foo.log |grep --line-buffered string2search
when you use "colortail" as an alias for tail, eg. in bash
alias tail='colortail -n 30'
you can check by
type alias
if this outputs something like
tail isan alias of colortail -n 30.
then you have your culprit :)
Solution:
remove the alias with
unalias tail
ensure that you're using the 'real' tail binary by this command
type tail
which should output something like:
tail is /usr/bin/tail
and then you can run your command
tail -f foo.log |grep --line-buffered something
Good luck.

Use awk(another great bash utility) instead of grep where you dont have the line buffered option! It will continuously stream your data from tail.
this is how you use grep
tail -f <file> | grep pattern
This is how you would use awk
tail -f <file> | awk '/pattern/{print $0}'

Multiple grep piping (include+exclude) results in not showing anything

Currently I'm trying to tail a log but only showing the lines that has some keywords. Currently I'm using
tail -F file.log | grep -ie 'error\|fatal\|exception\|shutdown\|started'
and I'm getting the expected results: (for example)
10:22 This is an error
10:23 RuntimeException: uncaught problem
I also want to exclude lines that contain a <DATATAG>, even if the keywords slipped into it, because it contains a lot of binary data that clutters my log. I'm then trying to add to the pipe another grep that excludes the tag:
tail -F file.log | grep -ie 'error\|fatal\|exception\|shutdown\|started' | grep -vF '<DATATAG>'
However, this time no lines appear, not even the previous ones that has 'error'/'exception' but not <DATATAG>. When I tried the excluding grep alone:
tail -F file.log | grep -vF '<DATATAG>'
all lines appear, including those that have 'error'/'exception'.
Am I doing something wrong?

Your problem is one of buffering. grep is a tricky tool when it comes to that. From the man page:
By default, output is line buffered when standard output is a terminal and block buffered otherwise.
In your example, the first grep is buffering at the block level, so it will not turn an output to the 2nd grep for a while. The solution is to use the --line-buffered option to look like:
tail -F file.log | grep --line-buffered -ie 'error\|fatal\|exception\|shutdown\|started' | grep -vF '<DATATAG>'

What is this Bash (and/or other shell?) construct called?

What is the construct in bash called where you can take wrap a command that outputs to stdout, such that the output itself is treated like a stream? In case I'm not describing that so well, maybe an example will do best, and this is what I typically use it for: applying diff to output that does not come from a file, but from other commands, where
cmd
is wrapped as
<(cmd)
By wrapping a command in such a manner, in the example below I determine that there a difference of one between the two commands that I am running, and then I am able to determine that one precise difference. What is the construct/technique of wrapping a command as <(cmd) called? Thanks
[builder#george v6.5 html]$ git status | egrep modified | awk '{print $3}' | wc -l
51
[builder#george v6.5 html]$ git status | egrep modified | awk '{print $3}' | xargs grep -l 'Ext\.define' | wc -l
50
[builder#george v6.5 html]$ diff <(git status | egrep modified | awk '{print $3}') <(git status | egrep modified | awk '{print $3}' | xargs grep -l 'Ext\.define')
39d38
< javascript/reports/report_initiator.js
ADDENDUM
The revised command using the advice for using git's ls-file should be as follows (untested):
diff <(git ls-files -m) <(git ls-files -m | xargs grep -l 'Ext\.define')

It is called process substitution.

This is called Process Substitution

This is process substitution, as you have been told. I'd just like to point out that this also works in the other direction. Process substitution with >(cmd) allows you to take a command that writes to a file and instead have that output redirected to another command's stdin. It's very useful for inserting something into a pipeline that takes an output filename as an argument. You don't see it as much because pretty much every standard command will write to stdout already, but I have used it often with custom stuff. Here is a contrived example:
$ echo "hello world" | tee >(wc)
hello world
1 2 12

Why no output is shown when using grep twice?

Basically I'm wondering why this doesn't output anything:
tail --follow=name file.txt | grep something | grep something_else
You can assume that it should produce output I have run another line to confirm
cat file.txt | grep something | grep something_else
It seems like you can't pipe the output of tail more than once!? Anyone know what the deal is and is there a solution?
EDIT:
To answer the questions so far, the file definitely has contents that should be displayed by the grep. As evidence if the grep is done like so:
tail --follow=name file.txt | grep something
Output shows up correctly, but if this is used instead:
tail --follow=name file.txt | grep something | grep something
No output is shown.
If at all helpful I am running ubuntu 10.04

You might also run into a problem with grep buffering when inside a pipe.
ie, you don't see the output from
tail --follow=name file.txt | grep something > output.txt
since grep will buffer its own output.
Use the --line-buffered switch for grep to work around this:
tail --follow=name file.txt | grep --line-buffered something > output.txt
This is useful if you want to get the results of the follow into the output.txt file as rapidly as possible.

Figured out what was going on here. It turns out that the command is working it's just that the output takes a long time to reach the console (approx 120 seconds in my case). This is because the buffer on the standard out is not written each line but rather each block. So instead of getting every line from the file as it was being written I would get a giant block every 2 minutes or so.
It should be noted that this works correctly:
tail file.txt | grep something | grep something
It is the following of the file with --follow=name that is problematic.
For my purposes I found a way around it, what I was intending to do was capture the output of the first grep to a file, so the command would be:
tail --follow=name file.txt | grep something > output.txt
A way around this is to use the script command like so:
script -c 'tail --follow=name file.txt | grep something' output.txt
Script captures the output of the command and writes it to file, thus avoiding the second pipe.
This has effectively worked around the issue for me, and I have explained why the command wasn't working as I expected, problem solved.
FYI, These other stackoverflow questions are related:
Trick an application into thinking its stdin is interactive, not a pipe
Force another program's standard output to be unbuffered using Python

You do know that tail starts by default with the last ten lines of the file? My guess is everything the cat version found is well into the past. Try tail -n+1 --follow=name file.txt to start from the beginning of the file.

works for me on Mac without --follow=name
bash-3.2$ tail delme.txt | grep po
position.bin
position.lrn
bash-3.2$ tail delme.txt | grep po | grep lr
position.lrn

grep pattern filename | grep pattern | grep pattern | grep pattern ......

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Buffering problem when piping output between CLI programs - linux

Looks like you can use -u for sed as in: tail -f myLog | sed -u "s/\(joelog\)/^[[46;1m\1^[[0m/g" | sed -u 's/\\n/\n/g' which tails the log, highlights 'joelog', and then adds linebreaks where there are '\n' source: http://www-01.ibm.com/support/docview.wss?uid=isg1IZ42070

Related

Loop to filter out lines from apache log files

Problems with tail -f and awk? [duplicate]

Multiple grep piping (include+exclude) results in not showing anything

What is this Bash (and/or other shell?) construct called?

Why no output is shown when using grep twice?

Categories

Resources