Grep function not stopping with head pipe - linux

So i'm currently trying to grep a single result from a random file in a specific directory. The grepping works just fine and the expected output file is populated as expected, but for some reason, even after the output file has already been filled, the process won't stop. This is the grep command where the program seems to be getting stuck.
searchFILE(){
case $2 in
pref)
echo "Populating output file: $3-$1.data.out"
dataOutputFile="$3-$1.data.out"
zgrep -a "\"someParameter\"\:\"$1\"" /folder/anotherFolder/filetemplate.log.* | zgrep -a "\"parameter2\"\:\"$3\"" | head -1 > $dataOutputFile
;;
*)
echo "Unrecognized command"
;;
esac
echo "Query finished"
}
What is currently happening is that the output file is being populated as expected with the head pipe, but for some reason I'm not getting the "Query finished" message, and the process seems not to stop at all.

grep does not know that head -n1 is no longer reading from the pipe until it attempts to write to the pipe, which it will only do if another match is found. There is no direct communication between the processes. It will eventually stop, but only once all the data is read, a second match is found and write fails with EPIPE, or some other error occurs.
You can watch this happen in a simple pipeline like this:
cat /dev/urandom | grep -ao "12[0-9]" | head -n1
With a sufficiently rare pattern, you will observe a delay between output and exit.
One solution is to change your stop condition. Instead of waiting for SIGPIPE as your pipeline does, wait for grep to match once using the -m1 option:
cat /dev/urandom | grep -ao -m1 "12[0-9]"

I saw better performance results with zcat myZippedFile | grep whatever paradigm...

The first difference you need to try is pipe with | head -z --lines=1
The reason is null terminated lines instead of newlines (just in case).
My example script below worked (drop the case statement to make it more simple). If I hold onto $1 $2 inside functions things go wrong. I use parameter $names and only use the $1 $2 $# once, because it also goes wrong for me if I don't and in any case you can then shift over $# and catch arguments. The $# in the script itself are not the same as arguments in bash functions.
grep searching for 2 or multiple parameters in any order means using grep twice; in your case zgrep | grep. The second grep is a normal grep! You only need the first grep to be zgrep to do the unzip. Your question is simpler if you drop the case statement as bash case scares off people: bash was always an ugly lady that works good for short scripts.
zgrep searches text or compressed text, but newlines in LINUX style vs WINDOWS are not the same. So use dos2unix to convert files so that newlines work. I use compressed file simply because it is strange and rare to see zgrep, so it is demonstrated in a shell script with a compressed file! It works for me. I changed a few things, like >> and "sort -u" but you can obviously change them back.
#!/usr/bin/env bash
# Search for egA AND egB using option go
# COMMAND LINE: ./zgrp egA go egB
A="$1"
cOPT="$2" # expecting case go
B="$3"
LOG="./filetemplate.log" # use parameters for long names.
# Generate some data with gzip and delete the temporary file.
echo "\"pramA\":\"$A\" \"pramB\":\"$B\"" >> $B$A.tmp
rm -f ${LOG}.A; tar czf ${LOG}.A $B$A.tmp
rm -f $B$A.tmp
# Use paramaterise $names not $1 etc because you may want to do shift etc
searchFILE()
{
outFile="$B-$A.data.out"
case $cOPT in
go) # This is zgrep | grep NOT zgrep | zgrep
zgrep -a "\"pramA\":\"$A\"" ${LOG}.* | grep -a "\"pramB\":\"$B\"" | head -z --lines=1 >> $outFile
sort -u $outFile > ${outFile}.sorted # sort unique on your output.
;;
*) echo -e "ERROR second argument must be go.\n Usage: ./zgrp egA go egB"
exit 9
;;
esac
echo -e "\n ============ Done: $0 $# Fin. ============="
}
searchFILE "$#"
cat ${outFile}.sorted

Related

How to "tail -f" with "grep" save outputs to an another file which name is time varying?

STEP1
Like I said in title,
I would like to save output of
tail -f example | grep "DESIRED"
to different file
I have tried
tail -f example | grep "DESIRED" | tee -a different
tail -f example | grep "DESIRED" >> different
all of them not working
and I have searched similar questions and read several experts suggesting buffered
but I cannot use it.....
Is there any other way I can do it?
STEP2
once above is done, I would like to make "different" (filename from above) to time varying. I want to keep change its name in every 30minutes.
For example like
20221203133000
20221203140000
20221203143000
...
I have tried
tail -f example | grep "DESIRED" | tee -a $(date +%Y%m%d%H)$([ $(date +%M) -lt 30 ] && echo 00 || echo 30)00
The problem is since I did not even solve first step, I could not test the second step. But I think this command will only create one file based on the time I run the command,,,, Could I please get some advice?
Below code should do what you want.
Some explanations: as you want bash to execute some "code" (in your case dumping to a different file name) you might need two things running in parallel: the tail + grep, and the code that would decide where to dump.
To connect the two processes I use a name fifo (created with mkfifo) in which what is written by tail + grep (using > tmp_fifo) is read in the while loop (using < tmp_fifo). Then once in a while loop, you are free to output to whatever file name you want.
Note: without line-buffered (like in your question) grep will work, will just wait until it has more data (prob 8k) to dump to the file. So if you do not have lots of data generated in "example" it will not dump it until it is enough.
rm -rf tmp_fifo
mkfifo tmp_fifo
(tail -f input | grep --line-buffered TEXT_TO_CHECK > tmp_fifo &)
while read LINE < tmp_fifo; do
CURRENT_NAME=$(date +%Y%m%d%H)
# or any other code that determines to what file to dump ...
echo $LINE >> ${CURRENT_NAME}
done

Grep into variable and maintain stdout?

I've got a long running process and I want to capture a tiny bit of data from the big swath of output.
I can do this by piping it through grep, but then I can't watch it spew out all the other info.
I basically want grep to save what it finds into a variable and leave stdout alone. How can I do that?
With tee, process substitution, and I/O redirection:
{ var=$(cmd | tee >(grep regexp) >&3); } 3>&1
There isn't any way to use a variable like that. It sounds like you want to write your own filter:
./some_long_program | tee >(my_do_stuff.sh)
Then, my_do_stuff.sh is:
#!/bin/bash
while read line; do
echo "$line" | grep -q 'pattern' || continue
VARIABLE=$line # Now it's in a variable
done
If you have the storage space, this is probably more like what you want:
./some_long_program | tee /tmp/slp.log
Then, simply:
grep 'pattern' /tmp/slp.log && VARIABLE=true
or:
VARIABLE=$(grep 'pattern' /tmp/slp.log)
This will let you run the grep at any time. I don't think the variable really adds anything though.
EDIT:
#mpen Based on your last answer above, it sounds like you want to use xargs. Try:
(echo 1 ; sleep 5 ; echo 2) | xargs -L1 echo got:
The -L1 will run the command for every instance found, otherwise it grabs lots of stdin and passes them all (up to some maximum) to the command at once. You'll still want to use this via tee if you want to see all the command output as well:
./some_long_program | tee >(grep 'pattern' | xargs -L1 ./my_do_stuff.sh)
I think this can be done a little more simply if we tee to /dev/tty:
❯ BAR=$(echo foobar | tee /dev/tty | grep -Pom1 b..); echo "[$BAR]"
foobar
[bar]

Problems with tail -f and awk? [duplicate]

Is that possible to use grep on a continuous stream?
What I mean is sort of a tail -f <file> command, but with grep on the output in order to keep only the lines that interest me.
I've tried tail -f <file> | grep pattern but it seems that grep can only be executed once tail finishes, that is to say never.
Turn on grep's line buffering mode when using BSD grep (FreeBSD, Mac OS X etc.)
tail -f file | grep --line-buffered my_pattern
It looks like a while ago --line-buffered didn't matter for GNU grep (used on pretty much any Linux) as it flushed by default (YMMV for other Unix-likes such as SmartOS, AIX or QNX). However, as of November 2020, --line-buffered is needed (at least with GNU grep 3.5 in openSUSE, but it seems generally needed based on comments below).
I use the tail -f <file> | grep <pattern> all the time.
It will wait till grep flushes, not till it finishes (I'm using Ubuntu).
I think that your problem is that grep uses some output buffering. Try
tail -f file | stdbuf -o0 grep my_pattern
it will set output buffering mode of grep to unbuffered.
If you want to find matches in the entire file (not just the tail), and you want it to sit and wait for any new matches, this works nicely:
tail -c +0 -f <file> | grep --line-buffered <pattern>
The -c +0 flag says that the output should start 0 bytes (-c) from the beginning (+) of the file.
In most cases, you can tail -f /var/log/some.log |grep foo and it will work just fine.
If you need to use multiple greps on a running log file and you find that you get no output, you may need to stick the --line-buffered switch into your middle grep(s), like so:
tail -f /var/log/some.log | grep --line-buffered foo | grep bar
you may consider this answer as enhancement .. usually I am using
tail -F <fileName> | grep --line-buffered <pattern> -A 3 -B 5
-F is better in case of file rotate (-f will not work properly if file rotated)
-A and -B is useful to get lines just before and after the pattern occurrence .. these blocks will appeared between dashed line separators
But For me I prefer doing the following
tail -F <file> | less
this is very useful if you want to search inside streamed logs. I mean go back and forward and look deeply
Didn't see anyone offer my usual go-to for this:
less +F <file>
ctrl + c
/<search term>
<enter>
shift + f
I prefer this, because you can use ctrl + c to stop and navigate through the file whenever, and then just hit shift + f to return to the live, streaming search.
sed would be a better choice (stream editor)
tail -n0 -f <file> | sed -n '/search string/p'
and then if you wanted the tail command to exit once you found a particular string:
tail --pid=$(($BASHPID+1)) -n0 -f <file> | sed -n '/search string/{p; q}'
Obviously a bashism: $BASHPID will be the process id of the tail command. The sed command is next after tail in the pipe, so the sed process id will be $BASHPID+1.
Yes, this will actually work just fine. Grep and most Unix commands operate on streams one line at a time. Each line that comes out of tail will be analyzed and passed on if it matches.
This one command workes for me (Suse):
mail-srv:/var/log # tail -f /var/log/mail.info |grep --line-buffered LOGIN >> logins_to_mail
collecting logins to mail service
Coming some late on this question, considering this kind of work as an important part of monitoring job, here is my (not so short) answer...
Following logs using bash
1. Command tail
This command is a little more porewfull than read on already published answer
Difference between follow option tail -f and tail -F, from manpage:
-f, --follow[={name|descriptor}]
output appended data as the file grows;
...
-F same as --follow=name --retry
...
--retry
keep trying to open a file if it is inaccessible
This mean: by using -F instead of -f, tail will re-open file(s) when removed (on log rotation, for sample).
This is usefull for watching logfile over many days.
Ability of following more than one file simultaneously
I've already used:
tail -F /var/www/clients/client*/web*/log/{error,access}.log /var/log/{mail,auth}.log \
/var/log/apache2/{,ssl_,other_vhosts_}access.log \
/var/log/pure-ftpd/transfer.log
For following events through hundreds of files... (consider rest of this answer to understand how to make it readable... ;)
Using switches -n (Don't use -c for line buffering!).By default tail will show 10 last lines. This can be tunned:
tail -n 0 -F file
Will follow file, but only new lines will be printed
tail -n +0 -F file
Will print whole file before following his progression.
2. Buffer issues when piping:
If you plan to filter ouptuts, consider buffering! See -u option for sed, --line-buffered for grep, or stdbuf command:
tail -F /some/files | sed -une '/Regular Expression/p'
Is (a lot more efficient than using grep) a lot more reactive than if you does'nt use -u switch in sed command.
tail -F /some/files |
sed -une '/Regular Expression/p' |
stdbuf -i0 -o0 tee /some/resultfile
3. Recent journaling system
On recent system, instead of tail -f /var/log/syslog you have to run journalctl -xf, in near same way...
journalctl -axf | sed -une '/Regular Expression/p'
But read man page, this tool was built for log analyses!
4. Integrating this in a bash script
Colored output of two files (or more)
Here is a sample of script watching for many files, coloring ouptut differently for 1st file than others:
#!/bin/bash
tail -F "$#" |
sed -une "
/^==> /{h;};
//!{
G;
s/^\\(.*\\)\\n==>.*${1//\//\\\/}.*<==/\\o33[47m\\1\\o33[0m/;
s/^\\(.*\\)\\n==> .* <==/\\o33[47;31m\\1\\o33[0m/;
p;}"
They work fine on my host, running:
sudo ./myColoredTail /var/log/{kern.,sys}log
Interactive script
You may be watching logs for reacting on events?
Here is a little script playing some sound when some USB device appear or disappear, but same script could send mail, or any other interaction, like powering on coffe machine...
#!/bin/bash
exec {tailF}< <(tail -F /var/log/kern.log)
tailPid=$!
while :;do
read -rsn 1 -t .3 keyboard
[ "${keyboard,}" = "q" ] && break
if read -ru $tailF -t 0 _ ;then
read -ru $tailF line
case $line in
*New\ USB\ device\ found* ) play /some/sound.ogg ;;
*USB\ disconnect* ) play /some/othersound.ogg ;;
esac
printf "\r%s\e[K" "$line"
fi
done
echo
exec {tailF}<&-
kill $tailPid
You could quit by pressing Q key.
you certainly won't succeed with
tail -f /var/log/foo.log |grep --line-buffered string2search
when you use "colortail" as an alias for tail, eg. in bash
alias tail='colortail -n 30'
you can check by
type alias
if this outputs something like
tail isan alias of colortail -n 30.
then you have your culprit :)
Solution:
remove the alias with
unalias tail
ensure that you're using the 'real' tail binary by this command
type tail
which should output something like:
tail is /usr/bin/tail
and then you can run your command
tail -f foo.log |grep --line-buffered something
Good luck.
Use awk(another great bash utility) instead of grep where you dont have the line buffered option! It will continuously stream your data from tail.
this is how you use grep
tail -f <file> | grep pattern
This is how you would use awk
tail -f <file> | awk '/pattern/{print $0}'

Bash - Piping output of command into while loop

I'm writing a Bash script where I need to look through the output of a command and do certain actions based on that output. For clarity, this command will output a few million lines of text and it may take roughly an hour or so to do so.
Currently, I'm executing the command and piping it into a while loop that reads a line at a time then looks for certain criteria. If that criterion exists, then update a .dat file and reprint the screen. Below is a snippet of the script.
eval "$command"| while read line ; do
if grep -Fq "Specific :: Criterion"; then
#pull the sixth word from the line which will have the data I need
temp=$(echo "$line" | awk '{ printf $6 }')
#sanity check the data
echo "\$line = $line"
echo "\$temp = $temp"
#then push $temp through a case statement that does what I need it to do.
fi
done
So here's the problem, the sanity check on the data is showing weird results. It is printing lines that don't contain the grep criteria.
To make sure that my grep statement is working properly, I grep the log file that contains a record of the text that is output by the command and it outputs only the lines that contain the specified criteria.
I'm still fairly new to Bash so I'm not sure what's going on. Could it be that the command is force feeding the while loop a new $line before it can process the $line that met the grep criteria?
Any ideas would be much appreciated!
How does grep know what line looks like?
if ( printf '%s\n' "$line" | grep -Fq "Specific :: Criterion"); then
But I cant help feel like you are overcomplicating a lot.
function process() {
echo "I can do anything I want"
echo " per element $1"
echo " that I want here"
}
export -f process
$command | grep -F "Specific :: Criterion" | awk '{print $6}' | xargs -I % -n 1 bash -c "process %";
Run the command, filter only matching lines, and pull the sixth element. Then if you need to run an arbitrary code on it, send it to a function (you export to make it visible in subprocesses) via xargs.
What are you applying the grep on ?
Modify
if grep -Fq "Specific :: Criterion"; then
as below
if ( echo $line | grep -Fq "Specific :: Criterion" ); then

Backticks can't handle pipes in variable

I have a problem with one script in bash with CAT command.
This works:
#!/bin/bash
fil="| grep LSmonitor";
log="/var/log/sys.log ";
lines=`cat $log | grep LSmonitor | wc -l`;
echo $lines;
Output: 139
This does not:
#!/bin/bash
fil="| grep LSmonitor";
log="/var/log/sys.log ";
string="cat $log $fil | wc -l";
echo $string;
`$string`;
Output:
cat /var/log/sys.log | grep LSmonitor | wc -l
cat: opcion invalida -- 'l'
Pruebe 'cat --help' para mas informacion.
$fil is a parameter in this example static, but in real script, parameter is get from html form POST, and if I print I can see that the content of $fil is correct.
In this case, since you're building a pipeline as a string, you would need:
eval "$string"
But DON'T DO THIS!!!! -- someone can easily enter the filter
; rm -rf *
and then you're hosed.
If you want a regex-based filter, get the user to just enter the regex, and then you'll do:
grep "$fil" "$log" | wc -l
Firstly, allow me to say that this sounds like a really bad idea:
[…] in real script, parameter is get from html form POST, […]
You should not be allowing the content of POST requests to be run by your shell. This is a massive attack vector, and whatever mechanisms you have in place to try to protect it are probably not as effective as you think.
Secondly, | inside variables are not treated as special. This isn't specific to backticks. Parameter expansion (e.g., replacing $fil with | grep LSmonitor) happens after the command is parsed and mostly processed. There's a little bit of post-processing that's done on the results of parameter expansion (including "word splitting", which is why $fil is equivalent to the three arguments '|' grep LSmonitor rather than to the single argument '| grep LSmonitor'), but nothing as dramatic as you describe. So, for example, this:
pipe='|'
echo $pipe cat
prints this:
| cat
Since your use-case is so frightening, I'm half-tempted to not explain how you can do what you want — I think you'll be better off not doing this — but since Stack Overflow answers are intended to be useful for more people than just the original poster, an example of how you can do this is below. I encourage the OP not to read on.
fil='| grep LSmonitor'
log=/var/log/sys.log
string="cat $log $fil | wc -l"
lines="$(eval "$string")"
echo "$lines"
Try using eval (taken from https://stackoverflow.com/a/11531431/2687324).
It looks like it's interpreting | as a string, not a pipe, so when it reaches -l, it treats it as if you're trying to pass in -l to cat instead of wc.
The other answers outline why you shouldn't do it this way.
grep LSmonitor /var/log/syslog | wc -l will do what you're looking for.

Resources