What's an alternative to echo grep in parsing a running log? - linux

I'm currently figuring my way around a bash script (sorry, can't use other languages like perl) to keep track of a running log during a server startup. Basically, I have to trigger certain events depending on whether or not i run into certain strings or patterns while the log is being written. Currently, i have this code:
LOG=path_to_logfile
LINE1="[1-9][0-9]* some string"
LINE2="another string"
LINE3="third string"
tail -fn0 $LOG | \
while read line
do
echo $line | grep "$LINE1" || echo $line | grep "$LINE2" || echo $line | grep "$LINE3"
if [ $? = 0 ]
then
TMP=<echo line above>
... bunch of conditional statements...
fi
done
However, this is kinda slow; by the time the line i need to track is detected by the echo/grep combinations using or, it's waaaay after the server already started up. What's a good alternative to the above? I've read awk should be used but when i tried writing it in awk, either i wrote it wrong or the processing was also taking too much time to finish.
Any help will be appreciated. Thanks!

Rather than calling grep (potentially several times) on each line, let bash do the regular expression matching.
LOG=path_to_logfile
LINE1="[1-9][0-9]* some string"
LINE2="another string"
LINE3="third string"
tail -fn0 $LOG | while read line
do
if [[ $line =~ $LINE1|$LINE2|$LINE3 ]]; then
TMP=<echo line above>
... bunch of conditional statements...
fi
done

I'd try something like this instead:
tail -fn0 $LOG | egrep "$LINE1|$LINE2|$LINE3" | \
while read TMP
do
...
done
That way, the while read loop, which at a guess is going to be the slowest part of this whole operation, is only invoked when egrep actually finds a matching line in the input log.

You can have multiple match statements, which are ORed together to see if the line matches:
tail -f -n0 "$LOG" | grep -e "$LINE1" -e "$LINE2" -e "$LINE3" | while IFS= read -r line
do
# Do something with each matching $line
done

Related

Grep function not stopping with head pipe

So i'm currently trying to grep a single result from a random file in a specific directory. The grepping works just fine and the expected output file is populated as expected, but for some reason, even after the output file has already been filled, the process won't stop. This is the grep command where the program seems to be getting stuck.
searchFILE(){
case $2 in
pref)
echo "Populating output file: $3-$1.data.out"
dataOutputFile="$3-$1.data.out"
zgrep -a "\"someParameter\"\:\"$1\"" /folder/anotherFolder/filetemplate.log.* | zgrep -a "\"parameter2\"\:\"$3\"" | head -1 > $dataOutputFile
;;
*)
echo "Unrecognized command"
;;
esac
echo "Query finished"
}
What is currently happening is that the output file is being populated as expected with the head pipe, but for some reason I'm not getting the "Query finished" message, and the process seems not to stop at all.
grep does not know that head -n1 is no longer reading from the pipe until it attempts to write to the pipe, which it will only do if another match is found. There is no direct communication between the processes. It will eventually stop, but only once all the data is read, a second match is found and write fails with EPIPE, or some other error occurs.
You can watch this happen in a simple pipeline like this:
cat /dev/urandom | grep -ao "12[0-9]" | head -n1
With a sufficiently rare pattern, you will observe a delay between output and exit.
One solution is to change your stop condition. Instead of waiting for SIGPIPE as your pipeline does, wait for grep to match once using the -m1 option:
cat /dev/urandom | grep -ao -m1 "12[0-9]"
I saw better performance results with zcat myZippedFile | grep whatever paradigm...
The first difference you need to try is pipe with | head -z --lines=1
The reason is null terminated lines instead of newlines (just in case).
My example script below worked (drop the case statement to make it more simple). If I hold onto $1 $2 inside functions things go wrong. I use parameter $names and only use the $1 $2 $# once, because it also goes wrong for me if I don't and in any case you can then shift over $# and catch arguments. The $# in the script itself are not the same as arguments in bash functions.
grep searching for 2 or multiple parameters in any order means using grep twice; in your case zgrep | grep. The second grep is a normal grep! You only need the first grep to be zgrep to do the unzip. Your question is simpler if you drop the case statement as bash case scares off people: bash was always an ugly lady that works good for short scripts.
zgrep searches text or compressed text, but newlines in LINUX style vs WINDOWS are not the same. So use dos2unix to convert files so that newlines work. I use compressed file simply because it is strange and rare to see zgrep, so it is demonstrated in a shell script with a compressed file! It works for me. I changed a few things, like >> and "sort -u" but you can obviously change them back.
#!/usr/bin/env bash
# Search for egA AND egB using option go
# COMMAND LINE: ./zgrp egA go egB
A="$1"
cOPT="$2" # expecting case go
B="$3"
LOG="./filetemplate.log" # use parameters for long names.
# Generate some data with gzip and delete the temporary file.
echo "\"pramA\":\"$A\" \"pramB\":\"$B\"" >> $B$A.tmp
rm -f ${LOG}.A; tar czf ${LOG}.A $B$A.tmp
rm -f $B$A.tmp
# Use paramaterise $names not $1 etc because you may want to do shift etc
searchFILE()
{
outFile="$B-$A.data.out"
case $cOPT in
go) # This is zgrep | grep NOT zgrep | zgrep
zgrep -a "\"pramA\":\"$A\"" ${LOG}.* | grep -a "\"pramB\":\"$B\"" | head -z --lines=1 >> $outFile
sort -u $outFile > ${outFile}.sorted # sort unique on your output.
;;
*) echo -e "ERROR second argument must be go.\n Usage: ./zgrp egA go egB"
exit 9
;;
esac
echo -e "\n ============ Done: $0 $# Fin. ============="
}
searchFILE "$#"
cat ${outFile}.sorted

Bash - Piping output of command into while loop

I'm writing a Bash script where I need to look through the output of a command and do certain actions based on that output. For clarity, this command will output a few million lines of text and it may take roughly an hour or so to do so.
Currently, I'm executing the command and piping it into a while loop that reads a line at a time then looks for certain criteria. If that criterion exists, then update a .dat file and reprint the screen. Below is a snippet of the script.
eval "$command"| while read line ; do
if grep -Fq "Specific :: Criterion"; then
#pull the sixth word from the line which will have the data I need
temp=$(echo "$line" | awk '{ printf $6 }')
#sanity check the data
echo "\$line = $line"
echo "\$temp = $temp"
#then push $temp through a case statement that does what I need it to do.
fi
done
So here's the problem, the sanity check on the data is showing weird results. It is printing lines that don't contain the grep criteria.
To make sure that my grep statement is working properly, I grep the log file that contains a record of the text that is output by the command and it outputs only the lines that contain the specified criteria.
I'm still fairly new to Bash so I'm not sure what's going on. Could it be that the command is force feeding the while loop a new $line before it can process the $line that met the grep criteria?
Any ideas would be much appreciated!
How does grep know what line looks like?
if ( printf '%s\n' "$line" | grep -Fq "Specific :: Criterion"); then
But I cant help feel like you are overcomplicating a lot.
function process() {
echo "I can do anything I want"
echo " per element $1"
echo " that I want here"
}
export -f process
$command | grep -F "Specific :: Criterion" | awk '{print $6}' | xargs -I % -n 1 bash -c "process %";
Run the command, filter only matching lines, and pull the sixth element. Then if you need to run an arbitrary code on it, send it to a function (you export to make it visible in subprocesses) via xargs.
What are you applying the grep on ?
Modify
if grep -Fq "Specific :: Criterion"; then
as below
if ( echo $line | grep -Fq "Specific :: Criterion" ); then

Why is my shell command working at the prompt, but not as a bash script?

New to bash scripting. I'm getting pretty familiar with shell scripting pretty well. I wrote this text transform script for a feed for a client. And extracts the url's I want, and the titles of articles. Awesome.
echo $(var=$(curl -L website.com/news)) |
grep -Po '<h3 class="article-link"><a href="\K[^<]+' <<< $var |
result=$(sed 's/"/\n/g' | sed 's/ \//\n\//g' | sed 's/>//g') ; let this=0 ; echo "$result" | while read line ; do if ((this % 2 == 0 )) ; then echo website.com/news$line ; else echo $line ; fi ; let this+=1 ; done
When I try to extract it to a file and run it with bash OR sh myThing.sh, it doesn't work at all. The only thing that echo's is 'webiste.com/news', when I try to echo $this, all I get is 1. What am I doing wrong?
#!/bin/bash
echo $(var=$(curl -L website.com/news)) |
grep -Po '<h3 class="article-link"><a href="\K[^<]+' <<< $var |
result=$(sed 's/"/\n/g' | sed 's/ \//\n\//g' | sed 's/>//g')
let this=0
echo "$result" | while read line
do
if ((this % 2 == 0 ))
then
echo website.com/news$line
else
echo $line
fi
let this+=1
done
edit:
#!/bin/bash
var=$(curl -L linux.com/news)
select=$(grep -Po '<h3 class="article-list__title"><a href="\K[^<]+' <<< $var)
result=$(sed 's/"/\n/g' | sed 's/ \//\n\//g' | sed 's/>//g')
let this=0
echo "$result" | while read line
do
if ((this % 2 == 0 ))
then
echo website.com/news$line
else
echo $line
fi
let this+=1
done
This answer solves the OP's specific problem, but to address the question "Why is my shell command working at the prompt, but not as a bash script?" generally, Etan Reisner provides an excellent answer in the comments:
"You are either not running that exact command or it "works" because you have shell state that is affecting things in ways you take to be "working" and your script doesn't have that state. Try launching an entirely new shell session and see if that command, on its own, works for you there."
echo $(var=...) will assign a value to variable $var, but will not output anything, so the echo command will simply print a newline.
Furthermore, because the assignment to $var happens inside $(...) (a command substitution), it is confined to the subshell that the command inside the substitution ran in, so $var will not be defined in the calling shell.
(A subshell is a child process that contains a duplicate of the current shell's environment, without being able to modify the current shell's environment).
More generally, you cannot meaningfully define variables inside a pipeline - they will neither be visible to other pipeline segments, nor after the pipeline finishes.[1]
The only reason your [original] command could ever have worked is if $var had a preexisting value in your shell.
In fact, given that you provide input to grep via a here-string (<<<), the first segment of your pipeline (echo ...) is entirely ignored.
To pass the output of curl through the pipeline to grep and then to sed, no intermediate variables are needed at all.
Furthermore, your sed command is lacking input: you probably meant to feed it $var in your first attempt, and $select in the 2nd (your 2nd attempt came close to a correct solution).
What you were probably ultimately looking for:
result=$(curl -L website.com/news |
grep -Po '<h3 class="article-link"><a href="\K[^<]+' |
sed 's/"/\n/g' | sed 's/ \//\n\//g' | sed 's/>//g')
# ... processing of "$result"
Some additional notes:
You could combine the 3 sed calls into a single one.
You could feed the pipeline output directly into your while loop, without the need for intermediate variable $result.
You should generally double-quote variable references (e.g., use "$line" instead of $line to protect them from interpretation by the shell (word-splitting, globbing).
let this+=1 is better expressed as (( ++this )) in modern Bash.
This answer of mine contains links to resources for learning about bash.
[1] All commands involved in a pipeline by default run in a subshell in bash, so they all see copies of the parent shell's variables. Bash 4.2+ offers the lastpipe option (off by default) to allow you to create variables in the current shell instead of in a subshell, by running the last pipeline segment (only) in the current shell instead of in a subshell, to facilitate scenarios such as ... | while read -r line ... and have $line continue to exist after the pipeline finishes.
Note that this still doesn't enable defining a variable in an earlier pipeline segment in the hopes that a later segment will see it - this can never work, because the commands that make up a pipeline are launched at the same time, and it is only through coordination of the input and output streams that effective left-to-right processing happens.
This line is totally wrong. You are attempting to pass thru pipes the standard output of each process when none of them ever prints anything except standard error.
echo $(var=$(curl -L website.com/news)) | grep -Po '<h3 class="article-link"><a href="\K[^<]+' <<< $var | result=$(sed 's/"/\n/g' | sed 's/ \//\n\//g' | sed 's/>//g')
I'll break down what I believe you are attempting to do.
echo $(var=$(curl -: website.com/news))
The above code will only print the standard error, which is a separate stream than standard output. The standard output is assigned to $var. However you are attempting to pass the standard output to the next process which is nothing but a newline at this time.
grep -Po '<h3 class="article-link"><a href="\K[^<]+' <<< $var
The here-string <<< takes precedence over pipe. But variable $var is lost as it was defined inside a sub-shell and not in the parent shell. Thanks to #mklement0.
The proper way to accomplish all this is to not use $var. All you wanted is the value stored in $result.
result=$(curl -L website.com/news | grep -Po '<h3 class="article-link"><a href="\K[^<]+'| sed 's/"/\n/g' | sed 's/ \//\n\//g' | sed 's/>//g')
I don't intend to optimize your script. This is more of a suggested solution. A more comprehensive answer to your question Why is my shell command working at the prompt, but not as a bash script? is answered by mklement0 here.

passing grep into a variable in bash

I have a file named email.txt like these one :
Subject:My test
From:my email <myemail#gmail.com>
this is third test
I want to take out only the email address in this file by using bash script.So i put this script in my bash script named myscript:
#!/bin/bash
file=$(myscript)
var1=$(awk 'NR==2' $file)
var2=$("$var1" | (grep -Eio '\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b'))
echo $var2
But I failed to run this script.When I run this command manually in bash i can obtain the email address:
echo $var1 | grep -Eio '\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b'
I need to put the email address to store in a variable so i can use it in other function.Can someone show me how to solve this problem?
Thanks.
I think this is an overly complicated way to go about things, but if you just want to get your script to work, try this:
#!/bin/bash
file="email.txt"
var1=$(awk 'NR==2' $file)
var2=$(echo "$var1" | grep -Eio '\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b')
echo $var2
I'm not sure what file=$(myscript) was supposed to do, but on the next line you want a file name as argument to awk, so you should just assign email.txt as a string value to file, not execute a command called myscript. $var1 isn't a command (it's just a line from your text file), so you have to echo it to give grep anything useful to work with. The additional parentheses around grep are redundant.
What is happening is this:
var2=$("$var1" | (grep -Eio '\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b'))
^^^^^^^ Execute the program named (what is in variable var1).
You need to do something like this:
var2=$(echo "$var1" | grep -Eio '\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b')
or even
var2=$(awk 'NR==2' $file | grep -Eio '\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b')
There are very helpful flags for bash: -xv
The line with
var2=$("$var1" | (grep...
should be
var2=$(echo "$var1" | (grep...
Also my version of grep doesn't have -o flag.
And, as far as grep patterns are "greedy" even as the following code runs, it's output is not exactly what you want.
#!/bin/bash -xv
file=test.txt
var1=$(awk 'NR==2' $file)
var2=$(echo "$var1" | (grep -Ei '\b[A-Z0-9._%+-]+#[A-Z0-9.-]+.[A-Z]{2,4}\b'))
echo $var2
Use Bash parameter expansion,
var2="${var1#*:}"
There's a cruder way:
cat $file | grep # | tr '<>' '\012\012' | grep #
That is, extract the line(s) with # signs, turn the angle brackets into newlines, then grep again for anything left with an # sign.
Refine as needed...

Shellscript to monitor a log file if keyword triggers then execute a command?

Is there a cheap way to monitor a log file like tail -f log.txt, then if something like [error] appears, execute a command?
Thank you.
tail -fn0 logfile | \
while read line ; do
echo "$line" | grep "pattern"
if [ $? = 0 ]
then
... do something ...
fi
done
I also found that you can use awk to monitor for pattern and perform some action when pattern is found:
tail -fn0 logfile | awk '/pattern/ { print | "command" }'
This will execute command when pattern is found in the log. Command can be any unix command including shell scripts or anything else.
An even more robust approach is monit. This tool can monitor very many things, but one of them is that it will easily tail one or more logs, match against regex and then trigger a script. This is particularly useful if you have a collection of log files to watch or more than one event to trigger.
Better and simple:
tail -f log.txt | egrep -m 1 "error"
echo "Found error, do sth."
...

Resources