Loop to filter out lines from apache log files

Loop to filter out lines from apache log files - linux

I have several apache access files that I would like to clean up a bit before I analyze them. I am trying to use grep in the following way:
grep -v term_to_grep apache_access_log
I have several terms that I want to grep, so I am piping every grep action as follow:
grep -v term_to_grep_1 apache_access_log | grep -v term_to_grep_2 | grep -v term_to_grep_3 | grep -v term_to_grep_n > apache_access_log_cleaned
Until here my rudimentary script works as expected! But I have many apache access logs, and I don't want to do that for every file. I have started to write a bash script but so far I couldn't make it work. This is my try:
for logs in ./access_logs/*;
do
cat $logs | grep -v term_to_grep | grep -v term_to_grep_2 | grep -v term_to_grep_3 | grep -v term_to_grep_n > $logs_clean
done;
Could anyone point me out what I am doing wrong?

If you have a variable and you append _clean to its name, that's a new variable, and not the value of the old one with _clean appended. To fix that, use curly braces:
$ var=file.log
$ echo "<$var>"
<file.log>
$ echo "<$var_clean>"
<>
$ echo "<${var}_clean>"
<file.log_clean>
Without it, your pipeline tries to redirect to the empty string, which results in an error. Note that "$file"_clean would also work.
As for your pipeline, you could combine that into a single grep command:
grep -Ev 'term_to_grep|term_to_grep_2|term_to_grep_3|term_to_grep_n' "$logs" > "${logs}_clean"
No cat needed, only a single invocation of grep.
Or you could stick all your terms into a file:
$ cat excludes
term_to_grep_1
term_to_grep_2
term_to_grep_3
term_to_grep_n
and then use the -f option:
grep -vf excludes "$logs" > "${logs}_clean"
If your terms are strings and not regular expressions, you might be able to speed this up by using -F ("fixed strings"):
grep -vFf excludes "$logs" > "${logs}_clean"
I think GNU grep checks that for you on its own, though.

You are looping over several files, but in your loop you constantly overwrite your result file, so it will only contain the last result from the last file.
You don't need a loop, use this instead:
egrep -v 'term_to_grep|term_to_grep_2|term_to_grep_3' ./access_logs/* > "$logs_clean"
Note, it is always helpful to start a Bash script with set -eEuCo pipefail. This catches most common errors -- it would have stopped with an error when you tried to clobber the $logs_clean file.

Related

How to grep while excluding some words?

I wanted to grep the word "force" but most of the output listed is from the command -force.
When I did grep -v "-force" filename , it says grep : orce most probably because of the -f command.
I just want to find a force signal from files using grep. How?

use grep -v -- "-force" - the double - signals that there are no more options being expected.

If you want to grep specific word from file then we can use cat command
# cat filename.txt | grep force
For other basic Commands

this line maybe simpler:
grep '[^-]force' tmp
it says: grep "force", but only if it does not has a prefix - by using [^]. See some simple regular expression examples here.

Use [-] to remove the special significance. Check this out:
> cat rand_file.txt
1. list items of random text
2. -force
3. look similar as the first batch
4. force
5. some random text
> grep -v "-force" rand_file.txt
grep: orce: No such file or directory
> grep -v "[-]force" rand_file.txt | grep force
4. force
>

grep 2 words at if statements in Bash

I am trying to see if my nohup file contains the words that I am looking for. If it does, then I need to put that into tmp file.
So I am currently using:
if grep -q "Started|missing" $DIR3/$dirName/nohup.out
then
grep -E "Started|missing" "$DIR3/$dirName/nohup.out" > tmp
fi
But it never goes into the if statement even if there are words that I am looking for.
How can I fix this?

Since basic sed uses BRE, regex alternation operator is represented by \| . | matches a literal | symbol. And you don't need to touch | symbol in the grep which uses ERE.
if grep -q "Started\|missing" $DIR3/$dirName/nohup.out

You should use egrep instead of grep (Avinash Raj has explained that in other words already in his answer).
I would generally recommend using egrep as a default for everyday use (even though many expressions only contain the basic regular expression syntax). From a practical point the standard grep is only interesting for performance reasons.
Details about the advantages of grep vs. egrep can be found in that superuser question.

When you only put the grep results into the tmp-file, you do not want to grep the file twice.
You can not use
egrep "Started|missing" $DIR3/$dirName/nohup.out > tmp
since that would create an empty tmp file when nothing is found.
You can remove empty files with if [ ! -s tmp ] or use another solution:
Redirectong the grep results without grepping again can be done with
rm -f tmp 2>/dev/null
egrep "Started|missing" $DIR3/$dirName/nohup.out | while read -r strange_line; do
echo "${strange_line}" >> tmp
done

How do I grep multiple lines (output from another command) at the same time?

I have a Linux driver running in the background that is able to return the current system data/stats. I view the data by running a console utility (let's call it dump-data) in a console. All data is dumped every time I run dump-data. The output of the utility is like below
Output:
- A=reading1
- B=reading2
- C=reading3
- D=reading4
- E=reading5
...
- variableX=readingX
...
The list of readings returned by the utility can be really long. Depending on the scenario, certain readings would be useful while everything else would be useless.
I need a way to grep only the useful readings whose names might have have nothing in common (via a bash script). I.e. Sometimes I'll need to collect A,D,E; and other times I'll need C,D,E.
I'm attempting to graph the readings over time to look for trends, so I can't run something like this:
# forgive my pseudocode
Loop
dump-data | grep A
dump-data | grep D
dump-data | grep E
End Loop
to collect A,D,E as that would actually give me readings from 3 separate calls of dump-data as that would not be accurate.

If you want to save all result of grep in the same file, you can just join all expressions in one:
grep -E 'expr1|expr2|expr3'
But if you want to have results (for expr1, expr2 and expr3) in separate files, things are getting more interesting.
You can do this using tee >(command).
For example, here I process the same pipe with thre different commands:
$ echo abc | tee >(sed s/a/_a_/ > file1) | tee >(sed s/b/_b_/ > file2) | sed s/c/_c_/ > file3
$ grep "" file[123]
file1:_a_bc
file2:a_b_c
file3:ab_c_
But the command seems to be too complex.
I would better save dump-data results to a file and then grep it.
TEMP=$(mktemp /tmp/dump-data-XXXXXXXX)
dump-data > ${TEMP}
grep A ${TEMP}
grep B ${TEMP}
grep C ${TEMP}

You can use dump-data | grep -E "A|D|E". Note the -E option of grep. Alternatively you could use egrep without the -E option.

you can simply use:
dump-data | grep -E 'A|D|E'

awk '/MY PATTERN/{print > "matches-"FILENAME;}' myfile{1,3}
thx Guru at Stack Exchange

Multiple greps in pipeline not terminating after completion

I'm seem to be having a problem with a simple grep statement not finishing/terminating after it's been completed.
For example:
grep -v -E 'syslogd [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}: restart' |
grep -v 'printStats: total reads from cache:' /var/log/customlog.log >\
/tmp/filtered_log.tmp
The above statement will strip out the contents and save them into a temp file, however after the grep finishes processing the entire file, the shell script hangs and cannot proceed anymore. This behavior is also triggered when manually running the command within the command line. Essentially combining multiple grep statements causes a PAGER like action (more/less).
Does anyone have any suggestions to overcome this limitation? Ideally I wouldn't want to do the following giving that the customlog.log file might get huge at times.
cat /var/log/customlog.log |
grep -v -E 'syslogd [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}: restart' |
grep -v 'printStats: total reads from cache:' > /tmp/filtered_log.tmp
Thanks,
Tony

As explained above, you need to move here your file name:
grep -v -E \
'syslogd [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}: restart' /var/log/customlog.log
| grep -v 'printStats: total reads from cache:' > /tmp/filtered_log.tmp
But you can also combine the two greps:
grep -v -E \
-e 'syslogd [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}: restart' \
-e 'printStats: total reads from cache:' /var/log/customlog.log > \
/tmp/filtered_log.tmp
Saves a bit of CPU and will fix your error at the same time.
BTW, another possible issue: What if two instances of this script are run at the same time? Both will be using the same temp file. This probably isn't an issue in this particular case, but you might as well get used to developing scripts for that situation. I recommend that you use $$ to put the process ID in your temporary file:
tempFileName="/tmp/filtered_log.$$.tmp"
grep -v -E -e [blah... blah.. blah] /var/log/customlog.log > $tempFileName
Now, if two different people are running this process, you won't get them using the same temp file.
Appended
As pointed out by uwe-kleine-konig, you're actually better off using mktemp:
tempFileName=$(mktemp filtered_log.XXXXX)
grep -v -E -e [blah... blah.. blah] /var/log/customlog.log > $tempFileName
Thanks for the suggestion.

Building grep strings dynamically

I am writing a shell script to do a "tail" on a growing log file. This script accepts parameters to search for, and incrementally greps the output on them.
For example, if the script is invoked as follows:
logs.sh string1 string2
it should translate to:
tail -f logs.txt | grep string1 | grep string2
I am building the list of grep strings like this:
full_grep_string=""
for grep_string in $*
do
full_grep_string="$full_grep_string | grep $grep_string"
done
The string is built correctly, but when I try to finally tag it to the tail command, like so...
tail -f $LOG_FILE_PATH $full_grep_string
...the grep does not apply, and I get the unfiltered logs.
Am I missing something here? Or is there an easier way to do this?

eval tail -f $LOG_FILE_PATH $full_grep_string

grep buffers the line it found. So modifying your code to
full_grep_string="$full_grep_string | grep --line-buffered $grep_string"
shall work. I tested it on debian lenny (with bash).
And use the tip of an0
eval tail -f ...
(All this works for whole words)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Loop to filter out lines from apache log files - linux

Related

How to grep while excluding some words?

grep 2 words at if statements in Bash

How do I grep multiple lines (output from another command) at the same time?

Multiple greps in pipeline not terminating after completion

Building grep strings dynamically

Categories

Resources