Bash exclude a line from ouput if the line matches a certain text - linux

I am looking to create a cronjob that will alert us if a certain directory has sent out a certain amount of emails from scanning a log file. The one liner I am using is:
awk '$3 ~ /^cwd/{print $3}' /var/log/exim_mainlog | sort | uniq -c | sed "s|^ *||g" | sort -nr | head --lines 5
before I get any further, I need to exclude some locations from the output, example:
50992 cwd=/var/spool/exim
21960 cwd=/home/USER1/public_html/wp-content/cache/object/000000/746
2717 cwd=/etc/csf
2063 cwd=/home/USER2
1072 cwd=/
I need to exclude:
1072 cwd=/
2717 cwd=/etc/csf
50992 cwd=/var/spool/exim
Would I need to append the output to a txt file then use SED or is there an easier method?

Pipe through grep -v to exclude matches:
egrep -v ' cwd=(/$|/etc/csf|/var/spool/exim)'

Related

Find duplicate entries in a text file using shell

I am trying to find duplicate *.sh entry mention in a text file(test.log) and delete it, using shell program. Since the path is different so uniq -u always print duplicate entry even though there are two first_prog.sh entry in a text file
cat test.log
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/first_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh
output:
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh
I tried couple of way using few command but dont have idea on how to get above output.
rev test.log | cut -f1 -d/ | rev | sort | uniq -d
Any clue on this?
You can use awk for this by splitting fields on / and using $NF (last field) in an associative array:
awk -F/ '!seen[$NF]++' test.log
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh
awk shines for these kind of tasks but here in a non awk solution,
$ sed 's|.*/|& |' file | sort -k2 -u | sed 's|/ |/|'
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh
or, if your path is balanced (the same number of parents for all files)
$ sort -t/ -k5 -u file
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh
awk '!/my_shellprog\/test\/first/' file
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh

Count lines of CLI output in linux

Hi have the following command:
lsscsi | grep HITACHI | awk '{print $6}'
I want that the output will be the number of lines of the original output.
For example, if the original output is:
/dev/sda
/dev/sdb
/dev/sdc
The final output will be 3.
Basically the command wc -l can be used to count the lines in a file or pipe. However, since you want to count the number of lines after a filter has been applied I would recommend to use grep for that:
lsscsi | grep -c 'HITACHI'
-c just prints the number of matching lines.
Another thing. In your example you are using grep .. | awk. That's a useless use of grep. It should be
lsscsi | awk '/HITACHI/{print $6}'

How to filter multiple files and eliminate duplicate entries to select a single entry while using linux shell

I have a folder that contains several files. These files consist of identical columns.
Let us say file1 and file2 have contents as follows.(Here it can be more than two files)
$cat file1.txt
9999999999|1200
8888888888|1400
7777777777|1255
6666666666|1788
7777777777|1289
9999999999|1300
$cat file2.txt
9999999999|2500
8888888888|2450
6666666666|2788
9999999999|3000
2222222222|3001
In my file 1st column is mobile number and 2nd is count. Same mobile can be there in multiple files. Now I want to get the records into a file with unique mobile numbers which has the highest count.
The output should be as follows:
$cat output.txt
7777777777|1289
8888888888|2450
6666666666|2788
9999999999|3000
2222222222|3001
Any help would be appreciated.
That's probably not very efficient but it does the job:
put this into phones.sh and run sh phones.sh
#!/bin/bash
files="
file1.txt
file2.txt
"
phones=$(cat $files | cut -d'|' -f1 | sort -u)
for phone in $phones; do grep -h $phone $files | sort -t'|' -k 2 -nr | head -n1; done | sort -t'|' -k 2
What it does is basically, extract all the phone numbers in the files, iterate over them and grep them in all files, select the one with the highest count. Then I also sorted the final result by count, which is what your expected result suggests. sort -t'|' -k 2 -nr means sort the second column given the delimiter |, by decreasing numerical order. head -n1 selects the first line. You can add other files into the files variable.
Another way of doing this is to use the power of sort and awk:
cat file1.txt file2.txt | sort -t '|' -k1,1 -k2,2nr | awk -F"|" '!_[$1]++' | sort -t '|' -k2,2n
I think the one-liner is pretty self-explanatory, except for the awk. What that part does is that it does a uniq by the first column. The last sort is just to get the final order that you wanted.

grep command not working as my expectation

I have a text file like mentioned below, and along with that I will pass an input for which I want a corresponding output.
Input file: test.txt
abc:abc_1
abcd:abcd_1
1_abcd:1_abcd_bkp
xyz:xyz_2
so if I use abc with the above test.txt file, I want abc_1; and if I pass abcd, I need abcd_1 as output.
I tried cat text.txt | grep abc | cut -d":" -f2,2, but I am getting the output
abc_1
abcd_1
1_abcd_bkp
when I want only abc_1.
With GNU grep:
grep -Po "^abc:\K.*" file
Output:
abc_1
\K keeps the text matched so far out of the overall regex match.
You want to use a regular expression with the -e switch.
In particular, regular expressions allow you to use caret (^) to express the start of a line.
Since you only care about abc when it's at the start of a line and it's followed by :, you want:
cat test.txt | grep -e "^abc:" | cut -d":" -f2,2
Output:
abc_1
awk to the rescue!
awk -F: -v key="abc" '$1==key{print $2}'
using : as the delimiter do the look up for key on field 1 to return field 2.
Or, by moving the key in the script
awk -F: '$1=="abc"{print $2}'
you can try the exclude -v:
cat text.txt | grep abc | grep -vi abc[a-z]
not sure if that would work exactly, try something with that kind of idea
Without specifying second field to be printed the whole line will be or in other cases lines.
awk -F: '/abc_/{print $2}' file
abc_1
awk -F: 'NR==1,/abc/{print $2}' file
abc_1

Extract and count value from standard .gz log files on an hourly basis

I'm trying to count the number of occurrences of a particular string from a bunch of .gz logfiles on an hourly basis. Each logfile statement starts with the following time format:
2013-11-21;09:07:23.433.
For example, to be more clear, find the count of occurrences of string "abc" between 8am to 9am, then 9am to 10am and so on. Any ideas on how to do it?
Since you just want to count occurrences, you may simply zcat the contents of the file, grep the portion that describes what you're looking for -- words/time intervals --, and finally sort/count (sort | uniq -c) the entries. The following would probably suffice:
zcat *.gz | grep <word> | grep -oP "^\d{4}-\d{2}-\d{2};\d{2}" | sort | uniq -c
The above command shall find the lines in your logfile that contains the <word> you're looking for, extract both date and hour from such entries, and later count the occurrences.
In case you don't want to take into account days/months/years, you may use:
zcat *.gz | grep <word> | grep -oP "^\d{4}-\d{2}-\d{2};\K\d{2}" | sort | uniq -c
The \K added in the grep expression is a flag for look-behind in PCRE -- Perl Compatible Regular Expression.
Try this :
zgrep -c '2013-11-21;0[89]:.*abc' file.gz
Or awk (gawk in linux) will work:
zcat *.gz | awk -F'[\.;:]' '{arr[$2]++} END{for(i in arr){print i, arr[i]} }' 2>/dev/null
the redirection is there because some awks, notably gawk, will complain about . not being a metacharacter

Resources