Find all lines with keyword and extract number - linux

I would like to find line which starts from word: "ERRORS" and exctract number from that line.
Part of file:
...
[ERROR] No keywords and test cases defined in file
File path: libraries_instances.robot
TEST SUITES SUMMARY:
ERRORS: 148
WARNINGS: 89
CS VIOLATIONS: 201
My solution is:
grep ERRORS .validation.log | grep -o -E '[0-9]+'
is it possible to make it better and use only one grep?
Finally I would like to assign that value to variable in my bash script.

Since linux tag is present in question, assuming GNU grep with -P option is available
$ grep -oP 'ERRORS.*\h\K\d+' .validation.log
148
ERRORS.*\h\K here the \K option helps to mark the starting point of regex.. string matched up to this point won't be part of output
also note that man grep warns about using -P as experimental, but I haven't faced any issue so far.. (see https://debbugs.gnu.org/cgi/pkgreport.cgi?package=grep for known GNU grep issues)
Alternate solution using awk
$ awk '/ERRORS:/ && NF==2{print $NF}' .validation.log
148
/ERRORS:/ && NF==2 match line containing ERRORS: and has only two fields (by default, one or more contiguous whitespace is field delimiter)
print $NF print the last field

Related

grep and cut a specific pattern [duplicate]

Is there a way to make grep output "words" from files that match the search expression?
If I want to find all the instances of, say, "th" in a number of files, I can do:
grep "th" *
but the output will be something like (bold is by me);
some-text-file : the cat sat on the mat
some-other-text-file : the quick brown fox
yet-another-text-file : i hope this explains it thoroughly
What I want it to output, using the same search, is:
the
the
the
this
thoroughly
Is this possible using grep? Or using another combination of tools?
Try grep -o:
grep -oh "\w*th\w*" *
Edit: matching from Phil's comment.
From the docs:
-h, --no-filename
Suppress the prefixing of file names on output. This is the default
when there is only one file (or only standard input) to search.
-o, --only-matching
Print only the matched (non-empty) parts of a matching line,
with each such part on a separate output line.
Cross distribution safe answer (including windows minGW?)
grep -h "[[:alpha:]]*th[[:alpha:]]*" 'filename' | tr ' ' '\n' | grep -h "[[:alpha:]]*th[[:alpha:]]*"
If you're using older versions of grep (like 2.4.2) which do not include the -o option, then use the above. Else use the simpler to maintain version below.
Linux cross distribution safe answer
grep -oh "[[:alpha:]]*th[[:alpha:]]*" 'filename'
To summarize: -oh outputs the regular expression matches to the file content (and not its filename), just like how you would expect a regular expression to work in vim/etc... What word or regular expression you would be searching for then, is up to you! As long as you remain with POSIX and not perl syntax (refer below)
More from the manual for grep
-o Print each match, but only the match, not the entire line.
-h Never print filename headers (i.e. filenames) with output lines.
-w The expression is searched for as a word (as if surrounded by
`[[:<:]]' and `[[:>:]]';
The reason why the original answer does not work for everyone
The usage of \w varies from platform to platform, as it's an extended "perl" syntax. As such, those grep installations that are limited to work with POSIX character classes use [[:alpha:]] and not its perl equivalent of \w. See the Wikipedia page on regular expression for more
Ultimately, the POSIX answer above will be a lot more reliable regardless of platform (being the original) for grep
As for support of grep without -o option, the first grep outputs the relevant lines, the tr splits the spaces to new lines, the final grep filters only for the respective lines.
(PS: I know most platforms by now would have been patched for \w.... but there are always those that lag behind)
Credit for the "-o" workaround from #AdamRosenfield answer
It's more simple than you think. Try this:
egrep -wo 'th.[a-z]*' filename.txt #### (Case Sensitive)
egrep -iwo 'th.[a-z]*' filename.txt ### (Case Insensitive)
Where,
egrep: Grep will work with extended regular expression.
w : Matches only word/words instead of substring.
o : Display only matched pattern instead of whole line.
i : If u want to ignore case sensitivity.
You could translate spaces to newlines and then grep, e.g.:
cat * | tr ' ' '\n' | grep th
Just awk, no need combination of tools.
# awk '{for(i=1;i<=NF;i++){if($i~/^th/){print $i}}}' file
the
the
the
this
thoroughly
grep command for only matching and perl
grep -o -P 'th.*? ' filename
I was unsatisfied with awk's hard to remember syntax but I liked the idea of using one utility to do this.
It seems like ack (or ack-grep if you use Ubuntu) can do this easily:
# ack-grep -ho "\bth.*?\b" *
the
the
the
this
thoroughly
If you omit the -h flag you get:
# ack-grep -o "\bth.*?\b" *
some-other-text-file
1:the
some-text-file
1:the
the
yet-another-text-file
1:this
thoroughly
As a bonus, you can use the --output flag to do this for more complex searches with just about the easiest syntax I've found:
# echo "bug: 1, id: 5, time: 12/27/2010" > test-file
# ack-grep -ho "bug: (\d*), id: (\d*), time: (.*)" --output '$1, $2, $3' test-file
1, 5, 12/27/2010
cat *-text-file | grep -Eio "th[a-z]+"
You can also try pcregrep. There is also a -w option in grep, but in some cases it doesn't work as expected.
From Wikipedia:
cat fruitlist.txt
apple
apples
pineapple
apple-
apple-fruit
fruit-apple
grep -w apple fruitlist.txt
apple
apple-
apple-fruit
fruit-apple
I had a similar problem, looking for grep/pattern regex and the "matched pattern found" as output.
At the end I used egrep (same regex on grep -e or -G didn't give me the same result of egrep) with the option -o
so, I think that could be something similar to (I'm NOT a regex Master) :
egrep -o "the*|this{1}|thoroughly{1}" filename
To search all the words with start with "icon-" the following command works perfect. I am using Ack here which is similar to grep but with better options and nice formatting.
ack -oh --type=html "\w*icon-\w*" | sort | uniq
You could pipe your grep output into Perl like this:
grep "th" * | perl -n -e'while(/(\w*th\w*)/g) {print "$1\n"}'
grep --color -o -E "Begin.{0,}?End" file.txt
? - Match as few as possible until the End
Tested on macos terminal
$ grep -w
Excerpt from grep man page:
-w: Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character.
ripgrep
Here are the example using ripgrep:
rg -o "(\w+)?th(\w+)?"
It'll match all words matching th.

How to grep for a matching word, not the surrounding line, with a wildcard?

Maybe an odd question, but I'm attempting to grep the output of a command to select just the matching word and not the line. This word also has a wildcard in it.
git log --format=%aD <file> | tail -1 | grep -oh 201
The first and second sections of the command check the git log for a file and grabs the line pertaining to the date and time of creation. I'm attempting to write a bash script that does something with the year it was created, so I need to grab just that one word (the year).
Looking at the grep documentation, -o specifically prints the matching word (and -h suppresses filenames). I can't find anything that allows for matching the rest of the word that it's matching, though (I could just be spacing).
So the output of that previous command is:
201
And I need it to be (as an example):
2017
Help would be much appreciated!
You can use . as a wildcard character:
$ echo 'before2017after' | grep -o '201.'
2017
Or, better yet, specify that the fourth character be a digit:
$ echo 'before2017after' | grep -o '201[[:digit:]]'
2017
Notes:
Since you are getting input from stdin, there are no filenames. Consequently, in this case, -h changes nothing.
[[:digit:]] is a unicode-safe way of specifying a digit.

Shell - How to find a word at a certain point in a message

I want to change my command:
anzahl=`cat $1 | grep -i "error" | wc -l`
This command also counts messages which are like this:
2017-07-15 03:07:02,746 [INFO] blabla:123 #blabla:123 - rhsmd started. Error.
But there is the word Info. So I dont want that it counts.
I just want messages like this:
2017-07-15 06:12:45,362 [ERROR] blabla:123 #blabla:123- Either the consumer is not registered or the certificates are corrupted. Certificate update using daemon failed.
Some tips how I can do this?
Generally you want:
anzahl=$(grep -c '\[ERROR\]' "$1")
This would search for the literal string [ERROR] in the logfile, -c returns the number of matches which makes wc -l superfluous.
Anyhow this would still match [ERROR] at any position of the strings. While this should be good enough in most cases, more precise would be this awk command:
anzahl=$(awk '$3=="[ERROR]"{c++}END{print c}' "$1")
This command would check if [ERROR] appears exactly in the third column of a line and counts those lines. At the end of input it prints the count.
Btw, German variable names doesn't suit for an international audience as on Stackoverflow. I recommend to use English variable names: count
If you don't actually want a regular expression but really just want to count a string, there are grep options for that:
-c, --count
Suppress normal output; instead print a count of matching lines
for each input file.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by new-
lines, any of which is to be matched.
So your command should be:
anzahl=$(grep -c -F '[ERROR]' "$1")
Of course, even that string might appear some place other than the third whitespace-delimited field of the line. If you want to stick with grep rather than switching to a tool like awk for your counting, you can do so by going back to what is perhaps an awkward regular expression:
anzahl=$(grep -c -E '^[^ ]+ [^ ]+ [[]ERROR[]]' "$1")
This uses grep's -E option to specify that you're using an Extended regular expression. The expression consists of two strings of not-space, each followed by a space, all of which is followed by your error tag.

how to grep range of numbers

in a text file I have the following entries:
10.1.0.10-15
10.1.0.20-25
10.1.0.30-35
10.1.0.40-45
I would like to print 10.1.0.10,15, 20, 25,30
cat file | grep 10.1.0.[1,2,3][0.5] -- prints 10,15,20,25,30, 35.
How do I suppress 35?
I do not want to use grep -v .35 ...just want to print specific IPs or #s.
You can use:
grep -E '10\.1\.0\.([12][05]|30)' file
However awk will be more readable:
awk -F '[.-]' '$4%5 == 0 && $4 >= 10 && $4 <= 30' file
10.1.0.10-15
10.1.0.20-25
10.1.0.30-35
Note that the , and . in the character classes are not needed — in fact, they match data that you don't want the pattern to match. Also, the . outside the character classes match any character (digit, letter, or . as you intend) — you need to escape them with a backslash so that they only match an actual ..
Also, you are making Useless Use of cat (UUoC) errors; grep can perfectly well read from a file.
As to what to do, probably use:
grep -E '10\.1\.0\.([12][05]|30)' file
This uses the extended regular expressions (formerly for egrep, now grep -E). It also avoids the dots from matching any character.
I'm not sure if what you want is just printing the first two IPs, excluding that one with 35. In that case cat file | grep '10.1.0.[1-3]0.[15|25]' does the job.
Remember that you can use conditional expressions such as | to help you.

How to do something like grep -B to select only one line?

Everything is in the title. Basicaly let's say I have this pattern
some text lalala
another line
much funny wow grep
I grep funny and I want my output to be "lalala"
Thank you
One possible answer is to use either ed or ex to do this (it is trivial in them):
ed - yourfile <<< 'g/funny/.-2p'
(Or replace ed with ex. You might have red, the restricted editor, too; it can't modify files.) This looks for the pattern /funny/ globally, and whenever it is found, prints the line 2 before the matching line (that's the .-2p part). Or, if you want the most recent line containing 'lalala' before the line matching 'funny':
ed - yourfile <<< 'g/funny/?lalala?p'
The only problem is if you're trying to process standard input rather than a file; then you have to save the standard input to a file and process that file, which spoils the concurrency.
You can't do negative offsets in sed (though GNU sed allows you to do positive offsets, so you could use sed -n '/lalala/,+2p' file to get the 'lalala' to 'funny' lines (which isn't quite what you want) based on finding 'lalala', but you cannot find the 'lalala' lines based on finding 'funny'). Standard sed does not allow offsets at all.
If you need to print just the IP address found on a line 8 lines before the pattern-matching line, you need a slightly more involved ed script, but it is still doable:
ed - yourfile <<< 'g/funny/.-8s/.* //p'
This uses the same basic mechanism to find the right line, then runs a substitute command to remove everything up to the last space on the line and print the modified version. Since there isn't a w command, it doesn't actually modify the file.
Since grep -B only prints each full number of lines before the match, you'll have to pipe the output into something like grep or Awk.
grep -B 2 "funny" file|awk 'NR==1{print $NF; exit}'
You could also just use Awk.
awk -v s="funny" '/[[:space:]]lalala$/{n=NR+2; o=$NF}NR==n && $0~s{print o}' file
For the specific example of an IP address 8 lines before the match as mentioned in your comment:
awk -v s="funny" '
/[[:space:]][0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$/ {
n=NR+8
ip=$NF
}
NR==n && $0~s {
print ip
}' file
These Awk solutions first find the output field you might want, then print the output only if the word you want exists in the nth following line.
Here's an attempt at a slightly generalized Awk solution. It maintains a circular queue of the last q lines and prints the line at the head of the queue when it sees a match.
#!/bin/sh
: ${q=8}
e=$1
shift
awk -v q="$q" -v e="$e" '{ m[(NR%q)+1] = $0 }
$0 ~ e { print m[((NR+1)%q)+1] }' "${#--}"
Adapting to a different default (I set it to 8) or proper option handling (currently, you'd run it like q=3 ./qgrep regex file) as well as remembering (and hence printing) the entire line should be easy enough.
(I also didn't bother to make it work correctly if you see a match in the first q-1 lines. It will just print an empty line then.)

Resources