why ' grep -w -e "[SVC]" *.* ' doesn't return anything? - linux

why grep -w -e "[SVC]" *.* (with upper case characters) doesn't return anything but when I use lower case characters
grep -w -e "[svc]" *.* it returns the result as expected?

Well, the first thing you have to ascertain is whether there are actually any words in the file made up of a single uppercase letter drawn from the set {S, V, C, P}.
The -w flag will restrict you to words of that form so it would, for example, find P.x. It would not find ValidParentheses. The detail can be found in the manpage:
The -w option selects only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character.

Related

grep obtains pattern from a file but printing not only the whole match word

I've got file.txt to extract lines containing the exact words listed in check.txt file.
# file.txt
CA1C 2637 green
CA1C-S1 2561 green
CA1C-S2 2371 green
# check.txt
CA1C
I tried
grep -wFf check.txt file.txt
but I'm not getting the desired output, i.e. all the three lines were printed.
Instead, I'd like to get only the first line,
CA1C 2637 green
I searched and found this post being relevant, it's easy to do it when doing only one word matching. But how can I improve my code to let grep obtain patterns from check.txt file and print only the whole word matched lines?
A lot of thanks!
The man page for grep says the following about the -w switch:
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.
In your case, all three lines start with "CA1C-", which meets the conditions of being at the beginning of the line, and being followed by a non-word constituent character (the hyphen).
I would do this with a loop, reading lines manually from check.txt:
cat check.txt | while read line; do grep "^$line " file.txt; done
CA1C 2637 green
This loop reads the lines from check.txt, and searches for each one at the start of a line in file.txt, with a following space.
There may be a better way to do this, but I couldn't get -f to actually consider whitespace at the end of a line of the input file.

Removing number of dots with grep using regular expression

How can I remove lines that contain more than 5 "." or less than 5 dots (simply put: 5 dots per line?
How can I write a regex that will detect it in bash using grep?
INPUT:
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
yGEtfWYBCBKtvxTbHxwK,126.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json
EXPECTED OUTPUT:
yGEtfWYBCBKtvxTbHxwK,176.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json
Tried:
grep -P '[.]{5}' stuff.txt
grep -P '[\.]{5}' stuff.txt
grep -P '([\.]{5})' stuff.txt
grep -P '\.{5}' stuff.txt
grep -E '([\.]{5}' stuff.txt
You can display only the lines that contain exactly 5 dots as follow :
grep '^[^.]*\.[^.]*\.[^.]*\.[^.]*\.[^.]*\.[^.]*$' stuff.txt
or if you want to factor it :
grep -E '^([^.]*\.){5}[^.]*$' stuff.txt
Using -ERE in this second one is helpful to avoid having to escape the \(\) and \{\}, in the first one grep's default BRE regex flavour is sufficient.
^ and $ are anchors representing respectively the start and end of the line that make sure we match the whole line and not just a part of it that contains 5 dots.
[^.] is a negated character class that will match anything but a dot.
They are quantified with * so that any number of non-dot characters can happen between each dot (you might want to change that to + if consecutive dots shouldn't be matched).
\. matches a literal dot (rather than any character, which the meta-character . outside of a character class would).
To detect specifically the bad IP address
Can you be certain that the IP address is always surrounded by commas and does not contain spaces - i.e. is never the first or last field?
Then, you might get away with:
grep -E ',\w+((\.\w+){2,3}|(\.\w+){5,}),'
If not, it is quite difficult to distinguish between a broken IP form with spaces and an ordinary sentence, so you might have to specify the column.
Using Perl one-liner to print only if number of "." exceeds 5
> cat five_dots.txt
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
yGEtfWYBCBKtvxTbHxwK,126.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json
> perl -ne '{ while(/\./g){$count++} print if $count > 5; $count=0 } ' five_dots.txt
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
>

Why does "grep -w" match strings ending with "." or "$"? [duplicate]

This question already has an answer here:
The meaning of 'word' in Grep
(1 answer)
Closed 6 years ago.
1.) I am using Debian 8.4 on a virtual box and lets say I have a text file name sample.txt containing..
Linux.
Linux$
Then I ran the command grep -w Linux sample.txt and the output was
Linux.
Linux$
So I was wondering why it match those lines since I specified the -w option which is supposed to match the exact string only?
Both $ and . are non-word constituent characters, so -w matches Linux in both lines, nothing else.
man grep states that:
-w, --word-regexp
Select only those lines containing matches that form whole words. The
test is that the matching substring must either be at the beginning of
the line, or preceded by a non-word constituent character. Similarly,
it must be either at the end of the line or followed by a non-word
constituent character. Word-constituent characters are letters,
digits, and the underscore. This option has no effect if -x is also
specified.
This means that Linux will be matched in all cases where this text is surrounded by anything but letters, digits and the underscore.
To see what exactly is grep matching, use -o to print the matched part only:
$ echo "Linux.
Linux$" | grep -wo Linux
Linux
Linux
So it is just Linux what gets matched.
Option -w has the semantics of matching "whole words". A word delimiter is a change of character class, e. g. from letter to symbol or to interpunction, so x$ contains a word delimiter between the two characters, so does x..

How to grep for the exact word if the string has got dot in it

I was trying to search for a particular word BML.I in a current directory.
When I tried with the below command:
grep -l "BML.I" *
It is displaying all the results if it contains the word BML
Is it possible to grep for the exact match BML.I
You need to escape the . (period) since by default it matches against any character, and specify -w to match a specific word e.g.
grep -w -l "BML\.I" *
Note there are two levels of escaping in the above. The quotes ensure that the shell passes BML\.I to grep. The \ then escapes the period for grep. If you omit the quotes, then the shell interprets the \ as an escape for the period (and would simply pass the unescaped period to grep)
try grep -wF
from man page:
-w, --word-regexp
Select only those lines containing matches that form whole words. The
test is that the matching substring must either be at the beginning of
the line, or preceded by a non-word constituent character. Similarly, it
must be either at the end of the line or followed by a non-word
constituent character. Word-constituent characters are letters, digits,
and the underscore.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any
of which is to be matched. (-F is specified by POSIX.)
I use fgrep, which is the same as grep -F
Use this command:
ls | grep -x "BML.I"

grep is not working as expected

i have tried to grep in a file ......
in file i have 5 entities
vivek
vivek.a
a.vivek
vivek_a
a_vivek
when i grep as grep -iw vivek filename, then it should give me
vivek only but it give
vivek
vivek.a
a.vivek
Looks fine to me. . is a non-word character. If you meant something else then you should have used a more-specific regex instead of using -w.
It does that because the definition of a word (which is what the w option chooses) permits . to separate words, though _ is considered part of a word. This definition is useful for programming languages, but not so useful for English text.
A set of characters with letters, underscore and digits is considered as a word. So any other character apart from that set denotes the word boundary. Therefore, in the line "vivek.a", the dot denotes end of word, and all the characters before that form a word "vivek", which matches with the word you are trying to match using option -w.
So, one way is to define your own word boundaries like this:
$ grep -i -e "[[:space:]]vivek[[:space:]]" -e "^vivek[[:space:]]" -e "[[:space:]]vivek$" -e "^vivek$" file

Resources