How to grep for the exact word if the string has got dot in it - linux

I was trying to search for a particular word BML.I in a current directory.
When I tried with the below command:
grep -l "BML.I" *
It is displaying all the results if it contains the word BML
Is it possible to grep for the exact match BML.I

You need to escape the . (period) since by default it matches against any character, and specify -w to match a specific word e.g.
grep -w -l "BML\.I" *
Note there are two levels of escaping in the above. The quotes ensure that the shell passes BML\.I to grep. The \ then escapes the period for grep. If you omit the quotes, then the shell interprets the \ as an escape for the period (and would simply pass the unescaped period to grep)

try grep -wF
from man page:
-w, --word-regexp
Select only those lines containing matches that form whole words. The
test is that the matching substring must either be at the beginning of
the line, or preceded by a non-word constituent character. Similarly, it
must be either at the end of the line or followed by a non-word
constituent character. Word-constituent characters are letters, digits,
and the underscore.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any
of which is to be matched. (-F is specified by POSIX.)

I use fgrep, which is the same as grep -F

Use this command:
ls | grep -x "BML.I"

Related

why ' grep -w -e "[SVC]" *.* ' doesn't return anything?

why grep -w -e "[SVC]" *.* (with upper case characters) doesn't return anything but when I use lower case characters
grep -w -e "[svc]" *.* it returns the result as expected?
Well, the first thing you have to ascertain is whether there are actually any words in the file made up of a single uppercase letter drawn from the set {S, V, C, P}.
The -w flag will restrict you to words of that form so it would, for example, find P.x. It would not find ValidParentheses. The detail can be found in the manpage:
The -w option selects only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character.

Removing number of dots with grep using regular expression

How can I remove lines that contain more than 5 "." or less than 5 dots (simply put: 5 dots per line?
How can I write a regex that will detect it in bash using grep?
INPUT:
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
yGEtfWYBCBKtvxTbHxwK,126.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json
EXPECTED OUTPUT:
yGEtfWYBCBKtvxTbHxwK,176.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json
Tried:
grep -P '[.]{5}' stuff.txt
grep -P '[\.]{5}' stuff.txt
grep -P '([\.]{5})' stuff.txt
grep -P '\.{5}' stuff.txt
grep -E '([\.]{5}' stuff.txt
You can display only the lines that contain exactly 5 dots as follow :
grep '^[^.]*\.[^.]*\.[^.]*\.[^.]*\.[^.]*\.[^.]*$' stuff.txt
or if you want to factor it :
grep -E '^([^.]*\.){5}[^.]*$' stuff.txt
Using -ERE in this second one is helpful to avoid having to escape the \(\) and \{\}, in the first one grep's default BRE regex flavour is sufficient.
^ and $ are anchors representing respectively the start and end of the line that make sure we match the whole line and not just a part of it that contains 5 dots.
[^.] is a negated character class that will match anything but a dot.
They are quantified with * so that any number of non-dot characters can happen between each dot (you might want to change that to + if consecutive dots shouldn't be matched).
\. matches a literal dot (rather than any character, which the meta-character . outside of a character class would).
To detect specifically the bad IP address
Can you be certain that the IP address is always surrounded by commas and does not contain spaces - i.e. is never the first or last field?
Then, you might get away with:
grep -E ',\w+((\.\w+){2,3}|(\.\w+){5,}),'
If not, it is quite difficult to distinguish between a broken IP form with spaces and an ordinary sentence, so you might have to specify the column.
Using Perl one-liner to print only if number of "." exceeds 5
> cat five_dots.txt
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
yGEtfWYBCBKtvxTbHxwK,126.221.42.21,10,Bad stuff is happening,http://mystuff.com/file.json
> perl -ne '{ while(/\./g){$count++} print if $count > 5; $count=0 } ' five_dots.txt
yGEtfWYBCBKtvxTbHxMK,126.221.42.321.0.147.30,10,Bad stuff is happening,http://mystuff.com/file.json
>

Do bash script and command grep treat single quote differently

In Advanced Bash-Scripting Guide, I find
Within single quotes, every special character except ' gets
interpreted literally.
So I think grep '\<the\>' file.txt would search \<the\>, instead of word the. But it searches the indeed.
#!/bin/bash
grep '\<the\>' file.txt
Added
Maybe I don't describe my question clearly.In man page,
Enclosing characters in single quotes preserves the literal value of each character within the quotes.
So my question is: Now that bash would regard enclosing characters in single quote as the literal value, why '\<the\>' is treated as the in grep? Is it grep own characteristic,differing from bash?
Indeed, bash will pass your string literally.
It is grep that interpretes the string (as a regular expression). If you want to avoid that, use grep -F. With that option, grep will search literally for the given string.
You need to add another backslash \ to match the whole pattern, as the symbols \< and \> are special to grep. Quoting the manpage: man grep
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the beginning and end
of a word.

How to print the longest word in a file by using combination of grep and wc

iam trining to find the longest word in a text file.
i tried it and find out the no of characters in the longest word in a file
by using the command
wc -L
i need to print the longest word By using this number and grep command .
If you must use the two commands give, I'd suggest:
grep -E ".{$(wc -L < test.txt)}" test.txt
The command substitution is used to build the correct brace expression to match the line(s) with exactly the given number of characters. -E is needed to enable extended regular expression support; otherwise, the braces need to be escaped: grep ".\{...\}" test.txt.
Using an awk command that makes a single pass through the file may be faster.

Why does "grep -w" match strings ending with "." or "$"? [duplicate]

This question already has an answer here:
The meaning of 'word' in Grep
(1 answer)
Closed 6 years ago.
1.) I am using Debian 8.4 on a virtual box and lets say I have a text file name sample.txt containing..
Linux.
Linux$
Then I ran the command grep -w Linux sample.txt and the output was
Linux.
Linux$
So I was wondering why it match those lines since I specified the -w option which is supposed to match the exact string only?
Both $ and . are non-word constituent characters, so -w matches Linux in both lines, nothing else.
man grep states that:
-w, --word-regexp
Select only those lines containing matches that form whole words. The
test is that the matching substring must either be at the beginning of
the line, or preceded by a non-word constituent character. Similarly,
it must be either at the end of the line or followed by a non-word
constituent character. Word-constituent characters are letters,
digits, and the underscore. This option has no effect if -x is also
specified.
This means that Linux will be matched in all cases where this text is surrounded by anything but letters, digits and the underscore.
To see what exactly is grep matching, use -o to print the matched part only:
$ echo "Linux.
Linux$" | grep -wo Linux
Linux
Linux
So it is just Linux what gets matched.
Option -w has the semantics of matching "whole words". A word delimiter is a change of character class, e. g. from letter to symbol or to interpunction, so x$ contains a word delimiter between the two characters, so does x..

Resources