Why does "grep -w" match strings ending with "." or "$"? [duplicate] - linux

This question already has an answer here:
The meaning of 'word' in Grep
(1 answer)
Closed 6 years ago.
1.) I am using Debian 8.4 on a virtual box and lets say I have a text file name sample.txt containing..
Linux.
Linux$
Then I ran the command grep -w Linux sample.txt and the output was
Linux.
Linux$
So I was wondering why it match those lines since I specified the -w option which is supposed to match the exact string only?

Both $ and . are non-word constituent characters, so -w matches Linux in both lines, nothing else.
man grep states that:
-w, --word-regexp
Select only those lines containing matches that form whole words. The
test is that the matching substring must either be at the beginning of
the line, or preceded by a non-word constituent character. Similarly,
it must be either at the end of the line or followed by a non-word
constituent character. Word-constituent characters are letters,
digits, and the underscore. This option has no effect if -x is also
specified.
This means that Linux will be matched in all cases where this text is surrounded by anything but letters, digits and the underscore.
To see what exactly is grep matching, use -o to print the matched part only:
$ echo "Linux.
Linux$" | grep -wo Linux
Linux
Linux
So it is just Linux what gets matched.

Option -w has the semantics of matching "whole words". A word delimiter is a change of character class, e. g. from letter to symbol or to interpunction, so x$ contains a word delimiter between the two characters, so does x..

Related

Output the names of all files from file.txt, having the .conf extension

I need to output from a file file.txt the names of all files with the .conf extension.
grep .conf file.txt
But in the end, I get a file called dconf and a file with the config extension. How can I output everything else, but without these two?
The '.' has a special meaning, it says "any character". If you really want to match only the dot itself, you have to mask the character with:
grep "\.conf" file.txt
The masking with backslash must also be masked for the shell itself with ".
To see a list of regular expressions, you can take a look at online regex test.
Add on:
From the comments: How to see no file from the list which is named xyz.config
Answer: You have to tell grep that the regular expression ends at the end of the word with:
grep "\.conf\>" file.txt
TL;DR: you should instead do:
grep "\.conf\>" file.txt
grep uses Regular Expressions. The . character in a regex is a command which means "match any one character." So your command means "match any string which contains one character followed by c o n f in that order."
So, your regular expression will match what you are looking for, but it will also match strings that have things after your match (your .config example) as well as anything followed by "conf" (your dconf example)
So instead you want to tell grep that you are looking for a "string literal ." by escaping that character in your regular expression by preceding it with a backslash (\), and you want to describe what the end or your string input is like, which may be a newline or it may simply be a space.

why ' grep -w -e "[SVC]" *.* ' doesn't return anything?

why grep -w -e "[SVC]" *.* (with upper case characters) doesn't return anything but when I use lower case characters
grep -w -e "[svc]" *.* it returns the result as expected?
Well, the first thing you have to ascertain is whether there are actually any words in the file made up of a single uppercase letter drawn from the set {S, V, C, P}.
The -w flag will restrict you to words of that form so it would, for example, find P.x. It would not find ValidParentheses. The detail can be found in the manpage:
The -w option selects only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character.

grep obtains pattern from a file but printing not only the whole match word

I've got file.txt to extract lines containing the exact words listed in check.txt file.
# file.txt
CA1C 2637 green
CA1C-S1 2561 green
CA1C-S2 2371 green
# check.txt
CA1C
I tried
grep -wFf check.txt file.txt
but I'm not getting the desired output, i.e. all the three lines were printed.
Instead, I'd like to get only the first line,
CA1C 2637 green
I searched and found this post being relevant, it's easy to do it when doing only one word matching. But how can I improve my code to let grep obtain patterns from check.txt file and print only the whole word matched lines?
A lot of thanks!
The man page for grep says the following about the -w switch:
-w, --word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.
In your case, all three lines start with "CA1C-", which meets the conditions of being at the beginning of the line, and being followed by a non-word constituent character (the hyphen).
I would do this with a loop, reading lines manually from check.txt:
cat check.txt | while read line; do grep "^$line " file.txt; done
CA1C 2637 green
This loop reads the lines from check.txt, and searches for each one at the start of a line in file.txt, with a following space.
There may be a better way to do this, but I couldn't get -f to actually consider whitespace at the end of a line of the input file.

Do bash script and command grep treat single quote differently

In Advanced Bash-Scripting Guide, I find
Within single quotes, every special character except ' gets
interpreted literally.
So I think grep '\<the\>' file.txt would search \<the\>, instead of word the. But it searches the indeed.
#!/bin/bash
grep '\<the\>' file.txt
Added
Maybe I don't describe my question clearly.In man page,
Enclosing characters in single quotes preserves the literal value of each character within the quotes.
So my question is: Now that bash would regard enclosing characters in single quote as the literal value, why '\<the\>' is treated as the in grep? Is it grep own characteristic,differing from bash?
Indeed, bash will pass your string literally.
It is grep that interpretes the string (as a regular expression). If you want to avoid that, use grep -F. With that option, grep will search literally for the given string.
You need to add another backslash \ to match the whole pattern, as the symbols \< and \> are special to grep. Quoting the manpage: man grep
The Backslash Character and Special Expressions
The symbols \< and \> respectively match the empty string at the beginning and end
of a word.

How to grep for the exact word if the string has got dot in it

I was trying to search for a particular word BML.I in a current directory.
When I tried with the below command:
grep -l "BML.I" *
It is displaying all the results if it contains the word BML
Is it possible to grep for the exact match BML.I
You need to escape the . (period) since by default it matches against any character, and specify -w to match a specific word e.g.
grep -w -l "BML\.I" *
Note there are two levels of escaping in the above. The quotes ensure that the shell passes BML\.I to grep. The \ then escapes the period for grep. If you omit the quotes, then the shell interprets the \ as an escape for the period (and would simply pass the unescaped period to grep)
try grep -wF
from man page:
-w, --word-regexp
Select only those lines containing matches that form whole words. The
test is that the matching substring must either be at the beginning of
the line, or preceded by a non-word constituent character. Similarly, it
must be either at the end of the line or followed by a non-word
constituent character. Word-constituent characters are letters, digits,
and the underscore.
-F, --fixed-strings
Interpret PATTERN as a list of fixed strings, separated by newlines, any
of which is to be matched. (-F is specified by POSIX.)
I use fgrep, which is the same as grep -F
Use this command:
ls | grep -x "BML.I"

Resources