Problem with using grep to match the whole word - linux

I am trying to match a whole string in a list of new line separated strings. Here is my example:
[hemanth.a#gateway ~]$ echo $snapshottableDirs
/user/hemanth.a/dummy1 /user/hemanth.a/dummy3
[hemanth.a#gateway ~]$ echo $snapshottableDirs | tr -s ' ' '\n'
/user/hemanth.a/dummy1
/user/hemanth.a/dummy3
[hemanth.a#gateway ~]$ echo $snapshottableDirs | tr -s ' ' '\n' | grep -w '/user/hemanth.a'
/user/hemanth.a/dummy1
/user/hemanth.a/dummy3
My aim is to only find a match if and only if the string /user/hemanth.a exists as a whole word(in a new line) in the list of strings. But the above command is also returning strings that contain /user/hemanth.a.
This is a sample scenario. There is no guarantee that all the strings that I would want to match will be in the form of /user/xxxxxx.x. Ideally I would want to match the exact string if it exists in a new line as a whole word in the list.
Any help would be appreciated. thank you.

Update: Using fgrep -x '/user/hemanth.a' is probably a better solution here, as it avoids having to escape characters such as $ to prevent grep from interpreting them as meta-characters. fgrep performs a literal string match as opposed to a regular expression match, and the -x option tells it to only match whole lines.
Example:
> cat testfile.txt
foo
foobar
barfoo
barfoobaz
> fgrep foo testfile.txt
foo
foobar
barfoo
barfoobaz
> fgrep -x foo testfile.txt
foo
Original answer:
Try adding the $ regex metacharacter to the end of your grep expression, as in:
echo $snapshottableDirs | tr -s ' ' '\n' | grep -w '/user/hemanth.a$'.
The $ metacharacter matches the end of the line.
While you're at it, you might also want to use the ^ metacharacter, which matches the beginning of the line, so that grep '/user/hemanth.a$' doesn't accidentally also match something like /user/foo/user/hemanth.a.
So you'd have this:
echo $snapshottableDirs | tr -s ' ' '\n' | grep '^/user/hemanth\.a$'.
Edit: You probably don't actually want the -w here, so I've removed that from my answer.
Edit 2: #U. Windl brings up a good point. The . character in a regular expression is a metacharacter that matches any character, so grep /user/hemanth.a might end up matching things you're not expecting, such as /user/hemanthxa, etc. Or perhaps more likely, it would also match the line /user/hemanth/a. To fix that, you need to escape the . character. I've updated the grep line above to reflect this.
Update: In response to your question in the comments about how to escape a string so that it can be used in a grep regular expression...
Yes, you can escape a string so that it should be able to be used in a regular expression. I'll explain how to do so, but first I should say that attempting to escape strings for use in a regex can become very complicated with lots of weird edge cases. For example, an escaped string that works with grep won't necessarily work with sed, awk, perl, bash's =~ operator, or even grep -e.
On top of that, if you change from single quotes to double quotes, you might then have to add another level of escaping so that bash will expand your string properly.
For example, if you wanted to search for the literal string 'foo [bar]* baz$'using grep, you'd have to escape the [, *, and $ characters, resulting in the regular expression:
'foo \[bar]\* baz\$'
But if for some reason you decided to pass that expression to grep as a double-quoted string, you would then have to escape the escapes. Otherwise, bash would interpret some of them as escapes. You can see this if you do:
echo "foo \[bar]\* baz\$"
foo \[bar]\* baz$
You can see that bash interpreted \$ as an escape sequence representing the character $, and thus swallowed the \ character. This is because normally, in double quoted strings $ is a special character that begins a parameter expansion. But it left \[ and \* alone because [ and * aren't special inside a double-quoted string, so it interpreted the backslashes as literal \ characters. To get this expression to work as an argument to grep in a double-quoted string, then, you would have to escape the last backslash:
# This command prints nothing, because bash expands `\$` to just `$`,
# which grep then interprets as an end-of-line anchor.
> echo 'foo [bar]* baz$' | grep "foo \[bar]\* baz\$"
# Escaping the last backslash causes bash to expand `\\$` to `\$`,
# which grep then interprets as matching a literal $ character
> echo 'foo [bar]* baz$' | grep "foo \[bar]\* baz\\$"
foo [bar]* baz$
But note that "foo \[bar]\* baz \\$" will not work with sed, because sed uses a different regex syntax in which escaping a [ causes it to become a meta-character, whereas in grep you have to escape it to prevent it from being interpreted as a meta-character.
So again, yes, you can escape a literal string for use as a grep regular expression. But if you need to match literal strings containing characters that will need to be escaped, it turns out there's a better way: fgrep.
The fgrep command is really just shorthand for grep -F, where the -F tells grep to match "fixed strings" instead of regular expression. For example:
> echo '[(*\^]$' | fgrep '[(*\^]$'
[(*\^]$
This works because fgrep doesn't know or care about regular expressions. It's just looking for the exact literal string '[(*\^]$'. However, this sort of puts you back at square one, because fgrep will match on substrings:
> echo '/users/hemanth/dummy' | fgrep '/users/hemanth'
/users/hemanth/dummy
Thankfully, there's a way around this, which it turns out was probably a better approach than my initial answer, considering your specific needs. The -x option to fgrep tells it to only match the entire line. Note that -x is not specific to fgrep (since fgrep is really just grep -F anyway). For example:
> echo '/users/hemanth/dummy' | fgrep -x '/users/hemanth' # prints nothing
This is equivalent to what you would have gotten by escaping the grep regex, and is almost certainly a better answer than my previous answer of enclosing your regex in ^ and $.
Now, as promised, just in case you want to go this route, here's how you would escape a fixed string to use as a grep regex:
# Suppose we want to match the literal string '^foo.\ [bar]* baz$'
# It contains lots of stuff that grep would normally interpret as
# regular expression meta-characters. We need to escape those characters
# so grep will interpret them as literals.
> str='^foo.\ [bar]* baz$'
> echo "$str"
^foo.\ [bar]* baz$
> regex=$(sed -E 's,[.*^$\\[],\\&' <<< "$str")
> echo "$regex"
\^foo\.\\ \[bar]\* baz\$
> echo "$str" | grep "$regex"
^foo.\ [bar]* baz$
# Success
Again, for the reasons cited above, I don't recommend this approach, especially not when fgrep -x exists.

Read "Anchoring" in man grep:
Anchoring
The caret ^ and the dollar sign $ are meta-characters that respectively
match the empty string at the beginning and end of a line.
Also be aware that . matches any character (from said manual page):
The period . matches any single character.

Related

Grep words with spaces [duplicate]

This question already has answers here:
grep with regexp: whitespace doesn't match unless I add an assertion
(2 answers)
Closed 3 years ago.
I have a text file that contains quotes, comma and spaces.
"'x','a b c'"
"'x','a b c','1','2 3'"
"'x','a b c','22'"
"'x','a b z'"
"'x','s d 2'"
However, when I try using grep to pull the exact match, it doesn't display the results. Below is the command I'm trying to use.
grep -E "\"\'x\'\,\'a\s\+b\s\+c\'\"" test.txt
Expected output: "'x','a b c'"
Am I missing anything? Any help would be really appreciated.
You were close! Couple of notes:
Don't use \s. It is a gnu extension, not available everywhere. It's better to use character classes [[:space:]], or really just match a space.
The \+ may be misleading - in -E mode, it matches a literal +, while without -E the \+ matches one or more preceding characters. The escaping depends on the mode you are using.
You don't need to escape everything! When in " doublequotes, escape doublequotes "\"", don't escape singlequotes and commas in doublequotes, "\'\," is interpreted as just "',".
If you meant only to match spaces with grep -E:
grep -E "\"'x','a +b +c'\""
This is simple enough without -E, just \+ instead of +:
grep "\"'x','a \+b \+c'\""
I like to put things in front of + inside braces, helps me read:
grep "\"'x','a[ ]\+b[ ]\+c'\""
grep -E "\"'x','a[ ]+b[ ]+c'\""
If you want to match spaces and tabs between a and b, you can insert a literal tab character inside [ ] with $'\t':
grep "\"'x','a[ "$'\t'"]\+b[ "$'\t'"]\+c'\""
grep -E "\"'x','a[ "$'\t'"]+b[ "$'\t'"]+c'\""
But with grep -P that would just become:
grep -P "\"'x','a[ \t]+b[ \t]+c'\""
But the best is to forget about \s and use character classes [[:space:]]:
grep "\"'x','a[[:space:]]\+b[[:space:]]\+c'\""
grep -E "\"'x','a[[:space:]]+b[[:space:]]+c'\""

Linux Bash. Delete line if field exactly matches

I have something like this in a file named file.txt
AA.201610.pancake.Paul
AA.201610.hello.Robert
A.201610.hello.Mark
Now, i ONLY get the first three fields in 3 variables like:
field1="A"
field2="201610"
field3='hello'.
I'd like to remove a line, if it contains exactly the first 3 fields, like , in the case described above, i want only the third line to be removed from the file.txt . Is there a way to do that? And is there a way to do that in the same file?
I tried with:
sed -i /$field1"."$field2"."$field3"."/Id file.txt
but of course this removes both the second and the third line
I suggest using awk for this as sed can only do regex search and that requires escaping all special meta-chars and anchors, word boundaries etc to avoid false matches.
Suggested awk with non-regex matching:
awk -F '[.]' -v f1="$field1" -v f2="$field2" -v f3="$field3" '
!($1==f1 && $2==f2 && $3==f3)' file
AA.201610.pancake.Paul
AA.201610.hello.Robert
Use ^ to anchor the pattern at the beginning of the line. Also note that . in a regex means "any character" and not a literal peridio. You have to escape it: either \. (be careful with shell escaping and the difference between single and double quotes) or [.]
Sed cannot do string matches, only regexp matches which becomes horrendously complicated to work around when you simply want to match a literal string (see Is it possible to escape regex metacharacters reliably with sed). Just use awk:
$ awk -v str="${field1}.${field2}.${field3}." 'index($0,str)!=1' file
AA.201610.pancake.Paul
AA.201610.hello.Robert
The question was about bash so in bash:
#!/usr/bin/env bash
field1="A"
field2="201610"
field3='hello'
IFS=
while read -r i
do
case "$i" in
"${field1}.${field2}.${field3}."*) ;;
*) echo -E "$i"
esac
done < file.txt

how to replace a special characters by character using shell

I have a string variable x=tmp/variable/custom-sqr-sample/test/example
in the script, what I want to do is to replace all the “-” with the /,
after that,I should get the following string
x=tmp/variable/custom/sqr/sample/test/example
Can anyone help me?
I tried the following syntax
it didnot work
exa=tmp/variable/custom-sqr-sample/test/example
exa=$(echo $exa|sed 's/-///g')
sed basically supports any delimiter, which comes in handy when one tries to match a /, most common are |, # and #, pick one that's not in the string you need to work on.
$ echo $x
tmp/variable/custom-sqr-sample/test/example
$ sed 's#-#/#g' <<< $x
tmp/variable/custom/sqr/sample/test/example
In the commend you tried above, all you need is to escape the slash, i.e.
echo $exa | sed 's/-/\//g'
but choosing a different delimiter is nicer.
The tr tool may be a better choice than sed in this case:
x=tmp/variable/custom-sqr-sample/test/example
echo "$x" | tr -- - /
(The -- isn't strictly necessary, but keeps tr (and humans) from mistaking - for an option.)
In bash, you can use parameter substitution:
$ exa=tmp/variable/custom-sqr-sample/test/example
$ exa=${exa//-/\/}
$ echo $exa
tmp/variable/custom/sqr/sample/test/example

What is wrong with 'echo 'a\\b' | grep "\\"'

When I am running this
echo 'a\\b' | grep '\\'
it correctly identifies the backslashes
but when I used the double quotes
echo 'a\\b' | grep "\\"
It is not running and it is returning a trailing backlash error, I can't figure out why that happens. It should work exactly similar to single quotes as I have escaped the backslashes.
When using double quotes the \\ gets evaluated into \. Single quotes leaves things as-is. Do an echo "\\" to see what I mean.
There are multiple places where backslash escapes can be processed:
In your shell: This happens when you use double quotes and not single quotes. Note that this is a very limited processing and escapes like "\n" will not work. This means when you do echo "a\\b" echo will get a\b as first argument while when you're executing echo 'a\\b' echo will get a\\b as first argument.
In the program you're calling: grep will parse the input as POSIX regular expression which has its own set of escapes. echo may or may not handle escape codes by default. My /bin/echo will not process escape codes by default, my bash bulitin echo also won't but my zsh builtin echo will. If a certain behaviour is wanted this can be specified by -e to enable escapes or by -E to disable escapes.
So if you're calling grep "\\" the shell will process the \\ escape sequence and pass \ to grep which will try to parse that as a regular expression and correclty complains about a trailing backslash.
So when you use double quotes you have to double escape everything resulting in the rather clumsy echo 'a\\b' | grep "\\\\".
To get the least amount of escaping you can use echo 'a\\b' | grep '\\' or even echo 'a\\b' | grep -F '\', where the -F flag tells grep to interpret the pattern as verbatim string and not as a regular expression. If you happen to have an echo that processes escapes by default (this will result in echo 'a\\b' printing a\b) you also need to add -E to the echo command.
grep does a bit of basic regular expression matching.
When in double quotes, the shell parses \\, so \\ will resolve to a single backslash as a parameter.
Additionally, grep does a bit of basic regular expression matching, so to match a single backslash you need to pass two backslashes to grep.
So by calling grep "\\" you actually get grep \, which does not parse as a regular expression, therefore makes grep fail.
To correctly match only double backslashes, do grep -F '\\' (which means fixed string matching instead of regex), grep '\\\\' or grep "\\\\\\\\".

Complex shell wildcard

I want to use echo to display(not content) directories that start with atleast 2 characters but can't begin with "an"
For example if had the following in the directory:
a as an23 an23 blue
I would only get
as blue back
I tried echo ^an* but that returns the directory with 1 charcter too.
Is there any way i can do this in the form of echo globalpattern
You can use the shells extended globbing feature, in bash:
bash$ setsh -s extglob
bash$ echo !(#(?|an*))
The !() construct inverts its internal expression, see this for more.
In zsh:
zsh$ setopt extendedglob
zsh$ print *~(?|an*)
In this case the ~ negates the pattern before the tilde. See the manual for more.
Since you want at least two characters in the names, you can use printf '%s\n' ??* to echo each such name on a separate line. You can then eliminate those names that start with an with grep -v '^an', leading to:
printf '%s\n' ??* | grep -v '^an'
The quotes aren't strictly necessary in the grep command with modern shells. Once upon a quarter of a century or so ago, the Bourne shell had ^ as a synonym for | so I still use quotes around carets.
If you absolutely must use echo instead of printf, then you'll have to map white space to newlines (assuming you don't have any names that contain white space).
I'm trying with just the echo command, no grep either?
What about:
echo [!a]?* a[!n]*
The first term lists all the two-plus character names not beginning with a; the second lists all the two-plus character names where the first is a and the second is not n.
This should do it, but you'd likely be better off with ls or even find:
echo * | tr ' ' '\012' | egrep '..' | egrep -v '^an'
Shell globbing is a form of regex, but it's not as powerful as egrep regex's.

Resources