Grep words with spaces [duplicate] - linux

This question already has answers here:
grep with regexp: whitespace doesn't match unless I add an assertion
(2 answers)
Closed 3 years ago.
I have a text file that contains quotes, comma and spaces.
"'x','a b c'"
"'x','a b c','1','2 3'"
"'x','a b c','22'"
"'x','a b z'"
"'x','s d 2'"
However, when I try using grep to pull the exact match, it doesn't display the results. Below is the command I'm trying to use.
grep -E "\"\'x\'\,\'a\s\+b\s\+c\'\"" test.txt
Expected output: "'x','a b c'"
Am I missing anything? Any help would be really appreciated.

You were close! Couple of notes:
Don't use \s. It is a gnu extension, not available everywhere. It's better to use character classes [[:space:]], or really just match a space.
The \+ may be misleading - in -E mode, it matches a literal +, while without -E the \+ matches one or more preceding characters. The escaping depends on the mode you are using.
You don't need to escape everything! When in " doublequotes, escape doublequotes "\"", don't escape singlequotes and commas in doublequotes, "\'\," is interpreted as just "',".
If you meant only to match spaces with grep -E:
grep -E "\"'x','a +b +c'\""
This is simple enough without -E, just \+ instead of +:
grep "\"'x','a \+b \+c'\""
I like to put things in front of + inside braces, helps me read:
grep "\"'x','a[ ]\+b[ ]\+c'\""
grep -E "\"'x','a[ ]+b[ ]+c'\""
If you want to match spaces and tabs between a and b, you can insert a literal tab character inside [ ] with $'\t':
grep "\"'x','a[ "$'\t'"]\+b[ "$'\t'"]\+c'\""
grep -E "\"'x','a[ "$'\t'"]+b[ "$'\t'"]+c'\""
But with grep -P that would just become:
grep -P "\"'x','a[ \t]+b[ \t]+c'\""
But the best is to forget about \s and use character classes [[:space:]]:
grep "\"'x','a[[:space:]]\+b[[:space:]]\+c'\""
grep -E "\"'x','a[[:space:]]+b[[:space:]]+c'\""

Related

Problem with using grep to match the whole word

I am trying to match a whole string in a list of new line separated strings. Here is my example:
[hemanth.a#gateway ~]$ echo $snapshottableDirs
/user/hemanth.a/dummy1 /user/hemanth.a/dummy3
[hemanth.a#gateway ~]$ echo $snapshottableDirs | tr -s ' ' '\n'
/user/hemanth.a/dummy1
/user/hemanth.a/dummy3
[hemanth.a#gateway ~]$ echo $snapshottableDirs | tr -s ' ' '\n' | grep -w '/user/hemanth.a'
/user/hemanth.a/dummy1
/user/hemanth.a/dummy3
My aim is to only find a match if and only if the string /user/hemanth.a exists as a whole word(in a new line) in the list of strings. But the above command is also returning strings that contain /user/hemanth.a.
This is a sample scenario. There is no guarantee that all the strings that I would want to match will be in the form of /user/xxxxxx.x. Ideally I would want to match the exact string if it exists in a new line as a whole word in the list.
Any help would be appreciated. thank you.
Update: Using fgrep -x '/user/hemanth.a' is probably a better solution here, as it avoids having to escape characters such as $ to prevent grep from interpreting them as meta-characters. fgrep performs a literal string match as opposed to a regular expression match, and the -x option tells it to only match whole lines.
Example:
> cat testfile.txt
foo
foobar
barfoo
barfoobaz
> fgrep foo testfile.txt
foo
foobar
barfoo
barfoobaz
> fgrep -x foo testfile.txt
foo
Original answer:
Try adding the $ regex metacharacter to the end of your grep expression, as in:
echo $snapshottableDirs | tr -s ' ' '\n' | grep -w '/user/hemanth.a$'.
The $ metacharacter matches the end of the line.
While you're at it, you might also want to use the ^ metacharacter, which matches the beginning of the line, so that grep '/user/hemanth.a$' doesn't accidentally also match something like /user/foo/user/hemanth.a.
So you'd have this:
echo $snapshottableDirs | tr -s ' ' '\n' | grep '^/user/hemanth\.a$'.
Edit: You probably don't actually want the -w here, so I've removed that from my answer.
Edit 2: #U. Windl brings up a good point. The . character in a regular expression is a metacharacter that matches any character, so grep /user/hemanth.a might end up matching things you're not expecting, such as /user/hemanthxa, etc. Or perhaps more likely, it would also match the line /user/hemanth/a. To fix that, you need to escape the . character. I've updated the grep line above to reflect this.
Update: In response to your question in the comments about how to escape a string so that it can be used in a grep regular expression...
Yes, you can escape a string so that it should be able to be used in a regular expression. I'll explain how to do so, but first I should say that attempting to escape strings for use in a regex can become very complicated with lots of weird edge cases. For example, an escaped string that works with grep won't necessarily work with sed, awk, perl, bash's =~ operator, or even grep -e.
On top of that, if you change from single quotes to double quotes, you might then have to add another level of escaping so that bash will expand your string properly.
For example, if you wanted to search for the literal string 'foo [bar]* baz$'using grep, you'd have to escape the [, *, and $ characters, resulting in the regular expression:
'foo \[bar]\* baz\$'
But if for some reason you decided to pass that expression to grep as a double-quoted string, you would then have to escape the escapes. Otherwise, bash would interpret some of them as escapes. You can see this if you do:
echo "foo \[bar]\* baz\$"
foo \[bar]\* baz$
You can see that bash interpreted \$ as an escape sequence representing the character $, and thus swallowed the \ character. This is because normally, in double quoted strings $ is a special character that begins a parameter expansion. But it left \[ and \* alone because [ and * aren't special inside a double-quoted string, so it interpreted the backslashes as literal \ characters. To get this expression to work as an argument to grep in a double-quoted string, then, you would have to escape the last backslash:
# This command prints nothing, because bash expands `\$` to just `$`,
# which grep then interprets as an end-of-line anchor.
> echo 'foo [bar]* baz$' | grep "foo \[bar]\* baz\$"
# Escaping the last backslash causes bash to expand `\\$` to `\$`,
# which grep then interprets as matching a literal $ character
> echo 'foo [bar]* baz$' | grep "foo \[bar]\* baz\\$"
foo [bar]* baz$
But note that "foo \[bar]\* baz \\$" will not work with sed, because sed uses a different regex syntax in which escaping a [ causes it to become a meta-character, whereas in grep you have to escape it to prevent it from being interpreted as a meta-character.
So again, yes, you can escape a literal string for use as a grep regular expression. But if you need to match literal strings containing characters that will need to be escaped, it turns out there's a better way: fgrep.
The fgrep command is really just shorthand for grep -F, where the -F tells grep to match "fixed strings" instead of regular expression. For example:
> echo '[(*\^]$' | fgrep '[(*\^]$'
[(*\^]$
This works because fgrep doesn't know or care about regular expressions. It's just looking for the exact literal string '[(*\^]$'. However, this sort of puts you back at square one, because fgrep will match on substrings:
> echo '/users/hemanth/dummy' | fgrep '/users/hemanth'
/users/hemanth/dummy
Thankfully, there's a way around this, which it turns out was probably a better approach than my initial answer, considering your specific needs. The -x option to fgrep tells it to only match the entire line. Note that -x is not specific to fgrep (since fgrep is really just grep -F anyway). For example:
> echo '/users/hemanth/dummy' | fgrep -x '/users/hemanth' # prints nothing
This is equivalent to what you would have gotten by escaping the grep regex, and is almost certainly a better answer than my previous answer of enclosing your regex in ^ and $.
Now, as promised, just in case you want to go this route, here's how you would escape a fixed string to use as a grep regex:
# Suppose we want to match the literal string '^foo.\ [bar]* baz$'
# It contains lots of stuff that grep would normally interpret as
# regular expression meta-characters. We need to escape those characters
# so grep will interpret them as literals.
> str='^foo.\ [bar]* baz$'
> echo "$str"
^foo.\ [bar]* baz$
> regex=$(sed -E 's,[.*^$\\[],\\&' <<< "$str")
> echo "$regex"
\^foo\.\\ \[bar]\* baz\$
> echo "$str" | grep "$regex"
^foo.\ [bar]* baz$
# Success
Again, for the reasons cited above, I don't recommend this approach, especially not when fgrep -x exists.
Read "Anchoring" in man grep:
Anchoring
The caret ^ and the dollar sign $ are meta-characters that respectively
match the empty string at the beginning and end of a line.
Also be aware that . matches any character (from said manual page):
The period . matches any single character.

Bash grep mac address unix/linux with semicolon

why does not this work, shouldnt this output 30:84:A9:9B:2A:67 from my textfile?
grep [A-F0-9]\:{5}[A-F0-9] textfile.txt
try this:
$ echo 30:84:A9:9B:2A:67 | grep -P "([A-F0-9]{2}:){5}[A-F0-9]{2}"
30:84:A9:9B:2A:67
In your question "[A-F0-9]:{5}" was trying to match an alpha numeric character plus colon five times: X:X:X:X:X:
Also, grep accepts basic regular expressions (BRE) so you need to escape brackets and parenthesis.

sed is replacing matched text with output of another command, but that command's output contains expansion characters [duplicate]

This question already has answers here:
Using different delimiters in sed commands and range addresses
(3 answers)
Closed 6 years ago.
I'm trying to replace text in a file with the output of another command. Unfortunately, the outputted text contains characters bash expands. For example, I'm running the following script to change the file (somestring references output that would break the sed command):
#!/bin/bash
somestring='$6$sPnfj/lnXwZVrec7$fCnL9uy1oWIMZduInKTHBAxhsQxGCsBpm2XfVFFqDPHKidrd93yfjbYvKgYexXHVcvkKdu9lbfy16Ek5GvKy/1'
sed '0,/^title/s/^title*/'"$somestring"'\n&/' $HOME/example.txt
sed fails with this error:
sed: -e expression #1, char 30: unknown option to `s'
I think bash is substuting the contents of $somestring when building the sed command, but is then trying to expand the resulting text. I can't put the entire sed script in single quotes, I need bash to expand it the first time, just not the second. Any suggestions? Thanks
here the forward slash / is the problem. If it's the only issue you can set sed to use a different delimiter.
for example
$ somestring="abc/def"; echo xxx | sed 's/xxx/'"$somestring"'/'
sed: -e expression #1, char 11: unknown option to `s'
$ somestring="abc/def"; echo xxx | sed 's_xxx_'"$somestring"'_'
abc/def
you also need to worry about & and \ chars and escape them if can appear in the replacement text.
If you can't control the the replacement string, either you have to sanitize with another sed script or, alternatively use r command to read it from a file. For example,
$ seq 5 | sed -e '/3/{r replace' -e 'd}'
1
2
3slashes///1ampersand&and2backslashes\\end
4
5
where
$ cat replace
3slashes///1ampersand&and2backslashes\\end
You have several errors here:
the string somestring has characters that are significative for sed command (the most important being '/' that you are using as a delimiter) You can escape it, by substituting it with a previous
somestring=$(echo "$somestring" | sed -e 's/\//\\\//g')
that will convert your / chars to \/ sequences.
you are using sed '0,/^title/s/^title*/'"$somestring"'\n&/' $HOME/example.txt which is looking to substitute the string titl followed by any number of e characters by that $somestring value, followed by a new line and the original one. Unfortunately, sed(1) doesn't allow you to use newline characters in the pattern substitution side of the s command, but you can afford the result by using the i command with a text consisting of you pattern (preceding any new line by a \ to interpret it as literal):
Finally the script leads to:
#!/bin/bash
somestring='$6$sPnfj/lnXwZVrec7$fCnL9uy1oWIMZduInKTHBAxhsQxGCsBpm2XfVFFqDPHKidrd93yfjbYvKgYexXHVcvkKdu9lbfy16Ek5GvKy/1'
somestring=$(echo "$somestring" | sed -e 's/\//\\\//g')
sed '/^title/i\
'"$somestring\\
" $HOME/example.txt
If your shell is Bash, you can use parameter substitution to replace the problematic /:
somestring="{somestring//\//\\/}"
That looks scary, but is easier to understand if you look at the version that replaces x with __:
somestring="${somestring//x/__}"
It might be easier to use (say) underscore as the delimiter for your sed s command, and then the substitution above would be
somestring="${somestring//_/\\_}"
If you already have backslashes, you'll need to first replace those:
somestring="${somestring//\\/\\\\}"
somestring="{somestring//\//\\/}"
If there were other characters that needed escaping (e.g. on the search side of s///), then you could extend the above appropriately.
This URL provides the cleanest answer:
Command to escape a string in bash
printf "%q" "$someVariable"
will escape any characters you need escaped for you.

A good way to use sed to find and replace characters with 2 delimiters

I trying to find and replace items using bash. I was able to use sed to grab out some of the characters, but I think I might be using it in the wrong matter.
I am basically trying to remove the characters after ";" and before "," including removing ","
sed -e 's/\(;\).*\(,\)/\1\2/'
That is what I used to replace it with nothing. However, it ends up replacing everything in the middle so my output came out like this:
cmd2="BMC,./socflash_x64 if=B600G3_BMC_V0207.ima;,reboot -f"
This is the original text of what I need to replace
cmd2="BMC,./socflash_x64 if=B600G3_BMC_V0207.ima;X,sleep 120;after_BMC,./run-after-bmc-update.sh;hba_fw,./hba_fw.sh;X,sleep 5;DB,2;X,reboot -f"
Is there any way to make it look like this output?
./socflash_x64 if=B600G3_BMC_V0207.ima;sleep 120;./run-after-bmc-update.sh;./hba_fw.sh;sleep 5;reboot -f
Ff there is any way to make this happen other than bash I am fine with any type of language.
Non-greedy search can (mostly) be simulated in programs that don't support it by replacing match-any (dot .) with a negated character class.
Your original command is
sed -e 's/\(;\).*\(,\)/\1\2/'
You want to match everything in between the semi-colon and the comma, but not another comma (non-greedy). Replace .* with [^,]*
sed -e 's/\(;\)[^,]*\(,\)/\1\2/'
You may also want to exclude semi-colons themselves, making the expression
sed -e 's/\(;\)[^,;]*\(,\)/\1\2/'
Note this would treat a string like "asdf;zxcv;1234,qwer" differently, since one would match ;zxcv;1234, and the other would match only ;1234,
In perl:
perl -pe 's/;.*?,/;/g;' -pe 's/^[^,]*,//' foo.txt
will output:
./socflash_x64 if=B600G3_BMC_V0207.ima;sleep 120;./run-after-bmc-update.sh;./hba_fw.sh;sleep 5;2;reboot -f
The .*? is non greedy matching before the comma. The second command is to remove from the beginning to the comma.
Something like:
echo $cmd2 | tr ';' '\n' | cut -d',' -f2- | tr '\n' ';' ; echo
result is:
./socflash_x64 if=B600G3_BMC_V0207.ima;sleep 120;./run-after-bmc-update.sh;./hba_fw.sh;sleep 5;2;reboot -f;
however, I thing your requirements are a few more complex, because 'DB,2' seems a particular case. After "tr" command, insert a "grep" or "grep -v" to include/exclude these cases.

how do you specify non-capturing groups in sed?

is it possible to specify non-capturing groups in sed?
if so, how?
Parentheses in sed have two functions, grouping, and capturing.
So i'm asking about using parentheses to do the grouping, but without capturing. One might say non-capturing grouping parentheses. (non-capturing parantheses and that aren't literal). What are called non-capturing groups. Like i've seen the syntax (?:regex) for non-capturing groups, but it doesn't work in sed.
Linguistic Note- in the UK, the term brackets is used generally, for "round brackets" or "square brackets". In the UK, brackets usually refers to "( )", since "( )" are so common. And in the UK the term parentheses is hardly used. In the USA the term brackets are specifically "[ ]". So to prevent confusion to anybody in the USA, i've not used the words brackets in the question.
Parentheses can be used for grouping alternatives. For example:
sed 's/a\(bc\|de\)f/X/'
says to replace "abcf" or "adef" with "X", but the parentheses also capture. There is not a facility in sed to do such grouping without also capturing. If you have a complex regex that does both alternative grouping and capturing, you will simply have to be careful in selecting the correct capture group in your replacement.
Perhaps you could say more about what it is you're trying to accomplish (what your need for non-capturing groups is) and why you want to avoid capture groups.
Edit:
There is a type of non-capturing brackets ((?:pattern)) that are part of Perl-Compatible Regular Expressions (PCRE). They are not supported in sed (but are when using grep -P).
The answer, is that as of writing, you can't - sed does not support it.
Non-capturing groups have the syntax of (?:a) and are a PCRE syntax.
Sed supports BRE(Basic regular expressions), aka POSIX BRE, and if using GNU sed, there is the option -r that makes it support ERE(extended regular expressions) aka POSIX ERE, but still not PCRE)
Perl will work, for windows or linux
examples here
https://superuser.com/questions/416419/perl-for-matching-with-regular-expressions-in-terminal
e.g. this from cygwin in windows
$ echo -e 'abcd' | perl -0777 -pe 's/(a)(?:b)(c)(d)/\1/s'
a
$ echo -e 'abcd' | perl -0777 -pe 's/(a)(?:b)(c)(d)/\2/s'
c
There is a program albeit for Windows, which can do search and replace on the command line, and does support PCRE. It's called rxrepl. It's not sed of course, but it does search and replace with PCRE support.
C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(c)" -r "\1"
a
C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(c)" -r "\3"
c
C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(?:c)" -r "\3"
Invalid match group requested.
C:\blah\rxrepl>echo abc | rxrepl -s "(a)(?:b)(c)" -r "\2"
c
C:\blah\rxrepl>
The author(not me), mentioned his program in an answer over here https://superuser.com/questions/339118/regex-replace-from-command-line
It has a really good syntax.
The standard thing to use would be perl, or almost any other programming language that people use.
I'll assume you are speaking of the backrefence syntax, which are parentheses ( ) not brackets [ ]
By default, sed will interpret ( ) literally and not attempt to make a backrefence from them. You will need to escape them to make them special as in \( \) It is only when you use the GNU sed -r option will the escaping be reversed. With sed -r, non escaped ( ) will produce backrefences and escaped \( \) will be treated as literal. Examples to follow:
POSIX sed
$ echo "foo(###)bar" | sed 's/foo(.*)bar/####/'
####
$ echo "foo(###)bar" | sed 's/foo(.*)bar/\1/'
sed: -e expression #1, char 16: invalid reference \1 on `s' command's RHS
-bash: echo: write error: Broken pipe
$ echo "foo(###)bar" | sed 's/foo\(.*\)bar/\1/'
(###)
GNU sed -r
$ echo "foo(###)bar" | sed -r 's/foo(.*)bar/####/'
####
$ echo "foo(###)bar" | sed -r 's/foo(.*)bar/\1/'
(###)
$ echo "foo(###)bar" | sed -r 's/foo\(.*\)bar/\1/'
sed: -e expression #1, char 18: invalid reference \1 on `s' command's RHS
-bash: echo: write error: Broken pipe
Update
From the comments:
Group-only, non-capturing parentheses ( ) so you can use something like intervals {n,m} without creating a backreference \1 don't exist. First, intervals are not apart of POSIX sed, you must use the GNU -r extension to enable them. As soon as you enable -r any grouping parentheses will also be capturing for backreference use. Examples:
$ echo "123.456.789" | sed -r 's/([0-9]{3}\.){2}/###/'
###789
$ echo "123.456.789" | sed -r 's/([0-9]{3}\.){2}/###\1/'
###456.789
As said, it is not possible to have non-capturing groups in sed.
It could be obvious but non-capturing groups are not a necessity(unless running into the back reference limit (e.g. \9).).
One can just use the desired capturing ones and ignore the non-desired ones as if they were non-capturing.
So e.g. of the two capturings here \1 and \2 you can ignore the \1 and just use the \2
$ echo blahblahblahc | sed -r "s/(blah){1,10}(.)/\2/"
c
For reference, nested capturing groups are numbered by the position-order of "(".
E.g.,
echo "apple and bananas and monkeys" | sed -r "s/((apple|banana)s?)/\1x/g"
applex and bananasx and monkeys (note: "s" in bananas, first bigger group)
vs
echo "apple and bananas and monkeys" | sed -r "s/((apple|banana)s?)/\2x/g"
applex and bananax and monkeys (note: no "s" in bananas, second smaller group)

Resources