Why do sed a and sed s commands behave differently with respect to escape characters under single quotes and double quotes? - linux

I know there are differences between single quotes and double quotes in a sed expression, but I didn't know there are differences between sed a and sed s expressions.
For sed s expressions, \t is translated as a tab correctly both in single and double quotes. \\t also does the same thing in double quotes.
# '\t' works for single quotes
$ echo -e "abc\n123" | sed 's|abc|&\n\tdef|'
abc
def
123
# '\\t' fails for single quotes
$ echo -e "abc\n123" | sed 's|abc|&\\n\\tdef|'
abc\n\tdef
123
# '\t' works for double quotes
$ echo -e "abc\n123" | sed "s|abc|&\n\tdef|"
abc
def
123
# '\\t' also works for double quotes
$ echo -e "abc\n123" | sed "s|abc|&\\n\\tdef|"
abc
def
123
However, in sed a expressions, I have to use \\t in a single quotes expression and \\\t in one with double quotes.
# '\t' fails for single quotes
$ echo -e "abc\n123" | sed '/abc/a\tdef'
abc
tdef
123
# '\\t' works for single quotes
$ echo -e "abc\n123" | sed '/abc/a\\tdef'
abc
def
123
# '\t' fails for double quotes
$ echo -e "abc\n123" | sed "/abc/a\tdef"
abc
tdef
123
# '\\t' fails for double quotes
$ echo -e "abc\n123" | sed "/abc/a\\tdef"
abc
tdef
123
# '\\\t' works for double quotes
$ echo -e "abc\n123" | sed "/abc/a\\\tdef"
abc
def
123
I had to change my sed a expressions to sed s ones to unify the outputs due to this phenomenon. Things work perfectly, but I'd like an explanation.
The commands above are executed on Ubuntu 20.04.

sed has no idea which quotes you use. The shell parses and removes the quotes. Inside single quotes, text is preserved completely verbatim; inside double quotes, the shell performs variable substitution, command substitution, and backslash processing. The rules are simple, but sometimes surprising: in brief, a backslash quotes the next character as literal, and so, a pair of backslashes gets translated to a single backslash. However, backslashes in front of characters which do not need escaping are preserved. So for example, \t is equivalent to \\t inside double quotes.
sed for its part performs another round of backslash processing. In some contexts, some versions of sed understand \t to represent a literal tab character, but generally not in the text after the a, c, or i commands.
The actual question here is probably actually about the formatting of the a command. This differs between sed versions, but out of the box on Ubuntu, it simply outputs the literal text after the command. A backslash in this context is just a literal backslash, which again escapes the next character to ensure it is interpreted literally. Unlike the shell, sed simply removes this backslash.
In Bash, you can use $'...' "C-style" strings which let you encode a literal tab symbolically. However, you need to add literal backslashes in a couple of places: sed doesn't accept an unescaped literal newline in the s command, and the tab after a needs a backslash in order for it not to be skipped as insignificant whitespace.
printf '%s\n' abc 123 ghi |
sed -e $'s/abc/&\\\n\tdef/' \
-e $'/ghi/a\\\tjkl'
To reiterate, inside $'...' a \t gets replaced by the shell with a literal tab character, \n with a newline, etc, before the string gets passed on to the command (sed in this case).
In the grand scheme of things, you may be better off using a less haphazard tool than sed for this. Awk is much easier to read, write, and debug.

Related

How to accomodate single quotes in sed bash [duplicate]

How to escape a single quote in a sed expression that is already surrounded by quotes?
For example:
sed 's/ones/one's/' <<< 'ones thing'
Quote sed codes with double quotes:
$ sed "s/ones/one's/"<<<"ones thing"
one's thing
I don't like escaping codes with hundreds of backslashes – hurts my eyes. Usually I do in this way:
$ sed 's/ones/one\x27s/'<<<"ones thing"
one's thing
One trick is to use shell string concatenation of adjacent strings and escape the embedded quote using shell escaping:
sed 's/ones/two'\''s/' <<< 'ones thing'
two's thing
There are 3 strings in the sed expression, which the shell then stitches together:
sed 's/ones/two'
\'
's/'
Escaping single quote in sed: 3 different ways:
From fragile to solid...
Note: This answer is based on GNU sed!!
1. Using double-quotes to enclose sed script:
Simpliest way:
sed "s/ones/one's/" <<< 'ones thing'
But using double-quote lead to shell variables expansion and backslashes to be considered as shell escape before running sed.
1.1. Specific case without space and special chars
In this specific case, you could avoid enclosing at shell level (command line):
sed s/ones/one\'s/ <<<'ones thing'
will work until whole sedscript don't contain spaces, semicolons, special characters and so on... (fragile!)
2. Using octal or hexadecimal representation:
This way is simple and efficient, if not as readable as next one.
sed 's/ones/one\o047s/' <<< 'ones thing'
sed 's/ones/one\x27s/' <<< 'ones thing'
And as following character (s) is not a digit, you coul write octal with only 2 digits:
sed 's/ones/one\o47s/' <<< 'ones thing'
3. Creating a dedicated sed script
cat <<eosedscript >sampleSedWithQuotes.sed
#!$(which sed) -f
s/ones/one's/;
eosedscript
chmod +x sampleSedWithQuotes.sed
From there, you could run:
./sampleSedWithQuotes.sed <<<'ones thing'
one's thing
This is the strongest and simpliest solution as your script is the most readable:$ cat sampleSedWithQuotes.sed
#!/bin/sed -f
s/ones/one's/;
3.1 You coud use -i sed flag:
As this script use sed in shebang, you could use sed flags on command line. For editing file.txt in place, with the -i flag:
echo >file.txt 'ones thing'
./sampleSedWithQuotes.sed -i file.txt
cat file.txt
one's thing
3.2 Mixing quotes AND double quotes
Using dedicated script may simplify mixing quotes and double quotes in same script.
Adding a new operation in our script to enclose the word thing in double quotes:
echo >>sampleSedWithQuotes.sed 's/\bthing\b/"&"/;'
( now our script look like:
#!/bin/sed -f
s/ones/one's/;
s/\bthing\b/"&"/;
)
then
./sampleSedWithQuotes.sed <<<'ones thing'
one's "thing"
The best way is to use $'some string with \' quotes \''
eg:
sed $'s/ones/two\'s/' <<< 'ones thing'
Just use double quotes on the outside of the sed command.
$ sed "s/ones/one's/" <<< 'ones thing'
one's thing
It works with files too.
$ echo 'ones thing' > testfile
$ sed -i "s/ones/one's/" testfile
$ cat testfile
one's thing
If you have single and double quotes inside the string, that's ok too. Just escape the double quotes.
For example, this file contains a string with both single and double quotes. I'll use sed to add a single quote and remove some double quotes.
$ cat testfile
"it's more than ones thing"
$ sed -i "s/\"it's more than ones thing\"/it's more than one's thing/" testfile
$ cat testfile
it's more than one's thing
This is kind of absurd but I couldn't get \' in sed 's/ones/one\'s/' to work. I was looking this up to make a shell script that will automatically add import 'hammerjs'; to my src/main.ts file with Angular.
What I did get to work is this:
apost=\'
sed -i '' '/environments/a\
import '$apost'hammerjs'$apost';' src/main.ts
So for the example above, it would be:
apost=\'
sed 's/ones/one'$apost's/'
I have no idea why \' wouldn't work by itself, but there it is.
Some escapes on AppleMacOSX terminals fail so:
sed 's|ones|one'$(echo -e "\x27")'s|1' <<<'ones thing'
I know this is going to sound like a cop out but I could never get sed working when there were both single and double quotes in the string. To help any newbies like me that are having trouble, one option is to split up the string. I had to replace code in over 100 index.hmtl files. The strings had both single and double quotes so I just split up the string and replaced the first block with
<!-- and the second block with -->. It made a mess of my index.html files but it worked.
use an alternative string seperator like ":" to avoid confusion with different slashes
sed "s:ones:one's:" <<< 'ones thing'
or if you wish to highligh the single quote
sed "s:ones:one\'s:" <<< 'ones thing'
both return
one's thing

Replace a word that has a specific format with another format

I want to replace the space between c and 2 ('C 2',) with two spaces as 'C 2', I tried with sed -i but it does not work
sed -i 's/'C 2',/'C 2',/g' test.dat
The quotes stop the quoting. Either change the quoting or escape it.
You could do it like so:
sed -i 's/'\''C 2'\'',/'\''C 2'\'',/g' test.dat
The '\'' stop the quoting, escape a single quote and then continue with quoting.
But for your specific case, you could just use double quotes:
sed -i "s/'C 2',/'C 2',/g" test.dat

Problem with using grep to match the whole word

I am trying to match a whole string in a list of new line separated strings. Here is my example:
[hemanth.a#gateway ~]$ echo $snapshottableDirs
/user/hemanth.a/dummy1 /user/hemanth.a/dummy3
[hemanth.a#gateway ~]$ echo $snapshottableDirs | tr -s ' ' '\n'
/user/hemanth.a/dummy1
/user/hemanth.a/dummy3
[hemanth.a#gateway ~]$ echo $snapshottableDirs | tr -s ' ' '\n' | grep -w '/user/hemanth.a'
/user/hemanth.a/dummy1
/user/hemanth.a/dummy3
My aim is to only find a match if and only if the string /user/hemanth.a exists as a whole word(in a new line) in the list of strings. But the above command is also returning strings that contain /user/hemanth.a.
This is a sample scenario. There is no guarantee that all the strings that I would want to match will be in the form of /user/xxxxxx.x. Ideally I would want to match the exact string if it exists in a new line as a whole word in the list.
Any help would be appreciated. thank you.
Update: Using fgrep -x '/user/hemanth.a' is probably a better solution here, as it avoids having to escape characters such as $ to prevent grep from interpreting them as meta-characters. fgrep performs a literal string match as opposed to a regular expression match, and the -x option tells it to only match whole lines.
Example:
> cat testfile.txt
foo
foobar
barfoo
barfoobaz
> fgrep foo testfile.txt
foo
foobar
barfoo
barfoobaz
> fgrep -x foo testfile.txt
foo
Original answer:
Try adding the $ regex metacharacter to the end of your grep expression, as in:
echo $snapshottableDirs | tr -s ' ' '\n' | grep -w '/user/hemanth.a$'.
The $ metacharacter matches the end of the line.
While you're at it, you might also want to use the ^ metacharacter, which matches the beginning of the line, so that grep '/user/hemanth.a$' doesn't accidentally also match something like /user/foo/user/hemanth.a.
So you'd have this:
echo $snapshottableDirs | tr -s ' ' '\n' | grep '^/user/hemanth\.a$'.
Edit: You probably don't actually want the -w here, so I've removed that from my answer.
Edit 2: #U. Windl brings up a good point. The . character in a regular expression is a metacharacter that matches any character, so grep /user/hemanth.a might end up matching things you're not expecting, such as /user/hemanthxa, etc. Or perhaps more likely, it would also match the line /user/hemanth/a. To fix that, you need to escape the . character. I've updated the grep line above to reflect this.
Update: In response to your question in the comments about how to escape a string so that it can be used in a grep regular expression...
Yes, you can escape a string so that it should be able to be used in a regular expression. I'll explain how to do so, but first I should say that attempting to escape strings for use in a regex can become very complicated with lots of weird edge cases. For example, an escaped string that works with grep won't necessarily work with sed, awk, perl, bash's =~ operator, or even grep -e.
On top of that, if you change from single quotes to double quotes, you might then have to add another level of escaping so that bash will expand your string properly.
For example, if you wanted to search for the literal string 'foo [bar]* baz$'using grep, you'd have to escape the [, *, and $ characters, resulting in the regular expression:
'foo \[bar]\* baz\$'
But if for some reason you decided to pass that expression to grep as a double-quoted string, you would then have to escape the escapes. Otherwise, bash would interpret some of them as escapes. You can see this if you do:
echo "foo \[bar]\* baz\$"
foo \[bar]\* baz$
You can see that bash interpreted \$ as an escape sequence representing the character $, and thus swallowed the \ character. This is because normally, in double quoted strings $ is a special character that begins a parameter expansion. But it left \[ and \* alone because [ and * aren't special inside a double-quoted string, so it interpreted the backslashes as literal \ characters. To get this expression to work as an argument to grep in a double-quoted string, then, you would have to escape the last backslash:
# This command prints nothing, because bash expands `\$` to just `$`,
# which grep then interprets as an end-of-line anchor.
> echo 'foo [bar]* baz$' | grep "foo \[bar]\* baz\$"
# Escaping the last backslash causes bash to expand `\\$` to `\$`,
# which grep then interprets as matching a literal $ character
> echo 'foo [bar]* baz$' | grep "foo \[bar]\* baz\\$"
foo [bar]* baz$
But note that "foo \[bar]\* baz \\$" will not work with sed, because sed uses a different regex syntax in which escaping a [ causes it to become a meta-character, whereas in grep you have to escape it to prevent it from being interpreted as a meta-character.
So again, yes, you can escape a literal string for use as a grep regular expression. But if you need to match literal strings containing characters that will need to be escaped, it turns out there's a better way: fgrep.
The fgrep command is really just shorthand for grep -F, where the -F tells grep to match "fixed strings" instead of regular expression. For example:
> echo '[(*\^]$' | fgrep '[(*\^]$'
[(*\^]$
This works because fgrep doesn't know or care about regular expressions. It's just looking for the exact literal string '[(*\^]$'. However, this sort of puts you back at square one, because fgrep will match on substrings:
> echo '/users/hemanth/dummy' | fgrep '/users/hemanth'
/users/hemanth/dummy
Thankfully, there's a way around this, which it turns out was probably a better approach than my initial answer, considering your specific needs. The -x option to fgrep tells it to only match the entire line. Note that -x is not specific to fgrep (since fgrep is really just grep -F anyway). For example:
> echo '/users/hemanth/dummy' | fgrep -x '/users/hemanth' # prints nothing
This is equivalent to what you would have gotten by escaping the grep regex, and is almost certainly a better answer than my previous answer of enclosing your regex in ^ and $.
Now, as promised, just in case you want to go this route, here's how you would escape a fixed string to use as a grep regex:
# Suppose we want to match the literal string '^foo.\ [bar]* baz$'
# It contains lots of stuff that grep would normally interpret as
# regular expression meta-characters. We need to escape those characters
# so grep will interpret them as literals.
> str='^foo.\ [bar]* baz$'
> echo "$str"
^foo.\ [bar]* baz$
> regex=$(sed -E 's,[.*^$\\[],\\&' <<< "$str")
> echo "$regex"
\^foo\.\\ \[bar]\* baz\$
> echo "$str" | grep "$regex"
^foo.\ [bar]* baz$
# Success
Again, for the reasons cited above, I don't recommend this approach, especially not when fgrep -x exists.
Read "Anchoring" in man grep:
Anchoring
The caret ^ and the dollar sign $ are meta-characters that respectively
match the empty string at the beginning and end of a line.
Also be aware that . matches any character (from said manual page):
The period . matches any single character.

Replace forward slash with double backslash enclosed in double quotes

I'm desperately trying to replace a forward slash (/) with double backslash enclosed in double quotes ("\\")
but
a=`echo "$var" | sed 's/^\///' | sed 's/\//\"\\\\\"/g'`
does not work, and I have no idea why. It always replaces with just one backslash and not two
When / is part of a regular expression that you want to replace with the s (substitute) command of sed, you can use an other character instead of slash in the command's syntax, so you write, for example:
sed 's,/,\\\\,g'
above , was used instead of the usual slash to delimit two parameters of the s command: the regular expression describing part to be replaced and the string to be used as the replacement.
The above will replace every slash with two backslashes. A backslash is a special (quoting) character, so it must be quoted, here it's quoted with itself, that's why we need 4 backslashes to represent two backslashes.
$ echo /etc/passwd| sed 's,/,\\\\,g'
\\etc\\passwd
How about this?
a=${var//\//\\\\}
Demo in a shell:
$ var=a/b/c
$ a=${var//\//\\\\}
$ echo "$a"
a\\b\\c
Another way of doing it: tr '/' '\'
$ var=a/b/c
$ echo "$var"
a/b/c
$ tr '/' '\' <<< "$var"
a\b\c

What is wrong with 'echo 'a\\b' | grep "\\"'

When I am running this
echo 'a\\b' | grep '\\'
it correctly identifies the backslashes
but when I used the double quotes
echo 'a\\b' | grep "\\"
It is not running and it is returning a trailing backlash error, I can't figure out why that happens. It should work exactly similar to single quotes as I have escaped the backslashes.
When using double quotes the \\ gets evaluated into \. Single quotes leaves things as-is. Do an echo "\\" to see what I mean.
There are multiple places where backslash escapes can be processed:
In your shell: This happens when you use double quotes and not single quotes. Note that this is a very limited processing and escapes like "\n" will not work. This means when you do echo "a\\b" echo will get a\b as first argument while when you're executing echo 'a\\b' echo will get a\\b as first argument.
In the program you're calling: grep will parse the input as POSIX regular expression which has its own set of escapes. echo may or may not handle escape codes by default. My /bin/echo will not process escape codes by default, my bash bulitin echo also won't but my zsh builtin echo will. If a certain behaviour is wanted this can be specified by -e to enable escapes or by -E to disable escapes.
So if you're calling grep "\\" the shell will process the \\ escape sequence and pass \ to grep which will try to parse that as a regular expression and correclty complains about a trailing backslash.
So when you use double quotes you have to double escape everything resulting in the rather clumsy echo 'a\\b' | grep "\\\\".
To get the least amount of escaping you can use echo 'a\\b' | grep '\\' or even echo 'a\\b' | grep -F '\', where the -F flag tells grep to interpret the pattern as verbatim string and not as a regular expression. If you happen to have an echo that processes escapes by default (this will result in echo 'a\\b' printing a\b) you also need to add -E to the echo command.
grep does a bit of basic regular expression matching.
When in double quotes, the shell parses \\, so \\ will resolve to a single backslash as a parameter.
Additionally, grep does a bit of basic regular expression matching, so to match a single backslash you need to pass two backslashes to grep.
So by calling grep "\\" you actually get grep \, which does not parse as a regular expression, therefore makes grep fail.
To correctly match only double backslashes, do grep -F '\\' (which means fixed string matching instead of regex), grep '\\\\' or grep "\\\\\\\\".

Resources