How to correctly detect and replace apostrophe (') with sed? - linux

I'm having a directory with many files having special characters and spaces. I want to perform an operation with all these files so I'm trying to store all filenames in a list.txt and then run the command with this list.
The special characters in my list are & []'.
So basically I want to use sed to replace each occurence with \ + the character in question.
E.g. : filename .txt => filename\ .txt etc...
The thing is I have trouble handling apostrophes.
Here is my command as of now :
ls | sed 's/\ /\\ /g' | sed 's/\&/\\&/g' | sed "s/\'/\\'/g" | sed 's/\[/\\[/g' | sed 's/\]/\\]/g'
At first I had issues with, I believe, the apostrophes in the string command in conflict with the apostrophes surrounding the string. So I used double quotes instead, but it still doesn't work.
I've tried all these and nothing worked :
sed "s/\'/\\'/g" (escaping the apostrophe)
sed "s/'/\'/g" (escaping nothing)
sed "s/'/\\'/g" (escaping the backslash)
sed 's/"'"/\"'"/g' (double quoting single quote)
As a disclaimer, I must say, I'm completely new to sed. I just run my first sed command today, so maybe I'm doing something wrong I didn't realize.
PS : I've seen those thread, but no answer worked for me :
https://unix.stackexchange.com/questions/157076/how-to-remove-the-apostrophe-and-delete-the-space
How to replace to apostrophe ' inside a file using SED

This may do:
cat file
avbadf
test&rr
more [ yes
this ]
and'df
sed -r 's/(\x27|&|\[|\])/\\\1/g' file
avbadf
test\&rr
more \[ yes
this \]
and\'df
\x27 is equal to singe quote '
\x22 is equal to double quote "

Whoops, I found the answer to my question. Here is the working input :
sed "s/'/\\\'/g"
This will effectively replace any ' with \'.
However I'm having trouble understanding exactly what's happening here.
So if I understand correctly, we are escaping the backslash and the apostrophe in the replacement string. Now, if somebody could answer some those, I would be grateful :
Why don't we need to escape the first quote (the one in the pattern to find) ?
Why do we have to escape the backslash whereas for the other characters, there's no need ?
Why do we need to escape the second quote (the one in the replacement string) ?

I think all of your sed matches actually need that replacement pattern. This one seems to work for all examples:
ls | sed "s/\ /\\\ /g" | sed "s/\&/\\\&/g" | sed "s/\[/\\\[/g" | sed "s/\]/\\\]/g" | sed "s/'/\\\'/g"
So it is s/regex/replacement/command and 'regex' and 'replacement' have different sets of special characters.
The only one that's different is s/'/\\\'/g and there only because I don't believe there is any special ' character on the regex expression. There is some obscure \' special character in the replacement expression, for matching buffer ends in multi-line mode, accord to the docs. That might be why it needs an escape in the replacement side, but not in the regex side.
For example, \5 is a special character in the replacement expression, so to replace:
filename5.txt -> filename\5.txt
You would also need, as with apostrophe:
sed "s/5/\\\5/g"
It probably has to do with the mysterious inner works of sed parsing, it might read from right to left or something.

Please try the following:
sed 's/[][ &'\'']/\\&/g' file
By using the same example by #Jotne, the result will be:
gavbadf
gtest\&rr
gmore\ \[\ yes
gthis\ \]
gand\'df
[How it works]
The regex part in the sed s command above just defines a character
class of & []', which should be escaped with a backslash.
The right square bracket ] does not need escaping when put
immediately after the left square bracket [.
The obfuscating part will be the handling of a single quote.
We cannot put a single quote within single quotes even if we escape it.
The workaround is as follows: Say we have an assignment str='aaabbb'.
To put a single quote between "aaa" and "bbb", we can say as
str='aaa'\''bbb'.
It may look puzzling but it just concatenates the three sequences;
1) to close the single-quoted string as 'aaa'.
2) to put a single quote with an escaping backslash as \'.
3) to restart the single-quoted string as 'bbb'.
Hope this helps.

Related

Remove double quotes within the column value using Unix

I am working on Processing a (90 Cols) CSV File - Semicolon Separated (;) {case can be ignore and I am aware file standard is a mess but I am helpless in that regards}
Input Rows :
"AAAAA";"ABABDBDA";"ASDASDA"asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd"asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa"dwd";"456"
Output Expected :
"AAAAA";"ABABDBDA";"ASDASDA asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa dwd";"456"
(Double Quote can be replaced by Space or blank). {Kindly note - even though this is ';' seperated file some rows have ';' within quoted data for a column.
Issue : In the rows - I am getting an extra Double Quote within the quoted data.
Please advise me on how to handle this in Unix.
one trick you can use is to remove " not around the field boundaries. A simple sed script can be
$ sed -E 's/([^;])"([^;])/\1 \2/g' file
note that if you allow escaped quote marks is you fields, this is going to remove them as well.
note the example below in the comments which is not covered with one round of the sed. Due to greedy match a single char can't be a condition for both matches, so "a"b"c"; won't work correctly.
What would you think of the following solution:
Replace all ";" by ;
Remove all remaining "
Replace all ; back into ";"
Add additional " characters, at the beginning and at the end of every line.
The whole thing can be done with tr or sed or whatever command you prefer.
mawk 'NF*(gsub(__," ",$!(NF=NF))^_ +gsub(OFS,FS) +gsub("^ | $",__))' \
__='\42' FS='\442\73\42' OFS='\31\17'
"AAAAA";"ABABDBDA";"ASDASDA asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa dwd";"456"
This transform is easy to do using tool which provide regular expression with zero-length assertions (lookbehind and lookahead), as you applied unix tag there is good chance you have perl command and therefore I propose following solution, let file.txt content be
"AAAAA";"ABABDBDA";"ASDASDA"asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd"asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa"dwd";"456"
then
perl -p -e 's/(?<=[[:alnum:]])"(?=[[:alnum:]])/ /g' file.txt
gives output
"AAAAA";"ABABDBDA";"ASDASDA asads";"123";"456"
"AAAAA";"ABABDBDA";"12322AAasd asads";"123";"456"
"Lmnop";"asdasads";"mer";"123;2343;asa dwd";"456"
Explanation: I inform perl that I want to use it sed-style via -p -e then I provide substitution (s): " which is after alphanumeric character (letter or digit) and before alphanumeric should be replaced using space character. This is applied to all such " that is globally (g).
Note: you might elect to port that answer to any other tools which does provide ability to replace regular expression with zero-length assertions.
(tested in perl 5, version 26, subversion 3)
When you consider the combination ";" as a delimiter, you can use
awk -F '";"' '{
printf "\"";
for (i=1;i<NF;i++) {
gsub("\"","", $i);
printf("%s\";\"",$i)
};
print $NF
}' inputfile
This might work for you (GNU sed):
sed -E ':a;s/^(("[^"]*";)*"[^"]*)"([^;])/\1 \3/;ta' file
Iterate starting from the start of the line, match zero or more correctly double quoted fields followed by an incorrect double quote and replace that double quote by a space.

remove backslash only from quote character using sed

I have string: this is a [\"sample\"] sample\'s.
What would be the correct way to remove backslashes from the double quote, preserving the double quote.
Expected output: this is a ["sample"] sample\'s.
I've tested: sed -i 's/\\\"//g' file.txt which is removing the "
Your command is almost correct. You just forgot to substitute with a double quote:
sed -i 's/\\"/"/g' file.txt
sed's s command substitutes the first part (between / here) with the second part. The g flag repeats the operation throughout the line. Your mistake was that the second part was empty, thus effectively deleting the whole \" string.
BTW, escaping the double quote is not necessary since your sed command is inside single quotes.

How to escape certain characters for sed? [duplicate]

This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 2 years ago.
I want to remove a line from a file and understand this can be done with sed.
The command below fails because of unterminated address regex
sed -i '/$settings['file_temp_path'] = '../tmp';/d' file.php
Having read the the answer at unterminated address regex while using sed . I now understand this is because characters [ and / must be escaped.
Having tried this, the code below is still unsuccessful
sed -i '/$settings\['file_temp_path'] = '..\/tmp';/d' file.php
What is wrong with this? What am I missing?
This might work for you (GNU sed):
sed -i '\#\$settings\['\''file_temp_path'\''\] = '\''\.\./tmp'\'';#d' file
When the regexp/replacement contains the match delimiter (usually /), it can either be escaped/quoted or the delimiters can be altered i.e. /\/tmp/ or \#/tmp#. Note that in the case of the substitution command s/\/tmp/replacement/ can also be s#/tmp#replacement# and leading delimiter does not need to escaped/quoted.
Meta characters i.e. ^,$,[,],*,.,\ and & must be escaped/quoted by \or placed in a character class e.g. . should be \. or [.].
As a rule of thumb, sed commands should be enclosed in single quotes ' and for single quotes to be included in the regexp they should be replaced by '\'' which closes off the existing commands, shell escapes/quotes a ' and reopens the next sed command.
Using double quotes " may also be used but may have unexpected side effects as they are open to shell interpolation.
N.B. If the regexp/substitution delimiter is put inside a character class it does not need to be escaped/quoted i.e. if / is the delimiter then [/] is the same as \/. Also note, that {,},|,? and + should not be escaped/quoted if they are to represent their literal value unless the -E or -r sed command line option is invoked, in which case they should be i.e + represents the plus sign as does \+ when the -E is invoked, whereas \+ and + when the -E or -r is invoked represent one or more of the preceding character/group.
You need to escape several special characters in your pattern, including $.
Example content of file.php:
foo
$settings['file_temp_path'] = '../tmp';
bar
Example code:
$ sed -i "s/\$settings\['file_temp_path'\] = '..\/tmp';//" file.php
$ cat file.php
foo
bar

fedora sed command replace special characters

i am totally new to sed and as part of script writing i am trying to replace specific string from a fiel. I know the special characters need to be escaped using backslash but the problem is if the special character is first in the line then it is not replaced....
For e.g my file contains
sldgfkls $bdxcv sldflksd
Now if i write the below code
sed -i 's/\b\$bdxcv\b/abcd/' filename
Then the above word is not replaced....But if the file contains
sldgfkls a$bdxcv sldflksd
Now if i write the below code
sed -i 's/\ba\$bdxcv\b/abcd/' filename
Then the above word is replaced.....
Please Help me here....
Clearly, \b does not consider a dollar sign to be a word character, so there is no word boundary for it to match between space and $.
Perhaps you want this instead:
sed -i 's/\(^\|[\t ]\)\$bdxcv\b/\1abcd/' filename
Assuming yours is GNU sed, see https://www.gnu.org/software/sed/manual/sed.html which contains this definition:
A “word” character is any letter or digit or the underscore character.
and thus not dollar sign.
sed cannot operator on strings, only regular expressions. Trying to figure out which characters need to be escaped to disable their regexp (or sed delimiter or sed backreference) meaning to make a regexp in sed behave as if it were a string is a fool's errand, just use a tool that can operate on strings, e.g. awk.
$ awk '{for (i=1;i<NF;i++) if ($i == "$bdxcv") $i="abcd"} 1' file
sldgfkls abcd sldflks
The above uses string comparison and string assignment - no need to escape anything unless one of the strings contained the string delimiter, ".

Escape file name for use in sed substitution

How can I fix this:
abc="a/b/c"; echo porc | sed -r "s/^/$abc/"
sed: -e expression #1, char 7: unknown option to `s'
The substitution of variable $abc is done correctly, but the problem is that $abc contains slashes, which confuse sed. Can I somehow escape these slashes?
Note that sed(1) allows you to use different characters for your s/// delimiters:
$ abc="a/b/c"
$ echo porc | sed -r "s|^|$abc|"
a/b/cporc
$
Of course, if you go this route, you need to make sure that the delimiters you choose aren't used elsewhere in your input.
The GNU manual for sed states that "The / characters may be uniformly replaced by any other single character within any given s command."
Therefore, just use another character instead of /, for example ::
abc="a/b/c"; echo porc | sed -r "s:^:$abc:"
Do not use a character that can be found in your input. We can use : above, since we know that the input (a/b/c/) doesn't contain :.
Be careful of character-escaping.
If using "", Bash will interpret some characters specially, e.g. ` (used for inline execution), ! (used for accessing Bash history), $ (used for accessing variables).
If using '', Bash will take all characters literally, even $.
The two approaches can be combined, depending on whether you need escaping or not, e.g.:
abc="a/b/c"; echo porc | sed 's!^!'"$abc"'!'
You don't have to use / as pattern and replace separator, as others already told you. I'd go with : as it is rather rarely used in paths (it's a separator in PATH environment variable). Stick to one and use shell built-in string replace features to make it bullet-proof, e.g. ${abc//:/\\:} (which means replace all : occurrences with \: in ${abc}) in case of : being the separator.
$ abc="a/b/c"; echo porc | sed -r "s:^:${abc//:/\\:}:"
a/b/cporc
backslash:
abc='a\/b\/c'
space filling....
As for the escaping part of the question I had the same issue and resolved with a double sed that can possibly be optimized.
escaped_abc=$(echo $abc | sed "s/\//\\\AAA\//g" | sed "s/AAA//g")
The triple A is used because otherwise the forward slash following its escaping backslash is never placed in the output, no matter how many backslashes you put in front of it.

Resources