sed is matching passed variable subsets, not exact matches

sed is matching passed variable subsets, not exact matches - linux

I'm partially successfully using sed to replace variables in a text file. I'm stuck on an exception.
A script reads input from a list - say the $roll_symbol is C20.
sed replaces C20, GC20, and KC20 (because C20 matches part of the string).
I searched the web and tried the variations I found - no success.
I tried these variations without success:
escape the reserved character $
escape braces
escape both
use double quotes instead of single quotes.
*the best version so far (but only partially):
sed -i 's/'${roll_symbol}'/'${roll_symbol}\,${contract_month}'/g' $OUTPUT_DIRECTORY/$OUTPUT_FILE;

You need to tell sed what characters are legal before the start of your match to limit where it can match. To only match at start-of-word boundaries try \<.
sed -i "s/\<${roll_symbol}/${roll_symbol},${contract_month}/g" "$OUTPUT_DIRECTORY/$OUTPUT_FILE";

Related

Finding and replacing text within a file

I have a large taxonomy file that I need to edit. There is an issue with the file as "Candida" is listed as both Candida and [Candida]. What I want to do is change every case of [Candida] to Candida within the file.
I have tried doing this several ways but never get the output I am after. This is the first few lines of the taxonomy file:
Penicillium;marneffei;NW_002197112.1
Penicillium;marneffei;NW_002197111.1
Penicillium;marneffei;NW_002197110.1
Penicillium;marneffei;NW_002197109.1
Penicillium;marneffei;NW_002197108.1
Using sed gives me this output:
$ sed -i -e 's/[Candida]/Candida/g' Full_HMS_Taxonomy.txt
PeCandidaCandidacCandidallCandidaum;mCandidarCandidaeffeCandida;NW_002197112.1
PeCandidaCandidacCandidallCandidaum;mCandidarCandidaeffeCandida;NW_002197111.1
PeCandidaCandidacCandidallCandidaum;mCandidarCandidaeffeCandida;NW_002197110.1
PeCandidaCandidacCandidallCandidaum;mCandidarCandidaeffeCandida;NW_002197109.1
PeCandidaCandidacCandidallCandidaum;mCandidarCandidaeffeCandida;NW_002197108.1
Using awk gives me this output:
$ awk '{gsub(/[Candida]/,"Candida")}1' Full_HMS_Taxonomy.txt
PeCandidaCandidacCandidallCandidaum;mCandidarCandidaeffeCandida;NW_002197112.1
PeCandidaCandidacCandidallCandidaum;mCandidarCandidaeffeCandida;NW_002197111.1
PeCandidaCandidacCandidallCandidaum;mCandidarCandidaeffeCandida;NW_002197110.1
PeCandidaCandidacCandidallCandidaum;mCandidarCandidaeffeCandida;NW_002197109.1
PeCandidaCandidacCandidallCandidaum;mCandidarCandidaeffeCandida;NW_002197108.1
In both cases it is adding Candida to multiple places and multiple lines, instead of just replacing each instance of [Candida]. Any ideas on what I am doing wrong?

[] are special characters in regexp, so you should escape them like that:
's/\[Candida\]/Candida/g'

Brackets are treated specially by regular expression parsers, matching each character listed inside them. So, [Candida] matches any of the characters inside it (C, a, n...). That's why you get a lot of substitutions.
You need to tell those utilities that you want literal brackets by escaping them with backslashes, e.g. with sed:
sed -i 's/\[Candida\]/Candida/g' Full_HMS_Taxonomy.txt

sed is misbehaving when replacing certain regular expressions

I am trying to remove numbers - but only when they immediately follow periods. Similar replaces seem to work correctly, but not with periods.
I have tried the following which was given as a solution in another post:
echo "fr.r1.1.0" | sed s/\.[0-9][0-9]*/\./g
I get fr..... It seems that even though I escape the period it is matching arbitrary characters instead of only periods.
This expression seems to work for the previous example:
echo "fr.r1.1.0" | sed s/[[:punct:]][0-9][0-9]*/\./g
and gives me fr.r1.. but then for
echo "ge.s1_1.0" | sed s/[[:punct:]][0-9][0-9]*/\./g
I get ge.s1.. instead of ge.s1_1.

You will have to put the sed instructions between single quotes to avoid interpretation of some of the special characters by your shell:
echo "fr.r1.1.0" | sed 's/\.[0-9][0-9]*/\./g'
fr.r1..
Also you do not need to escape the dot in the replacement part (.) and [0-9][0-9]* can be simplified into [0-9]\+ giving the simplified command:
echo "fr.r1.1.0" | sed 's/\.[0-9]\+/./g'
fr.r1..
Last but not least, as POSIX [:punct:] character class is defined as
punctuation (all graphic characters except letters and digits)
https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions
it will also include underscore (and a lot of other stuff), therefore, if you want to limit your matches to . followed by digits you will need to explicitly use dot (escaped or via its ascii value)

Sed - delete all chars before dash

I have below in file with contents
devtools-cloudformation
devtools-common-rpm
I want to remove devtools- and keep the rest of characters same, I tried below command but its removing all dashes
sed 's/.*-//' projects.txt

This worked for me
sed 's/-/\n/;s/.*\n//' projects.txt

If I understood well, you want to delete everything up till the first dash.
Try this:
sed 's/[^\-]*-//'
This deletes this first dash as well.
In case you want to maintain that first dash:
sed 's/[^\-]*-/-/'
The reason your solution doesn't work is the fact that the regular expression .*- means 'anything followed by a dash'.
The string devtools-common- matches this criterion and is therefore removed.
The expression I suggest says 'anything but a dash, followed by a dash'.

Can I retrieve a single character from 'sed'?

With the sed command, is it possible to do internal string commands? in this case the actual lines are:
s/9G /9F6 09999F7 09999F8 09999F9 09999G /g
s/0G /0F6 09999F7 09999F8 09999F9 09999G /g
The number can be set using [09] but I didn't know if I could retrieve it from, say, & and use it before the F6 in something like the following:
s/[09]G /(&:0:1)F6 09999F7 09999F8 09999F9 09999G /g
This actual code does not work, by the way.

You are looking for a so called sub expression in the form of \(SUB_PATTERN\):
sed 's/\([09]\)G /\1F6 09999F7 09999F8 09999F9 09999G /g' file
From man sed:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement.
The replacement may contain the special character & to refer to that portion of the pattern space which matched, and
the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.

linux bash replace placeholder with unknown text which can contain any characters

If I want to replace for example the placeholder {{VALUE}} with another string which can contain any characters, what's the best way to do it?
Using sed s/{{VALUE}}/$(value)/g might fail if $(value) contains a slash...

oldValue='{{VALUE}}'
newValue='new/value'
echo "${var//$oldValue/$newValue}"
but oldValue is not a regexp but works like a glob pattern, otherwise :
echo "$var" | sed 's/{{VALUE}}/'"${newValue//\//\/}"'/g'

Sed also works like 's|something|someotherthing|g' (or with other delimiters for that matter), but if you can't control the input string, you'll have to use some function to escape it before passing it to sed..

The question asked basically duplicates How can I escape forward slashes in a user input variable in bash?, Escape a string for sed search pattern, Using sed in a makefile; how to escape variables?, Use slashes in sed replace, and many other questions. “Use a different delimiter” is the usual answer. Pianosaurus's answer and Ben Blank's answer list characters (backslash and ampersand) that need to be escaped in the shell, besides whatever character is used as an alternate delimiter. However, they don't address the quoting-a-quote problem that will occur if your “string which can contain any characters” contains a double quote. The same kind of problem can affect the ${parameter/pattern/string} shell variable expansion mentioned in a previous answer.
Some other questions besides the few mentioned above suggest using awk, and that is usually a good approach to changes that are more complicated than are easy to do with sed. Also consider perl and python. Besides single- and double-quoted strings, python has u'...' unicode quoting, r'...' raw quoting,ur'...' quoting, and triple quoting with ''' or """ delimiters. The question as stated doesn't provide enough context for specific awk/perl/python solutions.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

sed is matching passed variable subsets, not exact matches - linux

You need to tell sed what characters are legal before the start of your match to limit where it can match. To only match at start-of-word boundaries try \<. sed -i "s/\<${roll_symbol}/${roll_symbol},${contract_month}/g" "$OUTPUT_DIRECTORY/$OUTPUT_FILE";

Related

Finding and replacing text within a file

sed is misbehaving when replacing certain regular expressions

Sed - delete all chars before dash

Can I retrieve a single character from 'sed'?

linux bash replace placeholder with unknown text which can contain any characters

Categories

Resources