Conditional replace using sed - linux

My question is probably rather simple. I'm trying to replace sequences of strings that are at the beginning of lines in a file. For example, I would like to replace any instance of the pattern "GN" with "N" or "WR" with "R", but only if they are the first 2 characters of that line. For example, if I had a file with the following content:
WRONG
RIGHT
GNOME
I would like to transform this file to give
RONG
RIGHT
NOME
I know i can use the following to replace any instance of the above example;
sed -i 's/GN/N/g' file.txt
sed -i 's/WR/R/g' file.txt
The issue is that I want this to happen only if the above patterns are the first 2 characters in any given line. Possibly an IF statement, although i'm not sure what the condition would look like. Any pointers in the right direction would be much appreciated, thanks.

just add the circumflex, remove g suffix (unnecessary, since you want at most one replacement), you can also combine them in one script.
sed -i 's/^GN/N/;s/^WR/R/' file.txt

Use the start-of-string regexp anchor ^:
sed -i 's/^GN/N/' file.txt
sed -i 's/^WR/R/' file.txt
Since sed is line-oriented, start-of-string == start-of-line.

Related

How to make GNU sed remove certain characters from a line

I have a following line;
�5=?�#A00165:69:HKJ3YDMXX:1:1101:16812:7341 1:N:0:TCTTAAAG
and would like to remove characters, �5=?� in front of #. So the desired output looks as follows;
#A00165:69:HKJ3YDMXX:1:1101:16812:7341 1:N:0:TCTTAAAG
I used gnu sed (v4.8)with a following argument;
sed "s/.*#/#/"'
but this did not remove �5=?� thought it worked in the GNU sed live editor.
At this point, I really appreciate any help on this.
My system is 3.10.0-1160.71.1.el7.x86_64
Using sed, remove everything up to the first occurance of #
$ sed 's/^[^#]*//' input_file
#A00165:69:HKJ3YDMXX:1:1101:16812:7341 1:N:0:TCTTAAAG
This might work for you (GNU sed):
sed -E 's/(\o357\o277\o275)5=\?\1//g' file
This removes all occurrences of �5=?�.
N.B. To translate the octal strings use sed -n l file to display the file as is. The triplets \357\277\275 can be matched in the LHS of the substitute command by using \o357\o277\o275.

Sed: Extracting regex pattern from lines

I have an input stream of many lines which look like this:
path/to/file: example: 'extract_me.proto'
path/to/other-file: example: 'me_too.proto'
path/to/something/else: example: 'and_me_2.proto'
...
I'd like to just extract the *.proto filenames from these lines, and I have tried:
[INPUT] | sed 's/^.*\([a-zA-Z0-9_]+\.proto\).*$/\1/'
I know that part of my problem is that .* is greedy and I'm going to get things like e.proto and o.proto and 2.proto, but I can't even get that far... it just outputs with the same lines as the input. Any help would be greatly appreciated.
I find it helpful to use extended regex for this purpose (-r) in which case you need not escape your brackets.
sed -r 's/^.*[^a-zA-Z0-9_]([a-zA-Z0-9_]+\.proto).*$/\1/'
The addition of [^a-zA-Z0-9_] forces the .* to not be greedy.
Since you tag your command with linux, I'll assume you have GNU grep. Pick one of
grep -oP '\w+\.proto' file
grep -o "[^']+\\.proto" file
one way to do it:
sed 's/^.*[^a-zA-Z0-9_]\([a-zA-Z0-9_]\+\.proto\).*$/\1/'
escaped the + char
put a negation before the alphanum+underscore to delimit the leading chars
another way: use single quote delimitation, after all it's here for that:
sed "s/^.*'\([a-zA-Z0-9_]\+\.proto\)'.*\$/\1/"
Use this sed:
sed "s/^.*'\([a-zA-Z0-9_]\+\.proto\).*$/\1/"
+ - Extended-RegEx. So, you need to escape to get special meaning. The preceding item will be matched one or more times.
Another way:
sed "s/^.*'\([^']\+\.proto\)'.*$/\1/"
With GNU sed:
sed -E "s/.*'([^']+)'$/\1/"

I am having trouble with Sed

I am trying to use the sed command to replace this line:
charmm.c36a4.20140107.newcali4.fixhcali.grange.b
with:
charmm.20140911.c36a4.3rd.ghost2.model3rd
When I use:
sed -i '/s/firstline/secondline/g'
It doesn't work. I think the periods are messing it up. How do I get around this?
sed uses regular expressions, so . matches any character. If you want to only match the . character itself, tell sed to look for \.
so to change the first line into the second line:
sed -e 's/charmm\.c36a4.20140107\.newcali4\.fixhcali\.grange\.b/charmm.20140911.c36a4.3rd.ghost2.model3rd/g' < filetochange >newfile
Here, I added "g" so it does it globally, ie, if there are several instances on the same line, all will be changed. If you remove the "g", it will only change the first occurence on each line.
It reads from filetochange and writes to newfile
If you do :
sed -i -e 's/charmm\.c36a4.20140107\.newcali4\.fixhcali\.grange\.b/charmm.20140911.c36a4.3rd.ghost2.model3rd/g' filetochange
it will directly do the change in "filetochange" ... but please be careful, a badly written sed -i could mess up the file and make it unusable
The s command follows this syntax:
s/pattern/replacement/
You need to drop the / in front of the sed command.

Replace string within a file from a bash script

I need to replace within a little bash script a string inside a file but... I am getting weird results.
Let's say I want to replace:
<tag><![CDATA[text]]></tag>
With:
<tag><![CDATA[replaced_text]]></tag>
Should I use sed? I think due to / and [ ] I am getting weird results...
What would be the best way of approaching this?
Perl with -p option works almost as sed and it has \Q (quote) switch for its regexes:
perl -pe 's{\Q<tag><![CDATA[text]]></tag>}
{<tag><![CDATA[replaced_text]]></tag>}' YOUR_FILE
And in Perl you can use different punctuation to delimiter your expressions (s{...}{...} in my example).
Yes, you need to escape the brackets, and either escape slashes or use different delimiters.
sed 's,<tag><!\[CDATA\[text\]\]></tag>,<tag><!\[CDATA\[replaced)text\]\]></tag>,'
That said, SGML and XML are not actually any better than HTML when it comes to using regexes; don't expect this to generalize.
This should be enough:
$ echo '<tag><![CDATA[text]]></tag>' | sed 's/\[text\]/\[replaced_text\]/'
<tag><![CDATA[replaced_text]]></tag>
You can also change your / separator inside sed to a different character like ,, | or %.
Just use a delimiter other than /, here I use #:
sed -i 's#<tag><!\[CDATA\[text\]\]></tag>#<tag><![CDATA[replaced_text]]></tag>#g' filename
-i to have sed change the file instead of printing it out.
g is for matching more than once (global).
But do you know the exact string you want to match, both the tag and the text?
For instance, if you want to replace the text in all with your replaced_text:
perl -i -pe 's#(<tag><!\[CDATA\[)(.*?)(\]\]></tag>)#\1replaced_text\3#g' filename
Switched to perl because sed doesn't support non-greedy multipliers (the *?).

Removing Parts of String With Sed

I have lines of data that looks like this:
sp_A0A342_ATPB_COFAR_6_+_contigs_full.fasta
sp_A0A342_ATPB_COFAR_9_-_contigs_full.fasta
sp_A0A373_RK16_COFAR_10_-_contigs_full.fasta
sp_A0A373_RK16_COFAR_8_+_contigs_full.fasta
sp_A0A4W3_SPEA_GEOSL_15_-_contigs_full.fasta
How can I use sed to delete parts of string after 4th column (_ separated) for each line.
Finally yielding:
sp_A0A342_ATPB_COFAR
sp_A0A342_ATPB_COFAR
sp_A0A373_RK16_COFAR
sp_A0A373_RK16_COFAR
sp_A0A4W3_SPEA_GEOSL
cut is a better fit.
cut -d_ -f 1-4 old_file
This simply means use _ as delimiter, and keep fields 1-4.
If you insist on sed:
sed 's/\(_[^_]*\)\{4\}$//'
This left hand side matches exactly four repetitions of a group, consisting of an underscore followed by 0 or more non-underscores. After that, we must be at the end of the line. This is all replaced by nothing.
sed -e 's/\([^_]*\)_\([^_]*\)_\([^_]*\)_\([^_]*\)_.*/\1_\2_\3_\4' infile > outfile
Match "any number of not '_'", saving what was matched between \( and \), followed by '_'. Do this 4 times, then match anything for the rest of the line (to be ignored). Substitute with each of the matches separated by '_'.
Here's another possibility:
sed -E -e 's|^([^_]+(_[^_]+){3}).*$|\1|'
where -E, like -r in GNU sed, turns on extended regular expressions for readability.
Just because you can do it in sed, though, doesn't mean you should. I like cut much much better for this.
AWK likes to play in the fields:
awk 'BEGIN{FS=OFS="_"}{print $1,$2,$3,$4}' inputfile
or, more generally:
awk -v count=4 'BEGIN{FS="_"}{for(i=1;i<=count;i++){printf "%s%s",sep,$i;sep=FS};printf "\n"}'
sed -e 's/_[0-9][0-9]*_[+-]_contigs_full.fasta$//g'
Still the cut answer is probably faster and just generally better.
Yes, cut is way better, and yes matching the back of each is easier.
I finally got a match using the beginning of each line:
sed -r 's/(([^_]*_){3}([^_]*)).*/\1/' oldFile > newFile

Resources