substitute strings with special characters in a huge file using sed

substitute strings with special characters in a huge file using sed - string

I'm stuck in this very easy problem (I hope it is for you).
I need to substitute several strings with special characters in a huge file.
I'm trying using sed and bash because I'm a linux user but I've only used sed for "standard" string so far.
These are the kind of strings that I'm trying to manipulate
(alpha[1],alpha[2]) and diff(A45(i,j),alpha[1])
and the substituting strings would be
(i,j) and dzA45(i,j)
I tried sed -i 's/(alpha[1],alpha[2])/(i,j)/g' $filetowork and
sed -i 's/\(alpha\[1\],alpha\[2\]\)/i,j/g' $filetowork without any success
The second option seems to work for the first kind of string but it doesn't for the second one, why?
could you please help me? I took a look around stackoverflow old questions without any help, unfortunately :(

I just tried on the command line, but
echo "(alpha[1],alpha[2])" | sed 's/(alpha\[1\],alpha\[2\])/(i,j)/
worked for the first case. Please note that you should not escape ( or ), because that is how you activate groups.
For the second one
echo "diff(A45(i,j),alpha[1])" | sed 's/diff(A45(i,j),alpha\[1\])/dzA45(i,j)/'
worked for me. The same case, don't escape brackets!

Related

sed doesn't accept my variable to parse a file

I'm trying to read this config file :
#[TABLE]
pattern1
DISK_MAIN
PART_EFI
PART_SWAP
PART_ROOT
pattern2
#[END_TABLE]
... rest of the file
I figured i had to use sed but so i researched on how to do it and saw this example :
val1="pattern1"
val2="pattern2"
sed -n "/^$val1/,/^$val2/p;/^$val2/q" $file
but once i change val1 and val2 to another thing it doesn't work anymore, i thought it was the special characters so i removed the #[] but it done noting can someone help me, I'm terrible at understanding regex stuff with my dyslexia ( it's also hard for me to go trough heavy documentation, that's why i ask that kind of stuff ).
Thanks in advance.

You were correct that a [ in the pattern affects the match, because it is a regex metacharacter. (# is not a problem, and ] is only special after [.) But you can't just remove them, because your pattern starts with ^, which means that it must match at the beginning of the line, and the beginning of the line is precisely # followed by [.
So you need to tell sed to ignore the meaning of the [, which you do by placing a \ before it. However, you can't just add a \ to your command, because \ is a special character for the shell (meaning that the next character has no special meaning, just like it does in sed.)
So in order to get it to work, you need to put two \ before the [, leaving you with:
pattern1='#\\[TABLE]'
pattern2='#\\[END_TABLE]'
sed -n "/^$pattern1/,/^$pattern2/p;/$pattern2/q;" "$file"
That might need to be adjusted if there are other special characters in the patterns. I'm just taking the patterns out of your comment to a different answer, although it would be better if you put the real patterns in your question, which would make it possible to answer.

Try quoting within sed like this:
sed -n '/^'"$val1"'/,/^'"$val2"'/p;/^'"$val2"'/q' "$file"
You might think it's a lot of quotes, which it is, but that's likely your issue.
shellcheck.net is helpful in these situations.

Using SED to replace capture group with regex pattern

I need some help with a sed command that I thought would help solve an issue I have. I have basically have long text files that look something like this:
>TRINITY_DN112253_co_g1_i2 Len=3873 path=[38000:0-183]
ACTCACGCCCACATAAT
The ACT text blocks continue on, and then there are more blocks of text that follow the same pattern, except the text after the > differs slightly by numbers. I want to replace only this header part (the part followed by the >) to everything up until the very last “_” the sed command I thought seemed logical is the following:
sed -i ‘s/>.*/TRINITY.*_/‘
However, sed is literally changing each header to TRINITY.*_ rather than capturing the block I thought it would. Any help is appreciated!
(Also.. just to make things clear, I thought that my sed command would convert the top header block into this:
>TRINITY_DN112253_co_g1_
ACTCACGCCCACATAAT

This might help:
sed '/^>/s/[^_]*$//' file
Output:
>TRINITY_DN112253_co_g1_
ACTCACGCCCACATAAT
See: The Stack Overflow Regular Expressions FAQ

Linux: Replace first string in file with contents of other file containing quotes and slashes.

I have spent all day today trying to find a proper solution, but I am not able to. My problem:
I have an XML file with tags containing multiple of the same.
Example:
<TASK INSTANCE />
<WORKFLOWLINK CONDITION=""/>
<WORKFLOWLINK CONDITION=""/>
I want to add the contents of an other XML file before the first <WORKFLOWLINK. The issue I've ran into is that this file is full of double quotes and slashes. I've tried replacing them and escaping them, but to no avail.
My tries mainly culminated on something like:
sed -e "0,/<WORKFLOWLINK/ /<WORKFLOWLINK/{ r ${filename}" -e "}" ${sourcefile}
If this isn't clear enough I'll get the exact data so you can see.

For the fun of sed:
sed -e "0,/<WORKFLOWLINK/{/<WORKFLOWLINK/{r ${sourcefile}" -e"}}"
The trick is to start a new "pattern/command" pair after your first address range condition 0,/<WORKFLOWLINK/.
Two nested patterns/addresses are not understood, there must be a command after the first pattern. Using an additional pair of curlies {} does that for you.
Apart from the brain exercise to do it in sed, #EdMorton is right in recommending to use an XML-processor. Also his request for an MCVE is appropriate. I had to do some guessing to see what you want and I hope I guessed right.
The mcve should at least have included
the error message or problem description defining your problem
the initialisation of your environment variables
some sample input; not the original data
You surely would have had an answer earlier and (in case mine does not satisfy you) probably a better one by now.
So, before your next question, please take the https://stackoverflow.com/tour
GNU sed version 4.2.1
GNU bash, version 3.1.17(1)-release (i686-pc-msys)

Everyone,
Thank you for thinking with me, even if I apparently broke some rules.
I have figured out a solution, granted it is not as pretty as can be, but for a one time action it is good enough.
I have moved from a single command to a combination of first detecting the location I want to put my data:
sed -e "0,/<WORKFLOWLINK/ s/<WORKFLOWLINK/##MARKER##\n\t<WORKFLOWLINK'" which will put the marker string in the desired location.
After this I replace the marker with the contents of the file I have. I managed to make the individual statements working when I was trying to do it all in a single statement before, so I just execute them separately.
sed -e "/##MARKER##/{r ${sourcefile}" -e 'd}'

Embedding quotation marks in command string generated by AWK?

I need to match all instances of strings in one file, with a master list in another. However, if my string is abc I want only that, not abcdef, abc1234 and so on.
So, a word boundary for the regex? Right now, I'm using a simple awk one liner:
cat results_file| sort -k 1| awk -F" " '{ print $1" /home/owner/file_2_search"}'|
xargs -L 1 /bin/grep -i
However, to force a word boundary, I'd need to grep string\b and the quotes (single or double) seem to be required.
In awk, \b is a special character, you need \\b ... And the quoted quotes ... (arg) ... Or am I missing something and overdoing this?
This is a Linux box, so presumably gawk. I have gone over quoting rules for awk, and realize this has got to be simple (and not complex ... but), but am not seeing it.

Had meant to post as an answer, not a comment. Will try to pose a more readable question, but confess to having second thoughts about doing this as a one-liner in the first place -- may be best to follow an alternate method. Appreciate the willingness to help.
--Joe

sed regex with variables to replace numbers in a file

Im trying to replace numbers in my textfile by adding one to them. i.e.
sed 's/3/4/g' path.txt
sed 's/2/3/g' path.txt
sed 's/1/2/g' path.txt
Instead of this, Can i automate it, i.e. find a /d and add one to it in the replace.
Something like
sed 's/\([0-8]\)/\1+1/g' path.txt
Also wanted to capture more than one digit i.e. ([0-9])\t([0-9]) and change each one keeping the tab inbetween
Thanks
edited #2
Using the perl example,
I also would like it to work with more digits i.e.
perl -pi~ -e 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/ ($1+1)\.($2+1)\.($3+1)\.($4+1) /ge' output.txt
Any tips on making the above work?

There is no support for arithmetic in sed, but you can easily do this in Perl.
perl -pe 's/(\d+)/ $1+1 /ge'
With the /e option, the replacement expression needs to be valid Perl code. So to handle your final updated example, you need
perl -pi~ -e 's/(\d+)\.(\d+)\.(\d+)\.(\d+)/ $1+1 . "." $2+1 . "." . $3+1 . "." . $4+1 /ge'
where strings are properly quoted and adjacent strings are concatenated together with the . Perl string concatenation operator. (The arithmetic numbers are coerced into strings as well when they are concatenated with a string.)
... Though of course, the first script already does that more elegantly, since with the /g flag it already increments every sequence of digits with one, anywhere in the string.

Triplee's perl solution is the more generic answer, but Michal's sed solution works well for this particular case. However, Michal's sed solution is more easily written:
sed y/12345678/23456789/ path.txt
and is better implemented as
tr 12345678 23456789 < path.txt
This utterly fails to handle 2 digit numbers (as in the edited question).

You can do it with sed but it's not easy, see this thread.
And it's hard with awk too, see this.
I'd rather use perl for this (something like this can be seen in action # ideone):
perl -pe 's/([0-8])/$1+1/e'
(The ideone.com example must have some looping as ideone does not sets -pe by default.)

You can't do addition directly in sed - you could do it in awk by matching numbers using a regex in each line and increasing the value, but it's quite complicated. If do not need to handle arbitrary numbers but a limited set, like only single-digit numbers from 0 to 8, you can just put several replacement commands on a single sed command line by separating them with semicolons:
sed 's/8/9/g ; s/7/8/g; s/6/7/g; s/5/6/g; s/4/5/g; s/3/4/g; s/2/3/g; s/1/2/g; s/0/1/g' path.txt

This might work for you (GNU sed & Bash):
sed 's/[0-9]/$((&+1))/g;s/.*/echo "&"/e' file
This will add one to every individual digit, to increment numbers:
sed 's/[0-9]\+/$((&+1))/g;s/.*/echo "&"/e' file
N.B. This method is fraught with problems and may cause unexpected results.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

substitute strings with special characters in a huge file using sed - string

Related

sed doesn't accept my variable to parse a file

Using SED to replace capture group with regex pattern

Linux: Replace first string in file with contents of other file containing quotes and slashes.

Embedding quotation marks in command string generated by AWK?

sed regex with variables to replace numbers in a file

Categories

Resources