Linux: Replace first string in file with contents of other file containing quotes and slashes. - linux

I have spent all day today trying to find a proper solution, but I am not able to. My problem:
I have an XML file with tags containing multiple of the same.
Example:
<TASK INSTANCE />
<WORKFLOWLINK CONDITION=""/>
<WORKFLOWLINK CONDITION=""/>
I want to add the contents of an other XML file before the first <WORKFLOWLINK. The issue I've ran into is that this file is full of double quotes and slashes. I've tried replacing them and escaping them, but to no avail.
My tries mainly culminated on something like:
sed -e "0,/<WORKFLOWLINK/ /<WORKFLOWLINK/{ r ${filename}" -e "}" ${sourcefile}
If this isn't clear enough I'll get the exact data so you can see.

For the fun of sed:
sed -e "0,/<WORKFLOWLINK/{/<WORKFLOWLINK/{r ${sourcefile}" -e"}}"
The trick is to start a new "pattern/command" pair after your first address range condition 0,/<WORKFLOWLINK/.
Two nested patterns/addresses are not understood, there must be a command after the first pattern. Using an additional pair of curlies {} does that for you.
Apart from the brain exercise to do it in sed, #EdMorton is right in recommending to use an XML-processor. Also his request for an MCVE is appropriate. I had to do some guessing to see what you want and I hope I guessed right.
The mcve should at least have included
the error message or problem description defining your problem
the initialisation of your environment variables
some sample input; not the original data
You surely would have had an answer earlier and (in case mine does not satisfy you) probably a better one by now.
So, before your next question, please take the https://stackoverflow.com/tour
GNU sed version 4.2.1
GNU bash, version 3.1.17(1)-release (i686-pc-msys)

Everyone,
Thank you for thinking with me, even if I apparently broke some rules.
I have figured out a solution, granted it is not as pretty as can be, but for a one time action it is good enough.
I have moved from a single command to a combination of first detecting the location I want to put my data:
sed -e "0,/<WORKFLOWLINK/ s/<WORKFLOWLINK/##MARKER##\n\t<WORKFLOWLINK'" which will put the marker string in the desired location.
After this I replace the marker with the contents of the file I have. I managed to make the individual statements working when I was trying to do it all in a single statement before, so I just execute them separately.
sed -e "/##MARKER##/{r ${sourcefile}" -e 'd}'

Related

Usage of sed to add a prefix for a string in linux

In my problem statement I would like to replace a word with a prefix
sed 's/hello-world/'"$1"'-hello-world/g' test.sql
Here $1 is any prefix passed as parameter to the shell script
In this case in the first go it works absolutely fine.
Let's assume "prefix=new"
It replaces as new-hello-world which is a perfect output.
If i re-run the command again I get new-new-hello-world which is not intended
Run it again i would get new-new-new-hello-world which is not intended
How can we search and replace it as new-hello-world no matter how many times it is run? Using a regex is also fine.
To make it idempotent, just check first that it doesn't already match. eg:
sed "/$1-hello-world/!s/hello-world/$1-hello-world/g" test.sql
This is not particularly robust, and will fail if the original documents contains the line new-hello-world hello-world, but is probably sufficient for your needs. (You need to worry more about / characters in the prefix, so if you want a robust solution there's a fair bit of work to be done.)

Using SED to replace capture group with regex pattern

I need some help with a sed command that I thought would help solve an issue I have. I have basically have long text files that look something like this:
>TRINITY_DN112253_co_g1_i2 Len=3873 path=[38000:0-183]
ACTCACGCCCACATAAT
The ACT text blocks continue on, and then there are more blocks of text that follow the same pattern, except the text after the > differs slightly by numbers. I want to replace only this header part (the part followed by the >) to everything up until the very last “_” the sed command I thought seemed logical is the following:
sed -i ‘s/>.*/TRINITY.*_/‘
However, sed is literally changing each header to TRINITY.*_ rather than capturing the block I thought it would. Any help is appreciated!
(Also.. just to make things clear, I thought that my sed command would convert the top header block into this:
>TRINITY_DN112253_co_g1_
ACTCACGCCCACATAAT
This might help:
sed '/^>/s/[^_]*$//' file
Output:
>TRINITY_DN112253_co_g1_
ACTCACGCCCACATAAT
See: The Stack Overflow Regular Expressions FAQ

Trying to Delete Certain Lines in a Range Using sed

In a large file, I need to edit text and remove comments inside a particular range. For this simplified example, let's assume the range begins with _start_ and finishes at _end_.
I'm able to edit the text with no problem using a command like:
sed -i -r "/_start_/,/_end_/ s/SearchText/ReplaceText/" FileName
Please note the following (and let me know, of course, if any of my statements are inaccurate or misguided):
I used -i so that it would edit "FileName" in place, rather than write to a different file.
I used -r so that it would recognize extended regular expressions (which are not shown in the simplified example above, but which seem to be working correctly).
I used double-quotes so that it would correctly handle variables (also not shown in the simplified example above, but also working as expected).
That command above is doing exactly what I expect it to do. So I moved on to the second step of my process: a very similar command to remove comment lines within this range:
sed -i -r "/_start_/,/_end_/ /^#/ d" FileName
This, however, is having no effect: The lines that begin with # are not removed. Indeed, when I execute this command alone, absolutely nothing in my file is changed or deleted -- neither inside the range nor elsewhere.
In my searches on this site and elsewhere, I've found a lot of instructions on deleting lines using sed (instructions that I think I'm following correctly) -- but nothing about a failure such as I'm experiencing.
Can anyone advise me what I'm doing wrong here?
I'm very new to the UNIX/Linux environment, so I'm definitely open to alternate suggestions as to how to handle the issue. But just to satisfy my frustration, I'd love to know what's wrong with my sed command above.
The best source of information is often the man page. You can reach it with the command man sed.
d takes an address range according to the man page. An address can be a number, a /regexp/, or a number of other things. An address range is either one address or two addresses, separated by comma.
You have been trying to use an address range and then an address.
As 1_CR pointed out, you can work around by using a block instead:
sed -i -r "/_start_/,/_end_/ {/^#/ d}" FileName
A block accepts an address range, and every command accepts an address range again, so you can combine the regexps.
You need to change
sed -i -r "/_start_/,/_end_/ /^#/ d" FileName
to
sed -i -r "/_start_/,/_end_/{/^#/d}" FileName
In terms of doing exactly what your question asks, you can also do the same thing with a range of line numbers. It doesn't use regular expressions, but you might find doing this is easier if looking at the line numbers is convenient for you:
sed -i '<start-of-range>,<end-of-range>d' FileName

substitute strings with special characters in a huge file using sed

I'm stuck in this very easy problem (I hope it is for you).
I need to substitute several strings with special characters in a huge file.
I'm trying using sed and bash because I'm a linux user but I've only used sed for "standard" string so far.
These are the kind of strings that I'm trying to manipulate
(alpha[1],alpha[2]) and diff(A45(i,j),alpha[1])
and the substituting strings would be
(i,j) and dzA45(i,j)
I tried sed -i 's/(alpha[1],alpha[2])/(i,j)/g' $filetowork and
sed -i 's/\(alpha\[1\],alpha\[2\]\)/i,j/g' $filetowork without any success
The second option seems to work for the first kind of string but it doesn't for the second one, why?
could you please help me? I took a look around stackoverflow old questions without any help, unfortunately :(
I just tried on the command line, but
echo "(alpha[1],alpha[2])" | sed 's/(alpha\[1\],alpha\[2\])/(i,j)/
worked for the first case. Please note that you should not escape ( or ), because that is how you activate groups.
For the second one
echo "diff(A45(i,j),alpha[1])" | sed 's/diff(A45(i,j),alpha\[1\])/dzA45(i,j)/'
worked for me. The same case, don't escape brackets!

How do I grep for entire, possibly wrapped, lines of code?

When searching code for strings, I constantly run into the problem that I get meaningless, context-less results. For example, if a function call is split across 3 lines, and I search for the name of a parameter, I get the parameter on a line by itself and not the name of the function.
For example, in a file containing
...
someFunctionCall ("test",
MY_CONSTANT,
(some *really) - long / expression);
grepping for MY_CONSTANT would return a line that looked like this:
MY_CONSTANT,
Likewise, in a comment block:
/////////////////////////////////////////
// FIXMESOON, do..while is the wrong choice here, because
// it makes the wrong thing happen
/////////////////////////////////////////
Grepping for FIXMESOON gives the very frustrating answer:
// FIXMESOON, do..while is the wrong choice here, because
When there are thousands of hits, single line results are a little meaningless. What I would like to do is have grep be aware of the start and stop points of source code lines, something as simple as having it consider ";" as the line separator would be a good start.
Bonus points if you can make it return the entire comment block if the hit is inside a comment.
I know you can't do this with grep alone. I also am aware of the option to have grep return a certain number of lines of context. Any suggestions on how to accomplish under Linux? FYI my preferred languages are C and Perl.
I'm sure I could write something, but I know that somebody must have already done this.
Thanks!
You can use pcregrep with the -M option (multiline matching; pcregrep is grep with Perl-compatible regular expressions). Something like:
pcregrep -M ";*\R*.*thingtosearchfor*\R*.*;.*"
Here's an example using awk.
$ cat file
blah1
blah2
function1 ("test",
MY_CONSTANT,
(some *really) - long / expression);
function2( one , two )
blah3
blah4
$ awk -vRS=")" '/function1/{gsub(".*function1","function1");print $0RT}' file
function1 ("test",
MY_CONSTANT,
(some *really)
the concept behind: RS is record separator. by setting it to ")", then every record in your file is separated by ")" instead of newline. This make it easy to find your "function1" since you can then "grep" for it. If you don't use awk, the same concept can be applied using "splitting" on ")".
You can write a command line using grep with the options that give you the line number and the filename, then xarg these results into awk to parse these columns and then use a little script from you to display the N lines surrounding that line? :)
If this isn't an academic endeavour you could just use cscope (for C code only though). If you are willing to drop the requirement to search in comments ctags should be enough (and it also supports Perl).
I had a situation in which I had an xml file full of the names of zip files in an xml style format, that is, with carrots bracketing the names of the files, say example.zip<\stuff>
I used awk to change all carrots into newlines then used grep :)

Resources