Trying to Delete Certain Lines in a Range Using sed - linux

In a large file, I need to edit text and remove comments inside a particular range. For this simplified example, let's assume the range begins with _start_ and finishes at _end_.
I'm able to edit the text with no problem using a command like:
sed -i -r "/_start_/,/_end_/ s/SearchText/ReplaceText/" FileName
Please note the following (and let me know, of course, if any of my statements are inaccurate or misguided):
I used -i so that it would edit "FileName" in place, rather than write to a different file.
I used -r so that it would recognize extended regular expressions (which are not shown in the simplified example above, but which seem to be working correctly).
I used double-quotes so that it would correctly handle variables (also not shown in the simplified example above, but also working as expected).
That command above is doing exactly what I expect it to do. So I moved on to the second step of my process: a very similar command to remove comment lines within this range:
sed -i -r "/_start_/,/_end_/ /^#/ d" FileName
This, however, is having no effect: The lines that begin with # are not removed. Indeed, when I execute this command alone, absolutely nothing in my file is changed or deleted -- neither inside the range nor elsewhere.
In my searches on this site and elsewhere, I've found a lot of instructions on deleting lines using sed (instructions that I think I'm following correctly) -- but nothing about a failure such as I'm experiencing.
Can anyone advise me what I'm doing wrong here?
I'm very new to the UNIX/Linux environment, so I'm definitely open to alternate suggestions as to how to handle the issue. But just to satisfy my frustration, I'd love to know what's wrong with my sed command above.

The best source of information is often the man page. You can reach it with the command man sed.
d takes an address range according to the man page. An address can be a number, a /regexp/, or a number of other things. An address range is either one address or two addresses, separated by comma.
You have been trying to use an address range and then an address.
As 1_CR pointed out, you can work around by using a block instead:
sed -i -r "/_start_/,/_end_/ {/^#/ d}" FileName
A block accepts an address range, and every command accepts an address range again, so you can combine the regexps.

You need to change
sed -i -r "/_start_/,/_end_/ /^#/ d" FileName
to
sed -i -r "/_start_/,/_end_/{/^#/d}" FileName

In terms of doing exactly what your question asks, you can also do the same thing with a range of line numbers. It doesn't use regular expressions, but you might find doing this is easier if looking at the line numbers is convenient for you:
sed -i '<start-of-range>,<end-of-range>d' FileName

Related

Wildcard sed search/remove within other text in the same line

I'm trying to remove a matching string with partial wildcards using sed, and the searches I've done for answers on this site either don't seem to apply or I can't convert them to my situation.
Below is the string of text I need to remove:
www.foo.com.cp123.bar.com
It is in a file with other entries on the same line. The line that has my entries always starts with serveralias:, however, as below:
serveralias: www.domain.com mail.domain.com www.foo.com.cp123.bar.com domain.com
I can identify what I need to remove via the 'cp123.bar.com' text as that always stays the same. It's the preceding 'www.foo.com' that changes. It can appear just once or multiple times within the line, but it will always end in 'cp123.bar.com'. I've tried the following two commands based on my research:
sed 's/\ .*cp123.bar.com\ //g' file.txt
sed 's/\ [^:]+$cp123.bar.com\ //g' file.txt
I'm using the spaces between each entry as the start and stop point for the find/replace(delete), but that's a band-aid and not always going to work since the entry I need to delete is occasionally at the end of the line (without a space afterward). If I don't include the spaces, though, everything gets removed since I'm using wildcards, including the www.domain.com, mail.domain.com, etc. text I need to keep there. Running either of the sed commands above doesn't do anything, just prints what's currently in the file.
Any ideas on what I need to change? I'm happy to clarify anything if need be.
Sed requires an -r flag to be able to use enhanced regular expressions. Without the -r, the + won't work in the regexps. Thus, a
sed -r 's/ +[^ ]+\.cp123\.bar\.com//g'
will do what you want. It removes the following substrings:
one or more space
followed by one or more non-space
followed by .cp123.bar.com

Using SED to replace capture group with regex pattern

I need some help with a sed command that I thought would help solve an issue I have. I have basically have long text files that look something like this:
>TRINITY_DN112253_co_g1_i2 Len=3873 path=[38000:0-183]
ACTCACGCCCACATAAT
The ACT text blocks continue on, and then there are more blocks of text that follow the same pattern, except the text after the > differs slightly by numbers. I want to replace only this header part (the part followed by the >) to everything up until the very last “_” the sed command I thought seemed logical is the following:
sed -i ‘s/>.*/TRINITY.*_/‘
However, sed is literally changing each header to TRINITY.*_ rather than capturing the block I thought it would. Any help is appreciated!
(Also.. just to make things clear, I thought that my sed command would convert the top header block into this:
>TRINITY_DN112253_co_g1_
ACTCACGCCCACATAAT
This might help:
sed '/^>/s/[^_]*$//' file
Output:
>TRINITY_DN112253_co_g1_
ACTCACGCCCACATAAT
See: The Stack Overflow Regular Expressions FAQ

Linux: Replace first string in file with contents of other file containing quotes and slashes.

I have spent all day today trying to find a proper solution, but I am not able to. My problem:
I have an XML file with tags containing multiple of the same.
Example:
<TASK INSTANCE />
<WORKFLOWLINK CONDITION=""/>
<WORKFLOWLINK CONDITION=""/>
I want to add the contents of an other XML file before the first <WORKFLOWLINK. The issue I've ran into is that this file is full of double quotes and slashes. I've tried replacing them and escaping them, but to no avail.
My tries mainly culminated on something like:
sed -e "0,/<WORKFLOWLINK/ /<WORKFLOWLINK/{ r ${filename}" -e "}" ${sourcefile}
If this isn't clear enough I'll get the exact data so you can see.
For the fun of sed:
sed -e "0,/<WORKFLOWLINK/{/<WORKFLOWLINK/{r ${sourcefile}" -e"}}"
The trick is to start a new "pattern/command" pair after your first address range condition 0,/<WORKFLOWLINK/.
Two nested patterns/addresses are not understood, there must be a command after the first pattern. Using an additional pair of curlies {} does that for you.
Apart from the brain exercise to do it in sed, #EdMorton is right in recommending to use an XML-processor. Also his request for an MCVE is appropriate. I had to do some guessing to see what you want and I hope I guessed right.
The mcve should at least have included
the error message or problem description defining your problem
the initialisation of your environment variables
some sample input; not the original data
You surely would have had an answer earlier and (in case mine does not satisfy you) probably a better one by now.
So, before your next question, please take the https://stackoverflow.com/tour
GNU sed version 4.2.1
GNU bash, version 3.1.17(1)-release (i686-pc-msys)
Everyone,
Thank you for thinking with me, even if I apparently broke some rules.
I have figured out a solution, granted it is not as pretty as can be, but for a one time action it is good enough.
I have moved from a single command to a combination of first detecting the location I want to put my data:
sed -e "0,/<WORKFLOWLINK/ s/<WORKFLOWLINK/##MARKER##\n\t<WORKFLOWLINK'" which will put the marker string in the desired location.
After this I replace the marker with the contents of the file I have. I managed to make the individual statements working when I was trying to do it all in a single statement before, so I just execute them separately.
sed -e "/##MARKER##/{r ${sourcefile}" -e 'd}'

Delete some lines from text using Linux command

I know how to match text using regex patterns but not how to manipulate them.
I have used grep to match and extract lines from a text file, but I want to remove those lines from the text. How can I achieve this without having to write a python or bash shell script?
I have searched on Google and was recommended to use sed, but I am new to it and don't know how it works.
Can anyone point me in the right direction or help me achieve this goal?
The -v option to grep inverts the search, reporting only the lines that don't match the pattern.
Since you know how to use grep to find the lines to be deleted, using grep -v and the same pattern will give you all the lines to be kept. You can write that to a temporary file and then copy or move the temporary file over the original.
grep -v pattern original.file > tmp.file
mv tmp.file original.file
You can also use sed, as shown in shellfish's answer.
There are multiple possible refinements for the grep solution, but for most people most of the time, what is shown is more or less adequate (it would be a good idea to use a per process intermediate file name, preferably with a random name such as the mktemp command gives you). You can add code to remove the intermediate file on an interrupt; suppress interrupts while moving back; use copy and remove instead of move if the original file has multiple hard links or is a symlink; etc. The sed command more or less works around these issues for you, but it is not cognizant of multiple hard links or symlinks.
Create the pattern which matches the lines using grep. Then create a sed script as follows:
sed -i '/pattern/d' file
Explanation:
The -i option means overwrite the input file, thus removing the files matching pattern.
pattern is the pattern you created for grep, e.g. ^a*b\+.
d this sed command stands for delete, it will delete lines matching the pattern.
file this is the input file, it can consist of a relative or absolute path.
For more information see man sed.

Copy a section within two keywords into a target file

I have thousand of files in a directory and each file contains numbers of defined variables starting with keyword DEFINE and ending with a semicolon (;), I want to copy all the occurrences of the data between this keyword(Inclusive) into a target file.
Example: Below is the content of the text file:
/* This code is for lookup */
DEFINE variable as a1 expr= extract (n123f1 using brach, code);
END.
Now from the above content i just want to copy the section starting with DEFINE and ending with ; into a target file i.e. the output should be:
DEFINE variable as a1 expr= extract (n123f1 using brach, code);
this needs to done for thousands of scripts and multiple occurences, Please help out.
Thanks a lot , the provided code works, but to a limited extent only when the whole sentence is in a single line but the data is not supposed to be in one single line it is spread in multiple line like below:
/* This code is for lookup */
DEFINE variable as a1 expr= if branchno > 55
then
extract (n123f1 using brach, code)
else
branchno = null
;
END.
The code is also in the above fashion i need to capture all the data between DEFINE and semicolon (;) after every define there will be an ending semicolon ;, this is the pattern.
It sounds like you want grep(1):
grep '^DEFINE.*;$' input > output
Try using grep. Let's say you have files with extension .txt in present directory,
grep -ho 'DEFINE.*;' *.txt > outfile
Output:
DEFINE variable as a1 expr= extract (n123f1 using brach, code);
Short Description
-o will give you only matching string rather than whole line, if line also contains something else and want to ommit it.
-h will suppress file names before matching result
Read man page of grep by typing man grep on your terminal
EDIT
If you want capability to search in multiple lines, you can use pcregrep with -M option
pcregrep -M 'DEFINE.*?(\n|.)*?;' *.txt > outfile
Works fine on my system. Check man pcregrep for more details
Reference : SO Question
One can make a simple solution using sed with version :
sed -n -e '/^DEFINE/{:a p;/;$/!{n;ba}}' your-file
Option -n prevents sed from printing every line; then each time a line begins with DEFINE, print the line (command p) then enter a loop: until you find a line ending with ;, grab the next line and loop to the print command. When exiting the loop, you do nothing.
It looks a bit dirty; it seems that the version sed15 has a shorter (and more straightforward) way to achieve this in one line:
sed -n -e '/^DEFINE/,/;$/p' your-file
Indeed, only for this version of sed, both patterns are treated; for other versions of sed like mine under cygwin, the range patterns must be on separate lines to work properly.
One last thing to remember: it does not treat inclusive patterned ranges, i.e. it stops printing after the first encountered end-pattern even if multiple start patterns have been matched. Prefer something with awk if this is a feature you are looking for.

Resources