sed remove characters between two strings on different lines [duplicate] - linux

This question already has answers here:
How to extract characters between the delimiters using sed?
(2 answers)
Closed 2 years ago.
I would like to remove all the text between the strings /* and */ in a file, where the strings may occur on different lines and surround a comment. For example I would like to remove the following seven lines which are contained between /* and */:
/* "CyIHTAlgorithm.pyx":81
* #cython.wraparound(False)
* #cython.cdivision(True)
* cdef inline object IHTReconstruction2D(fType_t[:,:] data, # <<<<<<<<<<<<<<
* fType_t[:,:] residualFID,
* fType_t[:,:] CS_spectrum,
*/
I have managed to do this using sed where the strings occur on the same line:
sed -i.bak 's/\(\/\*\).*\(\*\/\)/\1\2/' test.txt
but I'm not sure how to extend this to multiple lines in the same file:
I have also tried:
sed -i.bak '/\/\*/{:a;N;/\*\//!ba;s/.*\/\*\|\*\/.*//g}' test.txt following the ideas here (Extract text between two strings on different lines)
This deletes the /* at the beginning and */ but not the intervening text.

Why not to work with sed ranges?
$ cat tmp/file13
first line
/* "CyIHTAlgorithm.pyx":81
* #cython.wraparound(False)
* #cython.cdivision(True)
* cdef inline object IHTReconstruction2D(fType_t[:,:] data, # <<<<<<<<<<<<<<
* fType_t[:,:] residualFID,
* fType_t[:,:] CS_spectrum,
*/
before last line
last line
$ sed '/\/\*/,/\*\//d' tmp/file13
first line
before last line
last line

you can use sed or cut but they are really designed for a pattern so each line should match it.
you should declare first line and last line by getting the lumber line of the start and finish and then you can wrap it into a function.
so,
1) get the line number for /* part
2) get the last line number for */
3) you can use "while read line;" loop and cut every line in between using cut or sed.

awk is really better suited for this kind of things. It supports ranges out of the box with the /pattern/,/pattern2/ syntax.
awk '/[:space:]*\/\*/,/[:space:]*\*\// {next} {print}' file.txt
It works the following way: for the lines between the two patterns it executes {next} actually skipping the line, for everything else it just prints the input.

The following will try to do more, so test first if it fits your needs.
cpp -P test.txt

I found the answer here: https://askubuntu.com/questions/916424/how-to-replace-text-between-two-patterns-on-different-lines
sed -n '1h; 1!H; ${ g; s/<head>.*<\/head>/IF H = 2 THEN\n INSERT FILE 'head.bes'\nEND/p }' myProgram.bes
Notes: This replaces all lines between <head> ... </head> (inclusive) in an HTML document with:
IF H = 1 THEN
INSERT FILE 'head.bes'
END

Related

sed - replace paragraph that begins with A and ends with B with strings [duplicate]

This question already has an answer here:
how to replace all lines between two points and subtitute it with some text in sed
(1 answer)
Closed 1 year ago.
I have a bunch of text files in which many paragraphs begin with printf("<style type=\"text/css\">\n"); and end with printf("</style>\n");
For example,
A.txt
...
...
printf("<style type=\"text/css\">\n");
...
...
...
printf("</style>\n"); // It may started with several Spaces!
...
...
I want this part replaced with some function call.
How to do it by sed command?
Would you try the following:
sed '
:a ;# define a label "a"
/printf("<style type=\\"text\/css\\">\\n");/ { ;# if the line matches the string, enter the {block}
N ;# read the next line
/printf("<\/style>\\n");/! ba ;# if the line does not include the ending pattern, go back to the label "a"
s/.*/replacement function call/ ;# replace the text
}
' A.txt
To replace a block of lines starting with A and ending with B by the text R, use sed '/A/,/B/cR'. Just make sure that you correctly escape the special symbols /, \ and ; in your strings. For readability I used variables:
start='printf("<style type=\\"text\/css\\">\\n")\;'
end='printf("<\/style>\\n")\;'
replacement='somefunction()\;'
sed "/^ *$start/,/^ *$end/c$replacement" yourFile

How to extract a string after matching characters from a variable in shell script [duplicate]

This question already has answers here:
Can grep show only words that match search pattern?
(15 answers)
Closed 2 years ago.
I have a file with following text as below
classA = Something
classB = AB1234567
classC = Something more
classD = Something Else
Objective:
Using a shell script, I want to read the text which says AB1234567 from above complete text.
So to start, I can read the second line of the above text using following logic in my shell script:
secondLine=`sed -n '2p' my_file`;
echo $secondLine;
secondLine outputs classB = AB1234567. How do I extract AB1234567 from classB = AB1234567 in my shell script?
Question:
Considering the fact that AB is common in that particular part of the text all the files I deal with, how can I make sed to read all the numbers after AB?
Please note that classB = AB1234567 could end with a space or a newline. And I need to get this into a variable
Try:
sed '2{ s/^classB = \(AB[^ ]*\) *$/\1/;q } ;d' your_fileName
2 is the line number.
{ open a sed group command.
s/ substitute below match
^ is anchor for beginning of the line
\(...\) is known a capture group with \1 as its back-reference
[^ ]* means any character but not a space
\(AB[^ ]*\) capture AB followed by anything until first space seen but not spaces (back-reference is \1)
* means zero-or-more spaces
$ is anchor for end of the line
/ with below
\1 back-reference of above capture group
/ end of substitution
q quit to avoid reading rest of the file unnecessarily
} close group command.
d delete any other lines before seen line number 2.
get into variable:
your_variableName=$(sed '2{ s/^classB = \(AB[^ ]*\) *$/\1/;q } ;d' your_fileName)
Could you please try following, looks should be easy in awk. Considering you want to print 2nd line and print only digits in last field.
secondLine=$(awk 'FNR==2{sub(/[^0-9]*/,"",$NF);print $NF}' Input_file)
You may try this awk:
awk -F ' *= *' '$1 ~ /B$/ { print $2 }' file
AB1234567
I'm not 100% sure this is what you're looking for, but if you know there's only a single element in the file that starts with AB, this will get it into a variable:
$ cat sample.txt
classA = Something
classB = AB1234567
classC = Something more
classD = Something Else
$ x=$(perl -ne 'print if s/^.*\s+(AB\S+)\s*$/$1/' sample.txt)
$ echo "the variable is: $x"
the variable is: AB1234567
Explanation of the regex:
^ beginning of line
.* anything
\s+ any number of spaces
(AB\S+) anything that starts with AB followed by non-spaces
\s*$ Zero or more spaces followed by the end of the line.

sed - Delete lines only if they contain multiple instances of a string

I have a text file that contains numerous lines that have partially duplicated strings. I would like to remove lines where a string match occurs twice, such that I am left only with lines with a single match (or no match at all).
An example output:
g1: sample1_out|g2039.t1.faa sample1_out|g334.t1.faa sample1_out|g5678.t1.faa sample2_out|g361.t1.faa sample3_out|g1380.t1.faa sample4_out|g597.t1.faa
g2: sample1_out|g2134.t1.faa sample2_out|g1940.t1.faa sample2_out|g45.t1.faa sample4_out|g1246.t1.faa sample3_out|g2594.t1.faa
g3: sample1_out|g2198.t1.faa sample5_out|g1035.t1.faa sample3_out|g1504.t1.faa sample5_out|g441.t1.faa
g4: sample1_out|g2357.t1.faa sample2_out|g686.t1.faa sample3_out|g1251.t1.faa sample4_out|g2021.t1.faa
In this case I would like to remove lines 1, 2, and 3 because sample1 is repeated multiple times on line 1, sample 2 is twice on line 2, and sample 5 is repeated twice on line 3. Line 4 would pass because it contains only one instance of each sample.
I am okay repeating this operation multiple times using different 'match' strings (e.g. sample1_out , sample2_out etc in the example above).
Here is one in GNU awk:
$ awk -F"[| ]" '{ # pipe or space is the field reparator
delete a # delete previous hash
for(i=2;i<=NF;i+=2) # iterate every other field, ie right side of space
if($i in a) # if it has been seen already
next # skit this record
else # well, else
a[$i] # hash this entry
print # output if you make it this far
}' file
Output:
g4: sample1_out|g2357.t1.faa sample2_out|g686.t1.faa sample3_out|g1251.t1.faa sample4_out|g2021.t1.faa
The following sed command will accomplish what you want.
sed -ne '/.* \(.*\)|.*\1.*/!p' file.txt
grep: grep -vE '(sample[0-9]).*\1' file
Inspiring from Glenn's answer: use -i with sed to directly do changes in the file.
sed -r '/(sample[0-9]).*\1/d' txt_file

Remove line break every nth line using sed

Example: Is there a way to use sed to remove/subsitute a pattern in a file for every 3n + 1 and 3n+ 2 line?
For example, turn
Line 1n/
Line 2n/
Line 3n/
Line 4n/
Line 5n/
Line 6n/
Line 7n/
...
To
Line 1 Line 2 Line 3n/
Line 4 Line 5 Line 6n/
...
I know this can probably be handled by awk. But what about sed?
Well, I'd just use awk for that1 since it's a little more complex but, if you're really intent on using sed, the following command will combine groups of three lines into a single line (which appears to be what you're after based on the title and text, despite the strange use of /n for newline):
sed '$!N;$!N;s/\n/ /g'
See the following transcript for how to test this:
$ printf 'Line 1\nLine 2\nLine 3\nLine 4\nLine 5\n' | sed '$!N;$!N;s/\n/ /g'
Line 1 Line 2 Line 3
Line 4 Line 5
The sub-commands are as follows:
$!N will append the next line to the pattern space, but only if you're not on the last line (you do this twice to get three lines). Each line in the pattern space is separated by a newline character.
s/\n/ /g replaces all the newlines in the pattern space with a space character, effectively combining the three lines into one.
1 With something like:
awk '{if(NR%3==1){s="";if(NR>1){print ""}};printf s"%s", $0;s=" "}'
This is complicated by the likelihood you don't want an extraneous space at the end of each line, necessitating the introduction of the s variable.
Since the sed variant is smaller (and less complex once you understand it), you're probably better off sticking with it. Well, at least up to the point where you want to combine groups of 17 lines, or do something else more complex than sed was meant to handle :-)
The example is for merging 3 consecutive lines although description is different. To generate the example output, you can use awk idiom
awk 'ORS=NR%3?FS:RS' <(seq 1 9)
1 2 3
4 5 6
7 8 9
in your case the record separator needs to be defined upfront to include the literals
awk -v RS="n/\\n" 'ORS=NR%3?FS:RS'
ok. following are ways to deal with it generally using awk and sed.
awk:
awk 'NR % 3 { sub(/pattern/, substitution) } { print }' file | paste -d' ' - - -
sed:
sed '{s/pattern/substitution/p; n;s/pattern/substitution/p; n;p}' file | paste -d' ' - - -
both of them replace pattern in 3n+1 and 3n+2 lines into substitution and keep the 3n line untouched.
paste - - - is the bash idiom to fold the stdout by 3.

How can I swap two lines using sed?

Does anyone know how to replace line a with line b and line b with line a in a text file using the sed editor?
I can see how to replace a line in the pattern space with a line that is in the hold space (i.e., /^Paco/x or /^Paco/g), but what if I want to take the line starting with Paco and replace it with the line starting with Vinh, and also take the line starting with Vinh and replace it with the line starting with Paco?
Let's assume for starters that there is one line with Paco and one line with Vinh, and that the line Paco occurs before the line Vinh. Then we can move to the general case.
#!/bin/sed -f
/^Paco/ {
:notdone
N
s/^\(Paco[^\n]*\)\(\n\([^\n]*\n\)*\)\(Vinh[^\n]*\)$/\4\2\1/
t
bnotdone
}
After matching /^Paco/ we read into the pattern buffer until s// succeeds (or EOF: the pattern buffer will be printed unchanged). Then we start over searching for /^Paco/.
cat input | tr '\n' 'ç' | sed 's/\(ç__firstline__\)\(ç__secondline__\)/\2\1/g' | tr 'ç' '\n' > output
Replace __firstline__ and __secondline__ with your desired regexps. Be sure to substitute any instances of . in your regexp with [^ç]. If your text actually has ç in it, substitute with something else that your text doesn't have.
try this awk script.
s1="$1"
s2="$2"
awk -vs1="$s1" -vs2="$s2" '
{ a[++d]=$0 }
$0~s1{ h=$0;ind=d}
$0~s2{
a[ind]=$0
for(i=1;i<d;i++ ){ print a[i]}
print h
delete a;d=0;
}
END{ for(i=1;i<=d;i++ ){ print a[i] } }' file
output
$ cat file
1
2
3
4
5
$ bash test.sh 2 3
1
3
2
4
5
$ bash test.sh 1 4
4
2
3
1
5
Use sed (or not at all) for only simple substitution. Anything more complicated, use a programming language
A simple example from the GNU sed texinfo doc:
Note that on implementations other than GNU `sed' this script might
easily overflow internal buffers.
#!/usr/bin/sed -nf
# reverse all lines of input, i.e. first line became last, ...
# from the second line, the buffer (which contains all previous lines)
# is *appended* to current line, so, the order will be reversed
1! G
# on the last line we're done -- print everything
$ p
# store everything on the buffer again
h

Resources