Using sed to delete specific lines after LAST occurrence of pattern - linux

I have a file that looks like:
this name
this age
Remove these lines and space above.
Remove here too and space below
Keep everything below here.
I don't want to hardcode 2 as the number of lines containing "this" can change. How can I delete 4 lines after the last occurrence of the string. I am trying sed -e '/this: /{n;N;N;N;N;d}' but it is deleting after the first occurrence of the string.

Could you please try following.
awk '
FNR==NR{
if($0~/this/){
line=FNR
}
next
}
FNR<=line || FNR>(line+4)
' Input_file Input_file
Output will be as follows with shown samples.
this: name
this: age
Keep everything below here.

You can also use this minor change to make your original sed command work.
sed '/^this:/ { :k ; n ; // b k ; N ; N ; N ; d }' input_file
It uses a loop which prints the current line and reads the next one (n) while it keeps matching the regex (the empty regex // recalls the latest one evaluated, i.e. /^this:/, and the command b k goes back to the label k on a match). Then you can append the next 3 lines and delete the whole pattern space as you did.
Another possibility, more concise, using GNU sed could be this.
sed '/^this:/ b ; /^/,$ { //,+3 d }' input_file
This one prints any line beginning with this: (b without label goes directly to the next line cycle after the default print action).
On the first line not matching this:, two nested ranges are triggered. The outer range is "one-shot". It is triggered right away due to /^/ which matches any line then it stays triggered up to the last line ($). The inner range is a "toggle" range. It is also triggered right away because // recalls /^/ on this line (and only on this line, hence the one-shot outer range) then it stays trigerred for 3 additional lines (the end address +3 is a GNU extension). After that, /^/ is no longer evaluated so the inner range cannot trigger again because // recalls /^this:/ (which is short cut early).

This might work for you (GNU sed):
sed -E ':a;/this/n;//ba;$!N;$!ba;s/^([^\n]*\n?){4}//;/./!d' file
If the pattern space (PS) contains this, print the PS and fetch the next line.
If the following line contains this repeat.
If the current line is not the last line, append the next line and repeat.
Otherwise, remove the first four lines of the PS and print the remainder.
Unless the PS is empty in which case delete the PS entirely.
N.B. This only reads the file once. Also the OP says
How can I delete 4 lines after the last occurrence of the string
However the example would seem to expect 5 lines to be deleted.

Related

Repeat each line multiple times and add ascending numbers

Would like to have each line in a file repeated a fixed number of times and add ascending numbers, like this:
I have
wwx.domain.com/pageA/?page=1
wwx.domain.com/pageB/?page=1
wwx.domain.com/pageC/?page=1
I want
wwx.domain.com/pageA/?page=1
wwx.domain.com/pageA/?page=2
wwx.domain.com/pageA/?page=3
wwx.domain.com/pageB/?page=1
wwx.domain.com/pageB/?page=2
wwx.domain.com/pageB/?page=3
wwx.domain.com/pageC/?page=1
wwx.domain.com/pageC/?page=2
wwx.domain.com/pageC/?page=3
How can I do this?
awk '{ sub(/.$/,""); for(i=1; i<4; i++) print $0 i }' inputfile > outputfile
Explanation: Remove the last character from the input line, and in a loop print the (modified) input line followed by the loop index.
This might work for you (GNU sed):
sed -E 'h;:a;s/[^\n]*/&/3;t;x;s/(.*=)(.*)/echo "\1$((\2+1))"/e;x;G;ta' file
While the pattern space contains less than n (in this case 3) lines, append the current line, incrementing the last field by 1.
The solution uses the hold space to keep the last incremented line and shell arithmetic to increment the last field of the line.

How to remove newline Line Feed based on a condition in Unix [duplicate]

This exercise is from the AWK one-liners explained blog post by Peteris Krumins
Essentially this line
awk '/\\$/ { sub(/\\$/,""); getline t; print $0 t; next }; 1'
joins every line ending with backslash with the next line:
e.g. input
12345\
6789
523435\
00000
Output
123456789
52343500000
The blog post says:
Unfortunately this one liner fails to join more than 2 lines (this is left as an exercise to the reader to come up with a one-liner that joins arbitrary number of lines that end with backslash :)).
So using the AWK one-liner above, and if you use an input file with 2 or more lines one after the other that has a backslash at the end (input2), gives an incorrect answer (output2)
e.g. input2
12345\
6789\
523435\
00000
Output 2 - INCORRECT
123456789\
52343500000
I think, according to the post, the output should instead be output3:
Output 3 - CORRECT
12345678952343500000
How can one solve this problem (input as input2 and getting output3)?
Try the following:
awk '/\\$/ { printf "%s", substr($0, 1, length($0)-1); next } 1' <<'EOF'
12345\
6789\
523435\
00000
EOF
which yields
12345678952343500000
This demonstrates that 3 consecutive (or more) line continuations work fine, unlike with the command in the question.
Explanation of the command:
/\\$/ matches a \ at the end ($) of a line, signaling line continuation.
substr($0, 1, length($0)-1) removes that trailing \ from the input line, $0.
By using printf "%s", the (modified) current line is printed without a trailing newline, which means that whatever print command comes next will directly append to it, effectively joining the current and the next line.
next finishes processing of the current line.
1 is a common awk idiom that is shorthand for { print }, i.e., for simply printing the input line (with a trailing \n).
As for why the original command doesn't work:
awk '/\\$/ { sub(/\\$/,""); getline t; print $0 t; next }; 1
On encountering a line-continuation character (\ at the end of the current line), getline t reads the next line from the file and prints it as is after the current line.
next then finishes processing of both the current and - thanks to the getline call - the next line, so that the next script cycle processes the line after the next line (2 lines from the current one).
Therefore, since the line read via getline is blindly printed and not examined in any way, it is skipped with respect to line-continuation-character processing.
In general, as Ed Morton points out in a comment, use of getline is rarely the right solution and can lead to subtle bugs - see http://awk.info/?tip/getline.

sed - Delete lines only if they contain multiple instances of a string

I have a text file that contains numerous lines that have partially duplicated strings. I would like to remove lines where a string match occurs twice, such that I am left only with lines with a single match (or no match at all).
An example output:
g1: sample1_out|g2039.t1.faa sample1_out|g334.t1.faa sample1_out|g5678.t1.faa sample2_out|g361.t1.faa sample3_out|g1380.t1.faa sample4_out|g597.t1.faa
g2: sample1_out|g2134.t1.faa sample2_out|g1940.t1.faa sample2_out|g45.t1.faa sample4_out|g1246.t1.faa sample3_out|g2594.t1.faa
g3: sample1_out|g2198.t1.faa sample5_out|g1035.t1.faa sample3_out|g1504.t1.faa sample5_out|g441.t1.faa
g4: sample1_out|g2357.t1.faa sample2_out|g686.t1.faa sample3_out|g1251.t1.faa sample4_out|g2021.t1.faa
In this case I would like to remove lines 1, 2, and 3 because sample1 is repeated multiple times on line 1, sample 2 is twice on line 2, and sample 5 is repeated twice on line 3. Line 4 would pass because it contains only one instance of each sample.
I am okay repeating this operation multiple times using different 'match' strings (e.g. sample1_out , sample2_out etc in the example above).
Here is one in GNU awk:
$ awk -F"[| ]" '{ # pipe or space is the field reparator
delete a # delete previous hash
for(i=2;i<=NF;i+=2) # iterate every other field, ie right side of space
if($i in a) # if it has been seen already
next # skit this record
else # well, else
a[$i] # hash this entry
print # output if you make it this far
}' file
Output:
g4: sample1_out|g2357.t1.faa sample2_out|g686.t1.faa sample3_out|g1251.t1.faa sample4_out|g2021.t1.faa
The following sed command will accomplish what you want.
sed -ne '/.* \(.*\)|.*\1.*/!p' file.txt
grep: grep -vE '(sample[0-9]).*\1' file
Inspiring from Glenn's answer: use -i with sed to directly do changes in the file.
sed -r '/(sample[0-9]).*\1/d' txt_file

Output only the first pattern-line and its following line

I need to filter the output of a command.
I tried this.
bpeek | grep nPDE
My problem is that I need all matches of nPDE and the line after the found file. So the output would be like:
iteration nPDE
1 1
iteration nPDE
2 4
The best case would be if it would show me the found line only once and then only the line after it.
I found solutions with awk, But as far as I know awk can only read files.
There is an option for that.
grep --help
...
-A, --after-context=NUM print NUM lines of trailing context
Therefore:
bpeek | grep -A 1 'nPDE'
With awk (for completeness since you have grep and sed solutions):
awk '/nPDE/{c=2} c&&c--'
grep -A works if your grep supports it (it's not in POSIX grep). If it doesn't, you can use sed:
bpeek | sed '/nPDE/!d;N'
which does the following:
/nPDE/!d # If the line doesn't match "nPDE", delete it (starts new cycle)
N # Else, append next line and print them both
Notice that this would fail to print the right output for this file
nPDE
nPDE
context line
If you have GNU sed, you can use an address range as follows:
sed '/nPDE/,+1!d'
Addresses of the format addr1,+N define the range between addr1 (in our case /nPDE/) and the following N lines. This solution is easier to adapt to a different number of context lines, but still fails with the example above.
A solution that manages cases like
blah
nPDE
context
blah
blah
nPDE
nPDE
context
nPDE
would like like
sed -n '/nPDE/{$p;:a;N;/\n[^\n]*nPDE[^\n]*$/!{p;b};ba}'
doing the following:
/nPDE/ { # If the line matches "nPDE"
$p # If we're on the last line, just print it
:a # Label to jump to
N # Append next line to pattern space
/\n[^\n]*nPDE[^\n]*$/! { # If appended line does not contain "nPDE"
p # Print pattern space
b # Branch to end (start new loop)
}
ba # Branch to label (appended line contained "nPDE")
}
All other lines are not printed because of the -n option.
As pointed out in Ed's comment, this is neither readable nor easily extended to a larger amount of context lines, but works correctly for one context line.

delete a line after a pattern only if it is blank using sed or awk

I want to delete a blank line only if this one is after the line of my pattern using sed or awk
for example if I have
G
O TO P999-ERREUR
END-IF.
the pattern in this case is G
I want to have this output
G
O TO P999-ERREUR
END-IF.
This will do the trick:
$ awk -v n=-2 'NR==n+1 && !NF{next} /G/ {n=NR}1' file
G
O TO P999-ERREUR
END-IF.
Explanation:
-v n=-2 # Set n=-2 before the script is run to avoid not printing the first line
NR == n+1 # If the current line number is equal to the matching line + 1
&& !NF # And the line is empty
{next} # Skip the line (don't print it)
/G/ # The regular expression to match
{n = NR} # Save the current line number in the variable n
1 # Truthy value used a shorthand to print every (non skipped) line
Using sed
sed '/GG/{N;s/\n$//}' file
If it sees GG, gets the next line, removes the newline between them if the next line is empty.
Note this will only remove one blank line after, and the line must be blank i.e not spaces or tabs.
This might work for you (GNU sed):
sed -r 'N;s/(G.*)\n\s*$/\1/;P;D' file
Keep a moving window of two lines throughout the length of the file and remove a newline (and any whitespace) if it follows the intended pattern.
Using ex (edit in-place):
ex +'/G/j' -cwq foo.txt
or print to the standard output (from file or stdin):
ex -s +'/GG/j|%p|q!' file_or_/dev/stdin
where:
/GG/j - joins the next line when the pattern is found
%p - prints the buffer
q! - quits
For conditional checking (if there is a blank line), try:
ex -s +'%s/^\(G\)\n/\1/' +'%p|q!' file_or_/dev/stdin

Resources