How to remove only the first occurrence of a line in a file using sed - linux

I have the following file
titi
tata
toto
tata
If I execute
sed -i "/tat/d" file.txt
It will remove all the lines containing tat. The command returns:
titi
toto
but I want to remove only the first line that occurs in the file containing tat:
titi
toto
tata
How can I do that?

You could make use of two-address form:
sed '0,/tat/{/tat/d;}' inputfile
This would delete the first occurrence of the pattern.
Quoting from info sed:
A line number of `0' can be used in an address specification like
`0,/REGEXP/' so that `sed' will try to match REGEXP in the first
input line too. In other words, `0,/REGEXP/' is similar to
`1,/REGEXP/', except that if ADDR2 matches the very first line of
input the `0,/REGEXP/' form will consider it to end the range,
whereas the `1,/REGEXP/' form will match the beginning of its
range and hence make the range span up to the _second_ occurrence
of the regular expression.

If you can use awk, then this makes it:
$ awk '/tata/ && !f{f=1; next} 1' file
titi
toto
tata
To save your result in the current file, do
awk '...' file > tmp_file && mv tmp_file file
Explanation
Let's activate a flag whenever tata is matched for the first time and skip the line. From that moment, keep not-skipping these lines.
/tata/ matches lines that contain the string tata.
{f=1; next} sets flag f as 1 and then skips the line.
!f{} if the flag f is set, skip this block.
1, as a True value, performs the default awk action: {print $0}.
Another approach, by Tom Fenech
awk '!/tata/ || f++' file
|| stands for OR, so this condition is true, and hence prints the line, whenever any of these happens:
tata is not found in the line.
f++ is true. This is the tricky part: first time f is 0 as default, so first f++ will return False and not print the line. From that moment, it will increment from an integer value and will be True.

Here's the general way to do it:
$ cat file
1 titi
2 tata
3 toto
4 tata
5 foo
6 tata
7 bar
$
$ awk '/tat/{ if (++f == 1) next} 1' file
1 titi
3 toto
4 tata
5 foo
6 tata
7 bar
$
$ awk '/tat/{ if (++f == 2) next} 1' file
1 titi
2 tata
3 toto
5 foo
6 tata
7 bar
$
$ awk '/tat/{ if (++f ~ /^(1|2)$/) next} 1' file
1 titi
3 toto
5 foo
6 tata
7 bar
Note that with the above approach you can skip whatever occurrence(s) of an RE you like (1st, 2nd, 1st and 2nd, whatever) and you only specify the RE once (as opposed to having to duplicate it for some alternative solutions).
Clear, simple, obvious, easily maintainable, extensible, etc....

Here is one way of doing it with sed:
sed ':a;$!{N;ba};s/\ntat[^\n]*//' file
titi
toto
tata

This might work for you (GNU sed):
sed '/pattern/{x;//!d;x}' file
Print all lines other than those containing the pattern as normal. Otherwise if the line contains the pattern and hold space does not (the first occurrence), delete that line.

You may find the first matching line number with grep and pass it to sed for deletion.
sed "$((grep -nm1 tat file.txt || echo 1000000000:) | cut -f 1 -d:) d" file.txt
grep -n combined with cut finds the line number to be deleted. grep -m1 ensures at most one line number is found. echo handles the case when there is no match so as not to return an empty result. sed "[line number] d" deletes the line.

Related

I want to remove multiple line of text on linux

Just like this.
Before:
1
19:22
abcde
2
19:23
3
19:24
abbff
4
19:25
abbc
After:
1
19:22
abcde
3
19:24
abbff
4
19:25
abbc
I want remove the section having no alphabet like section 2.
I think that I should use perl or sed. But I don't know how to do.
I tried like this. But it didn't work.
sed 's/[0-9]\n[0-9]\n%s\n//'
sed is for doing s/old/new/ on individual lines, that is all. For anything else you should be using awk:
$ awk -v RS= -v ORS='\n\n' '/[[:alpha:]]/' file
1
19:22
abcde
3
19:24
abbff
4
19:25
abbc
The above is simply this:
RS= tells awk the input records are separated by blank lines.
ORS='\n\n' tells awk the output records must also be separated by blank lines.
/[[:alpha:]]/ searches for and prints records that contain alphabetic characters.
Simple enough in Perl. The secret is to put Perl in "paragraph mode" by setting the input record separator ($/) to an empty string. Then we only print records if they contain a letter.
#!/usr/bin/perl
use strict;
use warnings;
# Paragraph mode
local $/ = '';
# Read from STDIN a record (i.e. paragraph) at a time
while (<>) {
# Only print records that include a letter
print if /[a-z]/i;
}
This is written as a Unix filter, i.e. it reads from STDIN and writes to STDOUT. So if it's in a file called filter, you can call it like this:
$ filter < your_input_file > your_output_file
Alternatively this is a simple command line script in Perl (-00 is the command line option to put Perl into paragraph mode):
$ perl -00 -ne'print if /[a-z]/' < your_input_file > your_output_file
If there's exactly one blank line after each paragraph you can use a long awk oneliner (three patterns, so probably not a oneliner actually):
$ echo '1
19:22
abcde
2
19:23
3
19:24
abbff
4
19:25
abbc
' | awk '/[^[:space:]]/ { accum = accum $0 "\n" } /^[[:space:]]*$/ { if(on) print accum $0; on = 0; accum = "" } /[[:alpha:]]/ { on = 1 }'
1
19:22
abcde
3
19:24
abbff
4
19:25
abbc
The idea is to accumulate non-blank lines, setting flag once an alphabetical character found, and on a blank input line, flush the whole accumulated paragraph if that flag is set, reset accum to empty string and reset flag to zero.
(Note that if the last line of input is not necessarily empty you might need to add an END block that checks if currently there's a paragraph unflushed and flush it as needed.)
This might work for you (GNU sed):
sed ':a;$!{N;/^$/M!ba};/[[:alpha:]]/!d' file
Gather up lines delimited by an empty line or end-of-file and delete the latest collection if it does not contain an alpha character.
This presupposes that the file format is fixed as in the example. To be more accurate use:
sed -r ':a;$!{N;/^$/M!ba};/^[1-9][0-9]*\n[0-9]{2}:[0-9]{2}\n[[:alpha:]]+\n?$/!d' file
Similar to the solution of Ed Morton but with the following assumptions:
The text blocks consist of 2 or 3 lines.
If there is a third line, it contains characters from any alphabet.
In essence, under these conditions we only need to check for a third field:
awk 'BEGIN{RS=;ORS="\n\n";FS="\n"}(NF<3)' file
or similar without BEGIN:
awk -v RS= -v ORS='\n\n' -F '\n' '(NF<3)' file

grep string after first occurrence of numbers

How do I get a string after the first occurrence of a number?
For example, I have a file with multiple lines:
34 abcdefg
10 abcd 123
999 abc defg
I want to get the following output:
abcdefg
abcd 123
abc defg
Thank you.
You could use Awk for this, loop through all the columns in each line upto NF (last column in each line) and once matching the first word, print the column next to it. The break statement would exit the for loop after the first iteration.
awk '{ for(i=1;i<=NF;i++) if ($i ~ /[[:digit:]]+/) { print $(i+1); break } }' file
It is not clear what you exactly want, but you can try to express it in sed.
Remove everything until the first digit, the next digits and any spaces.
sed 's/[^0-9]*[0-9]\+ *//'
Imagine the following two input files :
001 ham
03spam
3 spam with 5 eggs
A quick solution with awk would be :
awk '{sub(/[^0-9]*[0-9]+/,"",$0); print $1}' <file>
This line substitutes the first string of anything that does not contain a number followed by a number by an empty set (""). This way $0 is redefined and you can reprint the first field or the remainder of the field. This line gives exactly the following output.
ham
spam
spam
If you are interested in the remainder of the line
awk '{sub(/[^0-9]*[0-9]+ */,"",$0); print $0}' <file>
This will have as an output :
ham
spam
spam with 5 eggs
Be aware that an extra " *" is needed in the regular expression to remove all trailing spaces after the number. Without it you would get
awk '{sub(/[^0-9]*[0-9]+/,"",$0); print $0}' <file>
ham
spam
spam with 5 eggs
You can remove digits and whitespaces using sed:
sed -E 's/[0-9 ]+//' file
grep can do the job:
$ grep -o -P '(?<=[0-9] ).*' inputFIle
abcdefg
abcd 123
abc defg
For completeness, here is a solution with perl:
$ perl -lne 'print $1 if /[0-9]+\s*(.*)/' inputFIle
abcdefg
abcd 123
abc defg

How to remove odd lines except for first line using SED or AWK

I have the following file
# header1 header2
zzzz yyyy
1
kkkkk wwww
2
What I want to do is to remove odd lines except the header
yielding:
# header1 header2
zzzz yyyy
kkkkk wwww
I tried this but it removes the header too
awk 'NR%2==0'
What's the right way to do it?
Works on GNU sed
sed '3~2d' ip.txt
This deletes line numbers starting from 3rd line and then +2,+4,+6, etc
Example:
$ seq 10 | sed '3~2d'
1
2
4
6
8
10
awk 'NR==1 || NR%2==0'
If the record number is 1 or is even, print it.
awk 'NR % 2 == 0 || NR == 1'
Reversing the comparisons might be marginally faster. The difference probably isn't measurable. (And the choice of spacing is essentially immaterial too.)
You just need
awk 'NR==1 || NR%2==0' file
This keeps the header part of the file intact and applies the rule NR%2==0, which is true only for even lines(starting from the header) in which case it is printed.
Another variant of the same above answer
awk 'NR==1 || !(NR%2)' file
For even lines (NR%2) becomes 0 and negation of that becomes a true condition to print the line
sed '1!{N;P;d}'
1! On lines other than the first (the default behavior echoes the first line)
N append the next line to the current line
P print only the first of the two
d delete them both.
This might work for you (GNU sed):
sed '1b;n;d' file
But:
sed '3~2d' file
Is far neater.

How can I remove double line breaks with sed?

I tried:
sed -i 's/\n+/\n/' file
but it's not working.
I still want single line breaks.
Input:
abc
def
ghi
jkl
Desired output:
abc
def
ghi
jkl
This might work for you (GNU sed):
sed '/^$/{:a;N;s/\n$//;ta}' file
This replaces multiple blank lines by a single blank line.
However if you want to place a blank line after each non-blank line then:
sed '/^$/d;G' file
Which deletes all blank lines and only appends a single blank line to a non-blank line.
Sed isn't very good at tasks that examine multiple lines programmatically. Here is the closest I could get:
$ sed '/^$/{n;/^$/d}' file
abc
def
ghi
jkl
The logic of this: if you find a blank line, look at the next line. If that next line is also blank, delete that next line.
This doesn't gobble up all of the lines in the end because it assumes that there was an intentional extra pair and reduced the two \n\ns down to two \ns.
To do it in basic awk:
$ awk 'NF > 0 {blank=0} NF == 0 {blank++} blank < 2' file
abc
def
ghi
jkl
This uses a variable called blank, which is zero when the number of fields (NF) is nonzero and increments when they are zero (a blank line). Awk's default action, printing, is performed when the number of consecutive blank lines is less than two.
Using awk (gnu or BSD) you can do:
awk -v RS= -v ORS='\n\n' '1' file
abc
def
ghi
jkl
Also using perl:
perl -pe '$/=""; s/(\n)+/$1$1/' file
abc
def
ghi
jkl
Found here That's What I Sed (slower than this solution).
sed '/^$/N;/\n$/D' file
The sed script can be read as follows:
If the next line is empty, delete the current line.
And can be translated into the following pseudo-code (for the reader already familiar with sed, buffer refers to the pattern space):
1 | # sed '/^$/N;/\n$/D' file
2 | while not end of file :
3 | buffer = next line
4 | # /^$/N
5 | if buffer is empty : # /^$/
6 | buffer += "\n" + next line # N
7 | end if
8 | # /\n$/D
9 | if buffer ends with "\n" : # /\n$/
10 | delete first line in buffer and go to 5 # D
11 | end if
12 | print buffer
13 | end while
In the regular expression /^$/, the ^ and $ signs mean "beginning of the buffer" and "end of the buffer" respectively. They refer to the edges of the buffer, not to the content of the buffer.
The D command performs the following tasks: if the buffer contains newlines, delete the text of the buffer up to the first newline, and restart the program cycle (go back to line 1) without processing the rest of the commands, without printing the buffer, and without reading a new line of input.
Finally, keep in mind that sed removes the trailing newline before processing the line, and keep in mind that the print command adds back the trailing newline. So, in the above code, if the next line to be processed is Hello World!\n, then next line implicitely refers to Hello World!.
More details at https://www.gnu.org/software/sed/manual/sed.html.
You are now ready to apply the algorithm to the following file:
a\n
b\n
\n
\n
\n
c\n
Now let's see why this solution is faster.
The sed script /^$/{:a;N;s/\n$//;ta} can be read as follows:
If the current line matches /^$/, then do {:a;N;s/\n$//;ta}.
Since there is nothing between ^ and $ we can rephrase like this:
If the current line is empty, then do {:a;N;s/\n$//;ta}.
It means that sed executes the following commands for each empty line:
Step
Command
Description
1
:a
Declare a label named "a".
2
N
Append the next line preceded by a newline (\n) to the current line.
3
s/\n$//
Substitute (s) any trailing newline (/\n$/) with nothing (//).
4
ta
Return to label "a" (to step 1) if a substitution was performed (at step 3), otherwise print the result and move on to the next line.
Non empty lines are just printed as is. Knowing all this, we can describe the entire procedure with the following pseudo-code:
1 | # sed '/^$/{:a;N;s/\n$//;ta}' file
2 | while not end of file :
3 | buffer = next line
4 | # /^$/{:a;N;s/\n$//;ta}
5 | if buffer is empty : # /^$/
6 | :a # :a
7 | buffer += "\n" + next line # N
8 | if buffer ends with "\n" : # /\n$/
9 | remove last "\n" from buffer # s/\n$//
10 | go to :a (at 6) # ta
11 | end if
12 | end if
13 | print buffer
14 | end while
As you can see, the two sed scripts are very similar. Indeed, s/\n$//;ta is almost the same as /\n$/D. However, the second script skips step 5, so it is potentialy faster than the first script. Let's time both scripts fed with ~10Mb of empty lines:
$ yes '' | head -10000000 > file
$ /usr/bin/time -f%U sed '/^$/N;/\n$/D' file > /dev/null
3.61
$ /usr/bin/time -f%U sed '/^$/{:a;N;s/\n$//;ta}' file > /dev/null
2.37
Second script wins.
perl -00 -pe 1 filename
That splits the input file into "paragraphs" separated by 2 or more newlines, and then prints the paragraphs separated by a single blank line:
perl -00 -pe 1 <<END
abc
def
ghi
jkl
END
abc
def
ghi
jkl
This gives you what you want using solely sed :
sed '/^$/d' txt | sed -e $'s/$/\\\n/'
The first sed command removes all empty lines, denoted as "^$".
The second sed command inserts one newline character at the end of each line.
Why not just get rid of all your blank lines, then add a single blank line after each line? For an input file tmp as you specified,
sed '/^$/d' tmp|sed '0~1 a\ '
abc
def
ghi
jkl
If white space (spaces and tabs) counts as a "blank" line for you, then use sed '/^\s*$/d' tmp|sed '0~1 a\ ' instead.
Note that these solutions do leave a trailing blank line at the end, as I wasn't sure if this was desired. Easily removed.
I wouldn't use sed for this but cat with the -s flag.
As the manual states:
-s, --squeeze-blank suppress repeated empty output lines
So all that is needed to get the desired output is:
cat -s file

How to use Linux command(sed?) to delete specific lines in a file?

I have a file that contains a matrix. For example, I have:
1 a 2 b
2 b 5 b
3 d 4 b
4 b 7 b
I know it's easy to use sed command to delete specific lines with specific strings. But what if I only want to delete those lines where the second field's value is b (i.e., second line and fourth line)?
You can use regex in sed.
sed -i 's/^[0-9]\s+b.*//g' xxx_file
or
sed -i '/^[0-9]\s+b.*/d' xxx_file
The "-i" argument will modify the file's content directly, you can remove "-i" and output the result to other files as you want.
Awk just work fine, just use code as below:
awk '{if ($2 != "b") print $0;}' file
if you want get more usage about awk, just man it!
awk:
cat yourfile.txt | awk '{if($2!="b"){print;}}'

Resources