How can I remove double line breaks with sed? - linux

I tried:
sed -i 's/\n+/\n/' file
but it's not working.
I still want single line breaks.
Input:
abc
def
ghi
jkl
Desired output:
abc
def
ghi
jkl

This might work for you (GNU sed):
sed '/^$/{:a;N;s/\n$//;ta}' file
This replaces multiple blank lines by a single blank line.
However if you want to place a blank line after each non-blank line then:
sed '/^$/d;G' file
Which deletes all blank lines and only appends a single blank line to a non-blank line.

Sed isn't very good at tasks that examine multiple lines programmatically. Here is the closest I could get:
$ sed '/^$/{n;/^$/d}' file
abc
def
ghi
jkl
The logic of this: if you find a blank line, look at the next line. If that next line is also blank, delete that next line.
This doesn't gobble up all of the lines in the end because it assumes that there was an intentional extra pair and reduced the two \n\ns down to two \ns.
To do it in basic awk:
$ awk 'NF > 0 {blank=0} NF == 0 {blank++} blank < 2' file
abc
def
ghi
jkl
This uses a variable called blank, which is zero when the number of fields (NF) is nonzero and increments when they are zero (a blank line). Awk's default action, printing, is performed when the number of consecutive blank lines is less than two.

Using awk (gnu or BSD) you can do:
awk -v RS= -v ORS='\n\n' '1' file
abc
def
ghi
jkl
Also using perl:
perl -pe '$/=""; s/(\n)+/$1$1/' file
abc
def
ghi
jkl

Found here That's What I Sed (slower than this solution).
sed '/^$/N;/\n$/D' file
The sed script can be read as follows:
If the next line is empty, delete the current line.
And can be translated into the following pseudo-code (for the reader already familiar with sed, buffer refers to the pattern space):
1 | # sed '/^$/N;/\n$/D' file
2 | while not end of file :
3 | buffer = next line
4 | # /^$/N
5 | if buffer is empty : # /^$/
6 | buffer += "\n" + next line # N
7 | end if
8 | # /\n$/D
9 | if buffer ends with "\n" : # /\n$/
10 | delete first line in buffer and go to 5 # D
11 | end if
12 | print buffer
13 | end while
In the regular expression /^$/, the ^ and $ signs mean "beginning of the buffer" and "end of the buffer" respectively. They refer to the edges of the buffer, not to the content of the buffer.
The D command performs the following tasks: if the buffer contains newlines, delete the text of the buffer up to the first newline, and restart the program cycle (go back to line 1) without processing the rest of the commands, without printing the buffer, and without reading a new line of input.
Finally, keep in mind that sed removes the trailing newline before processing the line, and keep in mind that the print command adds back the trailing newline. So, in the above code, if the next line to be processed is Hello World!\n, then next line implicitely refers to Hello World!.
More details at https://www.gnu.org/software/sed/manual/sed.html.
You are now ready to apply the algorithm to the following file:
a\n
b\n
\n
\n
\n
c\n
Now let's see why this solution is faster.
The sed script /^$/{:a;N;s/\n$//;ta} can be read as follows:
If the current line matches /^$/, then do {:a;N;s/\n$//;ta}.
Since there is nothing between ^ and $ we can rephrase like this:
If the current line is empty, then do {:a;N;s/\n$//;ta}.
It means that sed executes the following commands for each empty line:
Step
Command
Description
1
:a
Declare a label named "a".
2
N
Append the next line preceded by a newline (\n) to the current line.
3
s/\n$//
Substitute (s) any trailing newline (/\n$/) with nothing (//).
4
ta
Return to label "a" (to step 1) if a substitution was performed (at step 3), otherwise print the result and move on to the next line.
Non empty lines are just printed as is. Knowing all this, we can describe the entire procedure with the following pseudo-code:
1 | # sed '/^$/{:a;N;s/\n$//;ta}' file
2 | while not end of file :
3 | buffer = next line
4 | # /^$/{:a;N;s/\n$//;ta}
5 | if buffer is empty : # /^$/
6 | :a # :a
7 | buffer += "\n" + next line # N
8 | if buffer ends with "\n" : # /\n$/
9 | remove last "\n" from buffer # s/\n$//
10 | go to :a (at 6) # ta
11 | end if
12 | end if
13 | print buffer
14 | end while
As you can see, the two sed scripts are very similar. Indeed, s/\n$//;ta is almost the same as /\n$/D. However, the second script skips step 5, so it is potentialy faster than the first script. Let's time both scripts fed with ~10Mb of empty lines:
$ yes '' | head -10000000 > file
$ /usr/bin/time -f%U sed '/^$/N;/\n$/D' file > /dev/null
3.61
$ /usr/bin/time -f%U sed '/^$/{:a;N;s/\n$//;ta}' file > /dev/null
2.37
Second script wins.

perl -00 -pe 1 filename
That splits the input file into "paragraphs" separated by 2 or more newlines, and then prints the paragraphs separated by a single blank line:
perl -00 -pe 1 <<END
abc
def
ghi
jkl
END
abc
def
ghi
jkl

This gives you what you want using solely sed :
sed '/^$/d' txt | sed -e $'s/$/\\\n/'
The first sed command removes all empty lines, denoted as "^$".
The second sed command inserts one newline character at the end of each line.

Why not just get rid of all your blank lines, then add a single blank line after each line? For an input file tmp as you specified,
sed '/^$/d' tmp|sed '0~1 a\ '
abc
def
ghi
jkl
If white space (spaces and tabs) counts as a "blank" line for you, then use sed '/^\s*$/d' tmp|sed '0~1 a\ ' instead.
Note that these solutions do leave a trailing blank line at the end, as I wasn't sure if this was desired. Easily removed.

I wouldn't use sed for this but cat with the -s flag.
As the manual states:
-s, --squeeze-blank suppress repeated empty output lines
So all that is needed to get the desired output is:
cat -s file

Related

sed replacing first occurence of characters in each line of file only if they are first 2 characters

Is it possible using sed to replace the first occurrence of a character or substring in line of file only if it is the first 2 characters in the line?
For example we have this text file:
15 hello
15 h15llo
1 hello
1 h15loo
Using the following command: sed -i 's/15/0/' file.txt
Will give this output
0 hello
0 h15llo
1 hello
1 h0loo
What I am trying to avoid is it considering the characters past the first 2.
Is this possible?
Desired output:
0 hello
0 h15llo
1 hello
1 h15loo
You can use
sed -i 's/^15 /0 /' file.txt
sed -i 's/^15\([[:space:]]\)/0\1/' file.txt
sed -i 's/^15\(\s\)/0\1/' file.txt
Here, the ^ matches the start of string position, 15 matches the 15 substring and then a space matches a space.
The second and third solutions are the same, instead of a literal space, they capture a whitespace char into Group 1 and the group value is put back into the result using the \1 placeholder.

Swapping the first word with itself 3 times only if there are 4 words only using sed

Hi I'm trying to solve a problem only using sed commands and without using pipeline. But I am allowed to pass the result of a sed command to a file or te read from a file.
EX:
sed s/dog/cat/ >| tmp
or
sed s/dog/cat/ < tmp
Anyway lets say I had a file F1 and its contents was :
Hello hi 123
if a equals b
you
one abc two three four
dany uri four 123
The output should be:
if if if a equals b
dany dany dany uri four 123
Explanation: the program must only print lines that have exactly 4 words and when it prints them it must print the first word of the line 3 times.
I've tried doing commands like this:
sed '/[^ ]*.[^ ]*.[^ ]*/s/[^ ]\+/& & &/' F1
or
sed 's/[^ ]\+/& & &/' F1
but I can't figure out how i can calculate with sed that there are only 4 words in a line.
any help will be appreciated
$ sed -En 's/^([^[:space:]]+)([[:space:]]+[^[:space:]]+){3}$/\1 \1 &/p' file
if if if a equals b
dany dany dany uri four 123
The above uses a sed that supports EREs with a -E option, e.g. GNU and OSX seds).
If the fields are tab separated
sed 'h;s/[^[:blank:]]//g;s/[[:blank:]]\{3\}//;/^$/!d;x;s/\([^[:blank:]]*[[:blank:]]\)/\1\1\1/' infile

Sed - Conditional Matching of pattern

I want to do the following:
Find pattern 1, then find the first instance of pattern 2. After doing so, I want to print the next line. This is for a sed script. I'm pretty lost on how to do this, since sed doesn't have if statements.
This might work for you (GNU sed):
sed -n '/first/,${/second/{n;p;q}}' file
Set -n option to emulate grep i.e. only print what you want. Focus on the range from first to the end of the file ($). Then match second and get the next line (n), print (p) and quit (q).
If filename j.txt contains below content:
10 20 30
40 50 60
10 90 80
sed -n '/10/p' j.txt | sed -n '/20/,+1p'
First it will search for pattern1 (10) and then it will search for pattern2 (20) and print corresponding next line with content match line
Output will be:
10 20 30
10 90 80

How to remove odd lines except for first line using SED or AWK

I have the following file
# header1 header2
zzzz yyyy
1
kkkkk wwww
2
What I want to do is to remove odd lines except the header
yielding:
# header1 header2
zzzz yyyy
kkkkk wwww
I tried this but it removes the header too
awk 'NR%2==0'
What's the right way to do it?
Works on GNU sed
sed '3~2d' ip.txt
This deletes line numbers starting from 3rd line and then +2,+4,+6, etc
Example:
$ seq 10 | sed '3~2d'
1
2
4
6
8
10
awk 'NR==1 || NR%2==0'
If the record number is 1 or is even, print it.
awk 'NR % 2 == 0 || NR == 1'
Reversing the comparisons might be marginally faster. The difference probably isn't measurable. (And the choice of spacing is essentially immaterial too.)
You just need
awk 'NR==1 || NR%2==0' file
This keeps the header part of the file intact and applies the rule NR%2==0, which is true only for even lines(starting from the header) in which case it is printed.
Another variant of the same above answer
awk 'NR==1 || !(NR%2)' file
For even lines (NR%2) becomes 0 and negation of that becomes a true condition to print the line
sed '1!{N;P;d}'
1! On lines other than the first (the default behavior echoes the first line)
N append the next line to the current line
P print only the first of the two
d delete them both.
This might work for you (GNU sed):
sed '1b;n;d' file
But:
sed '3~2d' file
Is far neater.

How to remove only the first occurrence of a line in a file using sed

I have the following file
titi
tata
toto
tata
If I execute
sed -i "/tat/d" file.txt
It will remove all the lines containing tat. The command returns:
titi
toto
but I want to remove only the first line that occurs in the file containing tat:
titi
toto
tata
How can I do that?
You could make use of two-address form:
sed '0,/tat/{/tat/d;}' inputfile
This would delete the first occurrence of the pattern.
Quoting from info sed:
A line number of `0' can be used in an address specification like
`0,/REGEXP/' so that `sed' will try to match REGEXP in the first
input line too. In other words, `0,/REGEXP/' is similar to
`1,/REGEXP/', except that if ADDR2 matches the very first line of
input the `0,/REGEXP/' form will consider it to end the range,
whereas the `1,/REGEXP/' form will match the beginning of its
range and hence make the range span up to the _second_ occurrence
of the regular expression.
If you can use awk, then this makes it:
$ awk '/tata/ && !f{f=1; next} 1' file
titi
toto
tata
To save your result in the current file, do
awk '...' file > tmp_file && mv tmp_file file
Explanation
Let's activate a flag whenever tata is matched for the first time and skip the line. From that moment, keep not-skipping these lines.
/tata/ matches lines that contain the string tata.
{f=1; next} sets flag f as 1 and then skips the line.
!f{} if the flag f is set, skip this block.
1, as a True value, performs the default awk action: {print $0}.
Another approach, by Tom Fenech
awk '!/tata/ || f++' file
|| stands for OR, so this condition is true, and hence prints the line, whenever any of these happens:
tata is not found in the line.
f++ is true. This is the tricky part: first time f is 0 as default, so first f++ will return False and not print the line. From that moment, it will increment from an integer value and will be True.
Here's the general way to do it:
$ cat file
1 titi
2 tata
3 toto
4 tata
5 foo
6 tata
7 bar
$
$ awk '/tat/{ if (++f == 1) next} 1' file
1 titi
3 toto
4 tata
5 foo
6 tata
7 bar
$
$ awk '/tat/{ if (++f == 2) next} 1' file
1 titi
2 tata
3 toto
5 foo
6 tata
7 bar
$
$ awk '/tat/{ if (++f ~ /^(1|2)$/) next} 1' file
1 titi
3 toto
5 foo
6 tata
7 bar
Note that with the above approach you can skip whatever occurrence(s) of an RE you like (1st, 2nd, 1st and 2nd, whatever) and you only specify the RE once (as opposed to having to duplicate it for some alternative solutions).
Clear, simple, obvious, easily maintainable, extensible, etc....
Here is one way of doing it with sed:
sed ':a;$!{N;ba};s/\ntat[^\n]*//' file
titi
toto
tata
This might work for you (GNU sed):
sed '/pattern/{x;//!d;x}' file
Print all lines other than those containing the pattern as normal. Otherwise if the line contains the pattern and hold space does not (the first occurrence), delete that line.
You may find the first matching line number with grep and pass it to sed for deletion.
sed "$((grep -nm1 tat file.txt || echo 1000000000:) | cut -f 1 -d:) d" file.txt
grep -n combined with cut finds the line number to be deleted. grep -m1 ensures at most one line number is found. echo handles the case when there is no match so as not to return an empty result. sed "[line number] d" deletes the line.

Resources