How can join consecutive non-empty lines using sed/awk? - string

How can i join consecutive non-empty lines into a single lines using sed or awk?
An example is given of what I am trying to do.
Input:
aaa ff gg
bbb eee eee
ss gg dd
aaa ff gg
bbb eee eee
ss gg dd
aaa ff gg
bbb eee eee
ss gg dd
Converts to
aaa ff gg bbb eee eee ss gg dd
aaa ff gg bbb eee eee ss gg dd
aaa ff gg bbb eee eee ss gg dd

Not sure if you REALLY want a blank line between each data line or not so here's both:
$ awk -v RS= '{$1=$1}1' file
aaa ff gg bbb eee eee ss gg dd
aaa ff gg bbb eee eee ss gg dd
aaa ff gg bbb eee eee ss gg dd
$ awk -v RS= -v ORS='\n\n' '{$1=$1}1' file
aaa ff gg bbb eee eee ss gg dd
aaa ff gg bbb eee eee ss gg dd
aaa ff gg bbb eee eee ss gg dd

This might work for you (GNU sed):
sed ':a;N;/\n$/!s/\n/ /;ta' file
Unless the last line appended is empty, replace a newline by a space and repeat. Otherwise print and repeat.
If you want empty lines deleted, then:
sed ':a;N;/\n$/!s/\n/ /;ta;P;d' file

If perl is okay:
$ perl -00 -pe 's/\n(?!$)/ /g' ip.txt
aaa ff gg bbb eee eee ss gg dd
aaa ff gg bbb eee eee ss gg dd
aaa ff gg bbb eee eee ss gg dd
-00 read input in paragraph mode
See http://perldoc.perl.org/perlrun.html#Command-Switches for more info and for -pe options
use perl -i -00 -pe for inplace editing
s/\n(?!$)/ /g replace all newlines except the one from blank line with space

#Schon:#try:
awk '{ORS=/^$/?RS RS:FS} {$1=$1} 1;END{print RS}' Input_file
EDIT: Adding explanation too now.
awk '{
ORS= ##### Setting Output field separator here.
/^$/ ##### Checking the condition if a line starts from null.
? ##### ? means if above condition is TRUE then run following action.
RS RS ##### set ORS as RS RS means set it to 2 new lines, default value of RS will be new line.
: ##### : is a conditional operator which will execute the action following it when condition is FALSE.
FS} ##### Set ORS to FS, which is field separator and it's default value is space.
{$1=$1} ##### Re-setting the first field again of line to reflect the new value of ORS.
1; ##### making the condition as TRUE and not mentioning the action, so by default print will happen of current line.
END
{print RS} ##### printing the RS value at last which is new line.
' Input_file ##### Mentioning the Input_file here.

A more readable example, less Perl-like:
awk '{ if ($0 == "") { print line "\n"; line = "" } else line = line $0 } END { if (line) print line }' file

Related

shell duplicate spaces in file

Is it possible to remove multiple spaces from a text file and save the changes in the same file using awk or grep?
Input example:
aaa bbb ccc
ddd yyyy
Output I want:
aaa bbb ccc
ddd yyyy
Simply reset value of $1 to again $1 which will allow OFS to come into picture and will add proper spaces into lines.
awk '{$1=$1} 1' Input_file
EDIT: Since OP mentioned that what if we want to keep only starting spaces then try following.
awk '
match($0,/^ +/){
spaces=substr($0,RSTART,RLENGTH)
}
{
$1=$1
$1=spaces $1
spaces=""
}
1
' Input_file
Using sed
sed -i -E 's#[[:space:]]+# #g' < input file
For removing spaces at the start
sed -i -E 's#[[:space:]]+# #g; s#^ ##g' < input file
Demo:
$cat test.txt
aaa bbb ccc
ddd yyyy
Output I want:
aaa bbb ccc
ddd yyyy
$sed -i -E 's#[[:space:]]+# #g' test.txt
$cat test.txt
aaa bbb ccc
ddd yyyy
Output I want:
aaa bbb ccc
ddd yyyy
$

get paragraph with awk, and start-of-line regexp

I use awk to get paragraphs from a textfile, like so:
awk -v RS='' -v ORS='\n\n' '/pattern/' ./textfile
Say I have the following textfile:
aaa bbb ccc
aaa bbb ccc
aaa bbb ccc
aaa ccc
bbb aaa ccc
bbb aaa ccc
ccc bbb aaa
ccc bbb aaa
ccc bbb aaa
Now I only want the paragraph with one of the (original) lines starting with "bbb" (hence the second paragraph). However - using regexp ^ will not work anymore, (I presume) because of the RS='' line; awk now only matches to the begin of the paragraph.
Is there another way?
^ means start-of-string. You want start-of-line which is (^|\n), e.g.:
$ awk -v RS='' -v ORS='\n\n' '/(^|\n)bbb/' file
aaa ccc
bbb aaa ccc
bbb aaa ccc

Multiplication of lines in bash

I want to make something like multiplication:
File1:
aa
bb
File2:
cc
dd
File3:
eee
fff
ggg
I want a result like:
aa cc eee
aa cc fff
aa cc ggg
bb dd eee
bb dd fff
bb dd ggg
File1 & File2 first element will multiply every element of File3, and same as second element of File1 & File2 multiply with every element of File3.
This would work:
$ join -j 9999 <(paste file1 file2) file3
aa cc eee
aa cc fff
aa cc ggg
bb dd eee
bb dd fff
bb dd ggg
It joins on a non-existing field (field 9999), which creates the Cartesian product of the input files. For the input files, paste file1 file2 combines the first two files into one, and join uses process substitution.
A slight snag is that there is a space introduced on each line; to get rid of that, you can pipe to sed:
join -j 9999 <(paste file1 file2) file3 | sed 's/^ //'
or specify an output format:
join -j 9999 -o 1.1,1.2,2.1 <(paste file1 file2) file3
You could use a nested for loop.
for ab in $(paste -d ' ' File1 File2); do
for c in $(cat File3); do
echo "$ab $c"
done
done
It doesn’t scale, obviously, but it may be enough for your use case.

merge specific line using awk and sed

I want to merge specific line
Input :
AAA
BBB
CCC
DDD
EEE
AAA
BBB
DDD
CCC
EEE
Output Should be
AAA
BBB
CCC DDD
EEE
AAA
BBB
DDD
CCC EEE
I want to search CCC and merge next line with it.
I have tried with awk command but didn't get success
Use awk patterns, if the line matches /CCC/ then print the line with a space at the end and go on to the next line. Otherwise (1), print the line.
awk '/CCC/ { printf("%s ", $0); next } 1' file
Using sed:
sed '/CCC/ { N; s/\n/ / }' file
Using awk:
awk '{ ORS=(/CCC/ ? FS : RS) }1' file

Delete whole line NOT containing given string

Is there a way to delete the whole line if it contains specific word using sed? i.e.
I have the following:
aaa bbb ccc
qqq fff yyy
ooo rrr ttt
kkk ccc www
I want to delete lines that contain 'ccc' and leave other lines intact. In this example the output would be:
qqq fff yyy
ooo rrr ttt
All this using sed. Any hints?
sed -n '/ccc/!p'
or
sed '/ccc/d'

Resources