Bash Shell Script : Concating lines that do not end in ^M - linux

This is again a question related to End of line characters in Unix and Windows.
I have a sql extract where some fields can contain text that have line breaks.
When I take this extract to a linux machine and open it in VI with :se list option set I see text like below:
1 some broken Text part 1 - Line1$
2 other broken text part 2 -line 2^M$
3 good line ^M$
I need to detect lines that do not end in CARRIAGE RETURN (CR) or ^M and see if it contains value that have line breaks.
In the above extract , basically i need to join the line 1 and line 2 and come up with just one line
1 'some broken Text part 1 - Line1 other broken text part 2 -line 2^M$
There should be no change to the Line 3 which would then become the line 2 of the file.
I tried to remove \n using tr but then the whole file became just 1 line in VI.
After removing \n , I tried to then replace \r with \r\n but it introduced unexpected behavior in the file.
Any help to figure out this issue will be appreciated.

You could just replace \n with a space and \r with \n:
$ printf 'some broken Text part 1 - Line1
other broken text part 2 -line 2\r
goodline\r\n' > file.txt
$ cat -vE file.txt
some broken Text part 1 - Line1$
other broken text part 2 -line 2^M$
goodline^M$
$ tr '\n\r' ' \n' < file.txt
some broken Text part 1 - Line1 other broken text part 2 -line 2
goodline

Below did the trick:
tr -d '\n' < file.txt > step1-file.txt
sed -i -e 's/\r/\r\n/g' step1-file.txt
Somehow the below perl line that i was trying to use earlier was introducing unexpected behavior.
perl -pi -e 's/\r/\r\n/' step1-file.txt

Related

Append then delete line to another line, only if it does not contain character

In my text file, there are 6 lines in a group separated by two blank lines. I have printed the line number for each line to the text document.
365:--------------------------------------------------------------------------------
366:--------------------------------------------------------------------------------
367:--------------------------------------------------------------------------------
368:--------------------------------------------------------------------------x-----
369:--------------------4-----------------------------------------------------------
370:--0-----------------------------------------------------------------------------
371:
372:
373:--------------------------------------------------------------------|
374:--------------------------------------------------------------------|
375:------------0--------2--------3h----2h----0-----2-------------------|
376:---2-----------------------------------------------------2----------|
377:--------------------------------------------------------------------|
378:--------------------------------------------------------------------|
Currently only 80 characters are printed to a line, so the rest of the data continues in the next group. For example, Line 365 corresponds to Line 373.
For only lines that do not contain a vertical bar (i.e., lines 365-370), I am trying to 1) append the line that is 8 lines away, then 2) delete the appended line after it has been printed.
So, ideally:
365:----------------------------------------------------------------------------------------------------------------------------------------------------|
366:----------------------------------------------------------------------------------------------------------------------------------------------------|
367:--------------------------------------------------------------------------------------------0--------2--------3h----2h----0-----2-------------------|
368:--------------------------------------------------------------------------x--------2-----------------------------------------------------2----------|
369:--------------------4-------------------------------------------------------------------------------------------------------------------------------|
370:--0-------------------------------------------------------------------------------------------------------------------------------------------------|
I can isolate the lines that do not contain a vertical bar using grep
grep -vn \| song.txt
I know that SED or AWK are likely my best bet, but I'm not sure how to proceed from here.
Just massage this approach to suit:
$ seq 16 | awk 'NR>8{print a[NR%8], $0} {a[NR%8]=$0}'
1 9
2 10
3 11
4 12
5 13
6 14
7 15
8 16
e.g. assuming 2 blank lines at the end of your input to make it blocks of 8 lines:
$ awk 'NR>8{print a[NR%8] $0} {a[NR%8]=$0}' file
--------------------------------------------------------------------------------------------------------------------------------------------------|
--------------------------------------------------------------------------------------------------------------------------------------------------|
------------------------------------------------------------------------------------------0--------2--------3h----2h----0-----2-------------------|
-------------------------------------------------------------------------x-------2-----------------------------------------------------2----------|
-------------------4------------------------------------------------------------------------------------------------------------------------------|
-0------------------------------------------------------------------------------------------------------------------------------------------------|
or if you don't have those blank lines after the last block:
$ awk '!NF{next} ++cnt>6{print a[NR%6] $0} {a[NR%6]=$0}' file
--------------------------------------------------------------------------------------------------------------------------------------------------|
-------------------------------------------------------------------------x------------------------------------------------------------------------|
-------------------4----------------------------------------------------------------------0--------2--------3h----2h----0-----2-------------------|
-0-------------------------------------------------------------------------------2-----------------------------------------------------2----------|
--------------------------------------------------------------------------------------------------------------------------------------------------|
--------------------------------------------------------------------------------------------------------------------------------------------------|
A little bit ugly, but working:
Split your input:
egrep -v "^$|\|" song.txt >file1
egrep "\|" song.txt >file2
And put it together:
paste -d "" file1 file2
I usually use the vim program for this type of work. For example, assuming you have a file named file_name.txt with the following content
-------------------------8----
------------0--------2--------|
---2--------------------------|
------------------aaa---------|
---------------984asds--------|
---------t6776----------------|
with the following command
vim -c ":6y" -c ":put" -c ":1" -c ":join!" -c ":6d" -c ":wq" file_name.txt
the program opens file_name.txt on the first line, copy the sixth line, paste the contents copied in the second line (the next line), go to the first line, joins the first line with the second, delete the line that was copied (sixth line), save and close the file. In this way, this command produces the following result
-------------------------8-------------------984asds--------|
------------0--------2--------|
---2--------------------------|
------------------aaa---------|
---------t6776----------------|
This might work for you (GNU utils);
sed '/^$/d' file |
split -nr/6 --filter 'cat'|
paste -sd'\0'|
sed 's/|/&\n/g;s/\n$//'
This removes any blank lines using sed, splits the file into 6 using a round-robin method and instead of making separate files, outputs all the files interleaved into the stdout. The lines are then pasted into a long lines (one per string) and split back into shorter lines using the | as record separators.

Remove line break every nth line using sed

Example: Is there a way to use sed to remove/subsitute a pattern in a file for every 3n + 1 and 3n+ 2 line?
For example, turn
Line 1n/
Line 2n/
Line 3n/
Line 4n/
Line 5n/
Line 6n/
Line 7n/
...
To
Line 1 Line 2 Line 3n/
Line 4 Line 5 Line 6n/
...
I know this can probably be handled by awk. But what about sed?
Well, I'd just use awk for that1 since it's a little more complex but, if you're really intent on using sed, the following command will combine groups of three lines into a single line (which appears to be what you're after based on the title and text, despite the strange use of /n for newline):
sed '$!N;$!N;s/\n/ /g'
See the following transcript for how to test this:
$ printf 'Line 1\nLine 2\nLine 3\nLine 4\nLine 5\n' | sed '$!N;$!N;s/\n/ /g'
Line 1 Line 2 Line 3
Line 4 Line 5
The sub-commands are as follows:
$!N will append the next line to the pattern space, but only if you're not on the last line (you do this twice to get three lines). Each line in the pattern space is separated by a newline character.
s/\n/ /g replaces all the newlines in the pattern space with a space character, effectively combining the three lines into one.
1 With something like:
awk '{if(NR%3==1){s="";if(NR>1){print ""}};printf s"%s", $0;s=" "}'
This is complicated by the likelihood you don't want an extraneous space at the end of each line, necessitating the introduction of the s variable.
Since the sed variant is smaller (and less complex once you understand it), you're probably better off sticking with it. Well, at least up to the point where you want to combine groups of 17 lines, or do something else more complex than sed was meant to handle :-)
The example is for merging 3 consecutive lines although description is different. To generate the example output, you can use awk idiom
awk 'ORS=NR%3?FS:RS' <(seq 1 9)
1 2 3
4 5 6
7 8 9
in your case the record separator needs to be defined upfront to include the literals
awk -v RS="n/\\n" 'ORS=NR%3?FS:RS'
ok. following are ways to deal with it generally using awk and sed.
awk:
awk 'NR % 3 { sub(/pattern/, substitution) } { print }' file | paste -d' ' - - -
sed:
sed '{s/pattern/substitution/p; n;s/pattern/substitution/p; n;p}' file | paste -d' ' - - -
both of them replace pattern in 3n+1 and 3n+2 lines into substitution and keep the 3n line untouched.
paste - - - is the bash idiom to fold the stdout by 3.

Printing only 6 and 10 charcters words in a linux file

This is my file:
$cat filename
10023a,vija45,8877au,qwer65,guru12 0099888das,baburam123,ganeshan1,feild55512
What I tried to do is using the sed below command to get the output to be only 6 charcters words in that file
sed -ne 's/[a-z][0-9]\{6}/&/p' filename
it displaying all words and lines
Could you please any one help me on this..
Expected output is
vija45 baburam123
8877au ganeshan1
qwer65 feild55512
guru12
Use that:
tr "," "\n" <file | grep '^.\{6\}$\|^.\{10\}$'
First tr replaces all , with newlines, that we have each segment between the commas in a line.
Then grep searches for 6 or 10 character long lines and prints them.
With your given example, the output would then be:
10023a
vija45
8877au
qwer65
baburam123
feild55512
If guru12 0099888das must also be matched as a 6 character and a 10 character word, then just change the tr part to include also spaces:
tr ", " "\n" <file | grep '^.\{6\}$\|^.\{10\}$'
I suggest you to use grep for matching.
grep -o '\b\w\{6\}\b' file
sed '
# keep only 6 char word (and space) by removing less or more than 6 character word
s/.*/,&,/
s/[^[:space:],]\{11,\}//g;s/[[:space:],][^[:space:],][[:space:],]\{1,5\}/,/g;s/[[:space:],][^[:space:],][[:space:],]\{7,9\}/,/g
# clean space element
s/[[:space:],]\{2,\}/,/g;s/^[[:space:],]*//g;s/[[:space:],]*$//g
# remove empty line
/$[[:space:],]*$/d
# 1 word per line (optional)
y/ ,/\n\n/
' YourFile
Detail:
print all word of 6 letter find in lines (option for 1 word printed per output line)
self explained
adapted for , separated
Correction: forget some g and a small bug on small word removing and add 10 char word (take 6 only in first version)

vim/vi/sed: Act on a certain number of lines from the end of the file

Just as we can delete (or substitute, or yank, etc.) the 4th to 6th lines from the beginning of a file in vim:
:4,6d
I'd like to delete (or substitute, or yank, etc.) the 4th last to the 6th lines from the end of a file. It means, if the file has 15 lines, I'd do:
:10,12d
But one can't do this when they don't know how many lines are in the files -- and I'm going to use it on a batch of many files. How do I do this in vim and sed?
I did in fact look at this post, but have not found it useful.
Well, using vim you can try the following -- which goes quite intuitive, anyway:
:$-4,$-5d
Now, using sed I couldn't find an exact way to do it, but if you can use something other than sed, here goes a solution with head and tail:
head -n -4 file.txt && tail -2 file.txt
In Vim, you can subtract the line numbers from $, which stands for the last line, e.g. this will work on the last 3 lines:
:$-2,$substitute/...
In sed, this is not so easy, because it works on the stream of characters, and cannot simply go back. You would have to store a number of last seen lines in the hold space, and at the end of the stream work on the hold space.
Here are some recipes from sed1line.txt:
# print the last 10 lines of a file (emulates "tail")
sed -e :a -e '$q;N;11,$D;ba'
# print the last 2 lines of a file (emulates "tail -2")
sed '$!N;$!D'
# delete the last 2 lines of a file
sed 'N;$!P;$!D;$d'
# delete the last 10 lines of a file
sed -e :a -e '$d;N;2,10ba' -e 'P;D' # method 1
sed -n -e :a -e '1,10!{P;N;D;};N;ba' # method 2
From the 4th last to the 6th lines from the end of a file: use tac to reverse the file
tac filename | sed 4,6d | tac
You can use 2 passes with awk, first pass to count the number of lines and the second to print or delete whatever lines you like, e.g.
awk 'NR==FNR{numLines++;next} {fromEnd = numLines - FNR} fromEnd > 6 || fromEnd < 4' file file
awk 'NR==FNR{numLines++;next} {fromEnd = numLines - FNR} fromEnd < 6 && fromEnd > 4' file file
This might work for you (GNU sed):
sed -r ':a;${s/([^\n]*\n){3}//;q};N;7,$!ba;P;D' file
This works by making a moving window of 6 lines in the pattern space (PS) and then deleting the first three of them on encountering the last line.
:a is a loop label
${s/([^\n]*\n){3}//;q} delete the first three lines of the PS at end of file and quit.
N append a newline and then the next line to the PS.
7,$!ba' if not lines 7 to the $ (end-of file) that is lines 1 to 6, loop back to beginning i.e. label :a
P;D for the line range 7 to $ (end-of-file) print upto the first newline in the PS and then delete upto and including the first newline and begin a new cycle.
The second to last clause creates the window by default in that the lines 1 to 6 are appended into the PS. From line 7 to the end a line is added at the end and the first line is printed then deleted.
Alternatively:
sed -e ':a' -e '$s/\([^\n]*\n\)\{3\}//' -e '$q' -e 'N' -e '7,$!ba' -e 'P' -e 'D' file

How can I swap two lines using sed?

Does anyone know how to replace line a with line b and line b with line a in a text file using the sed editor?
I can see how to replace a line in the pattern space with a line that is in the hold space (i.e., /^Paco/x or /^Paco/g), but what if I want to take the line starting with Paco and replace it with the line starting with Vinh, and also take the line starting with Vinh and replace it with the line starting with Paco?
Let's assume for starters that there is one line with Paco and one line with Vinh, and that the line Paco occurs before the line Vinh. Then we can move to the general case.
#!/bin/sed -f
/^Paco/ {
:notdone
N
s/^\(Paco[^\n]*\)\(\n\([^\n]*\n\)*\)\(Vinh[^\n]*\)$/\4\2\1/
t
bnotdone
}
After matching /^Paco/ we read into the pattern buffer until s// succeeds (or EOF: the pattern buffer will be printed unchanged). Then we start over searching for /^Paco/.
cat input | tr '\n' 'ç' | sed 's/\(ç__firstline__\)\(ç__secondline__\)/\2\1/g' | tr 'ç' '\n' > output
Replace __firstline__ and __secondline__ with your desired regexps. Be sure to substitute any instances of . in your regexp with [^ç]. If your text actually has ç in it, substitute with something else that your text doesn't have.
try this awk script.
s1="$1"
s2="$2"
awk -vs1="$s1" -vs2="$s2" '
{ a[++d]=$0 }
$0~s1{ h=$0;ind=d}
$0~s2{
a[ind]=$0
for(i=1;i<d;i++ ){ print a[i]}
print h
delete a;d=0;
}
END{ for(i=1;i<=d;i++ ){ print a[i] } }' file
output
$ cat file
1
2
3
4
5
$ bash test.sh 2 3
1
3
2
4
5
$ bash test.sh 1 4
4
2
3
1
5
Use sed (or not at all) for only simple substitution. Anything more complicated, use a programming language
A simple example from the GNU sed texinfo doc:
Note that on implementations other than GNU `sed' this script might
easily overflow internal buffers.
#!/usr/bin/sed -nf
# reverse all lines of input, i.e. first line became last, ...
# from the second line, the buffer (which contains all previous lines)
# is *appended* to current line, so, the order will be reversed
1! G
# on the last line we're done -- print everything
$ p
# store everything on the buffer again
h

Resources