Delete last line break using sed [duplicate] - linux

This question already has answers here:
How can I delete a newline if it is the last character in a file?
(23 answers)
Closed 4 years ago.
How to delete the last \n from a file. The file has a last blank line created for a line break in the last text line. I'm using this command:
sed '/^\s*$/d'
But that las blank line is not removed.

Why is sed printing a newline?
When you read the sed POSIX standard, then it states:
Whenever the pattern space is written to standard output or a named file, sed shall immediately follow it with a <newline>.
A bit more details can be found in this answer.
Removing the last <newline>:
truncate: If you want to delete just one-character from a file you can do :
truncate -s -1 <file>
This makes the file one byte shorter, i.e. remove the last character.
From man resize:
-s, --size=SIZE set or adjust the file size by SIZE bytes
SIZE may also be prefixed by one of the following modifying characters:
'+' extend by, '-' reduce by, '<' at most, '>' at least,
'/' round down to multiple of, '%' round up to multiple of.
other answers can be found in How can I delete a newline if it is the last character in a file?

1) DELETE LAST EMPTY LINE FROM A FILE:
First of all, the command you are currently using will delete ALL empty and blank lines!
NOT ONLY THE LAST ONE.
If you want to delete the last line if it is empty/blank then you can use the following command:
sed '${/^[[:blank:]]*$/d}' test
INPUT:
cat -vTE test
a$
$
b$
$
c$
^I ^I $
OUTPUT:
sed '${/^[[:blank:]]*$/d}' test
a
b
c
Explanations:
the first $ will tell sed to do the processing only on the last line
/^[[:blank:]]*$/ the condition will be evaluate by sed and if this line is empty or composed only of blank chars it will trigger the delete operation on the pattern buffer, therefore this last line will not be printed
you can redirect the output of the sed command to save it to a new file or do the changes in-place using -i option (if you use it take a back up of your file!!!!) or use -i.bak to force sed to take a back up of your file before modifying it.
IMPORTANT:
If your file comes from Windows and contain some carriage returns (\r) this sed command will not work!!! You will need to remove those noisy characters by using either dos2unix or tr -d '\r'.
For files containing carriage returns <CR> (\r or ^M):
BEFORE FIXING THE FILE:
cat:
cat -vTE test
a$
$
b$
$
c$
^I ^I ^M$
od:
od -c test
0000000 a \n \n b \n \n c \n \t \t \r \n
0000016
sed:
sed '${/^[[:blank:]]*$/d}' test
a
b
c
AFTER FIXING THE FILE:
dos2unix test
dos2unix: converting file test to Unix format ...
cat:
cat -vTE test
a$
$
b$
$
c$
^I ^I $
od:
od -c test
0000000 a \n \n b \n \n c \n \t \t \n
0000015
sed:
sed '${/^[[:blank:]]*$/d}' test
a
b
c
2) DELETE LAST EOL CHARACTER FROM A FILE:
For this particular purpose, I would recommend using perl:
perl -pe 'chomp if eof' test
a
b
c
you can add -i option to to the change in-place (take a backup of your file before running the command). Last but not least, you might have to remove Carriage Return from your files as described hereover.

Your question isn't clear but this might be what you're asking for:
$ cat file
a
b
c
$ awk 'NR>1{print p} {p=$0}' file
a
b
c
$

you can also use below one-liner from sed to remove the trailing blank line(s):
sed -e :a -e '/^\n*$/N;/\n$/ba'

Related

Replace string and remain the file format [duplicate]

The intent of this question is to provide an answer to the daily questions whose answer is "you have DOS line endings" so we can simply close them as duplicates of this one without repeating the same answers ad nauseam.
NOTE: This is NOT a duplicate of any existing question. The intent of this Q&A is not just to provide a "run this tool" answer but also to explain the issue such that we can just point anyone with a related question here and they will find a clear explanation of why they were pointed here as well as the tool to run so solve their problem. I spent hours reading all of the existing Q&A and they are all lacking in the explanation of the issue, alternative tools that can be used to solve it, and/or the pros/cons/caveats of the possible solutions. Also some of them have accepted answers that are just plain dangerous and should never be used.
Now back to the typical question that would result in a referral here:
I have a file containing 1 line:
what isgoingon
and when I print it using this awk script to reverse the order of the fields:
awk '{print $2, $1}' file
instead of seeing the output I expect:
isgoingon what
I get the field that should be at the end of the line appear at the start of the line, overwriting some text at the start of the line:
whatngon
or I get the output split onto 2 lines:
isgoingon
what
What could the problem be and how do I fix it?
The problem is that your input file uses DOS line endings of CRLF instead of UNIX line endings of just LF and you are running a UNIX tool on it so the CR remains part of the data being operated on by the UNIX tool. CR is commonly denoted by \r and can be seen as a control-M (^M) when you run cat -vE on the file while LF is \n and appears as $ with cat -vE.
So your input file wasn't really just:
what isgoingon
it was actually:
what isgoingon\r\n
as you can see with cat -v:
$ cat -vE file
what isgoingon^M$
and od -c:
$ od -c file
0000000 w h a t i s g o i n g o n \r \n
0000020
so when you run a UNIX tool like awk (which treats \n as the line ending) on the file, the \n is consumed by the act of reading the line, but that leaves the 2 fields as:
<what> <isgoingon\r>
Note the \r at the end of the second field. \r means Carriage Return which is literally an instruction to return the cursor to the start of the line so when you do:
print $2, $1
awk will print isgoingon and then will return the cursor to the start of the line before printing what which is why the what appears to overwrite the start of isgoingon.
To fix the problem, do either of these:
dos2unix file
sed 's/\r$//' file
awk '{sub(/\r$/,"")}1' file
perl -pe 's/\r$//' file
Apparently dos2unix is aka frodos in some UNIX variants (e.g. Ubuntu).
Be careful if you decide to use tr -d '\r' as is often suggested as that will delete all \rs in your file, not just those at the end of each line.
Note that GNU awk will let you parse files that have DOS line endings by simply setting RS appropriately:
gawk -v RS='\r\n' '...' file
but other awks will not allow that as POSIX only requires awks to support a single character RS and most other awks will quietly truncate RS='\r\n' to RS='\r'. You may need to add -v BINMODE=3 for gawk to even see the \rs though as the underlying C primitives will strip them on some platforms, e.g. cygwin.
One thing to watch out for is that CSVs created by Windows tools like Excel will use CRLF as the line endings but can have LFs embedded inside a specific field of the CSV, e.g.:
"field1","field2.1
field2.2","field3"
is really:
"field1","field2.1\nfield2.2","field3"\r\n
so if you just convert \r\ns to \ns then you can no longer tell linefeeds within fields from linefeeds as line endings so if you want to do that I recommend converting all of the intra-field linefeeds to something else first, e.g. this would convert all intra-field LFs to tabs and convert all line ending CRLFs to LFs:
gawk -v RS='\r\n' '{gsub(/\n/,"\t")}1' file
Doing similar without GNU awk left as an exercise but with other awks it involves combining lines that do not end in CR as they're read.
Also note that though CR is part of the [[:space:]] POSIX character class, it is not one of the whitespace characters included as separating fields when the default FS of " " is used, whose whitespace characters are only tab, blank, and newline. This can lead to confusing results if your input can have blanks before CRLF:
$ printf 'x y \n'
x y
$ printf 'x y \n' | awk '{print $NF}'
y
$
$ printf 'x y \r\n'
x y
$ printf 'x y \r\n' | awk '{print $NF}'
$
That's because trailing field separator white space is ignored at the beginning/end of a line that has LF line endings, but \r is the final field on a line with CRLF line endings if the character before it was whitespace:
$ printf 'x y \r\n' | awk '{print $NF}' | cat -Ev
^M$
You can use the \R shorthand character class in PCRE for files with unknown line endings. There are even more line ending to consider with Unicode or other platforms. The \R form is a recommended character class from the Unicode consortium to represent all forms of a generic newline.
So if you have an 'extra' you can find and remove it with the regex s/\R$/\n/ will normalize any combination of line endings into \n. Alternatively, you can use s/\R/\n/g to capture any notion of 'line ending' and standardize into a \n character.
Given:
$ printf "what\risgoingon\r\n" > file
$ od -c file
0000000 w h a t \r i s g o i n g o n \r \n
0000020
Perl and Ruby and most flavors of PCRE implement \R combined with the end of string assertion $ (end of line in multi-line mode):
$ perl -pe 's/\R$/\n/' file | od -c
0000000 w h a t \r i s g o i n g o n \n
0000017
$ ruby -pe '$_.sub!(/\R$/,"\n")' file | od -c
0000000 w h a t \r i s g o i n g o n \n
0000017
(Note the \r between the two words is correctly left alone)
If you do not have \R you can use the equivalent of (?>\r\n|\v) in PCRE.
With straight POSIX tools, your best bet is likely awk like so:
$ awk '{sub(/\r$/,"")} 1' file | od -c
0000000 w h a t \r i s g o i n g o n \n
0000017
Things that kinda work (but know your limitations):
tr deletes all \r even if used in another context (granted the use of \r is rare, and XML processing requires that \r be deleted, so tr is a great solution):
$ tr -d "\r" < file | od -c
0000000 w h a t i s g o i n g o n \n
0000016
GNU sed works, but not POSIX sed since \r and \x0D are not supported on POSIX.
GNU sed only:
$ sed 's/\x0D//' file | od -c # also sed 's/\r//'
0000000 w h a t \r i s g o i n g o n \n
0000017
The Unicode Regular Expression Guide is probably the best bet of what the definitive treatment of what a "newline" is.
Run dos2unix. While you can manipulate the line endings with code you wrote yourself, there are utilities which exist in the Linux / Unix world which already do this for you.
If on a Fedora system dnf install dos2unix will put the dos2unix tool in place (should it not be installed).
There is a similar dos2unix deb package available for Debian based systems.
From a programming point of view, the conversion is simple. Search all the characters in a file for the sequence \r\n and replace it with \n.
This means there are dozens of ways to convert from DOS to Unix using nearly every tool imaginable. One simple way is to use the command tr where you simply replace \r with nothing!
tr -d '\r' < infile > outfile

SED - insert a blank line after every input line that consists of capital letters and spaces

I have a text file and I need a command using sed to insert a blank line after every line that that consists of capital letters and spaces only.
This might work for you (GNU sed):
sed '/^[[:blank:][:upper:]][[:blank:][:upper:]]*$/G' file
This appends the hold space (by default it contains a newline) to lines containing at least one or more whitespace or uppercase characters.
Given:
$ cat file
LINE LINE LINE
Line Line Line
Line 1
LINE 2
END!
====
You can use s/// to add a \n to the line:
With POSIX sed, use a literal new line in the sed script:
$ sed 's/^\([[:upper:][:blank:]]*\)$/\1\
/' file
LINE LINE LINE
Line Line Line
Line 1
LINE 2
END!
====
With GNU sed, you can use the representation of \n:
$ sed 's/^\([[:upper:][:blank:]]*\)$/\1\n/' file
You can also use a\ to append in sed. I have tried to get sed append to work but cannot reliably with POSIX, BSD and GNU sed since POSIX and BSD do not support \n
With GNU sed (note space after a\):
$ sed '/^[[:upper:][:blank:]]*$/a\ ' file
BSD:
$ sed '/^[[:upper:][:blank:]]*$/a\
\
' file
Those are not exactly equivalent since the GNU version has a space on the blank line.
The version of POSIX sed I have did not work with either of those...
Given the platform and version differences of sed, you might consider awk to do this since simple awk's are easier to make universal.
This works on every awk I have:
$ awk '1; /^[[:upper:][:blank:]]*$/{print ""}' file
With awk you can also make it so that blank lines are not doubled by making sure there is at least non blank like so:
$ awk '1; /^[[:upper:][:blank:]]+$/ && NF>1 {print ""}' file
Sure. Just insert lines with a:
sed '/^[[:blank:]A-Z]*$/a\'
The a command inserts the string after it after every matching line (end the string with a backslash). So the above command just inserts an empty line after all lines that contains solely of capital letters and spaces. That's exactly what you want.

Hidden line in file?

I have a UTF-8/no BOM file (converted from ISO-8859-1) that has 31214 lines. I have already run dos2unix on the file. When I open it in notepad++, I see a blank line underneath. When I remove this blank line, the line count reduces by one. I save it under a different name and when I tail the file, the prompt displays on the same line. From bash, how do I delete the blank line in the 1st file to produce the result displayed below in the 2nd file?
The goal is to do this from bash w/o manually deleting the line in notepad++
1st file:
[user#server]$ cat file1.txt | wc -l
31214
[user#server]$ tail file1.txt
T 31212 Data 20170517
[user#server]$
2nd file (edited with notepad++)
[user#server]$ cat file2.txt | wc -l
31213
[user#server]$ tail file2.txt
T 31212 Data 20170517[user#server]$
That's the trailing newline of the last line. Some editors allow you to go to the nonexisting "empty" line at the end, some don't show it. Again, some programs may allow you to remove the final newline, but note that e.g. POSIX in effect requires it to be there, and some standard utilities act oddly if it isn't present.
E.g. wc -l counts the number of newlines in the input file (printf "foo\nbar" | wc -l shows 1) so removing the final newline does decrease the line count.
Also, Bash prints the prompt wherever it was that the cursor was left on the screen, so if you print something that doesn't have the trailing newline, the prompt will be placed where the final incomplete line ended, as you saw.
There's no need to remove that final newline, just leave it there.
To remove the final newline character it is possible, as explained here, to use
sed -i '$ s/.$//' your.file
which will substitute nothing for the last character in the last line of the file (if you want to delete smth else from the end of the file you can replace the regex .$ with smth-else$). -i means ‘substitute in-place’ (in FreeBSD/MacOS you need to add an empty string as an argument: sed -i "" '$ s/.$//' your.file)
The file2.txt is missing a trailing newline.
Yes, a text file should end on a newline character.
Given that you do know that a trailing newline is missing, this command should be enough to correct the problem:
$ echo >> file2.txt

Replace first six commas for each line in a text file

I want to replace the first six , for each line in a text file using sed or something similar in linux.
There are more than six , on each line, but only the first six should be replaced by |.
Sed doesn't really support the notion of "the first n occurrences", only "the n-th occurrence"; GNU sed has one for "replace all matches from the n-th on", which is not what you want in this case. To get the first six commas replaced, you have to call the s command six times:
sed 's/,/|/;s/,/|/;s/,/|/;s/,/|/;s/,/|/;s/,/|/' infile
If, however, you know that there are no | in the file and you have GNU sed, you can do this:
sed 's/,/|/g;s/|/,/7g' infile
This replaces all commas with pipes, then turns the pipes from the 7th on back to commas.
If you do have pipes beforehand, you can turn them into something that you know isn't in the string first:
sed 's/|/~~/g;s/,/|/g;s/|/,/7g;s/~~/|/g' infile
This makes all | into ~~ first, then all , into |, then the | from the 7th on back into ,, and finally the ~~ back into |.
Testing on this input file:
,,,,,,X,,,,,,
,,,|,,,|,,,|,,,|
the first and third command result in
||||||X,,,,,,
||||||||,,,|,,,|
The second one would fail on the second line because there are already pipe characters.
This might work for you (GNU sed):
sed 'y/,/\n/;s/\n/,/7g;y/\n/|/' file
Translate all ,'s to \n's, then replace from the seventh \n to the end of line by ,'s, then replace the remaining \n's by |'s.
Use the following pattern in sed: sed 's/old/new/<number>'
Where <number> is the number of times you want this pattern applied.
You can replace <number> with g to apply the pattern to all occurrences.
You can try this sed,
sed -r ':loop; s/^([^,]*),/\1|/g; /^([^|]*\|){6}/t; b loop' file
(OR)
sed ':loop; s/^\([^,]*\),/\1|/g; /^\([^|]*|\)\{6\}/t; b loop' file
Test:
$ cat file
a,b,c,d,e,f,g,h,i,j,k
$ sed -r ':loop; s/^([^,]*),/\1|/g; /^([^|]*\|){6}/t; b loop' file
a|b|c|d|e|f|g,h,i,j,k
Note: This will work only if you do not have any pipe(|) before that.

Replacing multiple line using sed command

I have a text file file.log contains following text
file.log
ab
cd
ef
I want to replace "ab\ncd" with "ab\n" and the final file.log should look like this:
ab
ef
This is the sed command I am using but it couldn't recognize the newline character to match the pattern:
sed -i 's/\(.*\)\r \(.*\)/\1\r/g' file.log
with 3 character space after '\r' but no change is made with this.
\(.*\) - This matches any character(.) followed by 0 or more (*) of the preceding character
\r - For newline
\1 - Substitution for the first matching pattern. In this case, it's 'ab'
Can you help me out what's wrong with the above command.
The issue is that, the sed is a stream editor, which reads line by line from the input file
So when it reads line
ab
from the input file, it doesnt know whether the line is followed by a line
cd
When it reads the line cd it sed will habe removed the line ab from the pattern space, this making the pattern invalid for the current pattern space.
Solution
A solution can be to read the entire file, and append them into the hold space, and then replace the hold space. As
$ sed -n '1h; 1!H;${g;s/ab\ncd/ab\n/g;p}' input
ab
ef
What it does
1h Copies the first line into the hold space.
1!H All lines excpet the first line (1!) appends the line to the hold space.
$ matches the last line, performs the commands in {..}
g copies the contents of hold space back to pattern space
s/ab\ncd/ab\n/g makes the substitution.
p Prints the entire patterns space.
Sed processes the input file line by line. So can't do like the above . You need to include N, so that it would append the next line into pattern space.
$ sed 'N;s~ab\ncd~ab\n~g' file
ab
ef
A couple of other options:
perl -i -0pe 's/^ab\n\Kcd$//mg' file.log
which will change any such pattern in the file
If there's just one, good ol' ed
ed file.log <<END_SCRIPT
/^ab$/+1 c
.
wq
END_SCRIPT

Resources