Why is vim stripping the carriage return when I copy a line to another file? - vim

I sorted a file a.csv into b.csv.
I noticed that the sizes of the files differed, and after noticing that b.csv was exactly n bytes smaller (where n is the number of lines in a.csv), I immediately suspected that a.csv contained those pesky \r.
The .py script for sorting contained the line line.strip() which removed the carriage returns and then afile.write(line2 + '\n') which wrote newlines but not carriage returns.
Ok. Makes sense.
The strange bit is that when I vim'd a.csv, I didn't see the ^M like I usually do (maybe the reason lies in a configuration file), so I only found out about the \r from opening the file in a hex editor.
The more interesting bit, is that I would take a small subset of a.csv (3y) and paste it to a testfile (p).
Sorting the testfile resulted in a file of the exact same size as the original.
From xxding, I see that there is no \r in the new testfile.
When I yank a line that contains a carriage return and paste it into another file, the pasted line does not contain the carriage return. Why?
I tested this on Windows (Cygwin), and it does appear to copy the \r. But on the Linux machine I'm using, it doesn't.
How come?
Edit:
I tried reproducing the issue on another linux machine, but I couldn't. It appears to be a configuration thing - some file somewhere telling vim to do that.

Vim's model of a loaded file is a sequence of lines, each consisting of a sequence of characters. In this model, newlines aren't themselves characters. So when you're copying lines of text, you're not copying the CRs or LFs. Vim also stores a number of other pieces of information which are used to write the file back out again, principally:
fileformat can be unix, dos or mac. This determines what end-of-line character will be written at the end of each line.
endofline can be on or off. This determines if the last line of the file has an end-of-line character.
bomb can be on or off. This determines if a byte order mark is written at the start of the first line.
fileencoding specifies what character encoding will be used to store the file, such as utf-8.
Normally these are all auto-detected upon loading the file. In particular, fileformat will be auto-detected depending on the settings in fileformats option, which may be configured differently on different platforms. However, sometimes things can go wrong. The most common problem is that a file might have mixed line-endings, and that's when you'll start seeing ^M floating around. In this case, Vim has loaded the file as if it's in unix format - it treated the LFs as the line separators and the CRs as just normal characters. You can see which mode Vim has opened the file in by entering :set fileformat? or just set ff? for short.

Vim detects the newline style (Windows CR-LF vs. Unix LF) when opening the file (according to the 'fileformats' option), and uses the detected 'fileformat' value for all subsequent saves. So, the newline style is a property of the Vim buffer / opened file. When you yank line(s) from one buffer and paste it into another, the newline style isn't kept; instead, the newline style of the target buffer is used, as this makes much more sense.

Related

Why does `^M` appear in terminal output when looking at some files?

I'm trying to send file using curl to an endpoint and save the file to the machine.
Sending curl from Linux and saving it on the machine works well,
but doing the same curl from Windows is adding ^M character to every end of line.
I'm printing the file before saving it and can't see ^M. Only viewing the file on the remote machine after saving it shows me ^M.
A simple string replacement doesn't seem to work.
Why is ^M being added? How can I prevent this?
Quick Answer: That's a carriage return. They're a harmless but mildly irritating artifact of how Windows encodes text files. You can strip them out of your files with dos2unix. You can configure most text editors to use "Unix Line Endings" or "LF Line Endings" to prevent them from appearing in new files that you create from Windows PCs in the future.
Long Answer (with some historical trivia):
In a plain text file, when you create a new line (by pressing enter/return), a "line break" is embedded in the file. On Unix/Linux, this is a single character, '\n', the "line feed". On Windows, this is two sequential characters, '\r\n', the "carriage return" followed by the "line feed".
When physical teletype terminals, which behaved much like typewriters, were still in use, the "line feed" character meant "move the paper up to the next line" and the "carriage return" character meant "slide the carriage all the way over so the typing head is on the far left". From the very beginning, nearly all teletype terminals supported implicit carriage return; i.e., triggering a line feed would automatically trigger a carriage return. The developers working on what later evolved into Windows decided that it would be best to include explicit carriage returns, just in case (for some reason) the teletype does not perform one implicitly. The Unix developers, on the other hand, chose to work with the assumption of implicit carriage return.
The carriage return and line feed are ASCII Control Characters which means they do not have a visible representation as standalone printable characters, instead they affect the output cursor itself (in this case, the position of the output cursor).
The "^M" you see is a stand-in representation for the carriage return character, used by programs that don't fully "cook" their output (i.e., don't apply the effects of some ASCII Control Characters). (Other control characters have other representations starting with "^", and the "^" character is also used to represent the "ctrl" keyboard key in some Unix programs like nano.)
You can use dos2unix to convert the line endings from Windows-style to Unix-style.
$ curl https://example.com/file_with_crlf.txt | dos2unix > file.txt
On some distros, this tool is included by default, on others it can be installed via the package manager (e.g., on Ubuntu, sudo apt install dos2unix). There also exists a package, unix2dos, for the inverse.
Most "smart" text editors for coding (Sublime, Atom, VS Code, Notepad++, etc.) will happily read and write with either Windows-style or Unix-style line endings (this might require changing some configuration options). Often, the line-endings are auto-detected by scanning the contents of a file, and usually new files are created with the Operating System's native line endings (by default). Even the new version of Notepad supports Unix-style line endings. On the other hand, some Unix tools will produce strange results in the presence of Windows-style line breaks. If your codebase will be used by people on both Unix and Windows operating systems, the nice thing to do is to use Unix-style line endings everywhere.
Git on Windows also has an optional mode that checks out all files with Windows-style line breaks, but checks them back in with Unix-style line breaks.
Side Notes (interesting, but not directly related to your question):
What the carriage return actually does (on a modern virtual terminal, be it Windows or Unix) is move the output cursor to the beginning of the line. If you use the carriage return without a line feed, you can "overwrite" part of a string that has already been printed.
$ printf "dogdog" ; printf "\rcat\n"
catdog
Some Unix programs use this to asynchronously update part of the last line of output, to implement things like a live-updating progress indicator. For example, curl, which shows download progress on stdout if the file contents are piped elsewhere.
Also: If you had a tool that interpreted Windows-style line endings as literally as possible, and you fed it a string with Unix-style line endings such as "hello\nworld", you would get output like this:
hello
world
Fortunately, such implementations are extremely rare and, in general, the vast majority of Windows tools can render Unix-style line-endings identically to Windows-style line endings without any problem.

Vim don't show last empty line

I have written some c code, and the end of my output is '\n', when I check the output text file in vim, I cannot find the last empty line, however, when I open it with another text viewer, I can find the last empty line. How can I configure my vim to show the empty line?
The way Vim shows \n / 0x0a at the end of the file is that it opens the file without complaining about [noeol] when :editing the file (in a kind of "reverse logic" from what you expect). Vim's (and Unix) philosophy is that the trailing newline should be there. This can be confusing when one is used to other editors or predominantly works on MS Windows.
There's a lot of discussion and questions about this (e.g. here); as this is unlikely to change, get used to it.

Vim show and be able to delete 0x0a at end of file

This is driving me nuts. Similar SO questions don't contain the answer, though, so here it goes again in slightly different form:
Is there a way to:
Make vim show 0x0a at the end of file as a blank line?
Supposing #1 can't be done, how do I delete the eol? There is no line, so there is nothing to delete.
For example:
vim -b myfile (currently no eol)
Add blank line at the end of file, :w :q
vim -b myfile - the blank line is gone, but hexdump shows 0x0a is still there. This is inconsistent behaviour.
I don't know how to see the last blank line, but to remove just open your file, the run:
:set binary noendofline
This will remove the last (invisible) blank line from it.
Warning: because of binary some settings will be modified (for example textwidth)!
You can use set noeol in binary mode:
:help noeol
When writing a file and this option is off and the 'binary' option
is on, no <EOL> will be written for the last line in the file. This
option is automatically set when starting to edit a new file, unless
the file does not have an <EOL> for the last line in the file, in
which case it is reset.
see also: Vim show newline at the end of file
The way Vim shows 0x0a at the end of the file is that it opens the file without complaining about [noeol] when :editing the file (in a kind of "reverse logic" from what you expect). As you've probably read already, Vim's (and Unix) philosophy is that the trailing newline should be there.
Based on this philosophy, I wouldn't recommend intentionally creating files without a trailing newline. However, there are ways to make Vim respect and maintain such existing files. My PreserveNoEOL plugin provides a way to do this effortlessly.

Why would Vim add a new line at the end of a file?

I work with Wordpress a lot, and sometimes I changed Wordpress core files temporarily in order to understand what is going on, especially when debugging. Today I got a little surprise. When I was ready to commit my changes to my git repository, I noticed that git status was marking one of Wordpress files as not staged for commit. I remember I had reverted all the changes I did to that file before closing it, so I decided to use diff to see what had changed. I compared the file on my project with the file on the Wordpress copy that I keep in my downloads directory. It turns out the files differ at the very end. diff indicates that the there is a newline missing at the end of the original file:
1724c1724
< }
\ No newline at end of file
---
> }
I never even touched that line. The changes I made where somewhere in the middle of a large file. This leads me to think that vim added a newline character at the end of the file. Why would that happen?
All the answers I've seen here address the question "how could I prevent Vim from adding a newline character at the end of the file?", while the question was "Why would Vim add a new line at the end of a file?". My browser's search engine brought me here, and I didn't find the answer to that question.
It is related with how the POSIX standard defines a line (see Why should files end with a newline?). So, basically, a line is:
3.206 Line
A sequence of zero or more non- <newline> characters plus a terminating <newline> character.
And, therefore, they all need to end with a newline character. That's why Vim always adds a newline by default (because, according to POSIX, it should always be there).
It is not the only editor doing that. Gedit, the default text editor in GNOME, does the same exact thing.
Edit
Many other tools also expect that newline character. See for example:
How wc expects it.
GCC warns about it.
Also, you may be interested in: Vim show newline at the end of file.
Because vim is a text editor, it can sometimes "clean up" files for you. See http://vimhelp.appspot.com/vim_faq.txt.html#faq-5.4 for details on how to write without the ending newline, paraphrased below:
How do I write a file without the line feed (EOL) at the end of the file?
You can turn off the eol option and turn on the binary option to write a file without the EOL at the end of the file:
   :set binary
   :set noeol
   :w
Alternatively, you can use:
   :set noeol
   :w ++bin
Adding a newline is the default behavior for Vim. If you don't need it, then use this solution: VIM Disable Automatic Newline At End Of File
To disable, add this to your .vimrc
set fileformats+=dos
You can put the following line into your .vimrc
autocmd FileType php setlocal noeol binary
Which should do the trick, but actually your approach is somewhat wrong. First of all php won't mind that ending at all and secondly if you don't want to save your changes don't press u or worse manually try to recreate the state of the file, but just quit without saving q!. If you left the editor and saved for some reason, try git checkout <file>
3.206 Line
A sequence of zero or more non- characters plus a terminating character.
Interestingly, vim will allow you to open a new file, write the file, and the file will be zero bytes. If you open a new file and append a line using o then write the file it will be two characters long. If you open said file back up and delete the second line dd and write the file it will be one byte long. Open the file back up and delete the only line remaining and write the file it will be zero bytes. So vim will let you write a zero byte file only as long as it is completely empty. Seems to defy the posix definition above. I guess...

Can you force Vim to show a blank line at the end of a file?

When I open a text file in Notepad, it shows a blank line if there is a carriage return at the end of the last line containing text. However, in Vim it does not show this blank line. Another thing I've noticed is that the Vim editor adds a carriage return to the last line by default (even though it doesn't show it). I can tell, because if I open a file in Notepad that was created in Vim, it shows a blank line at the end of the file.
Anyway, I can live with these two differences, but I'm wondering if there is an option in Vim that allows you to toggle this behaviour.
Thanks
PS - GVim 7.2
[Update]
Would this make sense to be on Server Fault instead?
[Update 2]
I'll rephrase this... I need to know when there is a carriage return at the end of single line file (Notepad shows an extra line with no text, with Vim I cannot tell). This is due to a Progress program that reads a text file (expects a single line, but with a carriage return) and parses the text for some purpose. If there is no carriage return, Progress treats the line as if it is null.
[Workaround Solution]
One way I've found to ensure there is a carriage return (but make sure I don't add a second one) is to make sure I have the end of line write option turned on (:set eol) and then just do a write/save. This will put an end of line in the file if it's not already there. Otherwise, it doesn't add a new one.
:help endofline
explains how you could stop vim from adding an extra newline.
It seems that vim treats newline as a line terminator, while notepad treats it as a line separator: from http://en.wikipedia.org/wiki/Newline
There is also some confusion whether
newlines terminate or separate lines.
If a newline is considered a
separator, there will be no newline
after the last line of a file. The
general convention on most systems is
to add a newline even after the last
line, i.e., to treat newline as a line
terminator. Some programs have
problems processing the last line of a
file if it isn't newline terminated.
Conversely, programs that expect
newline to be used as a separator will
interpret a final newline as starting
a new (empty) line. This can result in
a different line count being reported
for the file, but is otherwise
generally harmless.
If I recall correctly, on unix-y systems a text file must be terminated with a newline.
One useful Vim option is
set list
It will help you see all end of lines characters (and possibly other generally invisible chars). So you will be able to view this last endofline directly in Vim and not only in Notepad.
When you open the file in VIM the status line should say [noeol] after the filename. So that's one indication. As Manni said, you can change this by setting both the endofline option off and the binary option on. You can set this as your default settings in a .vimrc file.

Resources