What is ^M and how is it generated? - vim

Recently, someone (who is using Windows10) ask me that his VSCode's remote (ssh) connect is unavailable. After a bunch of checking, I found his ~/.ssh/authorized_keys ends with "^M" (in Vim) and removing that symbol resolves his problem.
To modify/remove "^M" is easy. But this time I'd like to figure out what is "^M" and how it is generated? Knowing how it is generated would help people avoid generating "^M" and related issues.

^M is Vim's representation of ASCII 13 (M being the 13th letter of the alphabet), carriage return. ssh assumes that the file will use Unix line endings, so it treats the CR of the CR/LF pair in the DOS file as a regular character, rather than ignoring it as a line terminator. Removing the ^M essentially converts the file from a DOS text file to the POSIX text file that ssh expects.

Related

why Linux tools display the CR character as `^M`? [duplicate]

This question already has answers here:
What does the ^M character mean in Vim?
(15 answers)
Closed 2 years ago.
I'm new to Linux sorry if my question sounds dumb.
We know that Linux and Mac OS X use \n (0xa), which is the ASCII line feed (LF) character. MS Windows and Internet protocols such as HTTP use the sequence \r\n (0xd 0xa). If you create a file foo.txt in Windows and then view it in a Linux text editor, you’ll see an annoying ^M at the end of each line, which is how Linux tools display the CR character.
Bu why Linux tools display the CR character as ^M? as my understanding is, \r (carriage return) is to move the cursor in the beginning of the current line, so the sensible approach to display it is like, when you open the file, you see the cursor is in the beginning of the line(that contains \r), so ^M shouldn't be displayed?
PS: some people post answers that how to remove ^M, but I wnat to know why eventually^M is displayed rather than moving the cursor in the beginning, which is the definition of carriage return.
The ASCII control characters like TAB, CR, NL and others are intended to control the printing position of a teletypewriter-like display device.
A text editor isn't such a device. It is not appropriate for a text editor to treat a CR character literally as meaning "go to the first column"; it would make a confusing gibberish out of the editing experience.
A text editor works by parsing a text file's representation, to create an internal representation which is presented to the user. On Unix-like operating systems, a file is represented by zero or more lines, which are terminated by the ASCII NL character. Any CR characters that occur just look like part of the data, and not part of the line separation.
Not all editors behave the same way. For instance, the Vim editor will detect that a file uses CR-LF line endings, and load it properly using that representation. A flag is set for that buffer which indicates that it's a "DOS" file, so that when you save it, the same representation is reproduced.
That said, there is a feature in the actual Linux kernel for representing control characters like CR using the ^M notation. The TTY line discipline for any given TTY device can be configured to print characters in this notation, but only when echoing back the characters received.
Demo:
$ stty echoctl # turn on notational echo of control characters
$ cat # run some non-interactive program with rudimentary line input
^F^F^F^F^F^F
^C
$
Above, the Ctrl-F that I entered was echoed back as ^F. So, in fact there is a "Linux editor" which uses this notation: the rudimentary line editor of the "canonical input mode" line discipline.

Why does `^M` appear in terminal output when looking at some files?

I'm trying to send file using curl to an endpoint and save the file to the machine.
Sending curl from Linux and saving it on the machine works well,
but doing the same curl from Windows is adding ^M character to every end of line.
I'm printing the file before saving it and can't see ^M. Only viewing the file on the remote machine after saving it shows me ^M.
A simple string replacement doesn't seem to work.
Why is ^M being added? How can I prevent this?
Quick Answer: That's a carriage return. They're a harmless but mildly irritating artifact of how Windows encodes text files. You can strip them out of your files with dos2unix. You can configure most text editors to use "Unix Line Endings" or "LF Line Endings" to prevent them from appearing in new files that you create from Windows PCs in the future.
Long Answer (with some historical trivia):
In a plain text file, when you create a new line (by pressing enter/return), a "line break" is embedded in the file. On Unix/Linux, this is a single character, '\n', the "line feed". On Windows, this is two sequential characters, '\r\n', the "carriage return" followed by the "line feed".
When physical teletype terminals, which behaved much like typewriters, were still in use, the "line feed" character meant "move the paper up to the next line" and the "carriage return" character meant "slide the carriage all the way over so the typing head is on the far left". From the very beginning, nearly all teletype terminals supported implicit carriage return; i.e., triggering a line feed would automatically trigger a carriage return. The developers working on what later evolved into Windows decided that it would be best to include explicit carriage returns, just in case (for some reason) the teletype does not perform one implicitly. The Unix developers, on the other hand, chose to work with the assumption of implicit carriage return.
The carriage return and line feed are ASCII Control Characters which means they do not have a visible representation as standalone printable characters, instead they affect the output cursor itself (in this case, the position of the output cursor).
The "^M" you see is a stand-in representation for the carriage return character, used by programs that don't fully "cook" their output (i.e., don't apply the effects of some ASCII Control Characters). (Other control characters have other representations starting with "^", and the "^" character is also used to represent the "ctrl" keyboard key in some Unix programs like nano.)
You can use dos2unix to convert the line endings from Windows-style to Unix-style.
$ curl https://example.com/file_with_crlf.txt | dos2unix > file.txt
On some distros, this tool is included by default, on others it can be installed via the package manager (e.g., on Ubuntu, sudo apt install dos2unix). There also exists a package, unix2dos, for the inverse.
Most "smart" text editors for coding (Sublime, Atom, VS Code, Notepad++, etc.) will happily read and write with either Windows-style or Unix-style line endings (this might require changing some configuration options). Often, the line-endings are auto-detected by scanning the contents of a file, and usually new files are created with the Operating System's native line endings (by default). Even the new version of Notepad supports Unix-style line endings. On the other hand, some Unix tools will produce strange results in the presence of Windows-style line breaks. If your codebase will be used by people on both Unix and Windows operating systems, the nice thing to do is to use Unix-style line endings everywhere.
Git on Windows also has an optional mode that checks out all files with Windows-style line breaks, but checks them back in with Unix-style line breaks.
Side Notes (interesting, but not directly related to your question):
What the carriage return actually does (on a modern virtual terminal, be it Windows or Unix) is move the output cursor to the beginning of the line. If you use the carriage return without a line feed, you can "overwrite" part of a string that has already been printed.
$ printf "dogdog" ; printf "\rcat\n"
catdog
Some Unix programs use this to asynchronously update part of the last line of output, to implement things like a live-updating progress indicator. For example, curl, which shows download progress on stdout if the file contents are piped elsewhere.
Also: If you had a tool that interpreted Windows-style line endings as literally as possible, and you fed it a string with Unix-style line endings such as "hello\nworld", you would get output like this:
hello
world
Fortunately, such implementations are extremely rare and, in general, the vast majority of Windows tools can render Unix-style line-endings identically to Windows-style line endings without any problem.

Why is vim stripping the carriage return when I copy a line to another file?

I sorted a file a.csv into b.csv.
I noticed that the sizes of the files differed, and after noticing that b.csv was exactly n bytes smaller (where n is the number of lines in a.csv), I immediately suspected that a.csv contained those pesky \r.
The .py script for sorting contained the line line.strip() which removed the carriage returns and then afile.write(line2 + '\n') which wrote newlines but not carriage returns.
Ok. Makes sense.
The strange bit is that when I vim'd a.csv, I didn't see the ^M like I usually do (maybe the reason lies in a configuration file), so I only found out about the \r from opening the file in a hex editor.
The more interesting bit, is that I would take a small subset of a.csv (3y) and paste it to a testfile (p).
Sorting the testfile resulted in a file of the exact same size as the original.
From xxding, I see that there is no \r in the new testfile.
When I yank a line that contains a carriage return and paste it into another file, the pasted line does not contain the carriage return. Why?
I tested this on Windows (Cygwin), and it does appear to copy the \r. But on the Linux machine I'm using, it doesn't.
How come?
Edit:
I tried reproducing the issue on another linux machine, but I couldn't. It appears to be a configuration thing - some file somewhere telling vim to do that.
Vim's model of a loaded file is a sequence of lines, each consisting of a sequence of characters. In this model, newlines aren't themselves characters. So when you're copying lines of text, you're not copying the CRs or LFs. Vim also stores a number of other pieces of information which are used to write the file back out again, principally:
fileformat can be unix, dos or mac. This determines what end-of-line character will be written at the end of each line.
endofline can be on or off. This determines if the last line of the file has an end-of-line character.
bomb can be on or off. This determines if a byte order mark is written at the start of the first line.
fileencoding specifies what character encoding will be used to store the file, such as utf-8.
Normally these are all auto-detected upon loading the file. In particular, fileformat will be auto-detected depending on the settings in fileformats option, which may be configured differently on different platforms. However, sometimes things can go wrong. The most common problem is that a file might have mixed line-endings, and that's when you'll start seeing ^M floating around. In this case, Vim has loaded the file as if it's in unix format - it treated the LFs as the line separators and the CRs as just normal characters. You can see which mode Vim has opened the file in by entering :set fileformat? or just set ff? for short.
Vim detects the newline style (Windows CR-LF vs. Unix LF) when opening the file (according to the 'fileformats' option), and uses the detected 'fileformat' value for all subsequent saves. So, the newline style is a property of the Vim buffer / opened file. When you yank line(s) from one buffer and paste it into another, the newline style isn't kept; instead, the newline style of the target buffer is used, as this makes much more sense.

ksh "." operator is doing string replacement instead of concatenation

I was debugging a script in which I found the following weird behavior. The script is simply setting some variables by sourcing another file, then the values of these variables are used to run the main script command.
The first file has the following line:
export PROJECT=ABCD1234
The script then sources this file thought the following line:
. file_path
Later in the script, the script is using the $PROJECT variable in the following statement:
cd $PROJECT.proj #expecting to do string concatenation
The problem here is that $PROJECT.proj doesn't result in "ABCD1234.proj", actually it does string replacement instead of string concatenation, so $PROJECT.proj equals .proj234!!
I suspected that there might be some special hidden characters in the first file that cause this behavior, so I rewrote the file using gvim instead of nedit & it worked.
Does anybody have any idea how this happened??
Anytime you are creating files on Windows and then moving or using them on a Unix/Linux like environment, be sure to convert your files so they work properly on unix/linux.
Use the dos2unix utility for this, i.e.
dos2unix file [file1 file2 file3 .... myFile*]
As many files as will fit on the cmd line.
(I'll be back to flesh this out after I eat ; -)
Disappearing characters like
ABCD1234.proj
but getting some, but not all, like
proj234
Are often the result of the Windows line-ending characters conflicting with Unix/Linux line-ending character. Windows uses ^M^J (\r\n), where as unix/linux uses just ^J (\n).
OR
Ctrl oct hex dec abbrev
^J 012 0a 10 nl
^M 015 od 13 cr
cr = Carriage Return
Think of the old typewriters, it is a two step process.
The lever both moves the platen back to the left margin AND it advances the paper so the next line will be typed on. CR returns the carriage to the left margin, will new-line advances the printing to the next line.
Unix assumes there is an implied CR with an NL, so having a CR confuses things and makes it easy for your system to overwrite data (or maybe it just the display of data, I don't have time to test right now).
IHTH

Why would Vim add a new line at the end of a file?

I work with Wordpress a lot, and sometimes I changed Wordpress core files temporarily in order to understand what is going on, especially when debugging. Today I got a little surprise. When I was ready to commit my changes to my git repository, I noticed that git status was marking one of Wordpress files as not staged for commit. I remember I had reverted all the changes I did to that file before closing it, so I decided to use diff to see what had changed. I compared the file on my project with the file on the Wordpress copy that I keep in my downloads directory. It turns out the files differ at the very end. diff indicates that the there is a newline missing at the end of the original file:
1724c1724
< }
\ No newline at end of file
---
> }
I never even touched that line. The changes I made where somewhere in the middle of a large file. This leads me to think that vim added a newline character at the end of the file. Why would that happen?
All the answers I've seen here address the question "how could I prevent Vim from adding a newline character at the end of the file?", while the question was "Why would Vim add a new line at the end of a file?". My browser's search engine brought me here, and I didn't find the answer to that question.
It is related with how the POSIX standard defines a line (see Why should files end with a newline?). So, basically, a line is:
3.206 Line
A sequence of zero or more non- <newline> characters plus a terminating <newline> character.
And, therefore, they all need to end with a newline character. That's why Vim always adds a newline by default (because, according to POSIX, it should always be there).
It is not the only editor doing that. Gedit, the default text editor in GNOME, does the same exact thing.
Edit
Many other tools also expect that newline character. See for example:
How wc expects it.
GCC warns about it.
Also, you may be interested in: Vim show newline at the end of file.
Because vim is a text editor, it can sometimes "clean up" files for you. See http://vimhelp.appspot.com/vim_faq.txt.html#faq-5.4 for details on how to write without the ending newline, paraphrased below:
How do I write a file without the line feed (EOL) at the end of the file?
You can turn off the eol option and turn on the binary option to write a file without the EOL at the end of the file:
   :set binary
   :set noeol
   :w
Alternatively, you can use:
   :set noeol
   :w ++bin
Adding a newline is the default behavior for Vim. If you don't need it, then use this solution: VIM Disable Automatic Newline At End Of File
To disable, add this to your .vimrc
set fileformats+=dos
You can put the following line into your .vimrc
autocmd FileType php setlocal noeol binary
Which should do the trick, but actually your approach is somewhat wrong. First of all php won't mind that ending at all and secondly if you don't want to save your changes don't press u or worse manually try to recreate the state of the file, but just quit without saving q!. If you left the editor and saved for some reason, try git checkout <file>
3.206 Line
A sequence of zero or more non- characters plus a terminating character.
Interestingly, vim will allow you to open a new file, write the file, and the file will be zero bytes. If you open a new file and append a line using o then write the file it will be two characters long. If you open said file back up and delete the second line dd and write the file it will be one byte long. Open the file back up and delete the only line remaining and write the file it will be zero bytes. So vim will let you write a zero byte file only as long as it is completely empty. Seems to defy the posix definition above. I guess...

Resources