Why save files with CR LF on Windows > 8? - linux

I develop and use Git on both Windows and Linux. When using IDEs or Git on Windows, I'm frequently prompted on whether to save files with CR LF or not.
I am doing mainly C# and JavaScript ES6 development which involves code that contains multi-line strings.
What reasons are there to save files with CR LF on Windows? Are CR-LFs mostly of historical significance? I have not yet noticed a drawback to working with UNIX \n line endings on Windows.

Windows batch files can malfunction when saved with newline endings because the goto command works by jumping to the appropriate offset in the script — but it is not computed correctly unless the lines end with carriage-return/line-feed.

Some Windows programs don't properly handle '\n' without '\r' but any decent editor or for that matter any decent program should handle them identically. But CRLF is traditionally the sanctioned way to do line endings on Windows and you might have compatibility issues if you don't.

Related

Can PowerShell Core handle ps1 files with CRLF line endings in Linux environments?

I have the following unanswered questions, and I am looking for documentation that explains the PowerShell core requirement for LF vs CRLF line ending in Linux environments.
1- Can PWSH Core handle files with CRLF?
2- When I run a ps1 file with an LF line ending, can it call another ps1 file with mixed CRLF line endings?
3- Is the PWSH line ending requirement documented and consistent across all Linux distributions?
I am asking the above questions since PowerShell is born in Windows environment and I expect it can somehow tolerate LF vs. CRLF discrepancies.
A link to online documentation would be a great help. I did search and surprisingly, I could not find any.
Just preserving the content of the link mentioned in the comments (source):
Line endings
Windows editors, including ISE, tend to use a carriage return followed by linefeed (\r\n or `r`n) at the end of each line. Linux editors use linefeed only (\n or `n).
Line endings are less important if the only thing reading the file is PowerShell (on any platform). However, if a script is set to executable on Linux, a shebang must be included and the line-ending character used must be linefeed only.
For example, a created as follows named test.ps1 must use \n to end lines:
#!/usr/bin/env powershell
Get-Process
The first line is the shebang and lets Linux know which parser to use when executing the shell script.
Once created, chmod may be used to make the script executable outside of PowerShell:
chmod +x test.ps1
According to this, PowerShell can handle both - CRLF and LF - even in the same file. The only exception is the shebang: it has to end with LF.
stackprotector's community wiki answer, inspired by a link provided by Ben Voigt, provides the gist of an answer to your questions.
Let me complement it by answering your questions one by one:
I am looking for documentation that explains the PowerShell core requirement for LF vs CRLF line ending in Linux environments.
As of this writing, there appears to be no such documentation:
The conceptual about_Special_Characters help topic discusses `r (CR) and `n (LF) separately, but not explicitly in terms of their potential function as newline characters / sequences (line breaks; the topic uses the term "new line" to refer to a LF character, specifically).
1- Can PWSH Core handle files with CRLF?
Yes - both PowerShell editions - the legacy, ships-with-Windows, Windows-only Windows PowerShell edition (whose latest and final version is v5.1), as well as the cross-platform, install-on-demand, PowerShell (Core) edition (v6+) - treat CRLF and LF newlines interchangeably - both with respect to reading source code and reading files[1] - even if the two newline formats are mixed in a single file.
While not documented as such, PowerShell has always worked this way, and, given its commitment to backward compatibility, this won't change (which in this case is definitely a blessing, given that PowerShell (Core) is now cross-platform and must be able to handle files with LF-only newlines on Unix-like platforms).
The following examples demonstrate this - they work the same on all supported platforms ("`n" creates a LF-only newline, "`r`n" a CRLF newline):
# Read a file with mixed newline formats.
PS> "one`ntwo`r`nthree" > temp.txt; (Get-Content temp.txt).Count; Remove-Item temp.txt
3
# Execute a script with mixed newline formats.
PS> "'one'`n'two'`r`n'three'" > temp.ps1; . ./temp.ps1; Remove-Item temp.ps1
one
two
three
2- When I run a ps1 file with an LF line ending, can it call another ps1 file with mixed CRLF line endings?
Yes - this follows from the above.
However, special considerations apply to shebang-line-based PowerShells scripts on Unix-like platforms:
Such stand-alone shell scripts - which needn't an arguably shouldn't have a .ps1 extension - are first read by the system on Unix-like platforms, and therefore require the shebang line - by definition the first line - to be terminated with a LF-only ("`n") newline, given that only LF by itself is considered a newline on Unix-like platforms.[2] All remaining lines are then read only by PowerShell, and any mix of CLRF and LF is then accepted, as usual; e.g.:
# Run on any Unix-like platform - note that `n alone must end the first line.
PS> "#!/usr/bin/env pwsh`n'one'`r`n'two'" > temp; chmod a+x temp; ./temp; Remove-Item temp
one
two
In practice, not least due to the not insignificant startup cost of pwsh, the PowerShell (Core) CLI, but also due to several bugs as of PowerShell 7.2.6, stand-alone shebang-line-based PowerShell scripts - which are primarily useful for being called from outside PowerShell - are rare.
3- Is the PWSH line ending requirement documented and consistent across all Linux distributions?
No, it isn't documented.
Yes, as implied by the above, it is consistent, not just across Linux distributions, but across all supported platforms.
[1] Even CR-only newlines - as used in long-obsolete legacy mac OS versions, which should therefore be avoided nowadays - are recognized in PowerShell source code and by Get-Content, but not by Measure-Object -Line, for instance.
[2] If the first line ends in CRLF, the CR (\r) is retained as part of the line and therefore as part of the target executable path or the last option passed to it, which breaks the invocation - see this answer for a real-life manifestation of this problem.
Being as cannonical as a I can be:
I worked on PowerShell v2-v3.
Newlines of either form were always meant to be interchangeable within PowerShell.
There was a good amount of Unix influence in the language. Being able to support both forms of newlines was near and dear to many a team members' heart. Being able to support it from the get-go prevented the possibility of making a script that couldn't run just because it was copied to mac or linux and saved in the wrong editor.
The importance of this feature has been proved again and again, and has obviously become mission critical now that PowerShell Core is a thing (because the scenario listed above is far more common). I'd wager that as long as PowerShell is a language, this will be the behavior of PowerShell scripts.
As far as shebang files go, this isn't really an exception to this rule. With a shebang file, Unix is reading the file line by line and then sending it to the interpreter. A carriage return is outside of it's range of expectations, not PowerShell's.
Hope this helps shed some light on things.

Why does `^M` appear in terminal output when looking at some files?

I'm trying to send file using curl to an endpoint and save the file to the machine.
Sending curl from Linux and saving it on the machine works well,
but doing the same curl from Windows is adding ^M character to every end of line.
I'm printing the file before saving it and can't see ^M. Only viewing the file on the remote machine after saving it shows me ^M.
A simple string replacement doesn't seem to work.
Why is ^M being added? How can I prevent this?
Quick Answer: That's a carriage return. They're a harmless but mildly irritating artifact of how Windows encodes text files. You can strip them out of your files with dos2unix. You can configure most text editors to use "Unix Line Endings" or "LF Line Endings" to prevent them from appearing in new files that you create from Windows PCs in the future.
Long Answer (with some historical trivia):
In a plain text file, when you create a new line (by pressing enter/return), a "line break" is embedded in the file. On Unix/Linux, this is a single character, '\n', the "line feed". On Windows, this is two sequential characters, '\r\n', the "carriage return" followed by the "line feed".
When physical teletype terminals, which behaved much like typewriters, were still in use, the "line feed" character meant "move the paper up to the next line" and the "carriage return" character meant "slide the carriage all the way over so the typing head is on the far left". From the very beginning, nearly all teletype terminals supported implicit carriage return; i.e., triggering a line feed would automatically trigger a carriage return. The developers working on what later evolved into Windows decided that it would be best to include explicit carriage returns, just in case (for some reason) the teletype does not perform one implicitly. The Unix developers, on the other hand, chose to work with the assumption of implicit carriage return.
The carriage return and line feed are ASCII Control Characters which means they do not have a visible representation as standalone printable characters, instead they affect the output cursor itself (in this case, the position of the output cursor).
The "^M" you see is a stand-in representation for the carriage return character, used by programs that don't fully "cook" their output (i.e., don't apply the effects of some ASCII Control Characters). (Other control characters have other representations starting with "^", and the "^" character is also used to represent the "ctrl" keyboard key in some Unix programs like nano.)
You can use dos2unix to convert the line endings from Windows-style to Unix-style.
$ curl https://example.com/file_with_crlf.txt | dos2unix > file.txt
On some distros, this tool is included by default, on others it can be installed via the package manager (e.g., on Ubuntu, sudo apt install dos2unix). There also exists a package, unix2dos, for the inverse.
Most "smart" text editors for coding (Sublime, Atom, VS Code, Notepad++, etc.) will happily read and write with either Windows-style or Unix-style line endings (this might require changing some configuration options). Often, the line-endings are auto-detected by scanning the contents of a file, and usually new files are created with the Operating System's native line endings (by default). Even the new version of Notepad supports Unix-style line endings. On the other hand, some Unix tools will produce strange results in the presence of Windows-style line breaks. If your codebase will be used by people on both Unix and Windows operating systems, the nice thing to do is to use Unix-style line endings everywhere.
Git on Windows also has an optional mode that checks out all files with Windows-style line breaks, but checks them back in with Unix-style line breaks.
Side Notes (interesting, but not directly related to your question):
What the carriage return actually does (on a modern virtual terminal, be it Windows or Unix) is move the output cursor to the beginning of the line. If you use the carriage return without a line feed, you can "overwrite" part of a string that has already been printed.
$ printf "dogdog" ; printf "\rcat\n"
catdog
Some Unix programs use this to asynchronously update part of the last line of output, to implement things like a live-updating progress indicator. For example, curl, which shows download progress on stdout if the file contents are piped elsewhere.
Also: If you had a tool that interpreted Windows-style line endings as literally as possible, and you fed it a string with Unix-style line endings such as "hello\nworld", you would get output like this:
hello
world
Fortunately, such implementations are extremely rare and, in general, the vast majority of Windows tools can render Unix-style line-endings identically to Windows-style line endings without any problem.

Perforce Line Endings per file

My Perforce-based project supports both Linux and Cygwin platforms with the same shell scripts (e.g. build_project.sh). But Perforce defaults line endings for text files to the local platform (Docs). This causes \r\n newlines in the .sh scripts, which fail on Cygwin.
Some of the ideas I've thought of so far:
Is there a way to make Cygwin accept \r\n files? (Without having to run dos2unix, the files fetch as read-only).
Is there a way to set specific files to be text, but with Unix line endings for everyone? (I am guessing, "no", but thought I'd check.)
Of course I can set the entire workspace's line endings to \n (unix). But this makes the Windows clients unhappy with their .bat files being \n instead of \r\n. Also if the setting is per workspace (I can't recall), then a workspace setup is slightly harder for the new Windows user as they must set that option.
Set the .sh files to be "binary", but then we lose the text diffs on those files. Is there a workaround for this? Is this the common (good) hack?
This is a fairly minor nit, but I suspect that some of you have a BKM for this pattern.
Thanks.
EDIT: Craig's answer in this question seems to suggest that using Unix line endings will just leave files with \r\n's alone if they are originally submitted that way.
EDIT: To force bash (i.e. Cygwin) to accept files with \r\n endings, one can specify set -o igncr in the script. This is nice if one expects Cygwin users to that might not be very Unix literate (my case) or when we can't globally impose the trigger in the solution below for some other reason.
I believe that when you install Cygwin you can configure it to use Windows line endings. Leaving that aside, though:
If you use the "unix" LineEnd for absolutely everyone, then all of your text files will have their own internally-consistent line endings (but will not be necessarily consistent with the client platforms). This works by virtue of the fact that the Windows files will end up having the \r as part of the content of the line, so when being synced out in "unix" format they'll have \r\n endings.
The thing to watch out for is mixing and matching LineEnd settings when doing this -- if somebody with a "win" or "local" LineEnd syncs that same file, now they have \r\r\n endings! So if you want to go with the per-file line ending plan, make sure EVERYONE uses "unix" as their LineEnd. This is pretty easy to do with a trigger, e.g.:
Triggers:
form-in client "sed -i %quote%s/LineEnd:.*/LineEnd: unix/%quote% %formfile%"

Komodo IDE FTP (ASCII, binary) end-of-line characters

I've some problem when working on remote files (perl scripts) with Komodo IDE. There is (as far as I know) no way to change ftp transfer mode from binary to ASCII, which result in "^M" character at the end of every line. My setup is Linux server, and Windows client. Is there any way to solve this issue without nessecity of correcting saved file on Linux every time. This behaviour disqualify Komodo IDE, which was my favourite IDE until now.
The "^M" you observe has nothing to do with your file being ASCII, but line ending format (carriage return and line feed characters.)
I have not verified this, but here's a link showing how to save files in Komodo using a different line ending method. Saving files in DOS mode is not needed anymore, since most editors recognize UNIX file format nowadays.
Add switch -w to your Perl shebang.

What is the point of using *both* Carriage Returns and Line Feeds?

I'd have thought one was enough. But what's the point of doing CRLF (0x0D0A), when you can simply use CR (0D)? Normally, whenever I'm using strings (C++), I do this:
myString = "Test\nThis should be a new line!\nAnother linefeed.";
NOTE: For non-C++ programmers reading this, "\n" is a linefeed (0x0A).
But should I really be doing this:
myString = "Test\r\nThis should be a new line!\r\nAnother carriage return/linefeed pair.";
NOTE: "\r" means carriage return (0x0D).
EDIT: Should this be on Programmers.SE?
Remember that these codes all came from old Teletype machines. These were effectively typewriters: it was necessary both to advance the paper by a line (line-feed), but also to return the print head (on the carriage) to the left side of the paper (carriage-return).
Windows / Unix / old Mac systems have each different way of writing new lines in text files (not binary ones). If you're programming under windows, then in binary mode, you will read (and you probably want to write) CRLF endings. Under unix-like systems it would be just LF.
If you deal with your own data formats... it shouldn't really matter which way you choose. It all really depends only on what you want to do with the string and where did you get it from.
Some systems like UNIX and OSX just use linefeed, DOS used an additional carriage return in order to be compatible with teletype machines and Windows inherited the architecture.
You use both on Windows because that's the custom on Windows. It's that simple. But you only write both for files destined for Windows.

Resources