Can PowerShell Core handle ps1 files with CRLF line endings in Linux environments? - linux

I have the following unanswered questions, and I am looking for documentation that explains the PowerShell core requirement for LF vs CRLF line ending in Linux environments.
1- Can PWSH Core handle files with CRLF?
2- When I run a ps1 file with an LF line ending, can it call another ps1 file with mixed CRLF line endings?
3- Is the PWSH line ending requirement documented and consistent across all Linux distributions?
I am asking the above questions since PowerShell is born in Windows environment and I expect it can somehow tolerate LF vs. CRLF discrepancies.
A link to online documentation would be a great help. I did search and surprisingly, I could not find any.

Just preserving the content of the link mentioned in the comments (source):
Line endings
Windows editors, including ISE, tend to use a carriage return followed by linefeed (\r\n or `r`n) at the end of each line. Linux editors use linefeed only (\n or `n).
Line endings are less important if the only thing reading the file is PowerShell (on any platform). However, if a script is set to executable on Linux, a shebang must be included and the line-ending character used must be linefeed only.
For example, a created as follows named test.ps1 must use \n to end lines:
#!/usr/bin/env powershell
Get-Process
The first line is the shebang and lets Linux know which parser to use when executing the shell script.
Once created, chmod may be used to make the script executable outside of PowerShell:
chmod +x test.ps1
According to this, PowerShell can handle both - CRLF and LF - even in the same file. The only exception is the shebang: it has to end with LF.

stackprotector's community wiki answer, inspired by a link provided by Ben Voigt, provides the gist of an answer to your questions.
Let me complement it by answering your questions one by one:
I am looking for documentation that explains the PowerShell core requirement for LF vs CRLF line ending in Linux environments.
As of this writing, there appears to be no such documentation:
The conceptual about_Special_Characters help topic discusses `r (CR) and `n (LF) separately, but not explicitly in terms of their potential function as newline characters / sequences (line breaks; the topic uses the term "new line" to refer to a LF character, specifically).
1- Can PWSH Core handle files with CRLF?
Yes - both PowerShell editions - the legacy, ships-with-Windows, Windows-only Windows PowerShell edition (whose latest and final version is v5.1), as well as the cross-platform, install-on-demand, PowerShell (Core) edition (v6+) - treat CRLF and LF newlines interchangeably - both with respect to reading source code and reading files[1] - even if the two newline formats are mixed in a single file.
While not documented as such, PowerShell has always worked this way, and, given its commitment to backward compatibility, this won't change (which in this case is definitely a blessing, given that PowerShell (Core) is now cross-platform and must be able to handle files with LF-only newlines on Unix-like platforms).
The following examples demonstrate this - they work the same on all supported platforms ("`n" creates a LF-only newline, "`r`n" a CRLF newline):
# Read a file with mixed newline formats.
PS> "one`ntwo`r`nthree" > temp.txt; (Get-Content temp.txt).Count; Remove-Item temp.txt
3
# Execute a script with mixed newline formats.
PS> "'one'`n'two'`r`n'three'" > temp.ps1; . ./temp.ps1; Remove-Item temp.ps1
one
two
three
2- When I run a ps1 file with an LF line ending, can it call another ps1 file with mixed CRLF line endings?
Yes - this follows from the above.
However, special considerations apply to shebang-line-based PowerShells scripts on Unix-like platforms:
Such stand-alone shell scripts - which needn't an arguably shouldn't have a .ps1 extension - are first read by the system on Unix-like platforms, and therefore require the shebang line - by definition the first line - to be terminated with a LF-only ("`n") newline, given that only LF by itself is considered a newline on Unix-like platforms.[2] All remaining lines are then read only by PowerShell, and any mix of CLRF and LF is then accepted, as usual; e.g.:
# Run on any Unix-like platform - note that `n alone must end the first line.
PS> "#!/usr/bin/env pwsh`n'one'`r`n'two'" > temp; chmod a+x temp; ./temp; Remove-Item temp
one
two
In practice, not least due to the not insignificant startup cost of pwsh, the PowerShell (Core) CLI, but also due to several bugs as of PowerShell 7.2.6, stand-alone shebang-line-based PowerShell scripts - which are primarily useful for being called from outside PowerShell - are rare.
3- Is the PWSH line ending requirement documented and consistent across all Linux distributions?
No, it isn't documented.
Yes, as implied by the above, it is consistent, not just across Linux distributions, but across all supported platforms.
[1] Even CR-only newlines - as used in long-obsolete legacy mac OS versions, which should therefore be avoided nowadays - are recognized in PowerShell source code and by Get-Content, but not by Measure-Object -Line, for instance.
[2] If the first line ends in CRLF, the CR (\r) is retained as part of the line and therefore as part of the target executable path or the last option passed to it, which breaks the invocation - see this answer for a real-life manifestation of this problem.

Being as cannonical as a I can be:
I worked on PowerShell v2-v3.
Newlines of either form were always meant to be interchangeable within PowerShell.
There was a good amount of Unix influence in the language. Being able to support both forms of newlines was near and dear to many a team members' heart. Being able to support it from the get-go prevented the possibility of making a script that couldn't run just because it was copied to mac or linux and saved in the wrong editor.
The importance of this feature has been proved again and again, and has obviously become mission critical now that PowerShell Core is a thing (because the scenario listed above is far more common). I'd wager that as long as PowerShell is a language, this will be the behavior of PowerShell scripts.
As far as shebang files go, this isn't really an exception to this rule. With a shebang file, Unix is reading the file line by line and then sending it to the interpreter. A carriage return is outside of it's range of expectations, not PowerShell's.
Hope this helps shed some light on things.

Related

Why does `^M` appear in terminal output when looking at some files?

I'm trying to send file using curl to an endpoint and save the file to the machine.
Sending curl from Linux and saving it on the machine works well,
but doing the same curl from Windows is adding ^M character to every end of line.
I'm printing the file before saving it and can't see ^M. Only viewing the file on the remote machine after saving it shows me ^M.
A simple string replacement doesn't seem to work.
Why is ^M being added? How can I prevent this?
Quick Answer: That's a carriage return. They're a harmless but mildly irritating artifact of how Windows encodes text files. You can strip them out of your files with dos2unix. You can configure most text editors to use "Unix Line Endings" or "LF Line Endings" to prevent them from appearing in new files that you create from Windows PCs in the future.
Long Answer (with some historical trivia):
In a plain text file, when you create a new line (by pressing enter/return), a "line break" is embedded in the file. On Unix/Linux, this is a single character, '\n', the "line feed". On Windows, this is two sequential characters, '\r\n', the "carriage return" followed by the "line feed".
When physical teletype terminals, which behaved much like typewriters, were still in use, the "line feed" character meant "move the paper up to the next line" and the "carriage return" character meant "slide the carriage all the way over so the typing head is on the far left". From the very beginning, nearly all teletype terminals supported implicit carriage return; i.e., triggering a line feed would automatically trigger a carriage return. The developers working on what later evolved into Windows decided that it would be best to include explicit carriage returns, just in case (for some reason) the teletype does not perform one implicitly. The Unix developers, on the other hand, chose to work with the assumption of implicit carriage return.
The carriage return and line feed are ASCII Control Characters which means they do not have a visible representation as standalone printable characters, instead they affect the output cursor itself (in this case, the position of the output cursor).
The "^M" you see is a stand-in representation for the carriage return character, used by programs that don't fully "cook" their output (i.e., don't apply the effects of some ASCII Control Characters). (Other control characters have other representations starting with "^", and the "^" character is also used to represent the "ctrl" keyboard key in some Unix programs like nano.)
You can use dos2unix to convert the line endings from Windows-style to Unix-style.
$ curl https://example.com/file_with_crlf.txt | dos2unix > file.txt
On some distros, this tool is included by default, on others it can be installed via the package manager (e.g., on Ubuntu, sudo apt install dos2unix). There also exists a package, unix2dos, for the inverse.
Most "smart" text editors for coding (Sublime, Atom, VS Code, Notepad++, etc.) will happily read and write with either Windows-style or Unix-style line endings (this might require changing some configuration options). Often, the line-endings are auto-detected by scanning the contents of a file, and usually new files are created with the Operating System's native line endings (by default). Even the new version of Notepad supports Unix-style line endings. On the other hand, some Unix tools will produce strange results in the presence of Windows-style line breaks. If your codebase will be used by people on both Unix and Windows operating systems, the nice thing to do is to use Unix-style line endings everywhere.
Git on Windows also has an optional mode that checks out all files with Windows-style line breaks, but checks them back in with Unix-style line breaks.
Side Notes (interesting, but not directly related to your question):
What the carriage return actually does (on a modern virtual terminal, be it Windows or Unix) is move the output cursor to the beginning of the line. If you use the carriage return without a line feed, you can "overwrite" part of a string that has already been printed.
$ printf "dogdog" ; printf "\rcat\n"
catdog
Some Unix programs use this to asynchronously update part of the last line of output, to implement things like a live-updating progress indicator. For example, curl, which shows download progress on stdout if the file contents are piped elsewhere.
Also: If you had a tool that interpreted Windows-style line endings as literally as possible, and you fed it a string with Unix-style line endings such as "hello\nworld", you would get output like this:
hello
world
Fortunately, such implementations are extremely rare and, in general, the vast majority of Windows tools can render Unix-style line-endings identically to Windows-style line endings without any problem.

Why save files with CR LF on Windows > 8?

I develop and use Git on both Windows and Linux. When using IDEs or Git on Windows, I'm frequently prompted on whether to save files with CR LF or not.
I am doing mainly C# and JavaScript ES6 development which involves code that contains multi-line strings.
What reasons are there to save files with CR LF on Windows? Are CR-LFs mostly of historical significance? I have not yet noticed a drawback to working with UNIX \n line endings on Windows.
Windows batch files can malfunction when saved with newline endings because the goto command works by jumping to the appropriate offset in the script — but it is not computed correctly unless the lines end with carriage-return/line-feed.
Some Windows programs don't properly handle '\n' without '\r' but any decent editor or for that matter any decent program should handle them identically. But CRLF is traditionally the sanctioned way to do line endings on Windows and you might have compatibility issues if you don't.

Perforce Line Endings per file

My Perforce-based project supports both Linux and Cygwin platforms with the same shell scripts (e.g. build_project.sh). But Perforce defaults line endings for text files to the local platform (Docs). This causes \r\n newlines in the .sh scripts, which fail on Cygwin.
Some of the ideas I've thought of so far:
Is there a way to make Cygwin accept \r\n files? (Without having to run dos2unix, the files fetch as read-only).
Is there a way to set specific files to be text, but with Unix line endings for everyone? (I am guessing, "no", but thought I'd check.)
Of course I can set the entire workspace's line endings to \n (unix). But this makes the Windows clients unhappy with their .bat files being \n instead of \r\n. Also if the setting is per workspace (I can't recall), then a workspace setup is slightly harder for the new Windows user as they must set that option.
Set the .sh files to be "binary", but then we lose the text diffs on those files. Is there a workaround for this? Is this the common (good) hack?
This is a fairly minor nit, but I suspect that some of you have a BKM for this pattern.
Thanks.
EDIT: Craig's answer in this question seems to suggest that using Unix line endings will just leave files with \r\n's alone if they are originally submitted that way.
EDIT: To force bash (i.e. Cygwin) to accept files with \r\n endings, one can specify set -o igncr in the script. This is nice if one expects Cygwin users to that might not be very Unix literate (my case) or when we can't globally impose the trigger in the solution below for some other reason.
I believe that when you install Cygwin you can configure it to use Windows line endings. Leaving that aside, though:
If you use the "unix" LineEnd for absolutely everyone, then all of your text files will have their own internally-consistent line endings (but will not be necessarily consistent with the client platforms). This works by virtue of the fact that the Windows files will end up having the \r as part of the content of the line, so when being synced out in "unix" format they'll have \r\n endings.
The thing to watch out for is mixing and matching LineEnd settings when doing this -- if somebody with a "win" or "local" LineEnd syncs that same file, now they have \r\r\n endings! So if you want to go with the per-file line ending plan, make sure EVERYONE uses "unix" as their LineEnd. This is pretty easy to do with a trigger, e.g.:
Triggers:
form-in client "sed -i %quote%s/LineEnd:.*/LineEnd: unix/%quote% %formfile%"

How to enable linux support double backslashes "\\" as the path delimiter

Assume we have a file /root/file.ini.
In Ubuntu's shell, we can show the content with this command,
less /root\\file.ini
However, in debian's shell, the command below will report that the file does not exist.
Does anybody happen to know how to make linux support "\\" as a path delimiter? I need to solve it because we have a software, which tries to access a file using "\\". It works fine in ubuntu, but not in debian.
Thanks
Linux cannot support \ as a path delimiter (though perhaps it might be able to with substantial changes to the kernel). This is because \ is a valid file name character. In fact the only characters not allowed as part of a file name are / and \0 (the null character).
If this seems to be working under ubuntu, then I would check for the existence of a file called root\file.ini in /
I believe you will probably find it easier to make your program platform independent.
I found this forum post which states / is a platform independent path delimiter in ANSI C any that file operations will automatically convert / to actual path delimiter used on the host OS.
have you tried "\\\\" (4 backslashes) first and third one for escaping and second and the last one to rule them all?

Does GCC support command files

MSVC compilers support command files which are used to pass command line options. This is primarily due to the restriction on the size of the command line parameters that can be passed to the CreateProcess call.
This is less of an issue on Linux systems but when executing cygwin ports of Unix applications, such as gcc, the same limits apply.
Therefore, does anyone know if gcc/g++ also support some type of command file?
Sure!
#file
Read command-line options from file. The options read are inserted
in place of the original #file option. If file does not exist, or
cannot be read, then the option will be treated literally, and not
removed.
Options in file are separated by whitespace. A whitespace
character may be included in an option by surrounding the entire
option in either single or double quotes. Any character (including
a backslash) may be included by prefixing the character to be
included with a backslash. The file may itself contain additional
#file options; any such options will be processed recursively.
You can also jury-rig this type of thing with xargs, if your platform has it.

Resources