Technical difference between "line break" and "newline"? - string

Yesterday I added an answer to How to add a line break to text in UI5?. While trying to specify the question with more tags, I realized that there are two similar tags available on Stack Overflow: newline and line-breaks.
The differentiation between LF and CR is pretty clear. But what is the difference between the terms "newline" and "line break"? Aren't they synonym to each other?
The newline excerpt here on Stack Overflow says:
Newline refers to [...] a line break.
... while the disambiguation page of "Line break" from Wikipedia says:
Line break may refer to [...] newline.
And from that "Newline" page:
To denote a single line break, Unix programs use line feed [...] while most programs common to [...] Windows use carriage return+line feed.
Are there any technical standards or guidelines that make a clear distinction between those two terms in the software industry? Or can they be used interchangeably, having no technical difference?
My current assumption is that there is no difference: "line break" describes the result from either soft return (⇧ Shift+↵ Enter) or hard return (↵ Enter), whereas "newline" is a technical term for "line break" but has the same result.

I interpret line break as a semantic meaning. What you tell a typographer when you see a \n. C language "new line" means a line break. This is like A (0x41) in ASCII is the upper case of Latin letter a. So a name, a code, and a meaning.
It is like "SP" (\u0020) is a space character. There are many other space characters: C recognizes also TAB as space character. HTML recognize new line as space character (and not as a line break (but in <pre>).
CR and LF were defined by ASCII, and used together because typewritter usually worked in such manner, and to give some more time to move (they were connected serially, so with a timed inputs). You may find the scanned document of old ASCII about the meaning of control characters, but that changed a lot. \0 is now a end of string, before it was just a filling character (or to quit a sequence). Apple used CR, Unix LF as what typographers used as line break.
If you want a list of names and aliases, Unicode provides such names (but as usually, there were errors, so BELL is now used for a emoji, and so the old alias of BEL is not more valid, for control code). There are also several sources about standardization of control codes (mainly used for escape sequances, and so). But also this is not fully follower. Terminals tend to have quirks.

Related

Copy folders from list [duplicate]

I'd like to know the difference (with examples if possible) between
CR LF (Windows), LF (Unix) and CR (Macintosh) line break types.
CR and LF are control characters, respectively coded 0x0D (13 decimal) and 0x0A (10 decimal).
They are used to mark a line break in a text file. As you indicated, Windows uses two characters the CR LF sequence; Unix only uses LF and the old MacOS ( pre-OSX MacIntosh) used CR.
An apocryphal historical perspective:
As indicated by Peter, CR = Carriage Return and LF = Line Feed, two expressions have their roots in the old typewriters / TTY. LF moved the paper up (but kept the horizontal position identical) and CR brought back the "carriage" so that the next character typed would be at the leftmost position on the paper (but on the same line). CR+LF was doing both, i.e. preparing to type a new line. As time went by the physical semantics of the codes were not applicable, and as memory and floppy disk space were at a premium, some OS designers decided to only use one of the characters, they just didn't communicate very well with one another ;-)
Most modern text editors and text-oriented applications offer options/settings etc. that allow the automatic detection of the file's end-of-line convention and to display it accordingly.
This is a good summary I found:
The Carriage Return (CR) character (0x0D, \r) moves the cursor to the beginning of the line without advancing to the next line. This character is used as a new line character in Commodore and early Macintosh operating systems (Mac OS 9 and earlier).
The Line Feed (LF) character (0x0A, \n) moves the cursor down to the next line without returning to the beginning of the line. This character is used as a new line character in Unix-based systems (Linux, Mac OS X, etc.)
The End of Line (EOL) sequence (0x0D 0x0A, \r\n) is actually two ASCII characters, a combination of the CR and LF characters. It moves the cursor both down to the next line and to the beginning of that line. This character is used as a new line character in most other non-Unix operating systems including Microsoft Windows, Symbian and others.
Source
It's really just about which bytes are stored in a file. CR is a bytecode for carriage return (from the days of typewriters) and LF similarly, for line feed. It just refers to the bytes that are placed as end-of-line markers.
Way more information, as always, on wikipedia.
Summarized succinctly:
Carriage Return (Mac pre-OS X)
CR
\r
ASCII code 13
Line Feed (Linux, Mac OS X)
LF
\n
ASCII code 10
Carriage Return and Line Feed (Windows)
CRLF
\r\n
ASCII code 13 and then ASCII code 10
If you see ASCII code in a strange format, they are merely the number 13 and 10 in a different radix/base, usually base 8 (octal) or base 16 (hexadecimal).
ASCII chart
Jeff Atwood has a recent blog post about this: The Great Newline Schism
Here is the essence from Wikipedia:
The sequence CR+LF was in common use
on many early computer systems that
had adopted teletype machines,
typically an ASR33, as a console
device, because this sequence was
required to position those printers at
the start of a new line. On these
systems, text was often routinely
composed to be compatible with these
printers, since the concept of device
drivers hiding such hardware details
from the application was not yet well
developed; applications had to talk
directly to the teletype machine and
follow its conventions. The separation
of the two functions concealed the
fact that the print head could not
return from the far right to the
beginning of the next line in
one-character time. That is why the
sequence was always sent with the CR
first. In fact, it was often necessary
to send extra characters (extraneous
CRs or NULs, which are ignored) to
give the print head time to move to
the left margin. Even after teletypes
were replaced by computer terminals
with higher baud rates, many operating
systems still supported automatic
sending of these fill characters, for
compatibility with cheaper terminals
that required multiple character times
to scroll the display.
CR - ASCII code 13
LF - ASCII code 10.
Theoretically, CR returns the cursor to the first position (on the left). LF feeds one line, moving the cursor one line down. This is how in the old days you controlled printers and text-mode monitors.
These characters are usually used to mark end of lines in text files.
Different operating systems used different conventions. As you pointed out, Windows uses the CR/LF combination while pre-OS X Macs use just CR and so on.
CR and LF are a special set of characters that help us format our code.
CR(/r) stands for CARRIAGE RETURN. It puts the cursor at the beginning of a line, but it doesn't create a new line. This is how MAC OS works.
LF(/n) stands for LINE FEED. It creates a new line, but it doesn't put the cursor at the beginning of that line. The cursor stays back at the end of the last line. This is how Unix and Linux work.
CRLF (/r/f) creates a new line as well as puts the cursor at the beginning of the new line. This is how we see it in Windows OS.
Git uses LF by default. So when we use Git on Windows it throws a warning like "CRLF will be replaced by LF" and automatically converts all CRLF into LF, so that code becomes compatible.
NB: Don't worry...see this less as a warning and more as a notice thing.
Systems based on ASCII or a
compatible character set use either LF
(Line feed, 0x0A, 10
in decimal) or CR (Carriage return, 0x0D, 13 in decimal)
individually, or CR followed by
LF (CR+LF, 0x0D 0x0A);
These characters are based on printer commands: The line feed
indicated that one line of
paper should feed out of the printer, and a carriage return
indicated that the printer
carriage should return to the beginning of the current line.
Here is the details.
The sad state of "record separators" or "line terminators" is a legacy of the dark ages of computing.
Now, we take it for granted that anything we want to represent is in some way structured data and conforms to various abstractions that define lines, files, protocols, messages, markup, whatever.
But once upon a time this wasn't exactly true. Applications built-in control characters and device-specific processing. The brain-dead systems that required both CR and LF simply had no abstraction for record separators or line terminators. The CR was necessary in order to get the teletype or video display to return to column one and the LF (today, NL, same code) was necessary to get it to advance to the next line. I guess the idea of doing something other than dumping the raw data to the device was too complex.
Unix and Mac actually specified an abstraction for the line end, imagine that. Sadly, they specified different ones. (Unix, ahem, came first.) And naturally, they used a control code that was already "close" to S.O.P.
Since almost all of our operating software today is a descendent of Unix, Mac, or Microsoft operating software, we are stuck with the line ending confusion.
NL is derived from EBCDIC NL = 0x15 which would logically compare to CRLF 0x0D 0x0A ASCII... This becomes evident when physically moving data from mainframes to midrange. Colloquially (as only arcane folks use EBCDIC), NL has been equated with either CR or LF or CRLF.

Norminette empty line or not at end

Why people on VStudio with norminette highlights put an empty line at end.
And when they do norminette in iTerm it’s Works
When you use Vim and put an empty line at end the norminette say Error extra line.
This happen yesterday and today my Vstdio ask for a line at end and when I do norminette in iTerm it’s say error extra line. I go back to vim, delete the line and norminette say error again to remove extra line. I need to open back the file, add line and then delete and save to have norminette working. So now sometime moulinette say KO on server but OK my side. I need to add/remove last line every time with vim when I work on vstdio
So if someone know the issue i would like to have a better solution than double check on vim for the next 3 week and maybe 3 years 🤣 thx all!
That topic has already been discussed to death.
No matter what system you are on, lines in text files generally end with one or two invisible characters. On some systems, it's a "carriage return" CR followed by a "line feed" LF, on other systems it's only a CR, on other systems it's only a LF, and that's only for the least exotic ones. That character or pair of characters has a few colloquial names: "newline", "EOL", etc. I will use "EOL" from now on.
Now, EOL has two semantic interpretations. In some contexts, EOL is considered to be a "line terminator", meaning that no assumption is made about what comes after it, and in some other contexts, EOL is considered to be a "line separator", meaning that there is always a line after it.
And then there is the related problem of how a stream of text should end. With the first interpretation, the last line of a stream should end with EOL, if only to be able to establish a boundary between two streams. With the second interpretation above, the last line of a stream shouldn't end with EOL because it would mean… that the last line is not really the last line.
So, basically, this text:
foo
bar
baz
could be encoded as (first interpretation):
foo<EOL>
bar<EOL>
baz<EOL>
or as (second interpretation):
foo<EOL>
bar<EOL>
baz
Most Unix-like tools have adopted the first interpretation for decades and generally expect EOL on the last line but many more modern tools have adopted the second interpretation, which leads to much confusion among new generations who juggle between GUI editors and CLI tools.
Feed some text with a final EOL to a tool that interprets it as a "line separator" and you get an imaginary line displayed at the end of the file:
1 foo
2 bar
3 baz
4
Feed some text without a final EOL to a tool that interprets it as "line terminator" and you can get anything from a reasonable display (as follows) to loud complaints (git):
1 foo
2 bar
3 baz
Add to that confusing names like "newline" and you have a recipe for a never-ending stream of questions like this one.
So… this is what seems to be going on, here:
you are using a mix of tools that interpret EOL differently,
you are interpreting one of those interpretations incorrectly,
and you act upon that misinterpretation, with confusing results.
Let's address these one-by-one.
Well, we all do so ¯\_(ツ)_/¯.
What appears like an extra line at the bottom of your file in a modern GUI editor is unlikely to be an extra line. It is more likely to be an artefact of the second interpretation. Assuming you are on some kind of Unix-like system and the expected EOL is LF, you can do $ xxd filename to inspect its content in hexadecimal form:
a single a0 at the end means that your file ends with EOL,
a a0a0 at the end means that your file has an extra empty line,
no a0 at the end means that it was written by a an editor that firmly adheres to the second interpretation.
The second case will angry your linter:
Error: EMPTY_LINE_EOF (line: 7, col: 1): Empty line at end of file
but it doesn't seem to particularly care about the other two cases.
Because you see what appears to be an empty line at the end of a file in a GUI editor it doesn't mean that a) that line actually exists and b) that you must add one at the end of the file in Vim or any other editor.
The correct approach is to:
let Vim do what it does out the box, write files with an EOL at the end, because that's what your toolchain (including Norminette) expects,
set up your GUI editors to do the same (even if they still show that imaginary line),
get rid of any extraneous empty line you might have in one of your projects,
NEVER add an extraneous empty line at the end of your files, as it useless at best and forbidden at worst.

Vim not detecting implicit newline characters instead of visible newline characters I am trying to strip

Here's an example of some text from which I'm trying to strip those newline characters, which appear explicitly in my vim, and replace them with actual newline characters that I don't see.
But when I search for a newline character using /[\n]/, what I get isn't these visible newline characters, but instead the implicit ones. So I can't do a search and replace.
How should I address this? Here is the text:
The Reason that can be reasoned\n is not the eternal Reason.The name that can\n be namedis not the eternal Name. The Unnamable is of heaven and earth the beginning.\n The Namable becomes of the\n ten thousand things the mother.Therefore it is said:\n '\n\n He\n who desireless is found\n The spiritual of the world will sound.\n But he who by desire is bound\n Sees the mere shell of things around.' These two things are the same in sour ce but different in name.\n Their sameness\n is called a mystery.Indeed
it is the mystery\n
You need to search for \\n, not [\n].
doing:
%s/\\n/\r/g
Should solve your problem (I have no idea why, but vim needs \r instead of \n')

Carriage Return, Line Feed and New Line

What are the differences among Carriage Return, Line Feed and New line? Does it depend on OS? Why do we need to use all of them just for getting to next line?
Generally, a "new line" refers to any set of characters that is commonly interpreted as signaling a new line, which can include:
CR LF on DOS/Windows
CR on older Macs
LF on Unix variants, including modern Macs
CR is the Carriage Return ASCII character (Code 0x0D), usually represented as \r.
LF is the Line Feed character (Code 0x0A), usually represented as \n.
Original typewriter-based computers needed both of these characters, which do exactly what they say: CR returned the carriage to the left side of the paper, LF fed it through by one line. Windows kept this sequence unmodified, while Unix variants opted for more efficient character usage once they were only needed symbolically.
Make sure you look for a platform-agnostic new line symbol or function if you need to represent this sequence in code. If not, at least make sure that you account for the above three variants.
More on the history: The Great Newline Schism - Coding Horror

How to have a carriage return without bringing about a linebreak in VIM?

Is it possible to have a carriage return without bringing about a linebreak ?
For instance I want to write the following sentences in 2 lines and not 4 (and I do not want to type spaces of course) :
On a ship at sea: a tempestuous noise of thunder and lightning heard.
Enter a Master and a Boatswain
Master : Boatswain!
Boatswain : Here, master: what cheer?
Thanks in advance for your help
Thierry
In a text file, the expected line-end character or character sequence is platform dependent. On Windows, the sequence "carriage return (CR, \r) + line feed (LF, \n)" is used, while Unix systems use newline only (LF \n). Macintoshes traditionally used \r only, but these days on OS X I see them dealing with just about any version. Text editors on any system are often able to support all three versions, and to convert between them.
For VIM, see this article for tips how to convert/set line end character sequences.
However, I'm not exactly sure what advantage the change would have for you: Whichever sequence or character you use, it is just the marker for the end of the line (so there should be one of these, at the end of the first line and you'd have a 2 line text file in any event). However, if your application expects a certain character, you can either change the application -- many programming languages support some form of "universal" newline -- or change the data.
Just in case this is what you're looking for:
:set wrap
:set linebreak
The first tells vim to wrap long lines, and the second tells it to only break lines at word breaks, instead of in the middle of words when it reaches the window size.

Resources