Struggling to reproduce terminal from The Unix Programming Environment (1983) - linux

I have been reading The Unix Programming Environment & performing the included exercises. I understand that this work is somewhat dated, but I have found it to be an excellent resource.
In the first chapter, there are a few exercises in which the reader is presented with an interaction with the terminal & is asked to explain the interaction. Here is an example:
Exercise 1-1. Explain what happens with
$ date\#
In the text, it is explained that an # is to be interpreted as the line kill character. The equivalent on my system is ^u, but I can emulate the terminal in the book with stty kill #.
Based on the reading & my intuition, I would expect the invocation of date\# to return something to the effect of:
date#: command not found
The text supports this reasoning:
If you precede either # or # by a backslash \, it loses its special meaning. So to enter a # or #, type \# or \#.
My problem is that I cannot even type the example into my terminal. As soon as I type #, the line is erased. The backslash does not appear to escape the line kill character.
Assuming I am correct about how the escape character should interact with terminal control characters, how can I set up my system (Ubuntu GNU/Linux) to emulate the behavior from the text?
Here is another similar exercise:
Exercise 1-2. Most shells (though not the 7th edition shell) interpret # as introducing a comment, and ignore all text from the # to the end of the line. Given this, explain the following transcript, assuming your erase character is also #:
$ date
Mon Sep 26 12:39:56 EDT 1983
$ #date
Mon Sep 26 12:40:21 EDT 1983
$ \#date
$ \\#date
#date: not found
$
With my erase character set to #, it is impossible to replicate this transcript. The backslash does not appear to escape the erase character.

The Terminal gets and responds to your keystrokes before the Shell does. So the shell has no chance to escape the # since the terminal deletes the whole line first.
When you typed
stty kill #
you told the shell to tell the terminal to kill the line every time you press #
Type
stty kill ^u
and your shell will start to behave the way you expect and ^u will kill lines for you.
^v is the escape char for the terminal
\ is the escape char for the shell.

This is an antique question, about an even more antique book, but I'd like to set the record straight here because the currently accepted answer did not answer your question.
Believe it or not, when I learned UNIX from this book in 1985 (!), this part of the book was already antiquated, and the stuff about "#", "#" and "\" already did not work, and I remember being puzzled exactly like you on why it doesn't work, and whether I was doing something wrong. But it wasn't wrong per se - just out of date. Let me explain how in a previous era (perhaps a decade before the book was published?) this stuff was correct:
Before the advent of CRT terminals, there were "teletype" terminals - basically typewriters which print the characters you type (and the remote responses) on paper. On such teletypes, there was no "backspace". You couldn't erase something already typed. So the convention was that you typed a "#", and it erased, logically, the previous character. You'd still see both of them on the paper, but had to imagine both were deleted from the computer's input. So if you see on paper
helk#lo world
The computer actually received "hello world", with the "#" deleting the "k" behind it.
UNIX also allowed you to type one character, "#", to delete the entire line you just typed, if you made a lot of mistakes. So
oops I wrote a lot of crap I need to erase#hello world
Was again interpreted as just "hello world".
Finally, since sometimes you wanted to type an actual "#" or "#" characters and have them be taken literally, not as character-erase or line-erase commands, you also had an "escape character", which in very early days was "\". Note that this escape character was interpreted not by the shell, but rather by the Unix kernel's terminal driver, which communicated with the teletype.
When new CRT terminals appeared, these conventions were quickly phased out and became the ones we know today: The default erase character was no longer "#" but rather the backspace or delete key, and it really erases the character on the screen. The line-erase (somewhat confusingly known as "kill") became control-X or control-U. The escape character became control-V. You can also change these characters with the "stty" command, setting the "erase", "kill", or "lnext" attributes, but people rarely do. "stty -a" shows you all the current settings of these special characters (and many more).

Related

my bashrc contains strange characters (if Ä -f ü/.bash_aliases Å; then . ü/.bash_aliases fi)

In GCP compute Linux Accidentally did cat filebeat instead of filebeat.yaml
after that my bashrc contains below chars and if I type '~' bash is printing 'ü'
Need help in fixing this
if Ä -f ü/.bash_aliases Å; then
. ü/.bash_aliases
fi
This looks like your terminal was accidentally configured for legacy ISO-646-SE or a variant. Your file is probably fine; it's just that your terminal remaps the display characters according to a scheme from the 1980s.
A quick hex dump should verify that the characters in the file are actually correct. Here's an example of what you should see.
bash$ echo '[\]' | xxd
00000000: 5b5c 5d0a [\].
Even if the characters are displayed as ÄÖÅ, they are correct if you see the hex codes 5B, 5C, and 5D. (If you don't have xxd, try hexdump or od -t x1.)
Probably
bash$ tput reset
can set your terminal back to sane settings. Maybe stty sane might work too (but less likely, in my experience). Else, try logging out and back in.
Back when ASCII was the only game in town, but American (or really any) hardware was exported to places where the character repertoire was insufficient, the local vendor would replace the ROM chips in terminals to remap some slightly less common character codes to be displayed as the missing local glyphs. Over time, this became standardized; the ISO-646 standard was updated to document these local overrides. (The linked Wikipedia page has a number of tables with details.)
Eventually, 8-bit character sets became the norm, and then most locales switched to Latin-1 or some other suitable character set which no longer needed this hack. However, it was still rather prevalent even in the early 1990s. In the early 2000s, Unicode started taking over, and so now this seems like an absurd arrangement.
I'm guessing the file you happened to cat contained some control characters which instructed your terminal to switch to this legacy character set. It's not entirely uncommon (though usually when it happens to me, it switches to some "graphical" character set where some characters display box-drawing characters or mathematical symbols).

Technical difference between "line break" and "newline"?

Yesterday I added an answer to How to add a line break to text in UI5?. While trying to specify the question with more tags, I realized that there are two similar tags available on Stack Overflow: newline and line-breaks.
The differentiation between LF and CR is pretty clear. But what is the difference between the terms "newline" and "line break"? Aren't they synonym to each other?
The newline excerpt here on Stack Overflow says:
Newline refers to [...] a line break.
... while the disambiguation page of "Line break" from Wikipedia says:
Line break may refer to [...] newline.
And from that "Newline" page:
To denote a single line break, Unix programs use line feed [...] while most programs common to [...] Windows use carriage return+line feed.
Are there any technical standards or guidelines that make a clear distinction between those two terms in the software industry? Or can they be used interchangeably, having no technical difference?
My current assumption is that there is no difference: "line break" describes the result from either soft return (⇧ Shift+↵ Enter) or hard return (↵ Enter), whereas "newline" is a technical term for "line break" but has the same result.
I interpret line break as a semantic meaning. What you tell a typographer when you see a \n. C language "new line" means a line break. This is like A (0x41) in ASCII is the upper case of Latin letter a. So a name, a code, and a meaning.
It is like "SP" (\u0020) is a space character. There are many other space characters: C recognizes also TAB as space character. HTML recognize new line as space character (and not as a line break (but in <pre>).
CR and LF were defined by ASCII, and used together because typewritter usually worked in such manner, and to give some more time to move (they were connected serially, so with a timed inputs). You may find the scanned document of old ASCII about the meaning of control characters, but that changed a lot. \0 is now a end of string, before it was just a filling character (or to quit a sequence). Apple used CR, Unix LF as what typographers used as line break.
If you want a list of names and aliases, Unicode provides such names (but as usually, there were errors, so BELL is now used for a emoji, and so the old alias of BEL is not more valid, for control code). There are also several sources about standardization of control codes (mainly used for escape sequances, and so). But also this is not fully follower. Terminals tend to have quirks.

why Linux tools display the CR character as `^M`? [duplicate]

This question already has answers here:
What does the ^M character mean in Vim?
(15 answers)
Closed 2 years ago.
I'm new to Linux sorry if my question sounds dumb.
We know that Linux and Mac OS X use \n (0xa), which is the ASCII line feed (LF) character. MS Windows and Internet protocols such as HTTP use the sequence \r\n (0xd 0xa). If you create a file foo.txt in Windows and then view it in a Linux text editor, you’ll see an annoying ^M at the end of each line, which is how Linux tools display the CR character.
Bu why Linux tools display the CR character as ^M? as my understanding is, \r (carriage return) is to move the cursor in the beginning of the current line, so the sensible approach to display it is like, when you open the file, you see the cursor is in the beginning of the line(that contains \r), so ^M shouldn't be displayed?
PS: some people post answers that how to remove ^M, but I wnat to know why eventually^M is displayed rather than moving the cursor in the beginning, which is the definition of carriage return.
The ASCII control characters like TAB, CR, NL and others are intended to control the printing position of a teletypewriter-like display device.
A text editor isn't such a device. It is not appropriate for a text editor to treat a CR character literally as meaning "go to the first column"; it would make a confusing gibberish out of the editing experience.
A text editor works by parsing a text file's representation, to create an internal representation which is presented to the user. On Unix-like operating systems, a file is represented by zero or more lines, which are terminated by the ASCII NL character. Any CR characters that occur just look like part of the data, and not part of the line separation.
Not all editors behave the same way. For instance, the Vim editor will detect that a file uses CR-LF line endings, and load it properly using that representation. A flag is set for that buffer which indicates that it's a "DOS" file, so that when you save it, the same representation is reproduced.
That said, there is a feature in the actual Linux kernel for representing control characters like CR using the ^M notation. The TTY line discipline for any given TTY device can be configured to print characters in this notation, but only when echoing back the characters received.
Demo:
$ stty echoctl # turn on notational echo of control characters
$ cat # run some non-interactive program with rudimentary line input
^F^F^F^F^F^F
^C
$
Above, the Ctrl-F that I entered was echoed back as ^F. So, in fact there is a "Linux editor" which uses this notation: the rudimentary line editor of the "canonical input mode" line discipline.

Single quotes in history expansion (bash)

I have a theoretical question about the syntax of Bash.
I am running Bash 4.3.11(1) in Linux Ubuntu 14.04.
In the official GNU's website: Bash official web (GNU)
in Subection 9.3.1. it says:
!string
Refer to the most recent command preceding the current position
in the history list starting with string.
In general it's understood that string is, syntactically speaking, a sequence of characters ending before the first blank or newline.
However, when describing quoting in subsection 3.1.2., we can read in paragraph 3.1.2.2. what follows:
Enclosing characters in single quotes (‘'’) preserves the literal
value of each character within the quotes.
In particular, the blanks inside single quotes are not broking the strings in separated words.
So, a expression like !'some text' would have to search in the history list of Bash for the most recent command starting by 'some text'.
However, the blank between some and text is broken when I write it in my terminal, since the following error message is shown:
bash: !'some: event not found
Is this behaviour a bug in the implementation of the shell, or well I am not understanding the expansion rules of Bash for this example?
I wouldn't call the observed behaviour a bug, because there is no specification for history expansion other than the observed behaviour of the bash shell itself. But it is certainly the case that the precise mechanics of parsing a history expansion expression is not well documented and has a lot of possibly surprising corner cases.
The bash manpage does state that history expansion "is performed immediately after a complete line is read, before the shell breaks it into words" (emphasis added), while the bash manual mentions that history expansion is provided by the History library. This is the root cause of most of the history expansion parsing oddities: history expansion works on raw unparsed input without any assistance from the bash tokenizer, and is mostly done with an external library which is not bash-specific. Since tokenizing bash input is non-trivial, it is not really surprising that the relatively simple parsing rules used during history expansion are only a rough approximation to a real bash parse.
For example, the bash manual does indicated that you can prevent a history expansion character (!) from being recognized as such by backslash-quoting it. But it is not explicitly documented that any \ which immediately precedes an ! will inhibit recognition of the history expansion, even if the backslash was itself quoted with a backslash. So the ! in \\!word does not cause the previous command starting with word to be substituted. (\\word is a common way to execute the command word instead of the alias word, so the example is not entirely contrived.)
A longer discussion of some of the corner cases of the recognition of the history expansion character can be found in this answer.
The issue raised by this question is slightly different, since it is about the next phase of the history expansion parse. Once it has been established that a particular character is a history expansion character, it is then necessary to parse the "event" which follows; as indicated by the bash manual, the event can take several forms, one of which is !string, representing the most recent command which starts with "string".
It is implied that this form will only be used if no other form applies, which means that string may not start with a digit or -, !, # or ?. It also may not start with whitespace or = (since those would inhibit history expansion) and in some circumstances ( or " (which may inhibit history expansion). And finally, it may not start with ^, $, % or *, which would be interpreted as a word designator (from the default event, which is the previous command).
The bash manual does not specify what terminates the string. It is semi-documented in the history library manual, which mentions that a history search string (or "event" as it is called in the bash manual) is terminated by whitespace, :, or any of the characters in the history configuration variable history_search_delimiter_chars. (For the record, bash currently (v4.3) sets that variable to ";&()|<>".)
As indicated earlier, quoting is taken into account when deciding whether or not to recognize a history expansion character; as it turns out, if the history expansion occurs inside a double-quoted string, then the closing double-quote is also considered a history search delimiter character. And that, as far as I know, is the entire list of characters which will delimit !string.
Nowhere in either the bash nor the history documentation does it state that a history search delimiter character can be made non-special by quoting, and indeed this does not happen. An open quote, whether double or single, or even a backslash following the ! will be treated as just part of the string to be searched for, without any special processing.
Parsing of the substring-match history expansion -- !?string? -- is completely different. That string can only be terminated by a ? or by a newline. (As the bash manual says, the trailing ? is optional if terminated by a newline.)
Once the history expansion character has been recognized and the history search string has been identified, it may then be necessary to split the retrieved history entry into words. Again, the bash manual is slightly cavalier about corner cases, when it says that "the line is broken into words in the same fashion that Bash does, so that several words surrounded by quotes are considered one word."
A pedant would observe that "in the same fashion that Bash does" is not quite the same as saying "exactly as Bash would do", and in fact the second part of the sentence is literall true: several words surrounded by quotes are considered one word even if the quotes are not really matching quotes. For example, the line:
command "$(echo " foo bar ")"
is considered by the history library to consist of the following five words:
0. command
1. "$(echo "
2. foo
3. bar
4. ")"
although the bash parse would be quite different. By contrast, bash and the history library agree on the parsing of
command "$(echo ' foo bar ')"
as two words.

Redraw screen in terminal

How do some programs edit whats being displayed on the terminal (to pick a random example, the program 'sl')? I'm thinking of the Linux terminal here, it may happen in other OS's too, I don't know. I've always thought once some text was displayed, it stayed there. How do you change it without redrawing the entire screen?
Depending on the terminal you send control seuqences. Common sequences are for example esc[;H to send the cursor to a specific position (e.g. on Ansi, Xterm, Linux, VT100). However, this will vary with the type or terminal the user has ... curses (in conjunction with the terminfo files) will wrap that information for you.
Many applications make use of the curses library, or some language binding to it.
For rewriting on a single line, such as updating progress information, the special character "carriage return", often specified by the escape sequence "\r", can return the cursor to the start of the current line allowing subsequent output to overwrite what was previously written there.
try this shellscript
#!/bin/bash
i=1
while [ true ]
do
echo -e -n "\r $i"
i=$((i+1))
done
the -n options prevents the newline ... and the \r does the carriage return ... you write again and again into the same line - no scroling or what so ever
If you terminate a line sent to the terminal with a carriage return ('\r') instead of a linefeed ('\n'), it will move the cursor to the beginning of the current line, allowing the program to print more text over top of what it printed before. I use this occasionally for progress messages for long tasks.
If you ever need to do more terminal editing than that, use ncurses or a variant thereof.
There are characters that can be sent to the terminal that move the cursor back. Then text can be overwritten.
There is a list here. Note the "move cursor something" lines.
NCurses is a cross-platform library that lets you draw user interfaces on smart terminals.
Corporal Touchy has answered how this is done at the lowest level. For easier development the curses library gives a higher level of control than simply sending characters to the terminal.
To build on #Corporal Touchy's answer, there are libraries available that will handle some of this functionality for you such as curses/ncurses
I agree with danio, ncurses is the way to go. Here's a good tutorial:
http://tldp.org/HOWTO/NCURSES-Programming-HOWTO/

Resources