Make bash differentiate between Ctrl-<letter> and Ctrl-Shift-<letter> - linux

I was wondering whether there was any way of making bash send different codes for key combinations that include the shift key? for instance, (Ctrl+V shows me that) Ctrl+N and Ctrl+Shift+N are interpreted the same (^N). Or is there a terminal that can make the difference? Or can bash me modified so that it does?

A terminal doesn't interact directly with your keyboard; it interacts with a stream of bytes that it receives, which are usually (but not necessarily) generated by your keyboard. For the printable ASCII values, there is an obvious correspondence between the value and a key (or combination) on your keyboard. ASCII 97 is a, ASCII 65 is Shifta, and so on.
However, there are the 32 non-printing control characters from ASCII 0 to ASCII 31, called which because they were intended to control a terminal. In order to enter them, the Control was added to allow you, in combination with the other keys, to generate these codes. A simple scheme was used. Pressing Control-x will generate the control code corresponding to subtracting 64 from x. Since # generates ASCII 64, Control# generates ASCII 0. The same mapping holds true for A through _ (consult your favorite ASCII reference to see the rest of the correspondences).
However, whether or not you need a shift key to generate ASCII 64 through ASCII 95 depends on your keyboard. On my US keyboard layout, only [ and ] can be typed without a shift key. (Remember, it's the uppercase-letter ASCII range we're using here, not the lowercase.) So to simplify, I suspect it was decided that Shift would be ignored in determining which keycode is sent with Control-x. (Note that if for some reason your keyboard had two of the characters between 64 and 95 generated by a key/Shift-key pair, your terminal would need to define an alternate mapping for the associated control character.)
All this is simply(?) to explain why ControlShift-x and Control-x are typically the same. Obviously, your modern operating system can distinguish all kinds of keyboard combinations. But out of the myriad possibilities, only 256 of them can send unique values to a terminal; the rest must necessarily duplicate one or more of the others. To be useful, they need to be configured to send some multiple-byte sequence to the terminal, typically beginning with ASCII 27 (ESC). When terminals receive that byte, they pause for a moment to see if any other bytes are coming after. Keys like function keys, arrow keys, etc. have fairly standard sequences they send, which the terminal interprets in various ways. Other keys (like ControlShiftn in your example) have no agreed-upon meaning, and so your terminal emulator must assign one. Most emulators should allow you to do this, but how they do so is, obviously, program-specific.

There's are two great write-ups on keyboard shortcut customization in bash here:
Bash: call script with customized keyboard shortcuts?
In bash, how do I bind a function key to a command?

iTerm2 allows you to map key combinations like Control+Shift+, (which should represent C-<) to an escape sequence. Emacs translates certain escape sequences to the expected key sequence by default. Therefore, by remapping the desired key combination in iTerm to the appropriate escape sequence, you can get the behavior you want. See this response for specifics.

Related

Determining ANSI Terminal Escape Code Sizes

I am writing a program that bascially gets terminal output, and needs to strip-out ANSI/VT-100 escape codes. I can find specs on the code(s) - but there are many, and they all have different sizes (numbers of characters).
Anyone know of an algorithmic way of figuring out the size of a given code? (i.e. Something like if "if it starts with a letter, it will be 2 chars, but if it starts with a symbol it will be one char").

Why do ANSI color escapes end in 'm' rather than ']'?

ANSI terminal color escapes can be done with \033[...m in most programming languages. (You may need to do \e or \x1b in some languages)
What has always seemed odd to me is how they start with \033[, but they end in m Is there some historical reason for this (perhaps ] was mapped to the slot that is now occupied by m in the ASCII table?) or is it an arbitrary character choice?
It's not completely arbitrary, but follows a scheme laid out by committees, and documented in ECMA-48 (the same as ISO 6429). Except for the initial Escape character, the succeeding characters are specified by ranges.
While the pair Escape[ is widely used (this is called the control sequence introducer CSI), there are other control sequences (such as Escape], the operating system command OSC). These sequences may have parameters, and a final byte.
In the question, using CSI, the m is a final byte, which happens to tell the terminal what the sequence is supposed to do. The parameters if given are a list of numbers. On the other hand, with OSC, the command-type is at the beginning, and the parameters are less constrained (they might be any string of printable characters).

Which ASCII Characters are Obsolete?

My understanding is that the ASCII characters found in the range from 0x00 to 0x1f were included with Teletype machines in mind. In the modern era, many of them have become obsolete. I was curious as to which characters might still be found in a conventional string or file. From my experience programming in C, I thought those might be NUL, LF, TAB, and maybe EOT. I'm especially curious about BS and ESC, as I thought (similar to shift or control maybe) that those might be handled by the OS and never really printed or be included in a string. Any amount of insight would be appreciated!
Table for reference:
Out of the characters between hexadecimal 00 and 1F, the only ones you are likely to encounter frequently are NUL (0x00 = \0), TAB (0x09 = \t), CR (0x0D = \r), and LF (0x0A = \n). Of these, NUL is used in C-like languages as a string terminator, TAB is used as a tab character, and CR and LF are used at the end of a line. (Which one is used is a complicated situation; see the Wikipedia article Newline for details, including a history of how this came to be.)
The following additional characters are used when communicating with VT100-compatible terminal emulators, but are rarely found outside that context:
BEL (0x07 = \a), which causes a terminal to beep and/or flash.
BS (0x08 = \b), which is used to move the cursor left one position. (It is not sent when you press the backspace key; see below!)
SO and SI (0x0E and 0x0F), which are used to switch into certain special character sets.
ESC (0x1B = \e), which is sent when pressing the Escape key and various other function keys, and is additionally used to introduce escape sequences which control the terminal.
DEL (0x7F), which is sent when you press the backspace key.
The rest of the nonprintable ASCII characters are essentially unused.
"Backspace composition no longer works with typical modern digital displays or typesetting systems" Ref Backspace
Here's a related question: The backspace escape character in c unexpected behavior
Ref Unicode
Unicode and the ISO/IEC 10646 Universal Character Set (UCS) have a much wider array of characters and their various encoding forms have begun to supplant ISO/IEC 8859 and ASCII rapidly in many environments. While ASCII is limited to 128 characters, Unicode and the UCS support more characters by separating the concepts of unique identification (using natural numbers called code points) and encoding (to 8-, 16- or 32-bit binary formats, called UTF-8, UTF-16 and UTF-32).
To allow backward compatibility, the 128 ASCII and 256 ISO-8859-1 (Latin 1) characters are assigned Unicode/UCS code points that are the same as their codes in the earlier standards. Therefore, ASCII can be considered a 7-bit encoding scheme for a very small subset of Unicode/UCS, and ASCII (when prefixed with 0 as the eighth bit) is valid UTF-8.
Here's another Unicode using backspace what is the purpose of Unicode backspace u0008
Here's a good overview of c programming how to program for unicode and UTF-8
And finally here's (FSF.org) GNU implementation GNU libunistring manual
"This library provides functions for manipulating Unicode strings and for manipulating C strings according to the Unicode standard."

Is there a table for all the key representations in vim map?

Today I'm trying to do some nnoremap in vim.
Some keys have special meanings in this map, such as C for Control
I read these official two docs, and didn't find the representation map.
http://vim.wikia.com/wiki/Mapping_keys_in_Vim_-Tutorial(Part_1)
http://vimdoc.sourceforge.net/htmldoc/map.html#map-which-keys
I tried find all these representations, but I can't, maybe it's something like common sense?
But it's always these hidden common sense which hindered lots of beginners.
So far I find only some of the special keys can be the initial keys.
For example:
I can do <C-J> but not <SPACE-J> as {lhs}.
And it seems only special keys can be used in sequence greater than 2.
Like I can do <C-A-J> but not <C-K-J>
What are all the representations of these special keys, and what hidden rules are unknown for me to use sequences greater than 2? Is it able to use a special key + 2 normal key?
ps: So far I only know:
`C` for `Control`
`A` for `Alt`
`S` for `Shift`
But it seems there are B, M, D, etc. What are they?
Yes of course, there is such a resource, see :help key-notation.
Vim has slightly different capabilities in this regard depending on the platform and environment as well as a notoriously archaic key-handling mechanism so you are relatively limited.
For portability purpose, it is recommended to stick with universally usable mappings as much as possible. Mappings to avoid are (from the top of my head):
anything that involves the Alt key
anything that involves the Cmd key (works only in the GUI incarnation of MacVim)
anything that involves a modifier and an uppercase character

What is extended 7-bit (or 8-bit) code?

I just started reading the ECMA-48 standard (ISO/IEC 6429), and have a question.
It says:
This Ecma Standard defines control functions and their coded representations for use in a 7-bit code, an extended 7-bit code, an 8-bit code or an extended 8-bit code.
What does the "extended" 7/8-bit code mean here?
ECMA-35 talks about these. These terms are key:
code extension: The techniques for the encoding of characters that are not included in the character set of a given code.
escape sequence: A string of bit combinations that is used for control purposes in code extension procedures. The first of these bit combinations represents the control function ESCAPE.
Character ESCAPE: ESCAPE is a control character used for code extension purposes. It causes the meaning of a limited number of the bit combinations following it in a CC-data-element to be changed. These bit combinations, together with the preceding bit combination that represents the ESC character, constitute an escape sequence.
Thus, what we have here is a system where you can switch encoding systems in the middle of your text: You can start a text using Latin-1 encoding, provide an escape sequence that switches to Latin-2, and continue your text. ECMA-35 talks about this in appendix A. Chapter 13 has more information about the structure of escape sequences.

Resources