Having emacs search for special ligatures - search

I just noticed a whole bunch of typos in a compiled LaTeX document typed in emacs, stemming from me not noticing that when I pasted in some text from elsewhere, I accrued a lot of ligatures like fi instead of fi. I've done a search and replace to fix this particular instance, but it would be nice to be confident there weren't more of these. Is there anything more wholesale I could do in emacs to find all such fixes?

If the entire document is expected to be in ASCII, then you could use a regexp search for anything outside that range:
C-M-s [^ C-j SPC -~]
That is search for anything that is neither a newline (character code 10) nor anything between space (32) or tilde (126). Any ligatures would be outside this range.

I'm not real sure what you are asking, but you can easily Isearch for (or query-replace or replace-string) any Unicode chars that are ligatures, that is, that have LIGATURE as part of their Unicode character name. However, you must search for each of them separately (well, not really, but it is easiest to do that).
To search for a given ligature char, you use C-x 8 RET during Isearch, then type some part of the character name and complete that.
For this it really helps to use Icicles, or at least some other completion enhancement that lets you complete a substring or other regexp.
With Icicles you have also progressive completion, which means that you can provide multiple substrings (more generally, regexps) to match.
For example, to search for the ligature whose Unicode character name is LATIN SMALL LIGATURE FF you can do the following:
C-s C-x 8 RET
That prompts you for the name of a Unicode char. Type ligature S-SPC to match all whose names contain ligature (matching is case insensitive). Then type latin S-SPC to narrow to just the latin ligatures. Then type small S-SPC to narrow these to only the lowercase ligatures. Then type ffi to get just the one you want.
C-s C-x 8 RET ligature S-SPC latin S-SPC small S-SPC ffi RET
The order in which you provide the multiple patterns is irrelevant. And of course you do not need to use multiple patterns. You could just as easily do it with a single regexp:
C-x C-x 8 RET latin.*small.*ligature.*ffi RET
If you use C-s C-x 8 RET ligature S-TAB (or S-SPC instead of S-TAB), you see all of the ligature characters (there are 517 of them). If you use C-s C-x 8 RET small.*ligature S-TAB then you see all lowercase ligatures (there are 22 of them, including Arabic, Armenian, Cyrillic, Hebrew, and Latin).
Oh, and with Icicles you see not only the character names in buffer *Completions* -- you see also the characters themselves (WYSIWYG) next to their names.
(For query-replace etc. the procedure is the same as for Isearch.)

Related

How can I find “<html>“ in the Vim buffer?

I want to search a HTML tag in my html file using Vim.
I tried \<html\> but it means search only the “html” word.
I don’t know how to find the greater or lower characters.
Vim has 4 modes of regular expression interpretation:
very no magic,
no magic,
magic and
very magic.
The default is magic (check with :set magic?), which can be a bit surprising because some non alphanumeric characters have special regex meanings but not all. In particular ^$*. do but most other characters do not. For example to match alternatives you'd have to escape the pipe character this\|that and this|that would match the literal string "this|that".
In your case, < does not have a special meaning but \< does (beginning of a word). Searching for <html> will work, but when in doubt you can activate "very no magic" mode by prepending your search with \V (so /\V<html>) where every character matches the character itself. If and when you want to activate all regex features, you can activate "very magic" mode with lowercase \v (hence /\v<html> will search for the word "html").
In Normal mode, the / command searches forward (? — backward). Suppose we are at the top, and want to search forward. So, if we want to find a particular tag, like “div” for example, we should type the following:
/\V<div>
Here \V turns on the ”very unmagic” mode in which a symbol has no any special meaning unless it is preceded by a backslash. (I use only the “very magic” and “very unmagic” modes and don’t use the “magic” and “unmagic” modes.)
If we want to find any html tag, i.e. something between angle brackets, we may type one of the following:
/\V<\[^<>]\+>
/\v\<[^<>]+\>
That will find and highlight all the tags including their attributes.
You may create a convenient keymap for the mode you prefer, for example:
nnoremap // /\V
Now, double hitting of / brings you to the search line with “very unmagic” mode.
Type :help pattern for more information.

Searching for an exact match with a singular digit

I'm trying to search for only a singular digit in vim by itself. For example, if there are two sets of digits 1 and 123 and I want to search for 1, I would only want the singular 1 digit to be found.
I have tried using regular expressions like \<1> and \%(a)#
You almost had the right solution. You want:
\<1\>
This is because each angled bracket needs to be escaped. Alternatively, you could use:
\v<1>
The \v flag tells vim to treat more characters as special without needing to be escaped (for example, (){}+<> all become special rather than literal text. Read :h /\v for more on this.
A great reference for learning regex in vim is vimregex.com. The \<\> characters are explained in 4.1 "Anchors".
If you want to match text like 1.23 this is possible too. Two different approaches:
Modify the iskeyword option so that it includes .. This will also affect how w moves
Use \v<1(\d|.)#!, which basically means "a 1 at the beginning of a word, that isn't followed by some other digit or a period."

Vim treats unicode characters as a separate text object from english alphabets

I use both of vim and neovim, but just found that both editors treat unicode characters (such as Korean characters) as a different text object from english alphabet separately.
For instance, if there is a text, my_str = 'ABC가나다', vim only removes ABC if I type dw at A.
Of course I would be able to use di' instead, but it's getting annoying with using surrounding.vim plugin.
Typing ysw[ only surrounds ABC because Vim recognizes 가나다 as a separate text object even tough there is no space between the two.
Will there be any solution for this?
Thank you so much.
June

How to fine-tune Macros after having recorded it through recording in Vim?

Specific question
Description
After recording the desired action to registrar o, I pasted the whole macro to my ~/.vimrc and assigned it as follows (directly pasting the mappings are not displayed properly)
Expected behavior
I would like to use this macro to get myself a new "comment line" that leads a new section of script, formatted such that the name of the section is centered. After populating the "section title", I would like to enter insert mode in a new line.
In the following screen-record, I have tested both #o and #p$ on the word "time". The second attempt with#p` worked as desired.
The problem (on Windows machine specifically)
As you see, the #o mapping gets me junk phrases which had been part of my definition for the macro. Does this have to do with the ^M operator? And, how can I fix the #o mapping, which uses * to populate the line?
The two mapping worked just fine on Linux system. (Don't know why, as I have recorded and pasted the macro-definition on Windows machine.) This also does not appear to be a problem on Mac with MacVim.
Generalized question
Is there a way to properly substitute the ^M operator (for <CR>, or "Enter"-key)?
Is there a way to properly substitute the ^[ operator (for <ESC>, or the "Escape"-key)?
Is there a systematic list of mappings from these weird representation of keystrokes, as recorded by the "recording" function through q.
Solution
Substitute the ^M marks in the macro-definition with \r. And, substitute ^[ to be \x1b, for the ESC key. The mappings are fixed as follows:
let #o = ":center\ri\r\x1bkV:s/ /\*/g\rJx50A\*\x1b80d|o"
let #p = ":center\ri\r\x1bkV:s/ /\"/g\rJx50A\"\x1b80d|o"
Complete list of key-codes/mappings? Approach 1: through hex code.
Thanks to Zbynek Vyskovsky, the picture is clear. For whatever key one may think of, Vim takes its ASCII value at the "face value". (The trick is to use a escape clause starting with \x, where x serves as the leader key/string/character connecting to the hex values.) Thus, the correspondence list (incomplete yet), goes as follows:
Enter --- \x0d --- \r
ESC --- \x1b --- \e
Solution native to Vim
By chance, :help expr-quote gives the following list of special characters. This shall serve as the definite answer to the original question in general form.
string *string* *String* *expr-string* *E114*
------
"string" string constant *expr-quote*
Note that double quotes are used.
A string constant accepts these special characters:
\... three-digit octal number (e.g., "\316")
\.. two-digit octal number (must be followed by non-digit)
\. one-digit octal number (must be followed by non-digit)
\x.. byte specified with two hex numbers (e.g., "\x1f")
\x. byte specified with one hex number (must be followed by non-hex char)
\X.. same as \x..
\X. same as \x.
\u.... character specified with up to 4 hex numbers, stored according to the
current value of 'encoding' (e.g., "\u02a4")
\U.... same as \u but allows up to 8 hex numbers.
\b backspace <BS>
\e escape <Esc>
\f formfeed <FF>
\n newline <NL>
\r return <CR>
\t tab <Tab>
\\ backslash
\" double quote
\<xxx> Special key named "xxx". e.g. "\<C-W>" for CTRL-W. This is for use
in mappings, the 0x80 byte is escaped.
To use the double quote character it must be escaped: "<M-\">".
Don't use <Char-xxxx> to get a utf-8 character, use \uxxxx as
mentioned above.
Note that "\xff" is stored as the byte 255, which may be invalid in some
encodings. Use "\u00ff" to store character 255 according to the current value
of 'encoding'.
Note that "\000" and "\x00" force the end of the string.
As you use assigning to register using vim expression language, it's definitely possible in platform independent way. The strings in vim expressions understand the standard escape sequences, therefore it's best to replace ^M with \r and Esc with \x1b:
let #o = ":center\riSomeInsertedString\x1b"
There is no list of of special characters to be translated as far as I know but you can simply take all control characters (ASCII below 32) and translate them to corresponding escape sequence "\xHexValue" where HexValue is the value of the character. Even \r (or ^M) can be translated to \x0d as its ASCII value is 13 (0x0d hex).

find non LaTeX characters (eg. acute accents) with regex in vim

I was pasting bibtex references into bibex. some names contain characters that latex skips. for example, á. is there a way in vim or regex to search for all characters that are skipped by latex? one way I would think is to write in regex to search for anything that doesn't contain 0-9, a-z, A-Z and some characters like / \ $
I am not familiar with which characters LaTeX ignores, but if the file you are editing is encoded in UTF-8, you might try searching for characters outside the ASCII repertoire (0–127; or 32–127).
As a search command in Vim:
/[^\d0-\d127]
/[^\d32-\d127]
You can also use hex or octal instead of decimal; see :help /[]. This requires that l and \ not be present in the value of cpoptions (they are not present in the default state).
This should work for any encoding that is “the same as ASCII (where it is defined)” (i.e. UTF-8 and most “latin” encodings). If you are dealing with an encoding that clashes with ASCII, then you will need to refine the range specification.

Resources