find non LaTeX characters (eg. acute accents) with regex in vim

find non LaTeX characters (eg. acute accents) with regex in vim - vim

I was pasting bibtex references into bibex. some names contain characters that latex skips. for example, á. is there a way in vim or regex to search for all characters that are skipped by latex? one way I would think is to write in regex to search for anything that doesn't contain 0-9, a-z, A-Z and some characters like / \ $

I am not familiar with which characters LaTeX ignores, but if the file you are editing is encoded in UTF-8, you might try searching for characters outside the ASCII repertoire (0–127; or 32–127).
As a search command in Vim:
/[^\d0-\d127]
/[^\d32-\d127]
You can also use hex or octal instead of decimal; see :help /[]. This requires that l and \ not be present in the value of cpoptions (they are not present in the default state).
This should work for any encoding that is “the same as ASCII (where it is defined)” (i.e. UTF-8 and most “latin” encodings). If you are dealing with an encoding that clashes with ASCII, then you will need to refine the range specification.

Related

How to limit text in UTF-8 to only script characters?

I want to restrict a UTF-8 string to only script characters in any language. By script characters I mean only those characters in the language's written script, i.e. no symbols or special characters. Same as scripts here: http://www.unicode.org/charts/index.html
Would I have to go off and identify these character ranges for each and every language in UTF-8? Or is something e.g. regex, library... that I can make use of?

Depending on the language you're implementing this in, you might be able to use Unicode character categories in regular expressions.
The following expression should match all letters and numbers, but exclude punctuation, whitespace, symbols, etc.
[\p{L}\p{N}]*
Here's a small demo on regex101.

How to disable space when using pandoc with Chinese characters?

When we use vim, we always set vim to limit the number of characters per line. Like this
set cc=80
set fo=+tMn
So if I convert a markdown file to a docx file, pandoc will automatically place a space at the tail of every line, which is nice for English docs.
But for chinese characters, it is wired for a space in a single sentence. So is there any methods to avoid this problem?

This is solution I found, from the MANUAL.
Extension: east_asian_line_breaks
Causes newlines within a paragraph to be ignored, rather than being treated as spaces or as hard line breaks, when they occur between two East Asian wide characters. This is a better choice than ignore_line_breaks for texts that include a mix of East Asian wide characters and other characters.

How to insert Unicode character U+2611 in gvim

when I try to enter this Unicode character :☑(U+2611) in vim using the command like : ^Vu2611 (which means press ctrl+V then type u2611 in insert mode),Vim somehow breaks it into two characters : &(26) and ^Q(11).
There's no any problem when I tried to insert other kind of characters like □ (U+a1f5).
It seems like Vim stopped its parsing immediately after 26 (which represents character '&') has been read .
So,how can I insert this kind of Unicode characters in Vim (I have tried to paste it into Vim ,it doesn't work)?
Please Help!!!

In order to process Unicode characters, Vim must use an 'encoding' that is able to represent those characters. With a value of latin1, the mentioned character cannot be encoded (this 8-bit encoding only includes ASCII and several Western European characters, see here).
So, you need to
:set encoding=utf-8
With that, any newly created file will use that encoding, and you should be able to insert Unicode characters and write them (also with another Unicode file encoding, like :w ++enc=ucs-2le; but if you tried to persist as :w ++enc=latin1, you'd get a CONVERSION ERROR).

Removing hex code ffa3 in Vim

I've got a file with a load of weird characters with in it that I need to get rid of.
Using ga on the character reveals it has the following encodings:
ﾣ> 65443, Hex ffa3, Octal 177643
But I can't seem to find it using :%s/\%xffa3//g. What am I doing wrong?

Look at :help \%x:
\%x2a Matches the character specified with up to two hexadecimal characters.
So Vim is actually matching the three characters <uf>a3. Since you have a four-digit hex number, you need to use \%u:
:%s/\%uffa3//g
Alternatives
You can also insert the character directly into the command line via :help i_CTRL-V_digit (i.e. <C-v>uffa3), but if you already have instances of that character in your buffer (and near your cursor!), I'd just yank that char with yl and insert it in the command-line via <C-r>".

vim and unrecognized characters

I have a file with some accents, and VIM displays them as "~V" characters. The "od -bc" command tells me the characters are charcode 226. I want to substitute them using VIM. But I can't get it to match the characters. How can I achieve that?
Optional question: how can I have VIM tell me which charset is used to interpret the current file?

You can use the following formats, from vim's manual on patterns and regular expressions:
ordinary atom
magic nomagic matches
\%d \%d match specified decimal character (eg \%d123
\%x \%x match specified hex character (eg \%x2a)
\%o \%o match specified octal character (eg \%o040)
\%u \%u match specified multibyte character (eg \%u20ac)
\%U \%U match specified large multibyte character (eg \%U12345678)
So you should be able to do something like this to replace char 226 with a space globally in the file:
:%s/\%d226/ /g
As for the latter, if you do:
:set encoding
You'll see output like:
encoding=latin1

One very simple way to deal with such "weird" characters is:
select the offending character(s) visually (v)
yank it to buffer
replace it with: :%s/<ctrl-r>"/something-else/g
where <ctrl-r> is pressing ctrl and letter r - together with " it will copy buffer to command line - effectively putting your offending characters inside of s/// operation.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

find non LaTeX characters (eg. acute accents) with regex in vim - vim

Related

How to limit text in UTF-8 to only script characters?

How to disable space when using pandoc with Chinese characters?

How to insert Unicode character U+2611 in gvim

Removing hex code ffa3 in Vim

vim and unrecognized characters

Categories

Resources