Vim treats unicode characters as a separate text object from english alphabets - string

I use both of vim and neovim, but just found that both editors treat unicode characters (such as Korean characters) as a different text object from english alphabet separately.
For instance, if there is a text, my_str = 'ABC가나다', vim only removes ABC if I type dw at A.
Of course I would be able to use di' instead, but it's getting annoying with using surrounding.vim plugin.
Typing ysw[ only surrounds ABC because Vim recognizes 가나다 as a separate text object even tough there is no space between the two.
Will there be any solution for this?
Thank you so much.
June

Related

How to disable space when using pandoc with Chinese characters?

When we use vim, we always set vim to limit the number of characters per line. Like this
set cc=80
set fo=+tMn
So if I convert a markdown file to a docx file, pandoc will automatically place a space at the tail of every line, which is nice for English docs.
But for chinese characters, it is wired for a space in a single sentence. So is there any methods to avoid this problem?
This is solution I found, from the MANUAL.
Extension: east_asian_line_breaks
Causes newlines within a paragraph to be ignored, rather than being treated as spaces or as hard line breaks, when they occur between two East Asian wide characters. This is a better choice than ignore_line_breaks for texts that include a mix of East Asian wide characters and other characters.

How to insert Unicode character U+2611 in gvim

when I try to enter this Unicode character :☑(U+2611) in vim using the command like : ^Vu2611 (which means press ctrl+V then type u2611 in insert mode),Vim somehow breaks it into two characters : &(26) and ^Q(11).
There's no any problem when I tried to insert other kind of characters like □ (U+a1f5).
It seems like Vim stopped its parsing immediately after 26 (which represents character '&') has been read .
So,how can I insert this kind of Unicode characters in Vim (I have tried to paste it into Vim ,it doesn't work)?
Please Help!!!
In order to process Unicode characters, Vim must use an 'encoding' that is able to represent those characters. With a value of latin1, the mentioned character cannot be encoded (this 8-bit encoding only includes ASCII and several Western European characters, see here).
So, you need to
:set encoding=utf-8
With that, any newly created file will use that encoding, and you should be able to insert Unicode characters and write them (also with another Unicode file encoding, like :w ++enc=ucs-2le; but if you tried to persist as :w ++enc=latin1, you'd get a CONVERSION ERROR).

How can I find the character code of a special character in my text editor?

When pasting text from outside sources into a plain-text editor (e.g. TextMate or Sublime Text 2) a common problem is that special characters are often pasted in as well. Some of these characters render fine, but depending on the source, some might not display correctly (usually showing up as a question mark with a box around it).
So this is actually 2 questions:
Given a special character (e.g., ’ or ♥) can I determine the UTF-8 character code used to display that character from inside my text editor, and/or convert those characters to their character codes?
For those "extra-special" characters that come in as garbage, is there any way to figure out what encoding was used to display that character in the source text, and can those characters somehow be converted to UTF-8?
My favorite site for looking up characters is fileformat.info. They have a great Unicode character search that includes a lot of useful information about each character and its various encodings.
If you see the question mark with a box, that means you pasted something that can't be interpreted, often because it's not legal UTF-8 (not every byte sequence is legal UTF-8). One possibility is that it's UTF-16 with an endian mode that your editor isn't expecting. If you can get the full original source into a file, the file command is often the best tool for determining the encoding.
At &what I built a tool to focus on searching for characters. It indexes all the Unicode and HTML entity tables, but also supplements with hacker dictionaries and a database of keywords I've collected, so you can search for words like heart, quot, weather, umlaut, hash, cloverleaf and get what you want. By focusing on search, it avoids having to hunt around the Unicode pages, which can be frustrating. Give it a try.

Enter Unicode characters with 8-digit hex code

How do I enter Unicode characters like 𝓭 without copying it to the clipboard and pasting it?
Things I know:
The command ga on the character 𝓭 gives me hex:0001d4ed.
I can copy it on the clipboard and paste it via "+p.
I know how to enter Unicode values that have a 4 digit hex code:
<C-v>u for example <C-v>u03b1 gives the α character.
You can use <C-v>U, that is, an uppercase u, to input an 8 digit hex codepoint character.
More information here and here.
There is a Vim feature designed to simplify entering characters that
cannot be typed directly. It is called Digraphs (see :help digraphs).
To define a custom digraph for entering ‘𝓭’, use an Ex command similar
to the one below.
:dig dd 120045
where 120045 is the decimal representation of ‘𝓭’, as one can easily
confirm using the ga command.
Inserting a character using a digraph is simple:
Type Ctrl+K followed by the shortcut of that
digraph (dd for the above example).
There exists a Unicode plugin for Vim. According to the plugin description, this plugin has three main features:
Character/digraph completion using either the Unicode name or the codepoint.
Identify the character/digraph under the cursor.
Search for digraphs by name; transform two normal characters into their corresponding digraph.

How to search for a character the displays as "<85>" in Vim

I have a file that was converted from EBCDIC to ASCII. Where there used to be new lines there are now characters that show up as <85> (a symbol representing a single character, not the four characters it appears to be) and the whole file is on one line. I want to search for them and replace them all with new lines again, but I don't know how.
I tried putting the cursor over one and using * to search for the next occurrence, hoping that it might show up in my / search history. That didn't work, it just searched for the word that followed the <85> character.
I searched Google, but didn't see anything obvious.
My goal is to build a search and replace string like:
:%s/<85>/\n/g
Which currently just gives me:
E486: Pattern not found: <85>
I found "Find & Replace non-printable characters in vim" searching Google. It seems like you should be able to do:
:%s/\%x85/\r/gc
Omit the c to do the replacement without prompting, try with c first to make sure it is doing what you want it to do.
In Vim, typing :h \%x gives more details. In addition to \%x, you can use \%d, \%o, \%u and \%U for decimal, octal, up to four and up to eight hexadecimal characters.
For special character searching, win1252 for example, for the case of <80>,<90>,<9d>...
type:
/\%u80, \/%u90, /\%u9d ...
from the editor.
Similarly for octal, decimal, hex, type: /\%oYourCode, /\%dYourCode, /\%xYourCode.
try this: :%s/<85>/^M/g
note: press Ctrl-V together then M
or if you don't mind using another tool,
awk '{gsub("<85>","\n")}1' file

Resources