When attempting to specify handling of whitespace characters in vim, how does one figure out character-encoding specifiers, and then implement them? - vim

I'm trying to specify how to display whitespace characters in vim. I've seen a lot of suggestions, and am trying to use this:
:set listchars=tab:→\ ,space:·,nbsp:␣,trail:•,eol:¶,precedes:«,extends:»
But I'm getting E474: Invalid argument: listchars=tab:\~F~R\ ,space:·,nbsp:~P,trail:~#,eol:¶,precedes:«,extends:». I'm pretty sure the text encoding isn't right (Am I even saying that right?).
I'm guessing I have to cast the character type? Or something? How does one find out what encoding is needed, then how do they specify it correctly in .vimrc?
Thank you so much for your time.

How does one find out what encoding is needed, then how do they specify it correctly in .vimrc?
The command is called :help. The only command every Vim user must know.
From :h 'listchars':
The characters ':' and ',' should not be used. UTF-8 characters can be used when 'encoding' is "utf-8", otherwise only printable characters are allowed. All characters must be single width.
So either you're using encoding different from UTF-8, or some outdated Vim version.

Related

How to insert Unicode character U+2611 in gvim

when I try to enter this Unicode character :☑(U+2611) in vim using the command like : ^Vu2611 (which means press ctrl+V then type u2611 in insert mode),Vim somehow breaks it into two characters : &(26) and ^Q(11).
There's no any problem when I tried to insert other kind of characters like □ (U+a1f5).
It seems like Vim stopped its parsing immediately after 26 (which represents character '&') has been read .
So,how can I insert this kind of Unicode characters in Vim (I have tried to paste it into Vim ,it doesn't work)?
Please Help!!!
In order to process Unicode characters, Vim must use an 'encoding' that is able to represent those characters. With a value of latin1, the mentioned character cannot be encoded (this 8-bit encoding only includes ASCII and several Western European characters, see here).
So, you need to
:set encoding=utf-8
With that, any newly created file will use that encoding, and you should be able to insert Unicode characters and write them (also with another Unicode file encoding, like :w ++enc=ucs-2le; but if you tried to persist as :w ++enc=latin1, you'd get a CONVERSION ERROR).

Removing hex code ffa3 in Vim

I've got a file with a load of weird characters with in it that I need to get rid of.
Using ga on the character reveals it has the following encodings:
ᆪ> 65443, Hex ffa3, Octal 177643
But I can't seem to find it using :%s/\%xffa3//g. What am I doing wrong?
Look at :help \%x:
\%x2a Matches the character specified with up to two hexadecimal characters.
So Vim is actually matching the three characters <uf>a3. Since you have a four-digit hex number, you need to use \%u:
:%s/\%uffa3//g
Alternatives
You can also insert the character directly into the command line via :help i_CTRL-V_digit (i.e. <C-v>uffa3), but if you already have instances of that character in your buffer (and near your cursor!), I'd just yank that char with yl and insert it in the command-line via <C-r>".

(VIM) Is vimgrep capable of searching unicode string

Is vimgrep capable of searching unicode strings?
For example:
a.txt contains wide string "hello", vimgrep hello *.txt found nothing, and of course it's in the right path.
"Unicode" is a bit misleading in this case. What you have is not at all typical of text "encoded in accordance with any of the method provided by the Unicode standard". It's a bunch of normal characters with normal code points separated with NULL characters with code point 0000 or 00. Some Java programs do output that kind of garbage.
So, if your search pattern is hello, Vim and :vim are perfectly capable of searching for and finding hello (without NULLs) but they won't ever find hello (with NULLs).
Searching for h^#e^#l^#l^#o (^# is <C-v><C-#>), on the other hand, will find hello (with NULLs) but not hello (without NULLs).
Anyway, converting that file/buffer or making sure you don't end up with such a garbage are much better long-term solutions.
If Vim can detect the encoding of the file, then yes, Vim can grep the file. :vimgrep works by first reading in the file as normal (even including autocmds) into a hidden buffer, and then searching the buffer.
It looks like your file is little-endian UTF-16, without a byte-order mark (BOM). Vim can detect this, but won't by default.
First, make sure your Vim is running with internal support for unicode. To do that, :set encoding=utf-8 at the top of your .vimrc. Next, Vim needs to be able to detect this file's encoding. The 'fileencodings' option controls this.
By default, when you set 'encoding' to utf-8, Vim's 'fileencodings' option contains "ucs-bom" which will detect UTF-16, but ONLY if a BOM is present. To also detect it when no BOM is present, you need to add your desired encoding to 'fileencodings'. It needs to come before any of the 8-bit encodings but after ucs-bom. Try doing this at the top of your .vimrc and restart Vim to use:
set encoding=utf-8
set fileencodings=ucs-bom,utf-16le,utf-8,default,latin1
Now loading files with the desired encoding should work just fine for editing, and therefore also for vimgrep.

How does vim know encoding of my files? When even I don't

I have been in charset-hell for days and vim somehow always shows the right charset for my file when even I'm not sure what they are (I'm dealing with files with identical content encoded in both charsets, mixed together)
I can see from inspecting the ü (u-umlaut) character in UTF-8 vs ISO-8859-1 which encoding I'm in, but I don't understand how vim figured it out - in those character-sets only the 'special characters' really look any different
If there is some other recording of the encoding/charset information I would love to know it
The explanation can be found under :help 'fileencodings':
This is a list of character encodings considered when starting to edit
an existing file. When a file is read, Vim tries to use the first
mentioned character encoding. If an error is detected, the next one
in the list is tried. When an encoding is found that works,
'fileencoding' is set to it. If all fail, 'fileencoding' is set to
an empty string, which means the value of 'encoding' is used.
So, there's no magic involved. When there's a Byte Order Mark in the file, that's easy. Else, Vim tries some other common encodings (which you can influence with that option; e.g. Japanese people will probably include something like sjis if they frequently edit such encoded files).
If you want a more intelligent detection, there are plugins for that, e.g. AutoFenc - Tries to automatically detect and set file encoding.

Using vim+LaTeX with Scandinavian characters

I want to create a lab write-up with LaTeX in Ubuntu, however my text includes Scandinavian characters and at present I have to type them in using /"a and "/o etc. Is it possible to get the latex-compiler to read these special characters when they are typed in as is? Additionally, I would like vim to "read" Finnish: Now when I open a .tex-document containing Scandinavian characters, they are not displayed at all in vim. How can I correct this?
For latex, use the inputenc option:
\usepackage[utf8]{inputenc}
Instead of utf8, you may use whatever else fits you, like latin1, as well.
Now the trick is to make your terminal run the same character encoding. It seems that it runs a character/input encoding that doesn't fit your input right now.
For this, refer to the "Locale" settings of your distribution. You can always check the locale settings in the terminal by issueing locale. These days, UTF8 locales are preferred as they work with every character imaginable. If your terminal's environment is set up correctly, vim should happily work with all your special characters without mourning.
To find out in which encoding Vim thinks the document is, try:
:set enc
To set the encoding to UTF-8, try:
:set enc=utf8
I can't help with vim, but for LaTeX I recommend you check out XeTeX, which is an extension of TeX that is designed to support Unicode input. XeTeX is now part of Texlive, so if you have TeX installed chances are you already have it.
I use the UCS unicode support: http://iamleeg.blogspot.com/2007/10/nice-looking-latex-unicode.html

Resources