UTF-8 encoding in VIM - vim

I have no problem with UTF-8 chars in vim while editing, but when I do set fileencoding=utf-8 and then save and reload the file, vim displays nonascii chars incorrectly.
Why is that happening, and how to fix this?

What is the value of your 'fileencodings' setting? This setting determines what encodings are tried by Vim to interpret the contents of a buffer that is being loaded. The first encoding that does not result in a conversion error is assumed to be the current file's encoding.
If utf8 is to be tested you should put it at the beginning of that setting.
Probably you have fileencodings set to latin1 for example. This encoding is always valid (there is no single byte combination that could be rejected).
If you don't want to change fileencodings you can reload the file with: :e! ++enc=utf8

Related

gVim incorrectly guesses file encoding

I'm running gVim 8.2 with default configuration on Windows 7 with russian language (so all the system text and menu items are in russian). When I open a utf8 file with russian text in it, it's displayed incorrectly in cp1251 for some reason:
:set encoding?
encoding=cp1251
manually setting :set encoding=utf8 fixes it.
Other encoding-related options have following values:
:set fileencoding?
fileencoding=
:set fileencodings?
fileencodings=ucs-bom
I find vim help confusing here, because it doesn't seem to explain how it guesses the encoding. For some reason other applications I tried (Notepad++, Sublime Text 4, even Windows Notepad) guess the file encoding correctly. As I mentioned in the beginning, I run gVim with default configuration, so there's no custom vimrc anywhere:
:echo $MYVIMRC
D:\Program Files (x86)\Vim\_vimrc
What would be the correct way to fix this problem?
Create a vimrc with set encoding=utf-8 in it. This should be the default in newer versions of Vim on Windows, as can be seen from :help 'encoding'.
'encoding' 'enc' string (default for MS-Windows: "utf-8",
otherwise: value from $LANG or "latin1")
The default value used to be latin1 on Windows but it was changed to utf-8 recently.
This should be enough to solve your issue.
Again from :help 'encoding':
Sets the character encoding used inside Vim. It applies to text in
the buffers, registers, Strings in expressions, text stored in the
viminfo file, etc. It sets the kind of characters which Vim can work
with.
Vim uses fileencodings (plural) to try and guess the encoding of your file. fileencoding (singular) is the encoding that Vim guessed (or that you've set) for your file. You probably don't need to change either of these.

"set listchars=tab:>-,eol:$,space:·" the "space:·" part doesn't work?

So strange is that it's all right before, but after I added one line in the vimrc with Windows Notepad, the error occur! And it makes me crazy!
Here is ":set listchars" part of my vimrc:
set listchars=tab:>-,eol:$,space:·
encountering error:
E474: invalid argument: listchars=tab:>-,eol:$,space:<a1><a4>
As you're using non-ASCII characters, you need to use :help :scriptencoding to specify the encoding of your ~/.vimrc.
You're also overriding the default 'encoding', which affects the representation of any text inside Vim, also any non-ASCII text already read in. Therefore, you first need to :set encoding, and only then (with the correct internal representation set) can you specify the script's encoding:
set encoding=utf-8
scriptencoding utf-8
set listchars=...
Alternatively, you could also work around the vimrc encoding issues by encoding the non-ASCII characters in a (double-quoted) string (:help expr-quote), and using :let &option instead of :set:
let &listchars = "tab:>-,eol:$,space:\u00B7"
It seems, that Vim is loading the file vimrc with some encoding that is not UTF-8. I don't know which encoding is used.
To force Vim to interpret vimrc as UTF-8, the following line is needed:
scriptencoding utf-8
All lines following this lines are then interpreted as UTF-8. All lines before that line are interpreted in whatever Vim determines.
If you set encoding in your vimrc, the line scriptencoding must be after that line.
So in your case, you need:
set encoding=utf-8
scriptencoding utf-8
See :help :scriptencoding for details.

Is it possible to automatically set UTF16 file encoding when opening a file of that type?

I edit all kinds of files with Vim (as I'm sure most Vim users do). One bug bear I have is what Vim does when I come across a file with an odd encoding. Most editors (these days) make a good stab at detecting file encodings. However, Vim generally doesn't. And you have to type, for example:
:e ++enc=utf-16le
To re-read the file in UTF-16 (Otherwise you get a mass of # signs)
I've been searching around and have seen scripts like set_utf8.vim which can detect a specific file encoding. However, is there are more general solution? I'm a bit bored of having to manually work out what the file encoding is and consulting the help every time I open an unusual file.
Adding the encoding name to 'fileencodings' should do the trick:
:set fencs=ucs-bom,utf-16le,utf-8,default,latin1
Alternatively, there are plugins like AutoFenc and fencview.
Add this code to your .vimrc:
if has("multi_byte")
if &termencoding == ""
let &termencoding = &encoding
endif
set encoding=utf-8
setglobal fileencoding=utf-8
"setglobal bomb
set fileencodings=ucs-bom,utf-8,latin1
endif
Do you have a byte-order-mark ? Vim should detect this and work appropriately. From the doc - section 45.4:
When you start editing that 16-bit Unicode file, and it has a BOM, Vim
will detect this and convert the file to utf-8 when reading it. The
'fileencoding' option (without s at the end) is set to the detected
value. In this case it is "utf-16le". That means it's Unicode,
16-bit and little-endian. This file format is common on MS-Windows
(e.g., for registry files).

gvim change file character utf-8, but other editor show ansi

System is Win7
My vimrc:
set encoding=utf-8
set ffs=unix,dos,mac
set fencs=utf-8,ucs-bom,euc-jp,gb18030,gbk,gb2312,cp936
set fenc=utf-8
After I save the file with gvim, I open the file in editplus :
PS: editplus's default character setting is ANSI.
Why?
The beauty of the UTF-8 encoding is that, as long as the text is pure 7-bit ASCII (i.e. only the normal English characters), it is indistinguishable from ANSI / ASCII. So it may be fine that other editors show the file as ANSI.
When you insert a non-ASCII character (e.g. A UMLAUT = Ä; you can enter this in Vim as a digraph, pressing Ctrl + K, followed by A :), other editors should properly detect UTF-8, though (or they will probably show two pieces of garbage instead).
If this were another Unicode encoding, you could help other editors by inserting a byte order mark (:setlocal bomb in Vim), but for UTF-8, there is no high/low-byte ordering issue, and a BOM is rarely used (and some applications don't cope well with it.)
Vim must be adding some special characters or changing something in there. Can you copy the file off somewhere else then save over it in gvim, and attach both the original and gvim-saved version at http://www.diffchecker.com/ or some site like that? That should tell you exactly what gvim has modified about it.

How can I make vim recognize the file's encoding?

I noticed that most of the time, when using some encoding other than 'the standard english', vim doesn't recognize and does not display characters correctly.
This is most easily seen by opening some ascii graphics, or similar files off the net, which use cp437 code page.
Is there a way to make vim check for encoding when opening a file, and open it with a correct one ?
What encodings do you use, as the most "portable" ones (the ones which the largest amount of users will have least problems with) ?
Vim needs to detect the encoding, and that's going to be problematic, since files don't often explicitly state their encodings (an obvious exception are XML files with an encoding attribute in the header).
You can force Vim to reload a file with a different encoding thus:
:e ++enc=cp437
and you can set the default encoding in your .vimrc if you wish.
This page has more info and links, especially wrt. editing Unicode. UTF-8 is the most widely-used encoding, and the default you should probably go for.
You can use a vim modeline to set the file's encoding. This is simply a comment, in the first five lines of the file, that starts with vi: set fileencoding=cp437.
You could also start with 'vim:', instead of 'vi: set', but the latter makes it compatible with more editors. You definitely need the space between either of these prefixes and 'fileencoding', or whatever option you want to set. The fileencoding option should solve your problem, though.
So, in Python or an .rc file, you can put this at the top of your file:
# vi: set fileencoding=cp437
In Java, C, C++, JavaScript, etc. put this:
// vi: set fileencoding=cp437
For more information, in vim, type :help modeline.
You can set the variable 'fileencodings' in your .vimrc.
This is a list of character encodings considered when starting to edit
an existing file. When a file is read, Vim tries to use the first
mentioned character encoding. If an error is detected, the next one
in the list is tried. When an encoding is found that works,
'fileencoding' is set to it. If all fail, 'fileencoding' is set to
an empty string, which means the value of 'encoding' is used.
See :help filencodings
If you often work with e.g. cp437 or cp1252, you can add it there:
set fileencodings=ucs-bom,utf-8,cp1252,default,latin9
You can encode your files using unicode, and set a Byte Order Mark (BOM) in the file. This will make vim treat it appropriately, but some compilers and programs may have trouble with it. Even basic shell commands like cat may misbehave for some use cases.
To do it, type this in vim:
:set fileencoding=utf-8
:set bomb
:w
For more information, type:
:help bomb
Add at the end of your .vimrc: (you can replace "utf-8" by "latin1" if needed)
if len(&fenc) == 0
silent! exe "e! ++enc=utf-8"
endif

Resources