gVim incorrectly guesses file encoding - vim

I'm running gVim 8.2 with default configuration on Windows 7 with russian language (so all the system text and menu items are in russian). When I open a utf8 file with russian text in it, it's displayed incorrectly in cp1251 for some reason:
:set encoding?
encoding=cp1251
manually setting :set encoding=utf8 fixes it.
Other encoding-related options have following values:
:set fileencoding?
fileencoding=
:set fileencodings?
fileencodings=ucs-bom
I find vim help confusing here, because it doesn't seem to explain how it guesses the encoding. For some reason other applications I tried (Notepad++, Sublime Text 4, even Windows Notepad) guess the file encoding correctly. As I mentioned in the beginning, I run gVim with default configuration, so there's no custom vimrc anywhere:
:echo $MYVIMRC
D:\Program Files (x86)\Vim\_vimrc
What would be the correct way to fix this problem?

Create a vimrc with set encoding=utf-8 in it. This should be the default in newer versions of Vim on Windows, as can be seen from :help 'encoding'.
'encoding' 'enc' string (default for MS-Windows: "utf-8",
otherwise: value from $LANG or "latin1")
The default value used to be latin1 on Windows but it was changed to utf-8 recently.
This should be enough to solve your issue.
Again from :help 'encoding':
Sets the character encoding used inside Vim. It applies to text in
the buffers, registers, Strings in expressions, text stored in the
viminfo file, etc. It sets the kind of characters which Vim can work
with.
Vim uses fileencodings (plural) to try and guess the encoding of your file. fileencoding (singular) is the encoding that Vim guessed (or that you've set) for your file. You probably don't need to change either of these.

Related

Why does VIM changes encoding of all buffers when using setlocal?

When I'm using setlocal encoding on VIM to change the buffer encoding VIM changes the encoding of all buffers. It's really annoying because I'm editing files in different encoding and I'm corrupting them all the time. I tried it by creating two buffers, did a setlocal encoding=latin1 on one, a setlocal encoding=utf-8 on the other : the encoding is always the same on the two buffers and it's always the last encoding I set.
setlocal works well on 'shiftwidth' for example, the issue only exhibits on the encoding option.
I tried to reinstall a new copy of VIM, I tried the 7.2 and 7.1, I still have the issue. I've disabled all my plugins, my vimrc.
Am I losing my mind thinking that it's the way it used to work ? Do you have some idea that might cause that ?
I'm using VIM on Windows 7 if that's important.
'encoding' is a global option that dictates how Vim treats text internally so you are two times wrong.
setlocal doesn't work for global-only options like 'encoding'.
:help 'option' tells you the "scope" of 'option'.
'encoding' is not the option you are looking for, you want :set fileencoding=utf-8.
Reference:
:help 'encoding'
:help 'fileencoding'
:help 'setlocal'

Is it possible to automatically set UTF16 file encoding when opening a file of that type?

I edit all kinds of files with Vim (as I'm sure most Vim users do). One bug bear I have is what Vim does when I come across a file with an odd encoding. Most editors (these days) make a good stab at detecting file encodings. However, Vim generally doesn't. And you have to type, for example:
:e ++enc=utf-16le
To re-read the file in UTF-16 (Otherwise you get a mass of # signs)
I've been searching around and have seen scripts like set_utf8.vim which can detect a specific file encoding. However, is there are more general solution? I'm a bit bored of having to manually work out what the file encoding is and consulting the help every time I open an unusual file.
Adding the encoding name to 'fileencodings' should do the trick:
:set fencs=ucs-bom,utf-16le,utf-8,default,latin1
Alternatively, there are plugins like AutoFenc and fencview.
Add this code to your .vimrc:
if has("multi_byte")
if &termencoding == ""
let &termencoding = &encoding
endif
set encoding=utf-8
setglobal fileencoding=utf-8
"setglobal bomb
set fileencodings=ucs-bom,utf-8,latin1
endif
Do you have a byte-order-mark ? Vim should detect this and work appropriately. From the doc - section 45.4:
When you start editing that 16-bit Unicode file, and it has a BOM, Vim
will detect this and convert the file to utf-8 when reading it. The
'fileencoding' option (without s at the end) is set to the detected
value. In this case it is "utf-16le". That means it's Unicode,
16-bit and little-endian. This file format is common on MS-Windows
(e.g., for registry files).

gvim change file character utf-8, but other editor show ansi

System is Win7
My vimrc:
set encoding=utf-8
set ffs=unix,dos,mac
set fencs=utf-8,ucs-bom,euc-jp,gb18030,gbk,gb2312,cp936
set fenc=utf-8
After I save the file with gvim, I open the file in editplus :
PS: editplus's default character setting is ANSI.
Why?
The beauty of the UTF-8 encoding is that, as long as the text is pure 7-bit ASCII (i.e. only the normal English characters), it is indistinguishable from ANSI / ASCII. So it may be fine that other editors show the file as ANSI.
When you insert a non-ASCII character (e.g. A UMLAUT = Ä; you can enter this in Vim as a digraph, pressing Ctrl + K, followed by A :), other editors should properly detect UTF-8, though (or they will probably show two pieces of garbage instead).
If this were another Unicode encoding, you could help other editors by inserting a byte order mark (:setlocal bomb in Vim), but for UTF-8, there is no high/low-byte ordering issue, and a BOM is rarely used (and some applications don't cope well with it.)
Vim must be adding some special characters or changing something in there. Can you copy the file off somewhere else then save over it in gvim, and attach both the original and gvim-saved version at http://www.diffchecker.com/ or some site like that? That should tell you exactly what gvim has modified about it.

Vim and ASCII extended characters?

I would like to know how can I set VIM 7.0 to show and work with ASCII extended characters without problem.
Vim (which is what vi resolves to on most systems) readily supports extended character sets. You might need to tell Vim which encoding to use, though.
This is controlled by two options:
:set encoding
:set fileencoding
If you have loaded a file that displays incorrectly, you may use :set encoding=<new encoding> to force the appropriate encoding. This changes the interpretation of the characters on the fly. If you want to save the file in another encoding preserving the current interpretation of characters, use set fileencoding=<new encoding> to let Vim save the file in that encoding.
I recommend that you set utf-8 as the default encoding in your .vimrc.
Once the the characters are "extended" it's not ASCII any more.
However: Just use vim. ":help unicode" for more details.
The other solutions here didn't work for me. Vim told me that encoding and fileencoding were not supported options. That turned out to be because I was building from source myself, and I did not include the multi-byte feature. My two Macs are similar, but one of them enabled it by default while the other did not.
If you're building Vim from source like I was, include --enable-multibyte in your arguments to ./configure. In my case, Vim defaulted to UTF-8 and supported extended characters after that.
I would suggest you to try the following:
set the terminal to utf-8 (how to do that depends on your terminal; in PuTTY it's in the Window/Translation menu)
set your locale to utf-8 (how to do that depends on your OS; on my Debian box it's set LC_ALL=en_GB.UTF-8 for the current session and sudo dpkg-reconfigure locales for permanent system-wide changes) -- you can check your current locale with locale.
That's how it works for me (using VIM 7.1.314 and no .vimrc).

How can I make vim recognize the file's encoding?

I noticed that most of the time, when using some encoding other than 'the standard english', vim doesn't recognize and does not display characters correctly.
This is most easily seen by opening some ascii graphics, or similar files off the net, which use cp437 code page.
Is there a way to make vim check for encoding when opening a file, and open it with a correct one ?
What encodings do you use, as the most "portable" ones (the ones which the largest amount of users will have least problems with) ?
Vim needs to detect the encoding, and that's going to be problematic, since files don't often explicitly state their encodings (an obvious exception are XML files with an encoding attribute in the header).
You can force Vim to reload a file with a different encoding thus:
:e ++enc=cp437
and you can set the default encoding in your .vimrc if you wish.
This page has more info and links, especially wrt. editing Unicode. UTF-8 is the most widely-used encoding, and the default you should probably go for.
You can use a vim modeline to set the file's encoding. This is simply a comment, in the first five lines of the file, that starts with vi: set fileencoding=cp437.
You could also start with 'vim:', instead of 'vi: set', but the latter makes it compatible with more editors. You definitely need the space between either of these prefixes and 'fileencoding', or whatever option you want to set. The fileencoding option should solve your problem, though.
So, in Python or an .rc file, you can put this at the top of your file:
# vi: set fileencoding=cp437
In Java, C, C++, JavaScript, etc. put this:
// vi: set fileencoding=cp437
For more information, in vim, type :help modeline.
You can set the variable 'fileencodings' in your .vimrc.
This is a list of character encodings considered when starting to edit
an existing file. When a file is read, Vim tries to use the first
mentioned character encoding. If an error is detected, the next one
in the list is tried. When an encoding is found that works,
'fileencoding' is set to it. If all fail, 'fileencoding' is set to
an empty string, which means the value of 'encoding' is used.
See :help filencodings
If you often work with e.g. cp437 or cp1252, you can add it there:
set fileencodings=ucs-bom,utf-8,cp1252,default,latin9
You can encode your files using unicode, and set a Byte Order Mark (BOM) in the file. This will make vim treat it appropriately, but some compilers and programs may have trouble with it. Even basic shell commands like cat may misbehave for some use cases.
To do it, type this in vim:
:set fileencoding=utf-8
:set bomb
:w
For more information, type:
:help bomb
Add at the end of your .vimrc: (you can replace "utf-8" by "latin1" if needed)
if len(&fenc) == 0
silent! exe "e! ++enc=utf-8"
endif

Resources