System is Win7
My vimrc:
set encoding=utf-8
set ffs=unix,dos,mac
set fencs=utf-8,ucs-bom,euc-jp,gb18030,gbk,gb2312,cp936
set fenc=utf-8
After I save the file with gvim, I open the file in editplus :
PS: editplus's default character setting is ANSI.
Why?
The beauty of the UTF-8 encoding is that, as long as the text is pure 7-bit ASCII (i.e. only the normal English characters), it is indistinguishable from ANSI / ASCII. So it may be fine that other editors show the file as ANSI.
When you insert a non-ASCII character (e.g. A UMLAUT = Ä; you can enter this in Vim as a digraph, pressing Ctrl + K, followed by A :), other editors should properly detect UTF-8, though (or they will probably show two pieces of garbage instead).
If this were another Unicode encoding, you could help other editors by inserting a byte order mark (:setlocal bomb in Vim), but for UTF-8, there is no high/low-byte ordering issue, and a BOM is rarely used (and some applications don't cope well with it.)
Vim must be adding some special characters or changing something in there. Can you copy the file off somewhere else then save over it in gvim, and attach both the original and gvim-saved version at http://www.diffchecker.com/ or some site like that? That should tell you exactly what gvim has modified about it.
Related
I'm running gVim 8.2 with default configuration on Windows 7 with russian language (so all the system text and menu items are in russian). When I open a utf8 file with russian text in it, it's displayed incorrectly in cp1251 for some reason:
:set encoding?
encoding=cp1251
manually setting :set encoding=utf8 fixes it.
Other encoding-related options have following values:
:set fileencoding?
fileencoding=
:set fileencodings?
fileencodings=ucs-bom
I find vim help confusing here, because it doesn't seem to explain how it guesses the encoding. For some reason other applications I tried (Notepad++, Sublime Text 4, even Windows Notepad) guess the file encoding correctly. As I mentioned in the beginning, I run gVim with default configuration, so there's no custom vimrc anywhere:
:echo $MYVIMRC
D:\Program Files (x86)\Vim\_vimrc
What would be the correct way to fix this problem?
Create a vimrc with set encoding=utf-8 in it. This should be the default in newer versions of Vim on Windows, as can be seen from :help 'encoding'.
'encoding' 'enc' string (default for MS-Windows: "utf-8",
otherwise: value from $LANG or "latin1")
The default value used to be latin1 on Windows but it was changed to utf-8 recently.
This should be enough to solve your issue.
Again from :help 'encoding':
Sets the character encoding used inside Vim. It applies to text in
the buffers, registers, Strings in expressions, text stored in the
viminfo file, etc. It sets the kind of characters which Vim can work
with.
Vim uses fileencodings (plural) to try and guess the encoding of your file. fileencoding (singular) is the encoding that Vim guessed (or that you've set) for your file. You probably don't need to change either of these.
I edit all kinds of files with Vim (as I'm sure most Vim users do). One bug bear I have is what Vim does when I come across a file with an odd encoding. Most editors (these days) make a good stab at detecting file encodings. However, Vim generally doesn't. And you have to type, for example:
:e ++enc=utf-16le
To re-read the file in UTF-16 (Otherwise you get a mass of # signs)
I've been searching around and have seen scripts like set_utf8.vim which can detect a specific file encoding. However, is there are more general solution? I'm a bit bored of having to manually work out what the file encoding is and consulting the help every time I open an unusual file.
Adding the encoding name to 'fileencodings' should do the trick:
:set fencs=ucs-bom,utf-16le,utf-8,default,latin1
Alternatively, there are plugins like AutoFenc and fencview.
Add this code to your .vimrc:
if has("multi_byte")
if &termencoding == ""
let &termencoding = &encoding
endif
set encoding=utf-8
setglobal fileencoding=utf-8
"setglobal bomb
set fileencodings=ucs-bom,utf-8,latin1
endif
Do you have a byte-order-mark ? Vim should detect this and work appropriately. From the doc - section 45.4:
When you start editing that 16-bit Unicode file, and it has a BOM, Vim
will detect this and convert the file to utf-8 when reading it. The
'fileencoding' option (without s at the end) is set to the detected
value. In this case it is "utf-16le". That means it's Unicode,
16-bit and little-endian. This file format is common on MS-Windows
(e.g., for registry files).
Opening a file in vim or Notepad++ allows you to see a guess as to what file encoding a file
might have.
I have a file that I recently used vim under Linux to replace the line endings with \r and then later determined that I needed to know what type of encoding it was in.
I attempted to determine the encoding of the file using VIM (for Linux) and Notepad++ (for windows).
vim determines it to be fileencoding=latin1 and Notepad++ determines it to be ANSI
Do line endings have anything to do with determining the encoding? Would the fact that I changed the line endings have an effect on the detected character encoding?
Absent a byte order mark (BOM) at the beginning of the text file, the editor has to apply some heuristics to determine the file encoding. In Vim, this is controlled by the 'fileencodings' setting.
It is unlikely that the file's line endings (CR vs. CR-LF) influence the heuristics. Rather, the encoding names are not standardized, so different editors and tools come up with their own names.
You can find out about Vim's naming scheme via :help encoding-values. Presumably, the latin1 corresponds to what is called ANSI in Notepad++, and is frequently referred to as ISO 8859-1.
I have no problem with UTF-8 chars in vim while editing, but when I do set fileencoding=utf-8 and then save and reload the file, vim displays nonascii chars incorrectly.
Why is that happening, and how to fix this?
What is the value of your 'fileencodings' setting? This setting determines what encodings are tried by Vim to interpret the contents of a buffer that is being loaded. The first encoding that does not result in a conversion error is assumed to be the current file's encoding.
If utf8 is to be tested you should put it at the beginning of that setting.
Probably you have fileencodings set to latin1 for example. This encoding is always valid (there is no single byte combination that could be rejected).
If you don't want to change fileencodings you can reload the file with: :e! ++enc=utf8
I noticed that most of the time, when using some encoding other than 'the standard english', vim doesn't recognize and does not display characters correctly.
This is most easily seen by opening some ascii graphics, or similar files off the net, which use cp437 code page.
Is there a way to make vim check for encoding when opening a file, and open it with a correct one ?
What encodings do you use, as the most "portable" ones (the ones which the largest amount of users will have least problems with) ?
Vim needs to detect the encoding, and that's going to be problematic, since files don't often explicitly state their encodings (an obvious exception are XML files with an encoding attribute in the header).
You can force Vim to reload a file with a different encoding thus:
:e ++enc=cp437
and you can set the default encoding in your .vimrc if you wish.
This page has more info and links, especially wrt. editing Unicode. UTF-8 is the most widely-used encoding, and the default you should probably go for.
You can use a vim modeline to set the file's encoding. This is simply a comment, in the first five lines of the file, that starts with vi: set fileencoding=cp437.
You could also start with 'vim:', instead of 'vi: set', but the latter makes it compatible with more editors. You definitely need the space between either of these prefixes and 'fileencoding', or whatever option you want to set. The fileencoding option should solve your problem, though.
So, in Python or an .rc file, you can put this at the top of your file:
# vi: set fileencoding=cp437
In Java, C, C++, JavaScript, etc. put this:
// vi: set fileencoding=cp437
For more information, in vim, type :help modeline.
You can set the variable 'fileencodings' in your .vimrc.
This is a list of character encodings considered when starting to edit
an existing file. When a file is read, Vim tries to use the first
mentioned character encoding. If an error is detected, the next one
in the list is tried. When an encoding is found that works,
'fileencoding' is set to it. If all fail, 'fileencoding' is set to
an empty string, which means the value of 'encoding' is used.
See :help filencodings
If you often work with e.g. cp437 or cp1252, you can add it there:
set fileencodings=ucs-bom,utf-8,cp1252,default,latin9
You can encode your files using unicode, and set a Byte Order Mark (BOM) in the file. This will make vim treat it appropriately, but some compilers and programs may have trouble with it. Even basic shell commands like cat may misbehave for some use cases.
To do it, type this in vim:
:set fileencoding=utf-8
:set bomb
:w
For more information, type:
:help bomb
Add at the end of your .vimrc: (you can replace "utf-8" by "latin1" if needed)
if len(&fenc) == 0
silent! exe "e! ++enc=utf-8"
endif