I'm using Vim 7.3 on WinXP. I use XML files that are generated by an application at my work which writes them with UCS-2le encoding. After reading several articles on encoding at the vim wiki I found the following advice given, namely to set my file encoding in vimrc:
set fileencodings=ucs-bom,utf-8
The file in question has FF EE as the first characters (confirmed viewing with HxD), but Vim doesn't open it properly. I can open my UCS-2le files properly with this in my vimrc:
set fileencodings=ucs-2le, utf-8
But now my UTF-8 files are a mess!
Any advice how to proceed? I typically run Gvim without behave MSwin (if that matters). I use very few plugins. My actual vimrc setting regarding file encodings are:
set encoding=utf-8
set fileencodings=ucs-bom,utf-8,ucs-2le,latin1
The entry for ucs-2le in the third spot seems to make no difference. As I understand it, the first entry (set encoding) is the encoding Vim uses internally in its buffer, while the second (set fileencodings) deals with the encoding of the file when vim reads and writes it. So, it seems to me that since the file has a byte order mark, ucs-bom as the first entry in setfileencodings should catch it. As far I can tell, it seems that vim doesn't recognize that this file is 16 bytes per character.
Note: I can/do solve the problem in the meantime by manually setting the file encoding when I open my ucs-2le files:
edit ++enc=ucs2-le
Cheers.
Solved it. I am not sure what I did but the fixes noted are working perfectly now to read and write my UCS-2 files - though for unknown reason not immediately (did I just restart Vim?). I could try to reverse the fixes to see which one was the critical change but here's what I've done (see also my comments on Jul 27 above):
Put AutoFenc.vim plugin in my plugins folder (automatically detects file encoding (AutoFenc.vim).
Added iconv.dll and new version of libintl.dll to my vim73 folder (Vim.org)
Edited vimrc as below
vimrc now contains (the last bits just make it easier to see what's happening with file encodings by showing the file encoding in the status line):
"use utf-8 by default
set encoding=utf-8
set fileencodings=ucs-bom,utf-8,ucs-2le,latin1
"always show status line
set laststatus=2
"show encoding in status line http://vim.wikia.com/wiki/Show_fileencoding_and_bomb_in_the_status_line
if has("statusline")
set statusline=%<%f\ %h%m%r%=%{\"[\".(&fenc==\"\"?&enc:&fenc).((exists(\"+bomb\")\ &&\ &bomb)?\",B\":\"\").\"]\ \"}%k\ %-14.(%l,%c%V%)\ %P
endif
And all is well.
Related
echo "UTF-16le text"|vim -
:set encoding=utf-16le
:set fileencoding=utf-16le
:e! ++enc=utf-16le
has absolutely no effect on the mojibake that is displayed on the screen. Though the last one (:e! ++enc=utf-16le) results in an error E32: No file name.
If I edit ~/.vimrc to set fileencodings=utf-16le,[...] then it works, but I shouldn't have to edit my configuration file every time I use vim, is there a better way? Preferably a way in which a key code will just cycle between my :set fileencodings, that way I can choose quickly if needed.
The command-line equivalent to ~/.vimrc is passing commands via --cmd. You can also employ the :help :set^= command to prepend a value to an option:
echo "UTF-16le text"|vim --cmd 'set fencs^=utf-16le' -
I shouldn't have to edit my configuration file every time I use vim
First, I would test whether permanently keeping utf-16le in 'fileencodings' has any negative consequences for any files you regularly edit; maybe you can safely keep it in by default.
Second, there are plugins like AutoFenc, which extends the built-in detection, and fencview, which let's you choose the encoding from a menu.
Alternative
The problem with UTF-16 encodings is well known, and the byte order mark is one solution to make it easy to detect those. With such a BOM, Vim will correctly detect the encoding out-of-the-box. If your input is missing the BOM, you can manually prepend it:
{ printf '\xFF\xFE'; echo "UTF-16le text"; } | vim -
Is vimgrep capable of searching unicode strings?
For example:
a.txt contains wide string "hello", vimgrep hello *.txt found nothing, and of course it's in the right path.
"Unicode" is a bit misleading in this case. What you have is not at all typical of text "encoded in accordance with any of the method provided by the Unicode standard". It's a bunch of normal characters with normal code points separated with NULL characters with code point 0000 or 00. Some Java programs do output that kind of garbage.
So, if your search pattern is hello, Vim and :vim are perfectly capable of searching for and finding hello (without NULLs) but they won't ever find hello (with NULLs).
Searching for h^#e^#l^#l^#o (^# is <C-v><C-#>), on the other hand, will find hello (with NULLs) but not hello (without NULLs).
Anyway, converting that file/buffer or making sure you don't end up with such a garbage are much better long-term solutions.
If Vim can detect the encoding of the file, then yes, Vim can grep the file. :vimgrep works by first reading in the file as normal (even including autocmds) into a hidden buffer, and then searching the buffer.
It looks like your file is little-endian UTF-16, without a byte-order mark (BOM). Vim can detect this, but won't by default.
First, make sure your Vim is running with internal support for unicode. To do that, :set encoding=utf-8 at the top of your .vimrc. Next, Vim needs to be able to detect this file's encoding. The 'fileencodings' option controls this.
By default, when you set 'encoding' to utf-8, Vim's 'fileencodings' option contains "ucs-bom" which will detect UTF-16, but ONLY if a BOM is present. To also detect it when no BOM is present, you need to add your desired encoding to 'fileencodings'. It needs to come before any of the 8-bit encodings but after ucs-bom. Try doing this at the top of your .vimrc and restart Vim to use:
set encoding=utf-8
set fileencodings=ucs-bom,utf-16le,utf-8,default,latin1
Now loading files with the desired encoding should work just fine for editing, and therefore also for vimgrep.
The CSS3PIE .htc filetype in Perforce seems to cause issues when requesting the latest version. Changing the filetype from Text to Binary directly in Perforce fixed the problem.
Out of curiosity, does anyone have an idea why the PIE.htc format causes these problems? Could it be the encoding, the filetype, or attributes? If PIE.htc file type is set (as happens by default) to text in Perforce, it won't work when you try to submit it.
I don't know what a '.htc' file is, but if you want all your '.htc' files to be treated as binary files by Perforce, rather than as text files, you can use the Perforce 'typemap' feature to do this: use 'p4 typemap'.
This blog: http://blogs.encodo.ch/news/view_article.php?id=79 has a nice writeup.
Generally, the reason you do this is that Perforce tries to normalize the line ending characters for text files, so that the files are a single line-end on Unix computers, but are carriage-return/line-end pairs on Windows computers.
And if your filetype contains 0x0a characters, but it is NOT a textual data type and these characters do NOT indicate line ends, then turning them into CRLF pairs will mess you up.
So then binary is the better filetype for such files.
It could be a windows saved file opened in unix/linux problem and I am not quite sure how to solve it.
When I open a file which was previously saved by another developer using windows, my vim buffer some times shows
Trying char-by-char conversion...
In the middle of my file and I am unable to edit the code/text/characters right below this message in my buffer.
Why does it do that and how do I prevent this from happening?
This message comes from the Vim function mac_string_convert() in src/os_mac_conv.c. It is accompanied by the following comment:
conversion failed for the whole string, but maybe it will work for each character
Seems like the file you're editing contains a byte sequence that cannot be converted to Vim's internal encoding. It's hard to offer help without more details, but often, these help:
Ensure that you have :set encoding=utf-8
Check :set filencodings? and ensure that the file you're trying to open is covered, or explicitly specify an encoding with :edit ++enc=... file
The 8g8 command can find an illegal UTF-8 sequence, so that you can remove it, in case the file is corrupted. Binary mode :set binary / :edit ++bin may also help.
I've run into problems a few times because vim's encoding was set to latin1 by default and I didn't notice and assumed it was using utf-8. Now that I have, I'd like to set up vim so that it will do the right thing in all obvious cases, and use utf-8 by default.
What I'd like to avoid:
Forcing a file saved in some other encoding that would have worked before my changes to open as utf-8, resulting in gibberish.
Forcing a terminal that doesn't support multibyte characters (like the Windows XP one) to try to display them anyway, resulting in gibberish.
Interfering with other programs' ability to read or edit the files (I have a (perhaps unjustified) aversion to using a BOM by default because I am unclear on how likely it is to mess other programs up.)
Other issues that I don't know enough about to guess at (but hopefully you do!)
What I've got so far:
if has("multi_byte")
if &termencoding == ""
let &termencoding = &encoding
endif
set encoding=utf-8 " better default than latin1
setglobal fileencoding=utf-8 " change default file encoding when writing new files
"setglobal bomb " use a BOM when writing new files
set fileencodings=ucs-bom,utf-8,latin1 " order to check for encodings when reading files
endif
This is taken and slightly modified from the vim wiki. I moved the bomb from setglobal fileencoding to its own statement because otherwise it doesn't actually work. I also commented out that line because of my uncertainty towards BOMs.
What I'm looking for:
Possible pitfalls to avoid that I missed
Problems with the existing code
Links to anywhere this has been discussed / set out already
Ultimately, I'd like this to result in a no-thought-required copy/paste snippet that will set up vim for utf-8-by-default that will work across platforms.
EDIT: I've marked my own answer as accepted for now, as far as I can tell it works okay and accounts for all things it can reasonably account for. But it's not set in stone; if you have any new information please feel free to answer!
In response to sehe, I'll give a go at answering my own question! I removed the updates I made to the original question and have moved them to this answer. This is probably the better way to do it.
The answer:
if has("multi_byte")
if &termencoding == ""
let &termencoding = &encoding
endif
set encoding=utf-8 " better default than latin1
setglobal fileencoding=utf-8 " change default file encoding when writing new files
endif
I removed the bomb line because according to the BOM Wikipedia page it is not needed when using utf-8 and in fact defeats ASCII backwards compatibility. As long as ucs-bom is first in fileencodings, vim will be able to detect and handle existing files with BOMs, so it is not needed for that either.
I removed the fileencodings line because it is not needed in this case. From the Vim docs: When 'encoding' is set to a Unicode encoding, and 'fileencodings' was not set yet, the default for 'fileencodings' is changed.
I am using setglobal filencoding (as opposed to set fileencoding) because:
When reading a file, fileencoding will be automatically set based on fileencodings. So it only matters for new files then. And according to the docs again:
For a new file the global value of
'fileencoding' is used.
I think it would suffice to have a vanilla vimrc + fenc=utf-8
The rest should be pretty decent out-of-the-box
I'd use the BOM only on Windows platforms with Microsoft tooling (although even some of these fail to always write a BOM; however it is the default for Notepad Unicode saving, .NET XmlWriter and other central points of the MS platform tools)