Problem printing central european characters in Vim

Problem printing central european characters in Vim - vim

Here's the problem in a nutshell.
I wrote a text file which I need to print out (in a hurry) that contains central european characters (šđčćž/ŠĐČĆŽ).
Vim's encoding settings are as folows;
set encoding=cp1250
set fileencoding=
Upon printing out comes garbage. What should be changed to fix that?
Really hate Vim's frekin' 1001 options in a time like this. Can't it do a simple thing and just print what's on screen?!

Check the option printencoding.
The help says it's empty by default, and when the encoding is multi-byte Vim tries to convert them to the printencoding. Plus, if it's empty "the conversion will be to latin1". This is what may be causing the trouble.
I'd like to ask: why not to use UTF-8?

Related

(VIM) Is vimgrep capable of searching unicode string

Is vimgrep capable of searching unicode strings?
For example:
a.txt contains wide string "hello", vimgrep hello *.txt found nothing, and of course it's in the right path.

"Unicode" is a bit misleading in this case. What you have is not at all typical of text "encoded in accordance with any of the method provided by the Unicode standard". It's a bunch of normal characters with normal code points separated with NULL characters with code point 0000 or 00. Some Java programs do output that kind of garbage.
So, if your search pattern is hello, Vim and :vim are perfectly capable of searching for and finding hello (without NULLs) but they won't ever find hello (with NULLs).
Searching for h^#e^#l^#l^#o (^# is <C-v><C-#>), on the other hand, will find hello (with NULLs) but not hello (without NULLs).
Anyway, converting that file/buffer or making sure you don't end up with such a garbage are much better long-term solutions.

If Vim can detect the encoding of the file, then yes, Vim can grep the file. :vimgrep works by first reading in the file as normal (even including autocmds) into a hidden buffer, and then searching the buffer.
It looks like your file is little-endian UTF-16, without a byte-order mark (BOM). Vim can detect this, but won't by default.
First, make sure your Vim is running with internal support for unicode. To do that, :set encoding=utf-8 at the top of your .vimrc. Next, Vim needs to be able to detect this file's encoding. The 'fileencodings' option controls this.
By default, when you set 'encoding' to utf-8, Vim's 'fileencodings' option contains "ucs-bom" which will detect UTF-16, but ONLY if a BOM is present. To also detect it when no BOM is present, you need to add your desired encoding to 'fileencodings'. It needs to come before any of the 8-bit encodings but after ucs-bom. Try doing this at the top of your .vimrc and restart Vim to use:
set encoding=utf-8
set fileencodings=ucs-bom,utf-16le,utf-8,default,latin1
Now loading files with the desired encoding should work just fine for editing, and therefore also for vimgrep.

How does vim know encoding of my files? When even I don't

I have been in charset-hell for days and vim somehow always shows the right charset for my file when even I'm not sure what they are (I'm dealing with files with identical content encoded in both charsets, mixed together)
I can see from inspecting the ü (u-umlaut) character in UTF-8 vs ISO-8859-1 which encoding I'm in, but I don't understand how vim figured it out - in those character-sets only the 'special characters' really look any different
If there is some other recording of the encoding/charset information I would love to know it

The explanation can be found under :help 'fileencodings':
This is a list of character encodings considered when starting to edit
an existing file. When a file is read, Vim tries to use the first
mentioned character encoding. If an error is detected, the next one
in the list is tried. When an encoding is found that works,
'fileencoding' is set to it. If all fail, 'fileencoding' is set to
an empty string, which means the value of 'encoding' is used.
So, there's no magic involved. When there's a Byte Order Mark in the file, that's easy. Else, Vim tries some other common encodings (which you can influence with that option; e.g. Japanese people will probably include something like sjis if they frequently edit such encoded files).
If you want a more intelligent detection, there are plugins for that, e.g. AutoFenc - Tries to automatically detect and set file encoding.

Vim UTF-8 encoding error on Windows

I have a text file with Polish characters. As long as I do not set :set encoding=utf-8 the characters are not displayed correctly. As soon as I set it to Unicode the characters are displayed but umlauts in error messages in Vim on the other hand are not displayed anymore.
Example:
E37: Kein Schreibvorgang seit der letzten <c4>nderung (erzwinge mit !)
Instead of the <c4> there should be the character Ä displayed. Can anybody explain me why this happens?

I'm experiencing similar issues (you can view some of the questions in my account info, or search for "central european characters" or "croatian characters").
Changing the encoding value changes the way Vim displays the characters - so, the way some of the characters are displayed is changed - that's why you're getting characters. You could probably solve your problem of Polish characters by choosing some other encoding value (one of the cpXXXX for example instead of utf8), but then you would lose the ability to display utf8 characters which can make Vim rather pretty. At least this works for my case (Croatian).
So, either use while writing polish texts one of the cpXXXX encoding values, or stick to utf8 completely. I recommend the first one. But do not change them.
Still working on that here.

Changing text appearence in vim

Suppose I have a file, whose entire contents is:
\u1234
and suppose 1234 is the code for \alpha
is there a way to, in vim, have the "\1234" show up as a single \alpha symbol (and be treated as an \alpha symbol) ?
Thanks!
[This problem arises since I want to to use unicode names in g++]

I really don't think this is possible, since vim is designed to present and edit the actual contents of a file (it is not a WYSIWIG editor). I wouldn't recommend it for the exact same reasons, even if you find a way to do it - it will lead to confusion in the future (once you forget the feature, or if it is triggered in a document you weren't expecting, or the script contains a bug, etc.)

There is a plugin for Vim to display certain characters in Haskell as the Unicode symbol: http://www.vim.org/scripts/script.php?script_id=2603.
It even decodes the symbols to their text-representation before writing (and vice versa after reading a Haskell file).
I guess you can do the same, just inspect the plugin's source.

One thing you could do is define custom highlighting so that you know that that character maps to something else. May not be exactly what you want but as soulmerge points out anything else may be dangerous/unclear.

Using vim+LaTeX with Scandinavian characters

I want to create a lab write-up with LaTeX in Ubuntu, however my text includes Scandinavian characters and at present I have to type them in using /"a and "/o etc. Is it possible to get the latex-compiler to read these special characters when they are typed in as is? Additionally, I would like vim to "read" Finnish: Now when I open a .tex-document containing Scandinavian characters, they are not displayed at all in vim. How can I correct this?

For latex, use the inputenc option:
\usepackage[utf8]{inputenc}
Instead of utf8, you may use whatever else fits you, like latin1, as well.
Now the trick is to make your terminal run the same character encoding. It seems that it runs a character/input encoding that doesn't fit your input right now.
For this, refer to the "Locale" settings of your distribution. You can always check the locale settings in the terminal by issueing locale. These days, UTF8 locales are preferred as they work with every character imaginable. If your terminal's environment is set up correctly, vim should happily work with all your special characters without mourning.

To find out in which encoding Vim thinks the document is, try:
:set enc
To set the encoding to UTF-8, try:
:set enc=utf8

I can't help with vim, but for LaTeX I recommend you check out XeTeX, which is an extension of TeX that is designed to support Unicode input. XeTeX is now part of Texlive, so if you have TeX installed chances are you already have it.

I use the UCS unicode support: http://iamleeg.blogspot.com/2007/10/nice-looking-latex-unicode.html

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Problem printing central european characters in Vim - vim

Check the option printencoding. The help says it's empty by default, and when the encoding is multi-byte Vim tries to convert them to the printencoding. Plus, if it's empty "the conversion will be to latin1". This is what may be causing the trouble. I'd like to ask: why not to use UTF-8?

Related

(VIM) Is vimgrep capable of searching unicode string

How does vim know encoding of my files? When even I don't

Vim UTF-8 encoding error on Windows

Changing text appearence in vim

Using vim+LaTeX with Scandinavian characters

Categories

Resources