Is it possible to view UTF-8 Characters in DOS programs

Is it possible to view UTF-8 Characters in DOS programs - text

i am using Hiew32 to find some Arabic sentences inside an EXE file
Before launching Hiew32.exe i tried with Lucida Console font and :
chcp 65001
But the result is the same; is there a method to view UTF-8 chars in DOS mode ?

Consider using Hiew32 > file.txt, and open the file in notepad.
Or, consider using powershell.exe and start the command from there.
Powershell_ise.exe will have even better Arabic support

Related

How can I view EBCDIC formatted files in VIM?

What is the best way to view an EBCDIC formatted file in VIM?

On a mainframe
First of all, if Vim was compiled and is running on a system which default encoding is EDCDIC (e.g. an IBM manframe with an ANSI C compiler) then Vim will open EBCDIC files in the system's code page by default. Such an instance of Vim will have:
has("ebcdic")
evaluating to 1. And when invoked with --version Vim will print:
+ebcdic
Instances of Vim not compiled in an EBCDIC environment will never evaluate has("ebcdic") to true. This feature of Vim is needed because other features behave differently in a purely EBCDIC environment.
Not on a mainframe
Yet, most systems today do not use EBCDIC code pages. For the situation where a file encoded in an EBCDIC code page needs to be edited in Vim on a more popular system, Vim uses the iconv library. In essence, to be able to open a file encoded in an EBCDIC code page Vim needs to be compiled with iconv support. iconv support can be tested by evaluating:
has("iconv")
Or searching for the
+iconv
string in the output of vim --version.
EBCDIC has several code pages, and Vim will only be capable of using the code pages supported by the iconv library it was compiled against. To check which code pages are available you can use the iconv utility that comes together with the iconv library:
iconv --list
Now, let's assume that we have a file called myfile encoded with the EBCDIC-US code page (also called the EBCDIC-037 code page) and that the iconv installed on the system supports this code page.
Before opening the file in Vim we need to set Vim's encoding to utf-8, in ~/.vimrc we need:
set enc=utf-8
:h fenc advises that the encoding must be set to utf-8 if any file conversion (through iconv) is performed. Otherwise data loss is possible when writing the file back.
Now we open the file in Vim with vim myfile and see mangled characters. That is fine, we now need to perform the conversion using iconv with:
:e ++enc=EBCDIC-US
Vim will now display the file in utf-8 but will save the file in EBCDIC-US, both accomplished using iconv conversions on the fly.
Closing notes
The mapping between IBM's naming of code pages:
EBCDIC-037
EBCDIC-273
EBCDIC-500
...
and iconv's names
EBCDIC-US
EBCDIC-AT-DE-A
EBCDIC-BE
...
Is often non-trivial. Yet, if the display encoding (enc) is set to utf-8 there should be no issue in trying different code pages with
:e ++enc=EBCDIC-US
:e ++enc=EBCDIC-AT-DE-A
until the correct conversion is found.
Extra note: consider using vi.SE if you have more questions related to Vim.

(VIM) Is vimgrep capable of searching unicode string

Is vimgrep capable of searching unicode strings?
For example:
a.txt contains wide string "hello", vimgrep hello *.txt found nothing, and of course it's in the right path.

"Unicode" is a bit misleading in this case. What you have is not at all typical of text "encoded in accordance with any of the method provided by the Unicode standard". It's a bunch of normal characters with normal code points separated with NULL characters with code point 0000 or 00. Some Java programs do output that kind of garbage.
So, if your search pattern is hello, Vim and :vim are perfectly capable of searching for and finding hello (without NULLs) but they won't ever find hello (with NULLs).
Searching for h^#e^#l^#l^#o (^# is <C-v><C-#>), on the other hand, will find hello (with NULLs) but not hello (without NULLs).
Anyway, converting that file/buffer or making sure you don't end up with such a garbage are much better long-term solutions.

If Vim can detect the encoding of the file, then yes, Vim can grep the file. :vimgrep works by first reading in the file as normal (even including autocmds) into a hidden buffer, and then searching the buffer.
It looks like your file is little-endian UTF-16, without a byte-order mark (BOM). Vim can detect this, but won't by default.
First, make sure your Vim is running with internal support for unicode. To do that, :set encoding=utf-8 at the top of your .vimrc. Next, Vim needs to be able to detect this file's encoding. The 'fileencodings' option controls this.
By default, when you set 'encoding' to utf-8, Vim's 'fileencodings' option contains "ucs-bom" which will detect UTF-16, but ONLY if a BOM is present. To also detect it when no BOM is present, you need to add your desired encoding to 'fileencodings'. It needs to come before any of the 8-bit encodings but after ucs-bom. Try doing this at the top of your .vimrc and restart Vim to use:
set encoding=utf-8
set fileencodings=ucs-bom,utf-16le,utf-8,default,latin1
Now loading files with the desired encoding should work just fine for editing, and therefore also for vimgrep.

Unicode printing in vim

I am working with text files that contain a lot of unicode characters (≼, ⊓, ⊔, ...). Vim displays them fine, but when I print they are replaced by a generic character. Gedit prints them without problem, but it's a bit of a pain to launch another editor just to print.
Is there a way to get vim (on Linux/Gnome) to print properly? I tried using vim-gnome, in hope that it would use the same infrastructure as gedit, but it does not.

Vim is only able to use 8-bit encoding for printing. If there is encoding that includes all those characters all you need is to use
set printencoding={encoding}
If there is not then you can’t print it from vim directly. You can use :TOhtml command suggested by #DaoWen, do
:TOhtml
:w /tmp/print.html
:!command-that-makes-browser-print-a-file(I-do-not-know-one) /tmp/print.html
:!rm /tmp/print.html
. You can also use my formatvim plugin to print this to pdf through latex (don’t forget to file bug reports: latex-xcolor output is untested):
:Format format latex-xcolor to /tmp/print.tex
:!pdflatex /tmp/print.tex && lp /tmp/print.pdf && rm /tmp/print.*
(you can use html output as well, but that will not make me know a command to print it). Of course, you can map these to a single key.

Try using the :TOhtml command to convert your document to output your buffer in HTML format. You should be able to print the resulting file from your browser.

Komodo IDE FTP (ASCII, binary) end-of-line characters

I've some problem when working on remote files (perl scripts) with Komodo IDE. There is (as far as I know) no way to change ftp transfer mode from binary to ASCII, which result in "^M" character at the end of every line. My setup is Linux server, and Windows client. Is there any way to solve this issue without nessecity of correcting saved file on Linux every time. This behaviour disqualify Komodo IDE, which was my favourite IDE until now.

The "^M" you observe has nothing to do with your file being ASCII, but line ending format (carriage return and line feed characters.)
I have not verified this, but here's a link showing how to save files in Komodo using a different line ending method. Saving files in DOS mode is not needed anymore, since most editors recognize UNIX file format nowadays.

Add switch -w to your Perl shebang.

Using vim+LaTeX with Scandinavian characters

I want to create a lab write-up with LaTeX in Ubuntu, however my text includes Scandinavian characters and at present I have to type them in using /"a and "/o etc. Is it possible to get the latex-compiler to read these special characters when they are typed in as is? Additionally, I would like vim to "read" Finnish: Now when I open a .tex-document containing Scandinavian characters, they are not displayed at all in vim. How can I correct this?

For latex, use the inputenc option:
\usepackage[utf8]{inputenc}
Instead of utf8, you may use whatever else fits you, like latin1, as well.
Now the trick is to make your terminal run the same character encoding. It seems that it runs a character/input encoding that doesn't fit your input right now.
For this, refer to the "Locale" settings of your distribution. You can always check the locale settings in the terminal by issueing locale. These days, UTF8 locales are preferred as they work with every character imaginable. If your terminal's environment is set up correctly, vim should happily work with all your special characters without mourning.

To find out in which encoding Vim thinks the document is, try:
:set enc
To set the encoding to UTF-8, try:
:set enc=utf8

I can't help with vim, but for LaTeX I recommend you check out XeTeX, which is an extension of TeX that is designed to support Unicode input. XeTeX is now part of Texlive, so if you have TeX installed chances are you already have it.

I use the UCS unicode support: http://iamleeg.blogspot.com/2007/10/nice-looking-latex-unicode.html

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string