As you can see, D fails to output german Umlaute. At least on Windows. On Linux or BSD the same program outputs the string as I've saved it.
I already tried wstring or dstring, but the output is the same.
What am I doing wrong?
D will output UTF-8 regardless of the operating system. How the output will be interpreted depends on how it is displayed. In this particular case, it looks like your IDE is interpreting the output as if it was encoded in the Windows-1252 encoding.
For the standard Windows console, you could change the output encoding by calling SetConsoleOutputCP(65001), but note that this may have some undesired side effects (you should restore the codepage before your progam exits, and batch files may not run while the console output codepage is set to 65001).
CyberShadows post guided me to an acceptable answer. :-)
In Eclipse it is possible to change the output-encoding without changing global settings of the OS.
Go to Run --> Run-Configurations...
There select the Common-Tab and change the encoding to UTF-8. Now german Umlaute are displayed correctly. At least in Eclipse. :-)
Another possibility is to use https://babun.github.io/ . It is a Cygwin-based Shell that ouputs UTF-8:
Related
This is such a basic question I am surprised I could not easily find an answer to it:
I use Notepad++ to write my scripts in. Someone sent me some code for a shell script (.sh) that I could modify to suit my needs. I simply changed a small bit of text using Notepad++ (on Windows) and used FileZilla (SFTP) to upload it to my server (Debian Linux).
There were a few problems with this that it took my server admin an hour to find, namely:
FileZilla, for whatever reason, defaults to ASCII rather than binary! (changed it to binary and removed the .sh association with ASCII)
The permissions were wrong, chmod took care of this
Problem is it STILL did not work. To fix it my server admin simply copied the text right on the server (using vim or nano) into a new shell script file and saved that. Before he kept saying the problem was Windows (which he loves to hate on) but it seems it is the encoding that text-editors are using that is corrupting the files.
He said my text-editor encoding needs to be said to "None". However, that is not an option - only ANSI, UTF and UTS variants are options!
How can I create a shell script on Windows with no encoding whatsoever so that it doesn't get corrupted?
I need to be able to simply transfer the file to the server, I can't mess around with modifying it once on the server which is wholly impractical.
To fix it EndOfLine and encoding on Notepad++ :
On the bottom right of Notepad++ you can right click on the left of the encoding "UTF-8" and click on Convert UNIX(LF) format. Be sure to change encoding to UTF-8 if it is not the case.
In Filezilla :
Transfert mode : auto
I want to view some ansi-art on the linux local-console. (my setup:raspberry pi3 / newest raspbian - no x11)
i've tried many different settings in raspi-config, dpkg-reconfigure console-setup, /etc files, environment vars but i had no luck yet. do i need a special pcf font to get it working?
a reliable way to enable it for remote terminals would also be great.
thanks in advance
It depends on what your data uses (see chart). Codes 0..31 are a problem unless you have a program that can map those codes to a printable value (as noted in Why does showconsolefont have different output in tmux?, the showconsolefont program does this mapping of 0..31).
Most of the usable fonts for the Linux console are "psf" fonts: having a header which tells which Unicode values each glyph corresponds to. Using that, along with a known character set (cp437), you could convert the data or "play" it using an application which knows how to do this:
You could convert it using iconv or recode, or
The line-drawing (128..255) could be done using luit in a UTF-8 console.
A simple query about expected behaviour when compiling Windows-1252 characters under UTF-8. When building using an ant task on java source code it seems that some weird character encoding occurs.
For certain fields characters that are normally encoded as \u2013 on the windows machine for example, turn into \226 on Linux. What is the explanation for the \226? Will it still be rendered correctly on a browser, for example?
Given a text file in ubuntu (or debian unix in general), how do I find out the file encoding of the file ? Can I run od or hexdump on it to fingerprint its encoding ? What should I be looking out for ?
There are many tools to do this. Try a web search for "detect encoding". Here are some of the tools I found:
The Internationalizations Classes for Unicode (ICU) are a great place to start. See especially their page on Character Set Detection.
Chardet is a Python module to guess the encoding
of a file. See chardet.feedparser.org
The *nix command-line tool file detects file types, but might also detect encodings if mentioned in the file (e.g. if there's a mime-type notation in
the file). See man file
Perl modules Encode::Detect and Encode::Guess .
Someone asked a similar question in StackOverflow. Search for the question, PHP: Detect encoding and make everything UTF-8. That's in the context of fetching files from the net and using PHP, but you could write a command-line PHP script.
Note well what the ICU page says about character set detection: "Character set detection is ..., at best, an imprecise operation using statistics and heuristics...." In my experience the problem domain makes a big difference in how easy or difficult the job is. Don't forget that it's possible that the octets in a file can be of ambiguous encoding, i.e. sensibly interpreted using multiple different encodings. They can also be of mixed encoding, i.e. different subsets of the octets make sense interpreted in different encodings. This is why there's not a single command-line tool I can recommend which always does the job.
If you have a single file and you just want to get it into a known encoding, my trick is to open the file with a text editor which can import using a bunch of different encodings, such as TextWrangler or OpenOffice.org. First, open the file and let the editor guess the encoding. Take a look at the result. If you aren't satisfied with it, guess an encoding, open the file with the editor specifying that encoding, and take a look at the result. Then save as a known encoding, e.g. UTF-16.
You can use enca. Enca is a small command line tool for encoding detection and convertion.
You can install it at debian / ubuntu by:
apt-get install enca
In order to use it, just call
enca FILENAME
Also see the manpage for more information.
If I enter
print("®".length)
in smjs, it prints 2. If I enter javascript:alert("®".length) in my firefox as well as opera, it prints 1. Rhino prints 1 too.
Is it possible to tell smjs that I want to treat such characters as single character? Os: linux(Ubuntu 9.04), locale: UTF-8.
The JS shell is (maybe unfortunately :-) not locale-aware. It looks like this problem has been fixed by Bug 648102 -- all input lines in the shell are decoded as UTF-8. Yay progress!
$ ./js
js> print("®".length)
1
And I get the same result in Firefox's fancy new JS scratchpad, which confirms your original javascript:-based result!