Character Encoding in LINUX for java - linux

We had to recompile an existing java code today to change the directory path hardcoded on the original code
java was recompiled with the following command line command in LINUX
javac -verbose -classpath
/data/flexapp/maxmind/api/GeoIPJava-1.2.8/source:.
CityLookupTest2.java
We saw differences in the outputs produced using the new compiled code vs the old one and the byte difference seem to be ont he double-byte or international characters. See the bolded code below
[flexapp#gaalplpclu02df pp]$ cat currtrackgeo.201301031445.run1|grep 1ZY9307E6894015571
01/03/2013 14:46:43.004,TrackHTML,1ZY9307E6894015571,,92.56.217.169,en_US,,ES,51,Vélez-málaga,null,36.7726,-4.100403,0,0
[flexapp#gaalplpclu02df pp]$ cat currtrackgeo.201301031445|grep 1ZY9307E6894015571
01/03/2013 14:46:43.004,TrackHTML,1ZY9307E6894015571,,92.56.217.169,en_US,,ES,51,V?lez-m?laga,null,36.7726,-4.100403,0,0
I tried to use the following encoding parms but it all resulted to the same output
-encoding ISO-8859-1
-encoding ISO-8859-2
-encoding UTF-8
The original programmer left already and there is no documentation on how the old code was compiled before. Do you think this is an encoding parm that was applied to replace all double-byte characters to ? from à in this example.
Hope you can assist since I do not want to confuse my user community with the new variation on the column

Related

Possible to force CMake/MSVC to use UTF-8 encoding for source files without a BOM? C4819

All our source code is valid UTF-8, however some users on Windows cannot build them because their system is configured for a different encoding.
Without adding a BOM to source files, is it possible to tell MSVC to treat all source as UTF-8, irrespective of the users system encoding?
See MSDN's link regarding this topic (requires adding BOM header).
You can try:
add_compile_options("$<$<C_COMPILER_ID:MSVC>:/utf-8>")
add_compile_options("$<$<CXX_COMPILER_ID:MSVC>:/utf-8>")
By default, Visual Studio detects a byte-order mark to determine if the source file is in an encoded Unicode format, for example, UTF-16 or UTF-8. If no byte-order mark is found, it assumes the source file is encoded using the current user code page, unless you have specified a code page by using /utf-8 or the /source-charset option.
References
Docs - Visual C++ - ‎Documentation - IDE and Tools - Building - Build Reference: /utf-8 (Set Source and Executable character sets to UTF-8)
If you happen to create cross-platform code solving the problem using a command-line switch means that
add_compile_options("$<$<C_COMPILER_ID:MSVC>:/utf-8>")
add_compile_options("$<$<CXX_COMPILER_ID:MSVC>:/utf-8>")
or adding something like /utf-8 or /source-charset to the CFLAGs might mean you'll have to do a similar thing for other platforms, as well.
If possible it therefore might be better to avoid the problem, instead of solving it, by using an \uxxxx instead of an unicode character in strings: This way the source specifies which unicode characters to use, but doesn't actually contain them.

D Language fails to display german Umlaute on Windows?

As you can see, D fails to output german Umlaute. At least on Windows. On Linux or BSD the same program outputs the string as I've saved it.
I already tried wstring or dstring, but the output is the same.
What am I doing wrong?
D will output UTF-8 regardless of the operating system. How the output will be interpreted depends on how it is displayed. In this particular case, it looks like your IDE is interpreting the output as if it was encoded in the Windows-1252 encoding.
For the standard Windows console, you could change the output encoding by calling SetConsoleOutputCP(65001), but note that this may have some undesired side effects (you should restore the codepage before your progam exits, and batch files may not run while the console output codepage is set to 65001).
CyberShadows post guided me to an acceptable answer. :-)
In Eclipse it is possible to change the output-encoding without changing global settings of the OS.
Go to Run --> Run-Configurations...
There select the Common-Tab and change the encoding to UTF-8. Now german Umlaute are displayed correctly. At least in Eclipse. :-)
Another possibility is to use https://babun.github.io/ . It is a Cygwin-based Shell that ouputs UTF-8:

Windows to UTF-8 Character Encoding Behaviour Query

A simple query about expected behaviour when compiling Windows-1252 characters under UTF-8. When building using an ant task on java source code it seems that some weird character encoding occurs.
For certain fields characters that are normally encoded as \u2013 on the windows machine for example, turn into \226 on Linux. What is the explanation for the \226? Will it still be rendered correctly on a browser, for example?

wxformbuilder and unicode labels

Is there a way to get Unicode characters into label code generated by wxFormBuilder?
For example, to get an Angstrom character the generated string should read u"\u212b".
I tried entering \u212b in the label property field but the resulting string reads u"u212b". So I tried escaping the backslash as \\u212b but that gave me u"\\u212b".
I'm using wxFormBuilder v3.5 - beta. Generating Python code, although the C++ code shows the same behaviour.
By default, wxFormBuilder includes this command (# -- coding: utf-8 --
) on the first line at least for the python code generated.
So I went into MS word and inserted the Angstrom character Å, I then copied it into wxFormBuilder (Version 3.5 - RC1) statictext control and it worked on running the code.
Try my approach above instead of typing "u212b". Or type directly in your code like so: u"Hello... Å"

nonascii string literal is escaped after build

Please help me to solve a problem that appeared recently.
When release project is build on built machine (using msbuild) all string literals in code are escaped with \x00NN where nn are two digitals. The problem is that if such values are displayed in form (winforms) they appear as broken encoded (like broken codepage in www)
in source code it looks like
str = " Без ПДВ"
but reflector shows
str = " \x00c1\x00e5\x00e7 \x00cf\x00c4\x00c2";
And this appears as string with broken encoding in the form, like
â ò.÷. ÏÄÂ
.
What causes msbuild to convert non-ascii string literals to escaped symbols? There is no such problem for dev builds on developers machines.
Regional settings were checked for user that runs ms-build and were changed from German to Ukrainian, the same was done for non-unicode programs language. It does not help after reboot.
MsBuild has worked without such problem on the same machine for one year but latest build beraks string literals in code
command-line looks like
MSBuild {LocalPath}{Solution} /property:DefineConstants="{Defines}{DefinesExtra}" /t:{Target} /property:Configuration={Configuration} {Platform} /clp:NoItemAndPropertyList
Target is Build (or Rebuild it does not matter) configuration is release, platform is x86
PS I know that this is bad to store localized strings in code (but shit happens).

Resources