ISO-8859-15 encoding with nodejs - node.js

We are experiencing a very anoying encoding problem which started with loopback but seems to be nodejs related.
Basically, we just finished developping an API with Loopback based upon an existing SQL_ASCII encoded postgresql database. Since the API has to be in UTF-8, we try to convert the data sent through our API routes to ISO-8859-15 in order to insert them correctly in our base.
No matter what iconv, utf8, iso-8859 etc. modules we tried, we couldn't get to pass ISO-8859-15 converted strings, we ended up with very strange stuff. For example :
var Iconv = require('iconv').Iconv;
var iconv = new Iconv('UTF-8','ISO-8859-1');
var label = iconv.convert("bébé").toString();
If we insert then the "label" into our database, we end up with someting like that = "b�b�" !
So we just tried to look directly in the Terminal how it behaved with a basic nodejs application (without loopback or any other framework) but it didn't turn out to be better.
Once the Terminal encoding set up to "ISO Latin 1", the following code :
console.log('bébé');
Was displayed this way in the Terminal :
bébé
As if nodejs was completely unable to handle ISO-8859 strings.
Are we missing something there ?
Are we doomed to use UTF-8 string in order to make this work ?

Related

SQL Error (1366): Incorrect string value: '\xE3\x82\xA8\xE3\x83\xBC...'

Hi I am trying to upload data to the Heidi SQL table, but it returned "SQL Error (1366): Incorrect string value: '\xE3\x82\xA8\xE3\x83\xBC...'".
This issue is prompted by this string - "エーペックスレジェンズ" , and the source data file has a number of special characters. Want to know if there's a way to override this, so that all forms of character could be uploaded?
My default setting is utf8 and I have also tried utf8mb4, but neither of them would work.
That happens when you select the wrong file encoding in HeidiSQL's open-file dialog:
Never select "Auto-detect" - I wrote that auto-detection, and I can tell you it often detects the wrong encoding. Use the right encoding instead, which is mostly utf-8 nowadays.

What base64 format to use for images in opendocument/odt <office:binary-data>, possibly base64ing with nodejs?

I'm attempting to save an image base64'ed with nodejs in a raw format to an ODT file for display in OpenOffice Writer.
The spec was not very clear, but I found an example. However, when I post the following base64'd image (that looks fine in html), I get a "Read-error" in OpenOffice and the image does not display.
The spec says that it uses rfc2045, but that spec is not very specific (unless I'm missing something).
Here's what I have:
<text:p text:style-name="qr-wrapper">
<draw:frame draw:name="img1" svg:width="150.0pt" svg:height="150.0pt">
<draw:image xlink:href="Pictures/0.png" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad">
<office:binary-data>
iVBORw0KGgoAAAANSUhEUgAAAJYAAACWCAIAAACzY+a1AAAABGdBTUEAA1teXP8meAAABjpJREFUeAHtm92OHTcMg7tF3/+VtwlaBB+8Q0Eee46HAHOlkSlaJmMLm5+v7+/vv/LLWYG/nZtP778ViIX2vw9iYSy0V8D+ALmFsdBeAfsD5BbGQnsF7A+QWxgL7RWwP0BuYSy0V8D+ALmFsdBeAfsD5BbGQnsF7A/wz40TfH193aiqSzp/87xrX+5FTubrbv9bZW0H38HM9vCLMw9pR9hXY2Lhq+3pNBcLOyq9GnNnFvJAN97uP+W7Zgl7UJwK08n/abgIyFPALpdUz5fgn8ncwp+amGVioZlhP9uNhT81McuszkIet/Omr8wMVct9Oxj2rOIOZ6dWYVSfCl/kcwsLcTyWYqGHT0WXsbAQx2Np5yx84sScSeTnLGFMPPOsZUw8851a4g/GuYUHxd+zdSzco+NBllh4UPw9W79xFnI+qZlETEcJ8rCWefJ0MMQfjHMLD4q/Z+tYuEfHgyyx8KD4e7beOQvVXJntVPHMzifi2QP5FYZ4YlhLDOMOhvjFOLdwUcDz5bHwvAeLHcTCRQHPl6/OQs6JXachZ2eudPDkUfhOXp2RtQrzUD638CFhP0cbCz+n9UM7xcKHhP0c7ReHxOe2LXfqzBXVNmsVptz8/8VdPJ29FjG5hYsCni+Phec9WOwgFi4KeL585yzk/ODJ1ExS+E4tMbti9jPbM/Ednl09/+LJLdwo5hmqWHhG9427xsKNYp6hujMLO299B8MTKzzzxKuYM0lhyEn8Sl7tpfKdfVXtkM8tHATx+4yFfp4NHcfCQRC/zzuzcPaUnDGztZwZs7Ud/Gxv7Ie1zKt9iSemU0v8EOcWDoL4fcZCP8+GjmPhIIjf551ZyDed77jKU5UOhnjGs7XEk4c9M088MSrPWhV3ajsYxf8rn1tYiOOxFAs9fCq6jIWFOB5LO/8dKeeHOj0xnRlADDlVnvzEq5g8rN2VV5yqnxv53MIbor2rJBa+y48b3cTCG6K9q2R1FnbeemJ4epXnHJrFs5b8zJOTcQezi3OWh30OcW7hIIjfZyz082zoOBYOgvh9rv4ZKU/M9515FavZQx6FUZyztcSTU+07iycna2f5yTPEuYWDIH6fsdDPs6HjWDgI4ve5cxZ2Ts95oPCdOUEMOVfyqp9OXvXAWmKYVz0TU8S5hYU4Hkux0MOnostYWIjjsbTzz0hnT8wZwFo1M4hRsaplfnbfWbzqjXlysjdimnFuYVOo98Ji4Xu9aXYWC5tCvRd25+dCnoZvOvMqnn33yd+p3YXfxUMdZjlZW8S5hYU4Hkux0MOnostYWIjjsXTn50L1pjPfOT3xnHOdPPlZyzxjxclahenk1V7MPxTnFj4k7OdoY+HntH5op1j4kLCfo139uZCdcmYwz3nDPGNVS4ziYa3CkKcTK07mydPZd6WWew1xbuEgiN9nLPTzbOg4Fg6C+H2uzsLO+04MZ8auvFJd8Ss886xlfjZW51U8xCvMkM8tHATx+4yFfp4NHcfCQRC/z9VZ+MSJOYc4G5jnvsQw34k7nMR09iKePbC2g2FtEecWFuJ4LMVCD5+KLmNhIY7H0urfF+46JecEOTkzOhjWqpg8jLkXY8VDDHmIZ554hWG+GecWNoV6LywWvtebZmexsCnUe2F3ZiFPw7ee+U7cmQ0Ko/LcV/U2W6vwip89MJ7Fs7aIcwsLcTyWYqGHT0WXsbAQx2NpdRbylGpmELNrHuziYW+duHNG8ig8+yeGefIUcW5hIY7HUiz08KnoMhYW4ngs7ZyFu06sZoPKc98OhngVr/Cw9gn+gTO3cBDE7zMW+nk2dBwLB0H8Pt84C6mimivMq5+liCEn8bMY1pKTPMSofKeWmCLOLSzE8ViKhR4+FV3GwkIcj6Wds5AzYOX0T/BwJqneuC/xzLNWYZgnnjExip/4Is4tLMTxWIqFHj4VXcbCQhyPpdVZyDf91InZQ2eurODVGRUn86p2MZ9buCjg+fJYeN6DxQ5i4aKA58vf+P8Lz6ti1UFuoZVdV83GwitVrHKx0Mquq2Zj4ZUqVrlYaGXXVbOx8EoVq1wstLLrqtlYeKWKVS4WWtl11WwsvFLFKhcLrey6ajYWXqlilYuFVnZdNRsLr1SxysVCK7uumo2FV6pY5f4FHaQ4Pt0WyyMAAAAASUVORK5CYII=
</office:binary-data>
</draw:image>
</draw:frame>
</text:p>
I could use a base64 conversion library, like for instance this nodejs one. What format is expected?
Well, folks the answer is as follows:
<text:p text:style-name="qr-wrapper">
<draw:frame draw:name="img1" svg:width="150.0pt" svg:height="150.0pt">
<draw:image xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad">
<office:binary-data>
iVBORw0KGgoAAAANSUhEUgAAAJYAAACWCAIAAACzY+a1AAAABGdBTUEAA1teXP8meAAABjpJREFUeAHtm92OHTcMg7tF3/+VtwlaBB+8Q0Eee46HAHOlkSlaJmMLm5+v7+/vv/LLWYG/nZtP778ViIX2vw9iYSy0V8D+ALmFsdBeAfsD5BbGQnsF7A+QWxgL7RWwP0BuYSy0V8D+ALmFsdBeAfsD5BbGQnsF7A/wz40TfH193aiqSzp/87xrX+5FTubrbv9bZW0H38HM9vCLMw9pR9hXY2Lhq+3pNBcLOyq9GnNnFvJAN97uP+W7Zgl7UJwK08n/abgIyFPALpdUz5fgn8ncwp+amGVioZlhP9uNhT81McuszkIet/Omr8wMVct9Oxj2rOIOZ6dWYVSfCl/kcwsLcTyWYqGHT0WXsbAQx2Np5yx84sScSeTnLGFMPPOsZUw8851a4g/GuYUHxd+zdSzco+NBllh4UPw9W79xFnI+qZlETEcJ8rCWefJ0MMQfjHMLD4q/Z+tYuEfHgyyx8KD4e7beOQvVXJntVPHMzifi2QP5FYZ4YlhLDOMOhvjFOLdwUcDz5bHwvAeLHcTCRQHPl6/OQs6JXachZ2eudPDkUfhOXp2RtQrzUD638CFhP0cbCz+n9UM7xcKHhP0c7ReHxOe2LXfqzBXVNmsVptz8/8VdPJ29FjG5hYsCni+Phec9WOwgFi4KeL585yzk/ODJ1ExS+E4tMbti9jPbM/Ednl09/+LJLdwo5hmqWHhG9427xsKNYp6hujMLO299B8MTKzzzxKuYM0lhyEn8Sl7tpfKdfVXtkM8tHATx+4yFfp4NHcfCQRC/zzuzcPaUnDGztZwZs7Ud/Gxv7Ie1zKt9iSemU0v8EOcWDoL4fcZCP8+GjmPhIIjf551ZyDed77jKU5UOhnjGs7XEk4c9M088MSrPWhV3ajsYxf8rn1tYiOOxFAs9fCq6jIWFOB5LO/8dKeeHOj0xnRlADDlVnvzEq5g8rN2VV5yqnxv53MIbor2rJBa+y48b3cTCG6K9q2R1FnbeemJ4epXnHJrFs5b8zJOTcQezi3OWh30OcW7hIIjfZyz082zoOBYOgvh9rv4ZKU/M9515FavZQx6FUZyztcSTU+07iycna2f5yTPEuYWDIH6fsdDPs6HjWDgI4ve5cxZ2Ts95oPCdOUEMOVfyqp9OXvXAWmKYVz0TU8S5hYU4Hkux0MOnostYWIjjsbTzz0hnT8wZwFo1M4hRsaplfnbfWbzqjXlysjdimnFuYVOo98Ji4Xu9aXYWC5tCvRd25+dCnoZvOvMqnn33yd+p3YXfxUMdZjlZW8S5hYU4Hkux0MOnostYWIjjsXTn50L1pjPfOT3xnHOdPPlZyzxjxclahenk1V7MPxTnFj4k7OdoY+HntH5op1j4kLCfo139uZCdcmYwz3nDPGNVS4ziYa3CkKcTK07mydPZd6WWew1xbuEgiN9nLPTzbOg4Fg6C+H2uzsLO+04MZ8auvFJd8Ss886xlfjZW51U8xCvMkM8tHATx+4yFfp4NHcfCQRC/z9VZ+MSJOYc4G5jnvsQw34k7nMR09iKePbC2g2FtEecWFuJ4LMVCD5+KLmNhIY7H0urfF+46JecEOTkzOhjWqpg8jLkXY8VDDHmIZ554hWG+GecWNoV6LywWvtebZmexsCnUe2F3ZiFPw7ee+U7cmQ0Ko/LcV/U2W6vwip89MJ7Fs7aIcwsLcTyWYqGHT0WXsbAQx2NpdRbylGpmELNrHuziYW+duHNG8ig8+yeGefIUcW5hIY7HUiz08KnoMhYW4ngs7ZyFu06sZoPKc98OhngVr/Cw9gn+gTO3cBDE7zMW+nk2dBwLB0H8Pt84C6mimivMq5+liCEn8bMY1pKTPMSofKeWmCLOLSzE8ViKhR4+FV3GwkIcj6Wds5AzYOX0T/BwJqneuC/xzLNWYZgnnjExip/4Is4tLMTxWIqFHj4VXcbCQhyPpdVZyDf91InZQ2eurODVGRUn86p2MZ9buCjg+fJYeN6DxQ5i4aKA58vf+P8Lz6ti1UFuoZVdV83GwitVrHKx0Mquq2Zj4ZUqVrlYaGXXVbOx8EoVq1wstLLrqtlYeKWKVS4WWtl11WwsvFLFKhcLrey6ajYWXqlilYuFVnZdNRsLr1SxysVCK7uumo2FV6pY5f4FHaQ4Pt0WyyMAAAAASUVORK5CYII=
</office:binary-data>
</draw:image>
</draw:frame>
</text:p>
That is to say
the encoding cannot be prefixed with data:image/png;base64,
the <draw:image> element cannot have a xlink:href="Pictures/0.png" attributed. Even though the spec says it will be ignored if <office:binary-data> is present. Shrug.

Charset of Lotus Domino Server

I've a Java agent (running on Linux server) that manage document attachments, but something wrong with accented chars in their names (ò,è,ù ecc..).
I wrote this code to display the charset used:
OutputStreamWriter writer = new OutputStreamWriter(new ByteArrayOutputStream());
String enc = writer.getEncoding();
System.out.println("CHARSET: " + enc);
This display
CHARSET: ASCII
In a server where everything works fine, same line print:
CHARSET: UTF8
Servers have same configuration (works with "Internet sites", where "Use UTF-8 for output" is set to "Yes").
Any idea about parameter to set (Domino/Linux)?
UPDATE
I'll try to explain better...
I call an agent through Ajax call.
In parameter, i pass "ààà" string. When i try to decode in UTF-8 inside agent, string is resolved with
"???"
instead of
"ààà"
This is what System.out.println() shows in console.
On another Domino server, everything works. I don't understand if it is a matter of server settings or OS settings.
Just a suggestion, but you could change the first line in your example to be:
OutputStreamWriter writer = new OutputStreamWriter(new ByteArrayOutputStream(),
Charset.forName("UTF-8"));
That will force the OutputStreamWriter to UTF8, and your sample code will show consistent output on both servers. Without knowing more details, I can't say for sure if that's relevant to the real problem.
Although this might not directly answer your question, maybe you might be interessted in this article about encoding.

Using a nodejs server to load files with Unicode names

I want to load audio files with names like здраво.mp3, using a NodeJS server. (That's "zdravo" or "hello" in Serbian, if you were wondering).
However, NodeJS makes a request for %D0%B7%D0%B4%D1%80%D0%B0%D0%B2%D0%BE.mp3 instead, which results in the file not being found.
If I drag the file into a browser window from my desktop, the browser is happy to load it as file///path/здраво.mp3, so the issue is not with the way the browser is treating the Unicode string
The HTML page containing the link to the file has this meta tag in the head section...
<meta charset="utf-8" />
... and it is quite happy to display the text "Здраво" on the page, so the Unicode strings are properly formed within the browser.
I am guessing that the browser is converting the name to ISO-8859-1 before sending the request, and that the NodeJS server somehow needs to convert it back to Unicode before looking for it in the file system.
My question is: is there already a module that I can use to do this conversion, and are there examples of how to use it?
SOLUTION: Following the reply from Edwin Dalorzo, here is the one-line fix that I made to my handleRequest() function:
function handleRequest(request, response) {
request.url = decodeURIComponent(request.url) // the fix
var pathname = url.parse(request.url).pathname
It is not clear how you are receiving the encoded string, but for sure you can decode by simply doing:
decodeURIComponent("%D0%B7%D0%B4%D1%80%D0%B0%D0%B2%D0%BE")
And this will give you back your string "здраво"

Decode a string with unknown encode method received from web-browser

inside a webapplication i am processing requests to a url like
http://example.com/<website-base-url>
im am logging the raw GET parameter of the request in an uft8 database column and in filesystem. for a few chinese domains i get requests with a website-base-url parameter like
%C3%83%C2%A3%C3%82%C2%A5%C3%83%C2%A2%C3%82%C2%A4%C3%83%C2%A2%C3%82%C2%A7%C3%83%C2%A3%C3%82%C2%A5%C3%83%C2%A2%C3%82%C2%A4%C3%83%C2%A2%C3%82%C2%B4%C3%83%C2%A3%C3%82%C2%A8%C3%83%C2%A2%C3%82%C2%B4%C3%83%C2%A2%C3%82%C2%B4.cn
Decoding with urldecode returns
ã¥â¤â§ã¥â¤â´ã¨â´â´.cn
This does not seem to be the domain name the user wants to request.
I have tried urlencoding, base64, utf8 and combinations wihtout success.
Any suggestions how decode the given parameter to utf8?
URL percentage encodings simply encode the raw bytes. It does not give you any hint regarding the actual encoding of the text. If you do not know what encoding these bytes represent, all you can do is guess.
php > $d = urldecode('%C3%83%C2%A3%C3%82%C2%A5%C3%83%C2%A2%C3%82%C2%A4%C3%83%C2%A2%C3%82%C2%A7%C3%83%C2%A3%C3%82%C2%A5%C3%83%C2%A2%C3%82%C2%A4%C3%83%C2%A2%C3%82%C2%B4%C3%83%C2%A3%C3%82%C2%A8%C3%83%C2%A2%C3%82%C2%B4%C3%83%C2%A2%C3%82%C2%B4.cn');
php > echo $d;
ã¥â¤â§ã¥â¤â´ã¨â´â´.cn
php > echo iconv('BIG5', 'UTF-8', $d);
php > echo iconv('Shift-JIS', 'UTF-8', $d);
テδ」テつ・テδ「テつ、テδ「テつァテδ」テつ・テδ「テつ、テδ「テつエテδ」テつィテδ「テつエテδ「テつエ.cn
php > echo iconv('GB18030', 'UTF-8', $d);
脙拢脗楼脙垄脗陇脙垄脗搂脙拢脗楼脙垄脗陇脙垄脗麓脙拢脗篓脙垄脗麓脙垄脗麓.cn
GB18030 would seem to be the best candidate, but even that decoded string looks a bit too repetitive to be really useful Chinese.

Resources