Node MySQL 5.7 latin1_swedish_ci using mysql2 - node.js

I'm getting a special characters from a latin1_swedish_ci database. It contains a huge amount of data, and migration is not a option :(. The new app has all its files encoding are utf8, and we are looking for conversion solution, from latin1 to uft8. The charset on mysql2, plus set names, etc.. I also try any other suggestions using iconv (version dependency) from internet that I could not make them work, So I ended up developing some code that seems works and fixes the problem.
However, it is very obvious...do you see something wrong in the code?
let data = JSON.stringify(rows); // list of mysql objects swedish encoding to string
data = Buffer.from(data, "latin1"); // to bynary
data = data.toString("utf8"); // to utf8
rows = JSON.parse(data); // to json
String example before apply the code below:
Distributeurs: N° 5/6
Thanks!

OK, (warning: my node skills are low), but this code will convert the word ångström (first from UTF8 to latin1 and then) from latin1 to UTF8:
const buffer = require('buffer');
const latin1Buffer = buffer.transcode(Buffer.from("ångström"), "utf8", "latin1");
const latin1String = latin1Buffer.toString("latin1");
let rows = latin1String;
console.log("Buffer latin1 encoding: ", latin1Buffer);
console.log("String in latin1:", rows);
console.log("");
rows = latin1String;
let data2 = buffer.transcode(Buffer.from(rows, "latin1"), "latin1", "utf8" );
console.log("Buffer in UTF8:", data2);
console.log("String in UTF8: ", data2.toString());
output:
Buffer latin1 encoding: <Buffer e5 6e 67 73 74 72 f6 6d>
String in latin1: ångström
Buffer in UTF8: <Buffer c3 a5 6e 67 73 74 72 c3 b6 6d>
String in UTF8: ångström

Related

the output of "crypto.createCipheriv with chinese character" is not correct

when there is no chinese character, php and node output the same result.
but when this is chinese character, the output of php is correct, the output of node is not correct
const crypto = require('crypto');
function encodeDesECB(textToEncode, keyString) {
var key = new Buffer(keyString.substring(0, 8), 'utf8');
var cipher = crypto.createCipheriv('des-ecb', key, '');
cipher.setAutoPadding(true);
var c = cipher.update(textToEncode, 'utf8', 'base64');
c += cipher.final('base64');
return c;
}
console.log(encodeDesECB(`{"key":"test"}`, 'MIGfMA0G'))
console.log(encodeDesECB(`{"key":"测试"}`, 'MIGfMA0G'))
node output
6RQdIBxccCUFE+cXPODJzg==
6RQdIBxccCWXTmivfit9AOfoJRziuDf4
php output
6RQdIBxccCUFE+cXPODJzg==
6RQdIBxccCXFCRVbubGaolfSr4q5iUgw
The problem is not the encryption, but a different JSON serialization of the plaintext.
In the PHP code, json_encode() converts the characters as a Unicode escape sequence, i.e. the encoding returns {"key":"\u6d4b\u8bd5"}. In the NodeJS code, however, {"key": "测试"} is applied.
This means that different plaintexts are encrypted in the end. Therefore, for the same ciphertext, a byte-level identical plaintext must be used.
If Unicode escape sequences are to be applied in the NodeJS code (as in the PHP code), an appropriate conversion is necessary. For this the jsesc package can be used:
const jsesc = require('jsesc');
...
console.log(encodeDesECB(jsesc(`{\"key\":\"测试\"}`, {'lowercaseHex': true}), 'MIGfMA0G')); // 6RQdIBxccCXFCRVbubGaolfSr4q5iUgw
now returns the result of the posted PHP code.
If the Unicode characters are to be used unmasked in the PHP code (as in the NodeJS code), an appropriate conversion is necessary. For this the flag JSON_UNESCAPED_UNICODE can be set in json_encode():
$data = json_encode($data, JSON_UNESCAPED_UNICODE); // 6RQdIBxccCWXTmivfit9AOfoJRziuDf4
now returns the result of the posted NodeJS code.

Saving hex data to file in NodeJS

Say I've got a string
"020000009B020000C0060000"
and I want to turn it into a string such that if it was saved as a file and opened it in a hex editor it would display the same hex as I put in.
How do I do this?
I've never had to use buffers or anything, but in trying to do this I've seemingly had to, and I don't really know what I'm doing.
(or whether I should be using buffers at all- I've been searching for hours and all I can find are answers that use buffers so I've just taken their examples and yet nothing is working)
I turned the hex string into a buffer
function hexToBuffer(hex) {
let typedArray = new Uint8Array(hex.match(/[\da-f]{2}/gi).map(function (h) {
return parseInt(h, 16)
}))
return typedArray
}
and this seemed to work because logging hexToBuffer("020000009B020000C0060000").buffer what it returns gave me this:
ArrayBuffer {
[Uint8Contents]: <02 00 00 00 9b 02 00 00 c0 06 00 00>,
byteLength: 12
}
which has the same hex as I put in, so it seems to be working fine,
then to make the array buffer into a string I did this.
let dataView = new DataView(buffer);
let decoder = new TextDecoder('utf-8');
let string = decoder.decode(dataView)
just to test that it worked, I saved it to a file.
fs.writeFileSync(__dirname+'/test.txt', string)
Opening test.txt in a hex editor shows different data:
02000000EFBFBD020000EFBFBD060000
If I instead do
fs.writeFileSync(__dirname+'/test.txt', hexToBuffer("020000009B020000C0060000"))
then I get the correct data- but then if I read the file with fs and then add to it it's once again not the same values.
let test = fs.readFileSync(__dirname+'/test.txt', 'utf8)
let example2 = test+'example'
fs.writeFileSync(__dirname+'/test.txt', example2)
now test.txt begins with 02000000EFBFBD020000EFBFBD060000 instead of 020000009B020000C0060000. What do I do?
First of all, you can use the Buffer.from(string[, encoding]) method to create a Buffer from a string whilst also specifying the encoding - in your case it will be "hex":
const b1 = Buffer.from("020000009B020000C0060000", "hex")
Now save it to a file:
fs.writeFileSync(path.resolve('./test'), b1)
Then we can check that the file contains the same hex values as in the string by using xxd on the command line:
$ xxd test
0200 0000 9b02 0000 c006 0000
Looking good!
Now we can read the file back into a buffer as well, making sure to tell it the encoding of the file is "hex" again:
const b2 = fs.readFileSync(path.resolve("./test"), "hex")
Finally, turn it back into a string again with the Buffer.toString() method:
console.log(b2.toString()) // => "020000009b020000c0060000"

Flutter - export csv read as gibberish in Excel

I have the following code, which I use to export Unicode chars (Herbrew) to CSV, meant to be open by Excel/Google Sheets:
String csv = const ListToCsvConverter().convert(rows);
List<int> bytes = List.from(utf8.encode(csv));
// bytes.insert(0, unicodeBomCharacterRune );
return bytes;
If I export it to gmail, and open it in google sheets it works fine, but when I open it it with Excel it opens it as gibberish.
I'm well familiar with this issue which I solved in other ENV (php, js) by addind BOM char at the beginning of the file,
but when I do so in flutter (ie. uncomment the line bytes.insert(0, unicodeBomCharacterRune );) I get gibberish both in sheets and in excel.
Any idea how to over come this issue?
Here is how I solved it:
String csv = const ListToCsvConverter().convert(rows);
List<int> bytes = List.from(utf8.encode(csv));
bytes.insert(0, 0xBF );
bytes.insert(0, 0xBB );
bytes.insert(0, 0xEF );
return bytes;
The UTF-8 BOM is a sequence of bytes at the start of a text stream (0xEF, 0xBB, 0xBF) that allows the reader to more reliably guess a file as being encoded in UTF-8.
NOTE: I insert the bytes in reverse order since I add them to the top

Decode a serialized BrokeredMessage via Rest on non-Windows

I'm building a Xamarin client that reads messages from the Azure Service Bus.
My REST code can successfully pull a message off the Service Bus, but what I'm getting back appears to be binary (as in non-Text...I know it's all binary ;) )
This is the test code on Windows:
byte[] response = webClient.UploadData(fullAddress, "DELETE", new byte[0]);
MemoryStream ms = new MemoryStream(response);
BrokeredMessage bm = new BrokeredMessage(ms);
responseStr = bm.GetBody<string>();
My problem is on Xamarin/Mono, I don't have a BrokeredMessage.
So my question is how to I de-serialize a BrokeredMessage by hand?
Here's the first few bytes of the response variable looks like:
40 06 73 74 72 69 6e 67 08 33 68 74 74 70 3a 2f 2f 73 63 68
All the examples, I've found, say that I should be getting back XML....it 'almost' looks like XML but the 06 and the 08 are throwing me off.
I'm sure that I'm missing something simple, but I can't find it.
Any guidance would be welcome.
I figured it out so I'm posting the answer just in case someone else runs into the same problem.
response = webClient.UploadData(fullAddress, "DELETE", new byte[0]);
responseStr = System.Text.Encoding.UTF8.GetString(response);
Dictionary<string, object> result = new Dictionary<string, object>();
foreach (var headerKey in webClient.ResponseHeaders.AllKeys)
result.Add(headerKey, webClient.ResponseHeaders[headerKey]);
MemoryStream ms = new MemoryStream(response);
DataContractSerializer serializer = new DataContractSerializer(typeof(string));
XmlDictionaryReader reader = XmlDictionaryReader.CreateBinaryReader(ms, XmlDictionaryReaderQuotas.Max);
object deserializedBody = serializer.ReadObject(reader);
responseStr = (string)deserializedBody;
result.Add("body", responseStr);
The BrokeredMessage properties are stored in the ResponseHeaders

In node.js, how do I validate UTF8 data in a buffer? [duplicate]

I just discovered that Node (tested: v0.8.23, current git: v0.11.3-pre) ignores any decoding errors in its Buffer handling, silently replacing any non-utf8 characters with '\ufffd' (the Unicode REPLACEMENT CHARACTER) instead of throwing an exception about the non-utf8 input. As a consequence, fs.readFile, process.stdin.setEncoding and friends mask a large class of bad input errors for you.
Example which doesn't fail but really ought to:
> notValidUTF8 = new Buffer([ 128 ], 'binary')
<Buffer 80>
> decodedAsUTF8 = notValidUTF8.toString('utf8') // no exception thrown here!
'�'
> decodedAsUTF8 === '\ufffd'
true
'\ufffd' is a perfectly valid character that can occur in legal utf8 (as the sequence ef bf bd), so it is non-trivial to monkey-patch in error handling based on this showing up in the result.
Digging a little deeper, it looks like this stems from node just deferring to v8's strings and that those in turn have the above behaviour, v8 not having any external world full of foreign-encoded data.
Are there node modules or otherwise that let me catch utf-8 decode errors, preferrably with context about where the error was discovered in the input string or buffer?
I hope you solved the problem in those years, I had a similar one and eventually solved with this ugly trick:
function isValidUTF8(buf){
return Buffer.compare(new Buffer(buf.toString(),'utf8') , buf) === 0;
}
which converts the buffer back and forth and check it stays the same.
The 'utf8' encoding can be omitted.
Then we have:
> isValidUTF8(new Buffer('this is valid, 指事字 eè we hope','utf8'))
true
> isValidUTF8(new Buffer([128]))
false
> isValidUTF8(new Buffer('\ufffd'))
true
where the '\ufffd' character is correctly considered as valid utf8.
UPDATE: now this works in JXcore, too
From node 8.3 on, you can use util.TextDecoder to solve this cleanly:
const util = require('util')
const td = new util.TextDecoder('utf8', {fatal:true})
td.decode(Buffer.from('foo')) // works!
td.decode(Buffer.from([ 128 ], 'binary')) // throws TypeError
This will also work in some browsers by using TextDecoder in the global namespace.
As Josh C. said above: "npmjs.org/package/encoding"
From the npm website: "encoding is a simple wrapper around node-iconv and iconv-lite to convert strings from one encoding to another."
Download:
$ npm install encoding
Example Usage
var result = encoding.convert(new Buffer([ 128 ], 'binary'), "utf8");
console.log(result); //<Buffer 80>
Visit the site: npm - encoding

Resources