Say I've got a string
"020000009B020000C0060000"
and I want to turn it into a string such that if it was saved as a file and opened it in a hex editor it would display the same hex as I put in.
How do I do this?
I've never had to use buffers or anything, but in trying to do this I've seemingly had to, and I don't really know what I'm doing.
(or whether I should be using buffers at all- I've been searching for hours and all I can find are answers that use buffers so I've just taken their examples and yet nothing is working)
I turned the hex string into a buffer
function hexToBuffer(hex) {
let typedArray = new Uint8Array(hex.match(/[\da-f]{2}/gi).map(function (h) {
return parseInt(h, 16)
}))
return typedArray
}
and this seemed to work because logging hexToBuffer("020000009B020000C0060000").buffer what it returns gave me this:
ArrayBuffer {
[Uint8Contents]: <02 00 00 00 9b 02 00 00 c0 06 00 00>,
byteLength: 12
}
which has the same hex as I put in, so it seems to be working fine,
then to make the array buffer into a string I did this.
let dataView = new DataView(buffer);
let decoder = new TextDecoder('utf-8');
let string = decoder.decode(dataView)
just to test that it worked, I saved it to a file.
fs.writeFileSync(__dirname+'/test.txt', string)
Opening test.txt in a hex editor shows different data:
02000000EFBFBD020000EFBFBD060000
If I instead do
fs.writeFileSync(__dirname+'/test.txt', hexToBuffer("020000009B020000C0060000"))
then I get the correct data- but then if I read the file with fs and then add to it it's once again not the same values.
let test = fs.readFileSync(__dirname+'/test.txt', 'utf8)
let example2 = test+'example'
fs.writeFileSync(__dirname+'/test.txt', example2)
now test.txt begins with 02000000EFBFBD020000EFBFBD060000 instead of 020000009B020000C0060000. What do I do?
First of all, you can use the Buffer.from(string[, encoding]) method to create a Buffer from a string whilst also specifying the encoding - in your case it will be "hex":
const b1 = Buffer.from("020000009B020000C0060000", "hex")
Now save it to a file:
fs.writeFileSync(path.resolve('./test'), b1)
Then we can check that the file contains the same hex values as in the string by using xxd on the command line:
$ xxd test
0200 0000 9b02 0000 c006 0000
Looking good!
Now we can read the file back into a buffer as well, making sure to tell it the encoding of the file is "hex" again:
const b2 = fs.readFileSync(path.resolve("./test"), "hex")
Finally, turn it back into a string again with the Buffer.toString() method:
console.log(b2.toString()) // => "020000009b020000c0060000"
Related
Under Python 3.10, I do have an UDP socket that listens to a COM port.
I do get datas like this :
b'SENDPKT: "STN1" "" "SH/DX\r"\x98\x00'
The infos SH/DX before the "\n" can change and has a different length and I need to extract them.
.strip('b\r') doesn't work.
Using .decode() and str(), I tried to convert this bytes datas to a string for easier manipulation, but that doesn't work either.
I get an error "invalid start byte at position 27 for 0x98
Any guess, how I can solve this ?
Thanks,
For sophisticated input you can try ignoring errors while decoding:
b = b'SENDPKT: "STN1" "" "SH/DX\r"\x98\x00'
s = b.decode(errors='ignore')
res = s[20:s.find('\r')] # 'SH/DX'
I am working with a system that syncs files between two vendors. The tooling is written in Javascript and does a transformation on file names before sending it to the destination. I am trying to fix a bug in it that is failing to properly compare file names between the origin and destination.
The script uses the file name to check if it's on destination
For example:
The following file name contains a special character that has different encoding between source and destination.
source: Chinchón.jpg // hex code: ó
destination : Chinchón.jpg // hex code: 0xf3
The function that does the transformation is:
export const normalizeText = (text:string) => text
.normalize('NFC')
.replace(/\p{Diacritic}/gu, "")
.replace(/\u{2019}/gu, "'")
.replace(/\u{ff1a}/gu, ":")
.trim()
and the comparison is happening just like the following:
const array1 = ['Chinchón.jpg'];
console.log(array1.includes('Chinchón.jpg')); // false
Do I reverse the transformation before comparing? what's the best way to do that?
If i got your question right:
// prepare dictionary
const rawDictionary = ['Chinchón.jpg']
const dictionary = rawDictionary.map(x => normalizeText(x))
...
const rawComparant = 'Chinchón.jpg'
const comparant = normalizeText(rawComparant)
console.log(rawSources.includes(comparant))
I'm getting a special characters from a latin1_swedish_ci database. It contains a huge amount of data, and migration is not a option :(. The new app has all its files encoding are utf8, and we are looking for conversion solution, from latin1 to uft8. The charset on mysql2, plus set names, etc.. I also try any other suggestions using iconv (version dependency) from internet that I could not make them work, So I ended up developing some code that seems works and fixes the problem.
However, it is very obvious...do you see something wrong in the code?
let data = JSON.stringify(rows); // list of mysql objects swedish encoding to string
data = Buffer.from(data, "latin1"); // to bynary
data = data.toString("utf8"); // to utf8
rows = JSON.parse(data); // to json
String example before apply the code below:
Distributeurs: N° 5/6
Thanks!
OK, (warning: my node skills are low), but this code will convert the word ångström (first from UTF8 to latin1 and then) from latin1 to UTF8:
const buffer = require('buffer');
const latin1Buffer = buffer.transcode(Buffer.from("ångström"), "utf8", "latin1");
const latin1String = latin1Buffer.toString("latin1");
let rows = latin1String;
console.log("Buffer latin1 encoding: ", latin1Buffer);
console.log("String in latin1:", rows);
console.log("");
rows = latin1String;
let data2 = buffer.transcode(Buffer.from(rows, "latin1"), "latin1", "utf8" );
console.log("Buffer in UTF8:", data2);
console.log("String in UTF8: ", data2.toString());
output:
Buffer latin1 encoding: <Buffer e5 6e 67 73 74 72 f6 6d>
String in latin1: ångström
Buffer in UTF8: <Buffer c3 a5 6e 67 73 74 72 c3 b6 6d>
String in UTF8: ångström
I have the following code, which I use to export Unicode chars (Herbrew) to CSV, meant to be open by Excel/Google Sheets:
String csv = const ListToCsvConverter().convert(rows);
List<int> bytes = List.from(utf8.encode(csv));
// bytes.insert(0, unicodeBomCharacterRune );
return bytes;
If I export it to gmail, and open it in google sheets it works fine, but when I open it it with Excel it opens it as gibberish.
I'm well familiar with this issue which I solved in other ENV (php, js) by addind BOM char at the beginning of the file,
but when I do so in flutter (ie. uncomment the line bytes.insert(0, unicodeBomCharacterRune );) I get gibberish both in sheets and in excel.
Any idea how to over come this issue?
Here is how I solved it:
String csv = const ListToCsvConverter().convert(rows);
List<int> bytes = List.from(utf8.encode(csv));
bytes.insert(0, 0xBF );
bytes.insert(0, 0xBB );
bytes.insert(0, 0xEF );
return bytes;
The UTF-8 BOM is a sequence of bytes at the start of a text stream (0xEF, 0xBB, 0xBF) that allows the reader to more reliably guess a file as being encoded in UTF-8.
NOTE: I insert the bytes in reverse order since I add them to the top
I just discovered that Node (tested: v0.8.23, current git: v0.11.3-pre) ignores any decoding errors in its Buffer handling, silently replacing any non-utf8 characters with '\ufffd' (the Unicode REPLACEMENT CHARACTER) instead of throwing an exception about the non-utf8 input. As a consequence, fs.readFile, process.stdin.setEncoding and friends mask a large class of bad input errors for you.
Example which doesn't fail but really ought to:
> notValidUTF8 = new Buffer([ 128 ], 'binary')
<Buffer 80>
> decodedAsUTF8 = notValidUTF8.toString('utf8') // no exception thrown here!
'�'
> decodedAsUTF8 === '\ufffd'
true
'\ufffd' is a perfectly valid character that can occur in legal utf8 (as the sequence ef bf bd), so it is non-trivial to monkey-patch in error handling based on this showing up in the result.
Digging a little deeper, it looks like this stems from node just deferring to v8's strings and that those in turn have the above behaviour, v8 not having any external world full of foreign-encoded data.
Are there node modules or otherwise that let me catch utf-8 decode errors, preferrably with context about where the error was discovered in the input string or buffer?
I hope you solved the problem in those years, I had a similar one and eventually solved with this ugly trick:
function isValidUTF8(buf){
return Buffer.compare(new Buffer(buf.toString(),'utf8') , buf) === 0;
}
which converts the buffer back and forth and check it stays the same.
The 'utf8' encoding can be omitted.
Then we have:
> isValidUTF8(new Buffer('this is valid, 指事字 eè we hope','utf8'))
true
> isValidUTF8(new Buffer([128]))
false
> isValidUTF8(new Buffer('\ufffd'))
true
where the '\ufffd' character is correctly considered as valid utf8.
UPDATE: now this works in JXcore, too
From node 8.3 on, you can use util.TextDecoder to solve this cleanly:
const util = require('util')
const td = new util.TextDecoder('utf8', {fatal:true})
td.decode(Buffer.from('foo')) // works!
td.decode(Buffer.from([ 128 ], 'binary')) // throws TypeError
This will also work in some browsers by using TextDecoder in the global namespace.
As Josh C. said above: "npmjs.org/package/encoding"
From the npm website: "encoding is a simple wrapper around node-iconv and iconv-lite to convert strings from one encoding to another."
Download:
$ npm install encoding
Example Usage
var result = encoding.convert(new Buffer([ 128 ], 'binary'), "utf8");
console.log(result); //<Buffer 80>
Visit the site: npm - encoding