Unexpected result when calling toString on a buffer in Node

Unexpected result when calling toString on a buffer in Node - node.js

I'm in a situation where I need to revert data back to a buffer that has had toString called on it. For example:
const buffer // I need this, or equivalent
const bufferString = buffer.toString() // This is all I have
The node documentation implies that .toString() defaults to 'utf8' encoding, and I can revert this with Buffer.from(bufferString, 'utf8'), but this doesn't work and I get different data. (maybe some data loss when it is converted to a string, although the documentation doesn't seem to mention this).
Does anyone know why this is happening or how to fix it?
Here is the data I have to reproduce this:
const intArr = [31, 139, 8, 0, 0, 0, 0, 0, 0, 0, 170, 86, 42, 201, 207, 78, 205, 83, 178, 82, 178, 76, 78, 53, 179, 72, 74, 51, 215, 53, 54, 51, 51, 211, 53, 49, 78, 50, 210, 77, 74, 49, 182, 208, 53, 52, 178, 180, 72, 75, 76, 52, 75, 180, 76, 50, 81, 170, 5, 0, 0, 0, 255, 255, 3, 0, 29, 73, 93, 151, 48, 0, 0, 0]
const buffer = Buffer.from(intArr) // The buffer I want!
const bufferString = buffer.toString() // The string I have!, note .toString() and .toString('utf8') are equivalent
const differentBuffer = Buffer.from(bufferString, 'utf8')
You can get the initial intArr from a buffer by doing this:
JSON.parse(JSON.stringify(Buffer.from(buffer)))['data']
Edit: interestingly calling .toString() on differentBuffer gives the same initial string.

I think the important part of the documentation you linked is When decoding a Buffer into a string that does not exclusively contain valid UTF-8 data, the Unicode replacement character U+FFFD � will be used to represent those errors. When you are converting your buffer into a utf8 string, not all characters are valid utf8, as you can see by doing a console.log(bufferString); almost all of it comes out as gibberish. Therefore you are irretrievably losing data when converting from the buffer into a utf8 string and you can't get that lost data back when converting back into the buffer.
In your example if you were to use utf16 instead of utf8 you don't lose information and thus your buffer is the same once converting back. I.E
const intArr = [31, 139, 8, 0, 0, 0, 0, 0, 0, 0, 170, 86, 42, 201, 207, 78, 205, 83, 178, 82, 178, 76, 78, 53, 179, 72, 74, 51, 215, 53, 54, 51, 51, 211, 53, 49, 78, 50, 210, 77, 74, 49, 182, 208, 53, 52, 178, 180, 72, 75, 76, 52, 75, 180, 76, 50, 81, 170, 5, 0, 0, 0, 255, 255, 3, 0, 29, 73, 93, 151, 48, 0, 0, 0]
const buffer = Buffer.from(intArr);
const bufferString = buffer.toString('utf16le');
const differentBuffer = Buffer.from(bufferString, 'utf16le') ;
console.log(buffer); // same as the below log
console.log(differentBuffer); // same as the above log

Use the 'latin1' or 'binary' encoding with Buffer.toString and Buffer.from. Those encodings are the same and map bytes to the unicode characters U+0000 to U+00FF.

Related

What data format is this alongside ascii and decimal?

Consider the following:
use sha2::{Sha256,Digest};
fn main() {
let mut hasher = Sha256::new();
hasher.update(b"hello world");
let result = hasher.finalize();
let str_result = format!("{:x}", result);
println!("A string is: {:x}", result);
println!("ASCII decimal maps: {:?}", str_result.bytes());
println!("What data coding is this?: {:?}", result);
}
The SHA256 hash as a string is: b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9
ASCII decimal maps: Bytes(Copied { it: Iter([98, 57, 52, 100, 50, 55, 98, 57, 57, 51, 52, 100, 51, 101, 48, 56, 97, 53, 50, 101, 53, 50, 100, 55, 100, 97, 55, 100, 97, 98, 102, 97, 99, 52, 56, 52, 101, 102, 101, 51, 55, 97, 53, 51, 56, 48, 101, 101, 57, 48, 56, 56, 102, 55, 97, 99, 101, 50, 101, 102, 99, 100, 101, 57]) })
What data coding is this?: [185, 77, 39, 185, 147, 77, 62, 8, 165, 46, 82, 215, 218, 125, 171, 250, 196, 132, 239, 227, 122, 83, 128, 238, 144, 136, 247, 172, 226, 239, 205, 233]
The first two make sense, we have the ASCII representations, followed by the ASCII > Decimal map. What is the third format? [185, 77, 39, 185, 147, 77, 62, 8, 165, 46, 82, 215, 218, 125, 171, 250, 196, 132, 239, 227, 122, 83, 128, 238, 144, 136, 247, 172, 226, 239, 205, 233]?

It's the bytes of the hash represented as an array of decimals instead of as a hexadecimal string.
b94d27... -> [185, 77, 39 ...]
0xb9 -> 185
0x4d -> 77
0x27 -> 39

Saving an array of uint8array in an file using node js

I'm trying to save an array of uint8arrays in an file using node JS, does anyone know how can I do that using fs?
Something like that:
[
Uint8Array(162) [
133, 111, 74, 131, 187, 35, 69, 135, 1, 151, 1, 1,
198, 69, 229, 121, 168, 106, 59, 104, 91, 198, 115, 218,
79, 67, 102, 212, 204, 206, 145, 126, 152, 59, 147, 114,
133, 173, 119, 91, 220, 251, 174, 57, 16, 125, 67, 180,
114, 128, 5, 72, 143, 131, 122, 35, 124, 139, 62, 93,
187, 2, 2, 251, 255, 149, 134, 6, 9, 65, 100, 100,
32, 99, 97, 114, 100, 50, 0, 9, 1, 2, 2, 4,
19, 4, 21, 14, 52, 3, 66, 4, 86, 5, 87, 26,
112, 2, 3, 0,
... 62 more items
],
Uint8Array(170) [
133, 111, 74, 131, 67, 31, 18, 227, 1, 159, 1, 1,
187, 35, 69, 135, 0, 7, 102, 52, 237, 242, 169, 192,
199, 237, 238, 142, 226, 200, 98, 129, 144, 184, 181, 198,
33, 248, 228, 223, 158, 171, 250, 192, 16, 125, 67, 180,
114, 128, 5, 72, 143, 131, 122, 35, 124, 139, 62, 93,
187, 3, 5, 251, 255, 149, 134, 6, 8, 65, 100, 100,
32, 99, 97, 114, 100, 0, 10, 1, 2, 2, 4, 17,
4, 19, 4, 21, 14, 52, 3, 66, 4, 86, 5, 87,
29, 112, 2, 3,
... 70 more items
],
Uint8Array(120) [
133, 111, 74, 131, 71, 17, 199, 138, 1, 110, 1, 67,
31, 18, 227, 131, 113, 252, 222, 71, 172, 34, 205, 40,
96, 11, 236, 242, 153, 43, 182, 4, 136, 135, 36, 6,
79, 52, 223, 123, 188, 42, 7, 16, 125, 67, 180, 114,
128, 5, 72, 143, 131, 122, 35, 124, 139, 62, 93, 187,
4, 8, 251, 255, 149, 134, 6, 11, 68, 101, 108, 101,
116, 101, 32, 99, 97, 114, 100, 0, 10, 1, 2, 2,
2, 17, 2, 19, 2, 52, 1, 66, 2, 86, 2, 112,
2, 113, 2, 115,
... 20 more items
],
Uint8Array(126) [
133, 111, 74, 131, 51, 54, 68, 129, 1, 116, 1, 71,
17, 199, 138, 51, 167, 78, 185, 254, 251, 73, 245, 53,
153, 253, 146, 203, 139, 250, 206, 105, 223, 51, 227, 156,
55, 3, 146, 177, 243, 235, 140, 16, 125, 67, 180, 114,
128, 5, 72, 143, 131, 122, 35, 124, 139, 62, 93, 187,
5, 9, 251, 255, 149, 134, 6, 17, 77, 97, 114, 107,
32, 99, 97, 114, 100, 32, 97, 115, 32, 100, 111, 110,
101, 0, 9, 1, 2, 2, 2, 21, 6, 52, 1, 66,
2, 86, 2, 112,
... 26 more items
]
]

I am simply using the following, for exporting into docx file:
buffer = Uint8Array(86762) [
80, 75, 3, 4, 10, 0, 0, 0, 8, 0, 79, 72,
111, 85, 212, 85, 145, 41, 234, 1, 0, 0, 43, 11,
0, 0, 19, 0, 0, 0, 91, 67, 111, 110, 116, 101,
110, 116, 95, 84, 121, 112, 101, 115, 93, 46, 120, 109,
108, 197, 150, 205, 78, 227, 48, 20, 133, 247, 60, 133,
229, 45, 106, 92, 64, 26, 141, 80, 83, 22, 252, 44,
25, 164, 233, 60, 128, 27, 223, 164, 134, 248, 71, 182,
91, 232, 219, 207, 117, 66, 67, 169, 18, 198, 154, 54,
98, 83, 41, 182,
... 86662 more items
]
fs.writeFileSync('output.docx', buffer)
I suppose restoring it will be just fs.readFileSync(pathToFile)

Converting a .mat file to cv image

I have a .mat file and want to convert it into a CV image format such that I can use it for a CNN model.
I am trying to obtain an RGB/ other colored image and not gray.
I tried doing the following(below) but I get a grayscale image, but when I plot the actual mat file using matplotlib it is not grayscale. Also, the .mat file has a px_spacing array apart from the image array. I am not sure how this is helpful.
def mat_to_image(mat_image):
f = loadmat(mat_image,appendmat=True)
image = np.array(f.get('I')).astype(np.float32)
mean = image.mean()
std= image.std()
print(mean, std)
hi = np.max(image)
lo = np.min(image)
image = (((image - lo)/(hi-lo))*255).astype(np.uint8)
im = Image.fromarray(image,mode='RGB')
return im
images=mat_to_image(dir/filename)
cv_img = cv2.cvtColor(np.array(images), cv2.COLOR_GRAY2RGB)
Normally plotting the .mat file fetches a non-grayscale(RGB image)
imgplot= plt.imshow(loadmat(img,appendmat=True).get('I'))
plt.show()
Here is how the mat file looks after print(loadmat('filename'))
{'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Sep 9 11:32:54 2019',
'__version__': '1.0',
'__globals__': [],
'I': array([
[ 81, 75, 74, 75, -11, 14, 49, 37, 29, -24, -183, -349, -581, -740],
[ 51, 33, 67, 36, 1, 42, 30, 49, 47, 42, 14, -85, -465, -727],
[ 23, 31, 36, 20, 54, 70, 44, 56, 56, 79, 62, 19, -204, -595],
[ 7, 12, 36, 47, 59, 68, 74, 56, 59, 100, 74, 34, -3, -353],
[ 23, 19, 51, 87, 86, 79, 91, 76, 96, 95, 52, 51, 74, -141],
[ 18, 51, 54, 97, 93, 94, 98, 83, 119, 71, 36, 69, 50, -16],
[ -10, 5, 53, 92, 69, 87, 103, 114, 118, 77, 51, 68, 30, 0],
[ -24, 11, 74, 80, 49, 68, 106, 129, 107, 63, 57, 70, 39, -1],
[ -45, 43, 83, 69, 43, 64, 98, 108, 90, 35, 27, 55, 31, -13],
[ -9, 32, 83, 78, 66, 106, 89, 85, 58, 43, 31, 39, 28, 7],
[ 45, 35, 76, 45, 51, 84, 55, 66, 49, 41, 39, 28, 13, -7],
[ 85, 67, 61, 45, 69, 53, 23, 32, 31, -12, -34, -182, -376, -425],
[ 136, 93, 71, 54, 30, 39, 17, -21, -29, -43, -101, -514, -792, -816]
], dtype=int16),
'px_spacing': array([[0.78125]])}

conversion of string to tuple in python

I have a tuple (h) as follows:
(array([[145, 34, 26, 18, 90, 89],
[ 86, 141, 216, 167, 67, 214],
[ 18, 0, 212, 49, 232, 34],
...,
[147, 99, 73, 110, 108, 9],
[222, 133, 231, 48, 227, 154],
[184, 133, 169, 201, 162, 168]], dtype=uint8), array([[178, 58, 24, 90],
[ 3, 31, 129, 243],
[ 48, 92, 19, 108],
...,
[148, 21, 25, 209],
[189, 114, 46, 218],
[ 15, 43, 92, 61]], dtype=uint8), array([[ 17, 254, 216, ..., 126, 74, 129],
[231, 168, 214, ..., 131, 50, 107],
[ 77, 185, 229, ..., 86, 167, 61],
...,
[105, 240, 95, ..., 230, 158, 27],
[211, 46, 193, ..., 48, 57, 79],
[136, 126, 235, ..., 109, 33, 185]], dtype=uint8))
I converted it into a string s = str(h):
'(array([[ 1, 60, 249, 162, 51, 3],\n [ 57, 76, 193, 244, 17, 238],\n [ 22, 72, 101, 229, 185, 124],\n ...,\n [132, 243, 123, 192, 152, 107],\n [163, 187, 131, 47, 253, 155],\n [ 21, 3, 77, 208, 229, 15]], dtype=uint8), array([[119, 149, 215, 129],\n [146, 71, 121, 79],\n [114, 148, 121, 140],\n ...,\n [175, 121, 81, 71],\n [178, 92, 1, 99],\n [ 80, 122, 189, 209]], dtype=uint8), array([[ 26, 122, 248, ..., 104, 167, 29],\n [ 41, 213, 250, ..., 82, 71, 211],\n [ 20, 122, 4, ..., 152, 99, 121],\n ...,\n [133, 77, 84, ..., 238, 243, 240],\n [208, 183, 187, ..., 182, 51, 116],\n [ 19, 135, 48, ..., 210, 163, 58]], dtype=uint8))'
Now, I want to convert s back to a tuple. I tried using ast.literal_eval(s), but I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/ast.py", line 84, in literal_eval
return _convert(node_or_string)
File "/usr/lib/python3.5/ast.py", line 55, in _convert
return tuple(map(_convert, node.elts))
File "/usr/lib/python3.5/ast.py", line 83, in _convert
raise ValueError('malformed node or string: ' + repr(node))
ValueError: malformed node or string: <_ast.Call object at 0x76a6f770>
I could not find this exact solution anywhere. It would be great if someone could help me out.

You can't use str() on numpy arrays (wrapped in tuples or otherwise) and hope to recover the data.
First of all, the ast.literal_eval() function only supports literals and literal displays, not numpy array(...) syntax.
Next, str() on a tuple produces debugging-friendly output; tuples don't implement a __str__ string conversion hook, so their repr() representation is returned instead. Numpy arrays do support str() conversion, but their output is still but a friendly-looking string that omits a lot of detail from the actual values. In your example, those ... ellipsis dots indicate that there is more data in that part of the array, but the strings do not include those values. So you are losing data if you were to try to re-create your arrays from this.
If you need to store these tuples in a file or database column, or need to transmit them over a network connection, you need to serialise the data. Proper serialisation will preserve every detail of the arrays.
For tuples with numpy arrays, you can use pickle.dumps() to produce a bytes object that can be passed back to pickles.loads() to recreate the same value.
You can also convert invidual numpy arrays to a numpy-specific binary format, and load that format again, with the numpy.save() and numpy.load() functions (which operate directly on files, but you can pass in io.BytesIO() objects).

Ruby equivalent of Node .toString('ascii')

I am struggling with converting a Node application to Ruby. I have a Buffer of integers that I need to encode as an ASCII string.
In Node this is done like this:
const a = Buffer([53, 127, 241, 120, 57, 136, 112, 210, 162, 200, 111, 132, 46, 146, 210, 62, 133, 88, 80, 97, 58, 139, 234, 252, 246, 19, 191, 84, 30, 126, 248, 76])
const b = a.toString('hex')
// b = "357ff178398870d2a2c86f842e92d23e855850613a8beafcf613bf541e7ef84c"
const c = a.toString('ascii')
// c = '5qx9\bpR"Ho\u0004.\u0012R>\u0005XPa:\u000bj|v\u0013?T\u001e~xL'
I want to get the same output in Ruby but I don't know how to convert a to c. I used b to validate that a is parsed the same in Ruby and Node and it looks like it's working.
a = [53, 127, 241, 120, 57, 136, 112, 210, 162, 200, 111, 132, 46, 146, 210, 62, 133, 88, 80, 97, 58, 139, 234, 252, 246, 19, 191, 84, 30, 126, 248, 76].pack('C*')
b = a.unpack('H*')
# ["357ff178398870d2a2c86f842e92d23e855850613a8beafcf613bf541e7ef84c"]
# c = ???
I have tried serveral things, virtually all of the unpack options, and I also tried using the encode function but I lack the understanding of what the problem is here.

Okay well I am not that familiar with Node.js but you can get fairly close with some basic understandings:
Node states:
'ascii' - For 7-bit ASCII data only. This encoding is fast and will strip the high bit if set.
Update After rereading the nod.js description I think it just means it will drop 127 and only focus on the first 7 bits so this can be simplified to:
def node_js_ascii(bytes)
bytes.map {|b| b % 128 }
.reject(&127.method(:==))
.pack('C*')
.encode(Encoding::UTF_8)
end
node_js_ascii(a)
#=> #=> "5qx9\bpR\"Ho\u0004.\u0012R>\u0005XPa:\vj|v\u0013?T\u001E~xL"
Now the only differences are that node.js uses "\u000b" to represent a vertical tab and ruby uses "\v" and that ruby uses uppercase characters for unicode rather than lowercase ("\u001E" vs "\u001e") (you could handle this if you so chose)
Please note This form of encoding is not reversible due to the fact that you have characters that are greater than 8 bits in your byte array.
TL;DR (previous explanation and solution only works up to 8 bits)
Okay so we know the max supported decimal is 127 ("1111111".to_i(2)) and that node will strip the high bit if set meaning [I am assuming] 241 (an 8 bit number will become 113 if we strip the high bit)
With that understanding we can use:
a = [53, 127, 241, 120, 57, 136, 112, 210, 162, 200, 111, 132, 46, 146, 210, 62, 133, 88, 80, 97, 58, 139, 234, 252, 246, 19, 191, 84, 30, 126, 248, 76].map do |b|
b < 128 ? b : b - 128
end.pack('C*')
#=> "5\x7Fqx9\bpR\"Ho\x04.\x12R>\x05XPa:\vj|v\x13?T\x1E~xL"
Then we can encode that as UTF-8 like so:
a.encode(Encoding::UTF_8)
#=> "5\u007Fqx9\bpR\"Ho\u0004.\u0012R>\u0005XPa:\vj|v\u0013?T\u001E~xL"
but there is still is still an issue here.
It seems Node.js also ignores the Delete (127) when it converts to 'ascii' (I mean the high bit is set but if we strip it then it is 63 ("?") which doesn't match the output) so we can fix that too
a = [53, 127, 241, 120, 57, 136, 112, 210, 162, 200, 111, 132, 46, 146, 210, 62, 133, 88, 80, 97, 58, 139, 234, 252, 246, 19, 191, 84, 30, 126, 248, 76].map do |b|
b < 127 ? b : b - 128
end.pack('C*')
#=> "5\xFFqx9\bpR\"Ho\x04.\x12R>\x05XPa:\vj|v\x13?T\x1E~xL"
a.encode(Encoding::UTF_8, undef: :replace, replace: '')
#=> "5qx9\bpR\"Ho\u0004.\u0012R>\u0005XPa:\vj|v\u0013?T\u001E~xL"
Now since 127 - 128 = -1 (negative signed bit) becomes "\xFF" an undefined character in UTF-8 so we add undef: :replace what to do when the character is undefined use replace and we add replace: '' to replace with nothing.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Unexpected result when calling toString on a buffer in Node - node.js

Use the 'latin1' or 'binary' encoding with Buffer.toString and Buffer.from. Those encodings are the same and map bytes to the unicode characters U+0000 to U+00FF.

Related

What data format is this alongside ascii and decimal?

Saving an array of uint8array in an file using node js

Converting a .mat file to cv image

conversion of string to tuple in python

Ruby equivalent of Node .toString('ascii')

Categories

Resources