Nodejs asymmetrical buffer <-> string conversion - node.js

In nodejs I had naively expected the following to always output true:
let buff = Buffer.allocUnsafe(20); // Essentially random contents
let str = buff.toString('utf8');
let decode = Buffer.from(str, 'utf8');
console.log(0 === buff.compare(decode));
Given a Buffer buff, how can I detect ahead of time whether buff will be exactly equal to Buffer.from(buff.toString('utf8'), 'utf8')?

You should be probably be fine by just testing that the input buffer contains valid UTF-8 data:
try {
new TextDecoder('utf-8', { fatal: true }).decode(buff);
console.log(true);
} catch {
console.log(false);
}
But I wouldn't swear on Node being 100% consistent in the handling of invalid UTF-8 data when converting from string to buffer. If you want to be safe, you'll have to stick to buffer comparison. You could make the process of encoding/decoding a little more efficient by using transcode, which does not require creating a temporary string.
import { transcode } from 'buffer';
let buff = Buffer.allocUnsafe(20);
let decode = transcode(buff, 'utf8', 'utf8');
console.log(0 === buff.compare(decode));
If you're interested how TextDecoder determines if a buffer represents a valid utf8 string, the rigorous definition of this procedure can be found here.

Related

How to properly implement Node.js communication over Unix Domain sockets?

I'm debugging my implementation of an IPC between multithreaded node.js instances.
As datagram sockets are not supported natively, I use the default stream protocol, with simple application-level packaging.
When two threads communicate, the server side is always receiving, the client side is always sending.
// writing to the client trasmitter
// const transmitter = net.createConnection(SOCKETFILE);
const outgoing_buffer = [];
let writeable = true;
const write = (transfer) => {
if (transfer) outgoing_buffer.push(transfer);
if (outgoing_buffer.length === 0) return;
if (!writeable) return;
const current = outgoing_buffer.shift();
writeable = false;
transmitter.write(current, "utf8", () => {
writeable = true;
write();
});
};
// const server = net.createServer();
// server.listen(SOCKETFILE);
// server.on("connection", (reciever) => { ...
// reciever.on("data", (data) => { ...
// ... the read function is called with the data
let incoming_buffer = "";
const read = (data) => {
incoming_buffer += data.toString();
while (true) {
const decoded = decode(incoming_buffer);
if (!decoded) return;
incoming_buffer = incoming_buffer.substring(decoded.length);
// ... digest decoded string
}
};
My stream is encoded in transfer packages, and decoded back, with the data JSON stringified back and forth.
Now what happens is, that from time to time, as it seems more frequently at higher CPU loads, the incoming_buffer gets some random characters, displayed as ��� when logged.
Even if this is happening only once in 10000 transfers, it is a problem. I would need a reliable way, even if the CPU load is at max, the stream should have no unexpected characters, and should not get corrupted.
What could potentially cause this?
What would be the proper way to implement this?
Okay, I found it. The Node documentation gives a hint.
readable.setEncoding(encoding)
Must be used instead of incoming_buffer += data.toString();
The readable.setEncoding() method sets the character encoding for data
read from the Readable stream.
By default, no encoding is assigned and stream data will be returned
as Buffer objects. Setting an encoding causes the stream data to be
returned as strings of the specified encoding rather than as Buffer
objects. For instance, calling readable.setEncoding('utf8') will cause
the output data to be interpreted as UTF-8 data, and passed as
strings. Calling readable.setEncoding('hex') will cause the data to be
encoded in hexadecimal string format.
The Readable stream will properly handle multi-byte characters
delivered through the stream that would otherwise become improperly
decoded if simply pulled from the stream as Buffer objects.
So it was rather depending on the number of multibyte characters in the stress test, then on CPU load.

IORedis: how to publish ArrayBuffer

I'm trying to publish an ArrayBuffer to a IORedis stream.
I do so as follow:
const ab = new ArrayBuffer(1); // ArrayBuffer of length = 1 byte
const dv = new DataView(ab);
dv.setInt8(0, 7); // Write the number 7 in the buffer
const buffer = Buffer.from(ab); // Convert to Buffer since that's what `publish` expects
redisPublisher.publish('buffer-test', buffer);
It's a toy example, in practice I'll want to encode complex stuff in the ArrayBuffer, not just a number. Anyway, then I try to read with
redisSubscriber.on('message', async (channel, data) => {
logger.info(`Redis message: channel: ${channel}, data: ${data}, ${typeof data}`);
// ... do something with it
});
The problem is that data is empty, and its type is considered as string. As per the documentation I tried redisSubscriber.on('messageBuffer', ... instead, but it behaves exactly the same, so much so that I'm failing to understand the difference between the two.
Also confusing is that if I encode a Buffer, e.g.
const buffer = Buffer.from("I'm a string!", 'utf-8');
redisPublisher.publish('buffer-test', buffer);
Upon reception, data will again be a string, decoded from the Buffer, which in that toy case is ok but generally is not for me. I'd like to send an Buffer in, containing more complex data that just a string (an ArrayBuffer in my case), and get a Buffer out, that I could properly parse based on my needs and not have automatically read as a string.
Any help is welcome!

Reading data a block at a time, synchronously

What is the nodejs (typescript) equivalent of the following Python snippet? I've put an attempt at corresponding nodejs below the Python.
Note that I want to read a chunk at a time (later that is, in this example I'm just reading the first kilobyte), synchronously.
Also, I do not want to read the entire file into virtual memory at once; some of my input files will (eventually) be too big for that.
The nodejs snippet always returns null. I want it to return a string or buffer or something along those lines. If the file is >= 1024 bytes long, I want a 1024 character long return, otherwise I want the entire file.
I googled about this for an hour or two, but all I found was things synchronously reading an entire file at a time, or reading pieces at a time asynchronously.
Thanks!
Here's the Python:
def readPrefix(filename: str) -> str:
with open(filename, 'rb') as infile:
data = infile.read(1024)
return data
Here's the nodejs attempt:
const readPrefix = (filename: string): string => {
const readStream = fs.createReadStream(filename, { highWaterMark: 1024 });
const data = readStream.read(1024);
readStream.close();
return data;
};
To read synchronously, you would use fs.openSync(), fs.readSync() and fs.closeSync().
Here's some regular Javascript code (hopefully you can translate it to TypeScript) that synchronously reads a certain number of bytes from a file and returns a buffer object containing those bytes (or throws an exception in case of error):
const fs = require('fs');
function readBytesSync(filePath, filePosition, numBytesToRead) {
const buf = Buffer.alloc(numBytesToRead, 0);
let fd;
try {
fd = fs.openSync(filePath, "r");
fs.readSync(fd, buf, 0, numBytesToRead, filePosition);
} finally {
if (fd) {
fs.closeSync(fd);
}
}
return buf;
}
For your application, you can just pass 1024 as the bytes to read and if there are less than that in the file, it will just read up until the end of the file. The returns buffer object will contain the bytes read which you can access as binary or convert to a string.
For the benefit of others reading this, I mentioned in earlier comments that synchronous I/O should never be used in a server environment (servers should always use asynchronous I/O except at startup time). Synchronous I/O can be used for stand-alone scripts that only do one thing (like build scripts, as an example) and don't need to be responsive to multiple incoming requests.
Do I need to loop on readSync() in case of EINTR or something?
Not that I'm aware of.

Convert a buffer to string and then convert string back to buffer in javascript

I am using ZLIB in NODEJS to compress a string. On compressing the string I get a BUFFER. I want to send that buffer as a PUT request, but the PUT request rejects the BUFFER as it needs only STRING. I am not able to convert BUFFER to STRING and then on the receiving end I cannot decompress that string, so I can get the original data. I am not sure how I can convert the buffer to string and then convert that string to buffer and then decompress the buffer to get the original string.
let zlib = require('zlib');
// compressing 'str' and getting the result converted to string
let compressedString = zlib.deflateSync(JSON.stringify(str)).toString();
//decompressing the compressedString
let decompressedString = zlib.inflateSync(compressedString);
The last line is causing an issue saying the input is invalid.
I tried to converted the the 'compressedString' to a buffer and then decompress it then also it does not help.
//converting string to buffer
let bufferedString = Buffer.from(compressedString, 'utf8');
//decompressing the buffer
//decompressedBufferString = zlib.inflateSync(bufferedString);
This code also gives the exception as the input is not valid.
I would read the documentation for zlib but the usage is pretty clear.
var Buffer = require('buffer').Buffer;
var zlib = require('zlib');
// create the buffer first and pass the result to the stream
let input = new Buffer(str);
//start doing the compression by passing the stream to zlib
let compressedString = zlib.deflateSync(input);
// To deflate you will have to do the same thing but passing the
//compressed object to inflateSync() and chain the toString()
let decompressedString = zlib.deflateSync(compressedString).toString();
There are a number of ways to handle streams but this is what you are trying to achieve with the code provided.
Try sending the buffer as a latin1 string not an utf8 string. For instance if your buffer is in the mybuf variable:
mybuf.toString('latin1');
And send mybuf to your API. Then in your frontend code you can do something like this, supposing your response is in the response variable:
const byteNumbers = new Uint8Array(response.length);
for (let i = 0; i < response.length; i++) {
byteNumbers[i] = response[i].charCodeAt(0);
}
const blob: Blob = new Blob([byteNumbers], {type: 'application/gzip'});
In my experience the transferred size will be just a little higher this way compared to sending the buffer, but at least unlike utf8 you can get your original binary data back. I still don't know how to do it with utf8 encoding, according to this SO answer it doesn't seem possible.

Node same buffer on write stream

I have the following code
const buffer = new Buffer(buffer_size);
const wstream = fs.createWriteStream('testStream.ogg');
do{
read = obj1.partialDecrypt(buffer);
if(read>=0){
if(read<buffer_size){
wstream.write(buffer.slice(0,buffer_size));
}
else{
wstream.write(buffer);
}
}
total+=read;
}while(read>0);
wstream.end();
In which partialDecrypt fill the buffer with binary data and return the size filled.
If I fill the buffer more than one time the data written to the stream does not match the expected. Should I do something to reuse the same buffer on the stream?
Turns out reusing buffer is not a good idea. Like on this thread, creating a new buffer each pass was the way to go.

Resources