node - send large JSON over net socket - node.js

The problem is that sending large serialized JSON (over 16,000 characters) over a net socket gets split into chunks. Each chunk fires the data event on the receiving end. So simply running JSON.parse() on the incoming data may fail with SyntaxError: Unexpected end of input.
The work around I've managed to come up with so far is to append a null character ('\u0000') to the end of the serialized JSON, and check for that on the receiving end. Here is an example:
var partialData = '';
client.on( 'data', function( data ) {
data = data.toString();
if ( data.charCodeAt( data.length - 1 ) !== 0 ) {
partialData += data;
// if data is incomplete then no need to proceed
return;
} else {
// append all but the null character to the existing partial data
partialData += data.substr( 0, data.length - 1 );
}
// pass parsed data to some function for processing
workWithData( JSON.parse( partialData ));
// reset partialData for next data transfer
partialData = '';
});
One of the failures of this model is if the receiver is connected to multiple sockets, and each socket is sending large JSON files.
The reason I'm doing this is because I need to pass data between two processes running on the same box, and I prefer not to use a port. Hence using a net socket. So there would be two questions: First, is there a better way to quickly pass large JSON data between two Node.js processes? Second, if this is the best way then how can I better handle the case where the serialized JSON is being split into chunks when sent?

You can use try...catch every time to see if it is a valid json. Not very good performance though.
You can calculate size of your json on sending side and send it before JSON.
You can append a boundary string that's unlikely be in JSON. Your \u0000 - yes, it seems to be a legit way. But most popular choice is newline.
You can use external libraries like dnode which should already do something I mentioned before. I'd recommend trying that. Really.
One of the failures of this model is if the receiver is connected to multiple sockets, and each socket is sending large JSON files.
Use different buffers for every socket. No problem here.

It is possible to identify each socket individually and build buffers for each one. I add an id to each socket when I receive a connection and then when I receive data I add that data to a buffer.
net.createServer( function(socket) {
// There are many ways to assign an id, this is just an example.
socket.id = Math.random() * 1000;
socket.on('data', function(data) {
// 'this' refers to the socket calling this callback.
buffers[this.id] += data;
});
});
Each time you can check if you have received that "key" delimiter that will tell you that a buffer is ready to be used.

Related

How to properly implement Node.js communication over Unix Domain sockets?

I'm debugging my implementation of an IPC between multithreaded node.js instances.
As datagram sockets are not supported natively, I use the default stream protocol, with simple application-level packaging.
When two threads communicate, the server side is always receiving, the client side is always sending.
// writing to the client trasmitter
// const transmitter = net.createConnection(SOCKETFILE);
const outgoing_buffer = [];
let writeable = true;
const write = (transfer) => {
if (transfer) outgoing_buffer.push(transfer);
if (outgoing_buffer.length === 0) return;
if (!writeable) return;
const current = outgoing_buffer.shift();
writeable = false;
transmitter.write(current, "utf8", () => {
writeable = true;
write();
});
};
// const server = net.createServer();
// server.listen(SOCKETFILE);
// server.on("connection", (reciever) => { ...
// reciever.on("data", (data) => { ...
// ... the read function is called with the data
let incoming_buffer = "";
const read = (data) => {
incoming_buffer += data.toString();
while (true) {
const decoded = decode(incoming_buffer);
if (!decoded) return;
incoming_buffer = incoming_buffer.substring(decoded.length);
// ... digest decoded string
}
};
My stream is encoded in transfer packages, and decoded back, with the data JSON stringified back and forth.
Now what happens is, that from time to time, as it seems more frequently at higher CPU loads, the incoming_buffer gets some random characters, displayed as ��� when logged.
Even if this is happening only once in 10000 transfers, it is a problem. I would need a reliable way, even if the CPU load is at max, the stream should have no unexpected characters, and should not get corrupted.
What could potentially cause this?
What would be the proper way to implement this?
Okay, I found it. The Node documentation gives a hint.
readable.setEncoding(encoding)
Must be used instead of incoming_buffer += data.toString();
The readable.setEncoding() method sets the character encoding for data
read from the Readable stream.
By default, no encoding is assigned and stream data will be returned
as Buffer objects. Setting an encoding causes the stream data to be
returned as strings of the specified encoding rather than as Buffer
objects. For instance, calling readable.setEncoding('utf8') will cause
the output data to be interpreted as UTF-8 data, and passed as
strings. Calling readable.setEncoding('hex') will cause the data to be
encoded in hexadecimal string format.
The Readable stream will properly handle multi-byte characters
delivered through the stream that would otherwise become improperly
decoded if simply pulled from the stream as Buffer objects.
So it was rather depending on the number of multibyte characters in the stress test, then on CPU load.

Altering an Array from within Node JS net Socket

I am trying to update an array from within a server program, in order to record client data. However, whilst the positions array is updating okay, the board array (which obtains data from the positions array) never recognises these changes. Thus the output (socket.write) never changes. I feel I must be missing something obvious. This is a basic implementation of what I'm trying to do. Thank you in advance.
const net = require('net');
const position = [" ", " ", " "];
const board = [position[0], "-", position[1], "-", position[2]];
const server = net.createServer(socket => {
socket.on('data', data => {
socket.write('Enter a value between 1 & 3 (inclusive)');
const value = data.toString('utf-8');
position[value-1] = 'X';
board.forEach(item => {
socket.write(item);
})
})
socket.on('end', () => {
console.log("Session ended");
})
})
server.listen(5000);```
The two variables - position and board - are entirely independent (board does not hold references to position), so your modification to position is not carried over to board. This is because, in JavaScript, strings are pass-by-value, not pass-by-reference - any assignment of a string is a copy operation.
If you need to derive a value from another for sending, it's best to write a function that transforms your input to a desired output format, like so:
toBoard(position) {
return [position[0], "-", position[1], "-", position[2]];
}
and then
toBoard(position).forEach(item => {
socket.write(item);
})
Note, however, that your socket-related code has a serious bug: it treats data events as messages that come in individually. This is called message or datagram semantics, where a peer sends 1 message and the other peer receives the same message in its entirety. This is different from stream semantics, where a sequence of bytes is sent, and the same bytes come out on the other side, in the same order, but not necessarily sliced the same way.
With TCP, if the client does:
socket.write('1');
socket.write('2');
socket.write('3');
The server may receive any of:
one data event with '123'
two data events with '1' and '23'
two data events with '12' and '3'
three separate data events with '1', '2' and '3'
You should look into protocols that preserve message boundaries, instead of using raw TCP. Here are some protocols you could use instead:
UDP
WebSocket
ZeroMQ
If you decide to implement this on raw TCP yourself (which is error-prone), you'll need some logic to receive the stream progressively from the network, buffer and split the chunks accordingly into logical messages that you can process. A simple example of a built-in tool which does this is readline - a module that converts a stream-oriented input (such as a TCP socket, or a process' standard input) into discrete line events.

Incorect header check error from Zlib.gunzip() in Node.JS app using the HTTPS module

I have a Node.JS app and I am using the https module to make GET requests to a web server. The headers in the response coming back have the content-type set to gzip. I have directly inspected the data and it does appear to be compressed data and definitely not plain text.
I accumulate the chunks as they come in. I then try decompressing the accumulated data using zlib. So far everything I tried results in an "Incorrect header check" error when execute the decompression call. The code below shows the use of a Buffer object with type set to binary. I previously tried passing the accumulated data directly to the decompression call but that failed too.
Why doesn't this work?
// Make the request to the designated external server.
const httpsRequest = https.request(postOptions,
function(extRequest)
{
console.log('(httpsRequest) In request handler.');
// Process the response from the external server.
let dataBody = "";
// The data may come to us in pieces. The 'on' event handler will accumulate them for us.
let iNumSlices = 0;
extRequest.on('data', function(dataSlice) {
iNumSlices++;
console.log('(httpsRequest:on) Received slice # ' + iNumSlices +'.');
dataBody += dataSlice;
});
// When we have received all the data from the external server, finish the request.
extRequest.on('end', function() {
// SUCCESS: Return the result to AWS.
console.log('(httpsRequest:end) Success. Data body length: ' + dataBody.length +'.');
console.log('(httpsRequest:end) Content: ');
let buffer = Buffer.from(dataBody, "binary");
// Check for GZip compressed data.
if (extRequest.headers['content-encoding'] == 'gzip') {
// Decompress the data.
zlib.gunzip(buffer, (err, buffer) => {
if (err) {
// Reject the promise with the error.
reject(err);
return;
} else {
console.log(errPrefix + buffer.toString('utf8'));
}
});
} else {
console.log(errPrefix + dataBody);
let parsedDataBodyObj = JSON.parse(dataBody);
resolve(parsedDataBodyObj);
}
});
});
You may have it in you actual code - but the code snippet doesn't include a call to end(), which is mandatory.
It may be related to the way you accumulate the chunks with dataBody += dataSlice.
Since the data is compressed, this (probably) means that the type of a chunk is already a Buffer, and using += to concatenate it into a string seems to mess it up, even though you later call Buffer.from.
Try replacing it with making dataBody an empty array, then push chunks into it, then finally call Buffer.concat(dataBody).
Another options is that https.request already decompresses the data under the hood, so that once you accumulate the chunks into a buffer (as detailed in the previous section), all you're left with is to call buffer.toString(). I myself experienced it in this other answer and it seems to be related to Node.js version.
I'll end up this answer with a live demo of a similar working code which may come handy for you (it queries StackExchange API, gets a gzip compressed chunks, and then decompress it):
It includes a code that works on 14.16.0 (current StackBlitz version) - which, as I described, already decompresses the data under the hood - but not on Node.js 15.13.0,
It includes a commented-out code that works for Node.js 15.13.0 the latter but not for 14.16.0.

Buffering a Float32Array to a client

This should be obvious, but for some reason I am not getting any result. I have already spent way too much time just trying different ways to get this working without results.
TLDR: A shorter way to explain this question could be: I know how to stream a sound from a file. How to stream a buffer containing sound that was synthesized on the server instead?
This works:
client:
var stream = ss.createStream();
ss(socket).emit('get-file', stream, data.bufferSource);
var parts = [];
stream.on('data', function(chunk){
parts.push(chunk);
});
stream.on('end', function () {
var blob=new Blob(parts,{type:"audio"});
if(cb){
cb(blob);
}
});
server (in the 'socket-connected' callback of socket.io)
var ss = require('socket.io-stream');
// ....
ss(socket).on('get-file', (stream:any, filename:any)=>{
console.log("get-file",filename);
fs.createReadStream(filename).pipe(stream);
});
Now, the problem:
I want to alter this audio buffer and send the modified audio instead of just the file. I converted the ReadStream into an Float32Array, and did some processes sample by sample. Now I want to send that modified Float32Array to the client.
In my view, I just need to replaces the fs.createReadStream(filename) with(new Readable()).push(modifiedSoundBuffer). However, I get a TypeError: Invalid non-string/buffer chunk. Interestingly, if I convert this modifiedSodunBuffer into a Uint8Array, it doesn't yell at me, and the client gets a large array, which looks good; only that all the array values are 0. I guess that it's flooring all the values?
ss(socket).on('get-buffer', (stream:any, filename:any)=>{
let readable=(new Readable()).push(modifiedFloat32Array);
readable.pipe(stream);
});
I am trying to use streams for two reasons: sound buffers are large, and to allow concurrent processing in the future
if you will convert object Float32Array to buffer before sending like this Readable()).push(Buffer.from(modifiedSoundBuffer)) ?

What's the node.js paradigm for socket stream conversation?

I'm trying to implement a socket protocol and it is unclear to me how to proceed. I have the socket as a Stream object, and I am able to write() data to it to send on the socket, and I know that the "readable" or "data" events can be used to receive data. But this does not work well when the protocol involves a conversation in which one host is supposed to send a piece of data, wait for a response, and then send data again after the response.
In a block paradigm it would look like this:
send some data
wait for specific data reply
massage data and send it back
send additional data
As far as I can tell, node's Stream object does not have a read function that will asynchronously return with the number of bytes requested. Otherwise, each wait could just put the remaining functionality in its own callback.
What is the node.js paradigm for this type of communication?
Technically there is a Readable.read() but its not recommended (maybe you can't be sure of the size or it blocks, not sure.) You can keep track of state and on each data event add to a Buffer that you keep processing incrementally. You can use readUInt32LE etc. on Buffer to read specific pieces of binary data if you need to do that (or you can convert to string if its textual data). https://github.com/runvnc/metastream/blob/master/index.js
If you want to write it in your 'block paradigm', you could basically make some things a promise or async function and then
let specialReplyRes = null;
waitForSpecialReply = f => new Promise( res => specialReplyRes = res);
stream.on('data', (buff) => {
if (buff.toString().indexOf('special')>=0) specialReplyRes(buff.toString());
});
// ...
async function proto() {
stream.write(data);
let reply = await waitForSpecialReply();
const message = massage(reply);
stream.write(message);
}
Where your waitForSpecialReply promise is stored and resolved after a certain message is received through your parsing.

Resources