Nodejs Readable streams, parsing binary data, preserving order - node.js

Using latest nodejs...
Got a binary coming from mongodb (field within a document). Means I will be processing multiple binary payloads concurrently. Data is a media file (h264) made up of slices (nal units). Each slice is delimited.
Using a readable stream from fs if I act on "data" events is the order of the data chunks preserved? Can I be guaranteed to process the "data" in order? (See the origin in the path part of the "this" scope in each call)

The order that data is written to a stream is guaranteed to be the same order that it is read with. When writing to a stream, the data is either written or queued, order does not change. This is from the Node.js source:
function writeOrBuffer(stream, state, chunk, encoding, cb) {
chunk = decodeChunk(state, chunk, encoding);
if (util.isBuffer(chunk))
encoding = 'buffer';
var len = state.objectMode ? 1 : chunk.length;
state.length += len;
var ret = state.length < state.highWaterMark;
state.needDrain = !ret;
if (state.writing || state.corked)
state.buffer.push(new WriteReq(chunk, encoding, cb));
else
doWrite(stream, state, false, len, chunk, encoding, cb);
return ret;
}
This is also how data events are fired:
// if we want the data now, just emit it.
if (state.flowing && state.length === 0 && !state.sync) {
stream.emit('data', chunk);
stream.read(0);
}
The data event won't fire for a chunk unless there is no queued data, which means you will get the data in the order that it was passed in as.

Related

How to properly implement Node.js communication over Unix Domain sockets?

I'm debugging my implementation of an IPC between multithreaded node.js instances.
As datagram sockets are not supported natively, I use the default stream protocol, with simple application-level packaging.
When two threads communicate, the server side is always receiving, the client side is always sending.
// writing to the client trasmitter
// const transmitter = net.createConnection(SOCKETFILE);
const outgoing_buffer = [];
let writeable = true;
const write = (transfer) => {
if (transfer) outgoing_buffer.push(transfer);
if (outgoing_buffer.length === 0) return;
if (!writeable) return;
const current = outgoing_buffer.shift();
writeable = false;
transmitter.write(current, "utf8", () => {
writeable = true;
write();
});
};
// const server = net.createServer();
// server.listen(SOCKETFILE);
// server.on("connection", (reciever) => { ...
// reciever.on("data", (data) => { ...
// ... the read function is called with the data
let incoming_buffer = "";
const read = (data) => {
incoming_buffer += data.toString();
while (true) {
const decoded = decode(incoming_buffer);
if (!decoded) return;
incoming_buffer = incoming_buffer.substring(decoded.length);
// ... digest decoded string
}
};
My stream is encoded in transfer packages, and decoded back, with the data JSON stringified back and forth.
Now what happens is, that from time to time, as it seems more frequently at higher CPU loads, the incoming_buffer gets some random characters, displayed as ��� when logged.
Even if this is happening only once in 10000 transfers, it is a problem. I would need a reliable way, even if the CPU load is at max, the stream should have no unexpected characters, and should not get corrupted.
What could potentially cause this?
What would be the proper way to implement this?
Okay, I found it. The Node documentation gives a hint.
readable.setEncoding(encoding)
Must be used instead of incoming_buffer += data.toString();
The readable.setEncoding() method sets the character encoding for data
read from the Readable stream.
By default, no encoding is assigned and stream data will be returned
as Buffer objects. Setting an encoding causes the stream data to be
returned as strings of the specified encoding rather than as Buffer
objects. For instance, calling readable.setEncoding('utf8') will cause
the output data to be interpreted as UTF-8 data, and passed as
strings. Calling readable.setEncoding('hex') will cause the data to be
encoded in hexadecimal string format.
The Readable stream will properly handle multi-byte characters
delivered through the stream that would otherwise become improperly
decoded if simply pulled from the stream as Buffer objects.
So it was rather depending on the number of multibyte characters in the stress test, then on CPU load.

Spawned ffmpeg process in nodejs Transform stream with flow control doesn't process input stream

I have implemented a node.js Transform stream class that spawns ffmpeg and streams out the transformed stream at a controlled realtime rate. Below is my _transform() method.
this.rcvd += chunk.length
console.log('received bytes %d', this.rcvd)
const ready = this.ffmpeg.stdin.write(chunk, encoding, err => err
? cb(err)
: ready
? cb
: this.ffmpeg.stdin.once('drain', cb))
I want to write to ffmpeg's stdin stream till it returns false, at which point I wait for the drain event to write more data.
Concurrently, I have a timer that fires every 40 milliseconds that reads ffmpeg's stdout and pushes it to the Transform stream's output (the readable side). Code not complete, but the folllowing describes it well.
const now = Date.now()
const bytesToTransmit = (now - this.lastTx) * 32
const buf = this.ffmpeg.stdout.read()
if (buf == null) return
if (buf.length <= bytesToTransmit) {
this.push(buf)
this.lastTx += buf.length * 32
return
}
this.push(buf.slice(0, bytesToTransmit))
this.lastTx = now
// todo(handle pending buffer)
this.pending = buf.slice(bytesToTransmit)
The issue I am facing is that the first write (of 65k bytes) returns false, and after that I never receive the drain event. FFMpeg doesn't start processing the data until certain amount of data (256k bytes in my case) has been written, and once I start waiting for the drain event, I never recover control.
I've tried a few ffmpeg options like nobuffer but to no avail. What am I doing wrong? Any ideas would be super helpful.
Thanks!

Why while loop is needed for reading a non-flowing mode stream in Node.js?

In the node.js documentation, I came across the following code
const readable = getReadableStreamSomehow();
// 'readable' may be triggered multiple times as data is buffered in
readable.on('readable', () => {
let chunk;
console.log('Stream is readable (new data received in buffer)');
// Use a loop to make sure we read all currently available data
while (null !== (chunk = readable.read())) {
console.log(`Read ${chunk.length} bytes of data...`);
}
});
// 'end' will be triggered once when there is no more data available
readable.on('end', () => {
console.log('Reached end of stream.');
});
Here is the comment from the node.js documentation concerning the usage of the while loop, saying it's needed to make sure all data is read
// Use a loop to make sure we read all currently available data
while (null !== (chunk = readable.read())) {
I couldn't understand why it is needed and tried to replace while with just if statement, and the process terminated after the very first read. Why?
From the node.js documentation
The readable.read() method should only be called on Readable streams operating in paused mode. In flowing mode, readable.read() is called automatically until the internal buffer is fully drained.
Be careful that this method is only meant for stream that has been paused.
And even further, if you understand what a stream is, you'll understand that you need to process chunks of data.
Each call to readable.read() returns a chunk of data, or null. The chunks are not concatenated. A while loop is necessary to consume all data currently in the buffer.
So i hope you understand that if you are not looping through your readable stream and only executing 1 read, you won't get your full data.
Ref: https://nodejs.org/api/stream.html

Performing piped operations on individual chunks (node-wav)

I'm new to node and I'm working on an audio stream server. I'm trying to process / transform the chunks of a stream as they come out of each pipe.
So, file = fs.createReadStream(path) (filestream) is piped into file.pipe(wavy) (remove headers and output raw PCM) gets piped in to .pipe(waver) (add proper wav header to chunk) which is piped into .pipe(spark) (ouput chunk to client).
The idea is that each filestream chunk has headers removed if any (only applies to first chunk), then using the node-wav Writer that chunk is endowed with headers and then sent to the client. As I'm sure you guessed this doesn't work.
The pipe operations into node-wav are acting on the entire filestream, not the individual chunks. To confirm I've checked the output client side and it is effectively dropping the headers and re-adding them to the entire data stream.
From what I've read of the Node Stream docs it seems like what I'm trying to do should be possible, just not the way I'm doing it. I just can't pin down how to accomplish this.
Is it possible, and if so what am I missing?
Complete function:
processAudio = (path, spark) ->
wavy = new wav.Reader()
waver = new wav.Writer()
file = fs.createReadStream(path)
file.pipe(wavy).pipe(waver).pipe(spark)
I don't really know about wavs and headers but if you're "trying to process / transform the chunks of a stream as they come out of each pipe." you can use the Transform stream.
It permits you to sit between 2 streams and modify the bytes between them:
var util = require('util');
var Transform = require('stream').Transform;
util.inherits(Test, Transform);
function Test(options) {
Transform.call(this, options);
}
Test.prototype._transform = function(chunk, encoding, cb) {
// do something with chunk, then pass a modified chunk (or not)
// to the downstream
cb(null, chunk);
};
To observe the stream and potentially modify it, pipe like:
file.pipe(wavy).pipe(new Test()).pipe(waver).pipe(spark)

NodeJS: How to write a file parser using readStream?

I have a file in a binary format:
The format is as follows:
[4 - header bytes] [8 bytes - int64 - how many bytes to read following] [variable num of bytes (size of the int64) - read the actual information]
And then it repeats, so I must first read the first 12 bytes to determine how many more bytes I need to read.
I have tried:
var readStream = fs.createReadStream('/path/to/file.bin');
readStream.on('data', function(chunk) { ... })
The problem I have is that chunk always comes back in chunks of 65536 bytes at a time whereas I need to be more specific on the number of bytes that I am reading.
I have always tried readStream.on('readable', function() { readStream.read(4) })
But it is also not very flexible, because it seems to turn asynchronous code into synchronous code because, I have to put the 'reading' in a while loop
Or maybe readStream is not appropriate in this case and I should use this instead? fs.read(fd, buffer, offset, length, position, callback)
Here's what I'd recommend as an abstract handler of a readStream to process abstract data like you're describing:
var pending = new Buffer(9999999);
var cursor = 0;
stream.on('data', function(d) {
d.copy(pending, cursor);
cursor += d.length;
var test = attemptToParse(pending.slice(0, cursor));
while (test !== false) {
// test is a valid blob of data
processTheThing(test);
var rawSize = test.raw.length; // How many bytes of data did the blob actually take up?
pending.copy(pending.copy, 0, rawSize, cursor); // Copy the data after the valid blob to the beginning of the pending buffer
cursor -= rawSize;
test = attemptToParse(pending.slice(0, cursor)); // Is there more than one valid blob of data in this chunk? Keep processing if so
}
});
For your use-case, ensure the initialized size of the pending Buffer is large enough to hold the largest possible valid blob of data you'll be parsing (you mention an int64; that max size plus the header size) plus one extra 65536 bytes in case the blob boundary happens just on the edge of a stream chunk.
My method requires a attemptToParse() method that takes a buffer and tries to parse the data out of it. It should return false if the length of the buffer is too short (data hasn't come in enough yet). If it is a valid object, it should return some parsed object that has a way to show the raw bytes it took up (.raw property in my example). Then you do any processing you need to do with the blob (processTheThing()), trim out that valid blob of data, shift the pending Buffer to just be the remainder and keep going. That way, you don't have a constantly growing pending buffer, or some array of "finished" blobs. Maybe process on the receiving end of processTheThing() is keeping an array of the blobs in memory, maybe it's writing them to a database, but in this example, that's abstracted away so this code just deals with how to handle the stream data.
Add the chunk to a Buffer, and then parse the data from there. Being aware not to go beyond the end of the buffer (if your data is large). I'm using my tablet right now so can't add any example source code. Maybe somebody else can?
Ok, mini source, very skeletal.
var chunks = [];
var bytesRead= 0;
stream.on('data', function(chunk) {
chunks.push(chunk);
bytesRead += chunk.length;
// look at bytesRead...
var buffer = Buffer.concat(chunks);
chunks = [buffer]; // trick for next event
// --> or, if memory is an issue, remove completed data from the beginning of chunks
// work with the buffer here...
}

Resources