How do I test if a stream has ended? - node.js

I have a stream that may or may not have already ended. If it has ended, I don't want to sit there forever waiting for the end event. How do I check?

As of node 12.9.0, readable streams have a readableEnded property you can check – see readable.readableEnded in the Stream documentation.
(Writable streams have a corresponding property)

If you absolutely have to, you can check stream._readableState.ended, at least in node 0.10 and 0.12. This isn't public API though so be careful. Also apparently all streams aren't guarantee to have this property too:
Worth reading on this:
https://github.com/hapijs/hapi/issues/2368
https://github.com/nodejs/node/issues/445

You probably shouldn't check, and instead attach an end event listener. When the stream ends you will get an end event.
If a [readable]stream is implemented correctly, you will get an end event when there is nothing left to read. For open-ended streams that don't end, you could implement an activity timer to see how long it's been since the stream was last readable, and decide after a certain amount of time to remove your listeners.
(function() {
var timeout = null;
var WAIT_TIME = 10 * 1000; // 10 Sec
function stopListening( ) {
stream.off('data', listener);
}
function listener( data ) {
clearTimeout(timeout);
timeout = setTimeout(stopListening, WAIT_TIME);
// DO STUFF
}
timeout = setTimeout(stopListening, WAIT_TIME);
stream.on('data', listener);
})();

Related

How can I implement a nodeJS worker that streams data from mongo to elasticsearch?

I'm building a CDC-based application that uses Mongo Change Streams to listen for change events and index the changes in elasticsearch in near real-time.
So far, I've implemented a worker that calls a function to capture events, transform them and index them in elasticsearch with no issues when implementing the stream for 1 mongo collection:
function syncChangeEvents() {
const stream = ModelA.watch()
while (!stream.isClosed()) {
if (await stream.hasNext()) {
const event = stream.next()
// transform event
// index to elasticsearch
}
}
}
I've implemented it using an infinite loop (probably a bad approach) but I'm not sure what alternatives there are when I have to keep the change stream alive forever.
The problem comes when I have to implement a change stream for another model. Since the first function has a while loop that is blocking, the worker can't call the second function to start the second change stream.
I'm wondering what the best way would be to spin up a worker that can trigger x no. of change streams without impacting the performance of each change stream. Would worker threads be the right way to go?
There are three primary ways to work with Change Streams in Node.js.
You can monitor the Change Stream using EventEmitter's on() function.
// See https://mongodb.github.io/node-mongodb-native/3.3/api/Collection.html#watch for the watch() docs
const changeStream = collection.watch(pipeline);
// ChangeStream inherits from the Node Built-in Class EventEmitter (https://nodejs.org/dist/latest-v12.x/docs/api/events.html#events_class_eventemitter).
// We can use EventEmitter's on() to add a listener function that will be called whenever a change occurs in the change stream.
// See https://nodejs.org/dist/latest-v12.x/docs/api/events.html#events_emitter_on_eventname_listener for the on() docs.
changeStream.on('change', (next) => {
console.log(next);
});
// Wait the given amount of time and then close the change stream
await closeChangeStream(timeInMs, changeStream);
You can monitor the Change Stream using hasNext().
// See https://mongodb.github.io/node-mongodb-native/3.3/api/Collection.html#watch for the watch() docs
const changeStream = collection.watch(pipeline);
// Set a timer that will close the change stream after the given amount of time
// Function execution will continue because we are not using "await" here
closeChangeStream(timeInMs, changeStream);
// We can use ChangeStream's hasNext() function to wait for a new change in the change stream.
// If the change stream is closed, hasNext() will return false so the while loop will exit.
// See https://mongodb.github.io/node-mongodb-native/3.3/api/ChangeStream.html for the ChangeStream docs.
while (await changeStream.hasNext()) {
console.log(await changeStream.next());
}
You can monitor the Change Stream using the Stream API
// See https://mongodb.github.io/node-mongodb-native/3.3/api/Collection.html#watch for the watch() docs
const changeStream = collection.watch(pipeline);
// See https://mongodb.github.io/node-mongodb-native/3.3/api/ChangeStream.html#pipe for the pipe() docs
changeStream.pipe(
new stream.Writable({
objectMode: true,
write: function (doc, _, cb) {
console.log(doc);
cb();
}
})
);
// Wait the given amount of time and then close the change stream
await closeChangeStream(timeInMs, changeStream);
If your MongoDB database is hosted on Atlas (https://cloud.mongodb.com), the simplest thing to do is create a Trigger. Atlas handles programming the Change Stream code for you, so you only have to write the code that will transform the event and index them in Elasticsearch.
More information on working with Change Streams and Triggers is available in my blog post. A complete code example for all of the snippets above is available on GitHub.

socket.io how to send multiple messages sequentially?

I'm using socket.io like this
Client:
socket.on('response', function(i){
console.log(i);
});
socket.emit('request', whateverdata);
Server:
socket.on('request', function(whateverdata){
for (i=0; i<10000; i++){
console.log(i);
socket.emit('response', i);
}
console.log("done!");
});
I need output like this when putting the two terminals side by side:
Server Client
0 0
1 1
. (etc) .
. .
9998 9998
9999 9999
done!
But instead I am getting this:
Server Client
0
1
. (etc)
.
9998
9999
done!
0
1
.
. (etc)
9998
9999
Why?
Shouldn't Socket.IO / Node emit the message immediately, not wait for the loop to complete before emitting any of them?
Notes:
The for loop is very long and computationally slow.
This question is referring to the socket.io library, not websockets in general.
Due to latency, waiting for confirmation from the client before sending each response is not possible
The order that the messages are received is not important, only that they are received as quickly as possible
The server emits them all in a loop and it takes a small bit of time for them to get to the client and get processed by the client in another process. This should not be surprising.
It is also possible that the single-threaded nature of Javascript in node.js prevents the emits from actually getting sent until your Javascript loop finishes. That would take detailed examination of socket.io code to know for sure if that is an issue. As I said before if you want to 1,1 then 2,2 then 3,3 instead of 1,2,3 sent, then 1,2,3 received you have to write code to force that.
If you want the client to receive the first before the server sends the 2nd, then you have to make the client send a response to the first and have the server not send the 2nd until it receives the response from the first. This is all async networking. You don't control the order of events in different processes unless you write specific code to force a particular sequence.
Also, how do you have client and server in the same console anyway? Unless you are writing out precise timestamps, you wouldn't be able to tell exactly what event came before the other in two separate processes.
One thing you could try is to send 10, then do a setTimeout(fn, 1) to send the next 10 and so on. That would give JS a chance to breathe and perhaps process some other events that are waiting for you to finish to allow the packets to get sent.
There's another networking issue too. By default TCP tries to batch up your sends (at the lowest TCP level). Each time you send, it sets a short timer and doesn't actually send until that timer fires. If more data arrives before the timer fires, it just adds that data to the "pending" packet and sets the timer again. This is referred to as the Nagle's algorithm. You can disable this "feature" on a per-socket basis with socket.setNoDelay(). You have to call that on the actual TCP socket.
I am seeing some discussion that Nagle's algorithm may already be turned off for socket.io (by default). Not sure yet.
In stepping through the process of socket.io's .emit(), there are some cases where the socket is marked as not yet writable. In those cases, the packets are added to a buffer and will be processed "later" on some future tick of the event loop. I cannot see exactly what puts the socket temporarily in this state, but I've definitely seen it happen in the debugger. When it's that way, a tight loop of .emit() will just buffer and won't send until you let other events in the event loop process. This is why doing setTimeout(fn, 0) every so often to keep sending will then let the prior packets process. There's some other event that needs to get processed before socket.io makes the socket writable again.
The issue occurs in the flush() method in engine.io (the transport layer for socket.io). Here's the code for .flush():
Socket.prototype.flush = function () {
if ('closed' !== this.readyState &&
this.transport.writable &&
this.writeBuffer.length) {
debug('flushing buffer to transport');
this.emit('flush', this.writeBuffer);
this.server.emit('flush', this, this.writeBuffer);
var wbuf = this.writeBuffer;
this.writeBuffer = [];
if (!this.transport.supportsFraming) {
this.sentCallbackFn.push(this.packetsFn);
} else {
this.sentCallbackFn.push.apply(this.sentCallbackFn, this.packetsFn);
}
this.packetsFn = [];
this.transport.send(wbuf);
this.emit('drain');
this.server.emit('drain', this);
}
};
What happens sometimes is that this.transport.writable is false. And, when that happens, it does not send the data yet. It will be sent on some future tick of the event loop.
From what I can tell, it looks like the issue may be here in the WebSocket code:
WebSocket.prototype.send = function (packets) {
var self = this;
for (var i = 0; i < packets.length; i++) {
var packet = packets[i];
parser.encodePacket(packet, self.supportsBinary, send);
}
function send (data) {
debug('writing "%s"', data);
// always creates a new object since ws modifies it
var opts = {};
if (packet.options) {
opts.compress = packet.options.compress;
}
if (self.perMessageDeflate) {
var len = 'string' === typeof data ? Buffer.byteLength(data) : data.length;
if (len < self.perMessageDeflate.threshold) {
opts.compress = false;
}
}
self.writable = false;
self.socket.send(data, opts, onEnd);
}
function onEnd (err) {
if (err) return self.onError('write error', err.stack);
self.writable = true;
self.emit('drain');
}
};
Where you can see that the .writable property is set to false when some data is sent until it gets confirmation that the data has been written. So, when rapidly sending data in a loop, it may not be letting the event come through that signals that the data has been successfully sent. When you do a setTimeout() to let some things in the event loop get processed that confirmation event comes through and the .writable property gets set to true again so data can again be sent immediately.
To be honest, socket.io is built of so many abstract layers across dozens of modules that it's very difficult code to debug or analyze on GitHub so it's hard to be sure of the exact explanation. I did definitely see the .writable flag as false in the debugger which did cause a delay so this seems like a plausible explanation to me. I hope this helps.

NodeJS streams and premature end

Assuming a Readable Stream in NodeJS and a Data (on('data', ...)) event handler tied to it that is relatively slow, is it possible for the End event to fire before the last Data handler(s) has finished, and if so, will it prematurely terminate that handler? Or, will all Data events get dispatched and run?
In my case, I am working with large files and want to commit to a DB every data chunk. I am worried that I may lose the last record or two (or more) if End is fired before the last DB calls in the handler actually complete.
Event 'end' fire after last 'data' event. But it may happend before the last Data handler has finished. It is possible that before one 'data' handler has finished, next is started. It depends of what you have in your code, but it is possible that later call of event 'data' finish before earlier. It may cause errors and problems in your code.
Example how to cause problems (to your own tests):
var fs = require('fs');
var rr = fs.createReadStream('somebigfile.jpg');
var i=0;
rr.on('data', function(chunk) {
i++;
var s = i;
console.log('readable:' + s);
setTimeout(function(){
console.log('timeout:'+s);
}, 50-i*10);
});
rr.on('end', function() {
console.log('end');
});
It will print in your console when start each 'data' event handler. And after some miliseconds when it finish. Finish may be in different order.
Solution:
Readable Streams have two modes 'flowing mode' and a 'paused mode'. When you add 'data' event handler, you auto set Readable Streams to flowing mode.
From documentation :
When in flowing mode, data is read from the underlying system and
provided to your program as fast as possible
In this mode events will not wait for your slow actions to finish. For your need is 'paused mode'.
From documentation:
In paused mode, you must explicitly call stream.read() to get chunks
of data out. Streams start out in paused mode.
In other words: you demand chunk of data, you get it, you work with it, and when you ready you ask for new chunk of data. In this mode you controll when you want to get your data.
How to change to 'paused mode':
It is default mode for this stream. But when you register 'data' event handler it switch to 'flowing mode'. Therefore not use readstream.on('data',...)
Instead use readstream.on('readable', function(){...}) when it fire, then it means that stream is ready to give chunk of data. To get chunk of data use var chunk = readstream.read();
Example from docs:
var fs = require('fs');
var rr = fs.createReadStream('foo.txt');
rr.on('readable', function() {
console.log('readable:', rr.read());
});
rr.on('end', function() {
console.log('end');
});
Please read documentation for more details, because there are more posibilities when stream is auto switched to 'flowing mode'.
Work with slow handlers and flowing mode:
If you want/need work in 'flowing mode', there is also solution. You can pause and resume stream. When you get chunk form readstream('data'), pause stream and when you finish work then resume it.
Example from documentation:
var readable = getReadableStreamSomehow();
readable.on('data', function(chunk) {
console.log('got %d bytes of data', chunk.length);
readable.pause();
console.log('there will be no more data for 1 second');
setTimeout(function() {
console.log('now data will start flowing again');
readable.resume();
}, 1000);
});

How to forcibly keep a Node.js process from terminating?

TL;DR
What is the best way to forcibly keep a Node.js process running, i.e., keep its event loop from running empty and hence keeping the process from terminating? The best solution I could come up with was this:
const SOME_HUGE_INTERVAL = 1 << 30;
setInterval(() => {}, SOME_HUGE_INTERVAL);
Which will keep an interval running without causing too much disturbance if you keep the interval period long enough.
Is there a better way to do it?
Long version of the question
I have a Node.js script using Edge.js to register a callback function so that it can be called from inside a DLL in .NET. This function will be called 1 time per second, sending a simple sequence number that should be printed to the console.
The Edge.js part is fine, everything is working. My only problem is that my Node.js process executes its script and after that it runs out of events to process. With its event loop empty, it just terminates, ignoring the fact that it should've kept running to be able to receive callbacks from the DLL.
My Node.js script:
var
edge = require('edge');
var foo = edge.func({
assemblyFile: 'cs.dll',
typeName: 'cs.MyClass',
methodName: 'Foo'
});
// The callback function that will be called from C# code:
function callback(sequence) {
console.info('Sequence:', sequence);
}
// Register for a callback:
foo({ callback: callback }, true);
// My hack to keep the process alive:
setInterval(function() {}, 60000);
My C# code (the DLL):
public class MyClass
{
Func<object, Task<object>> Callback;
void Bar()
{
int sequence = 1;
while (true)
{
Callback(sequence++);
Thread.Sleep(1000);
}
}
public async Task<object> Foo(dynamic input)
{
// Receives the callback function that will be used:
Callback = (Func<object, Task<object>>)input.callback;
// Starts a new thread that will call back periodically:
(new Thread(Bar)).Start();
return new object { };
}
}
The only solution I could come up with was to register a timer with a long interval to call an empty function just to keep the scheduler busy and avoid getting the event loop empty so that the process keeps running forever.
Is there any way to do this better than I did? I.e., keep the process running without having to use this kind of "hack"?
The simplest, least intrusive solution
I honestly think my approach is the least intrusive one:
setInterval(() => {}, 1 << 30);
This will set a harmless interval that will fire approximately once every 12 days, effectively doing nothing, but keeping the process running.
Originally, my solution used Number.POSITIVE_INFINITY as the period, so the timer would actually never fire, but this behavior was recently changed by the API and now it doesn't accept anything greater than 2147483647 (i.e., 2 ** 31 - 1). See docs here and here.
Comments on other solutions
For reference, here are the other two answers given so far:
Joe's (deleted since then, but perfectly valid):
require('net').createServer().listen();
Will create a "bogus listener", as he called it. A minor downside is that we'd allocate a port just for that.
Jacob's:
process.stdin.resume();
Or the equivalent:
process.stdin.on("data", () => {});
Puts stdin into "old" mode, a deprecated feature that is still present in Node.js for compatibility with scripts written prior to Node.js v0.10 (reference).
I'd advise against it. Not only it's deprecated, it also unnecessarily messes with stdin.
Use "old" Streams mode to listen for a standard input that will never come:
// Start reading from stdin so we don't exit.
process.stdin.resume();
Here is IFFE based on the accepted answer:
(function keepProcessRunning() {
setTimeout(keepProcessRunning, 1 << 30);
})();
and here is conditional exit:
let flag = true;
(function keepProcessRunning() {
setTimeout(() => flag && keepProcessRunning(), 1000);
})();
You could use a setTimeout(function() {""},1000000000000000000); command to keep your script alive without overload.
spin up a nice repl, node would do the same if it didn't receive an exit code anyway:
import("repl").then(repl=>
repl.start({prompt:"\x1b[31m"+process.versions.node+": \x1b[0m"}));
I'll throw another hack into the mix. Here's how to do it with Promise:
new Promise(_ => null);
Throw that at the bottom of your .js file and it should run forever.

Pausing a non-flowing stream (delaying readable event)

On one end I have a http request (long polling). On the other end is a "game server" dispatching events. These ends are tied together with a duplex non-flowing stream in object mode and that part works fine.
The long poll end listens on readable and drains the stream by calling stream.read repeatedly. Then closes the client's connection.
The game server uses stream.write to push events to the clients.
Some events in the game actually spand several events and here's the problem:
When the game server adds several events at once (calling stream.write repeatedly) the first write triggers readable and the long poll is filled with the event and closed. That's very inconvinient.
The essential of the problem is that I can't silent readable and then trigger it when I'm done writing.
So my question is; can I somehow "pause" the stream and resume afterwards?
Is there a known another solution to this problem?
My best bet is to write an array of events, but I think that's somehow an antipattern.
Here's some code to illustrate my problem:
var stream = require('stream');
var connection = stream.PassThrough({ objectMode: true });
var exhaust = function() {
console.log('exhausting');
var chunk;
while ((chunk = connection.read()) !== null)
console.log(chunk);
console.log('exhausting end');
}
connection.on('readable', function(){
console.log('Ready to read');
exhaust();
});
for (var i = 0;i < 10;i++)
connection.write({ test: true });
I ended up writing an array of events to write, but I'm looking forward to this upcoming feature, which could be the missin solution:
http://strongloop.com/strongblog/performance-node-js-v-0-12-whats-new/
http://nodejs.org/docs/v0.11.10/api/stream.html#stream_writable_cork

Resources