Node.js stream.on('end'... does not make file readable - node.js

I try to catch the completion of writing the canvas stream thusly:
var out = fs.createWriteStream(out_fs);
var stream = canvas.createPNGStream({
bufsize: 2048
});
stream.on('end', function () {
// can we use out_fs now? why not?
});
stream.pipe(out);
But when I try to load out_fs in sub function
Error: Image given has not completed loading
at this line:
fs.readFile(out_fs, function (err, data) {
if (err) throw err;
var img = new Canvas.Image; // Create a new Image
img.src = data;
ctx2.drawImage(img, 0, 50, img.width, img.height); <--
http://nodejs.org/api/stream.html#stream_event_end
But I don't see any other way to continue with the control flow after the stream is written. If I let the entire parent function return, the file then seems readable. I've tried wrapping my child functions in setImmediate(), but that only seems to work intermittently.
What is the definitive way to catch the final usable end result of writing the stream?
The node-canvas documentation claims that the end event signals the final writing of the file: https://www.npmjs.com/package/canvas#canvaspngstream
But this generates the error above if you immediately try to use it.
`finish' does not seem to be implemented at all.

Since you have piped stream to out, out will be close()'d automatically on stream's end event (this is part of what gets setup automatically when you .pipe() a stream). So, to know when file is finished being written, listen to the close event of out stream.
You saw intermittent results because the stream end event is the same event that will be used by out writable stream to finalize the file.

I would put this in a comment (but can't):
You need to close your WriteStream called 'out' - use the event aarosil suggests and do out.close()

Related

How can I implement a nodeJS worker that streams data from mongo to elasticsearch?

I'm building a CDC-based application that uses Mongo Change Streams to listen for change events and index the changes in elasticsearch in near real-time.
So far, I've implemented a worker that calls a function to capture events, transform them and index them in elasticsearch with no issues when implementing the stream for 1 mongo collection:
function syncChangeEvents() {
const stream = ModelA.watch()
while (!stream.isClosed()) {
if (await stream.hasNext()) {
const event = stream.next()
// transform event
// index to elasticsearch
}
}
}
I've implemented it using an infinite loop (probably a bad approach) but I'm not sure what alternatives there are when I have to keep the change stream alive forever.
The problem comes when I have to implement a change stream for another model. Since the first function has a while loop that is blocking, the worker can't call the second function to start the second change stream.
I'm wondering what the best way would be to spin up a worker that can trigger x no. of change streams without impacting the performance of each change stream. Would worker threads be the right way to go?
There are three primary ways to work with Change Streams in Node.js.
You can monitor the Change Stream using EventEmitter's on() function.
// See https://mongodb.github.io/node-mongodb-native/3.3/api/Collection.html#watch for the watch() docs
const changeStream = collection.watch(pipeline);
// ChangeStream inherits from the Node Built-in Class EventEmitter (https://nodejs.org/dist/latest-v12.x/docs/api/events.html#events_class_eventemitter).
// We can use EventEmitter's on() to add a listener function that will be called whenever a change occurs in the change stream.
// See https://nodejs.org/dist/latest-v12.x/docs/api/events.html#events_emitter_on_eventname_listener for the on() docs.
changeStream.on('change', (next) => {
console.log(next);
});
// Wait the given amount of time and then close the change stream
await closeChangeStream(timeInMs, changeStream);
You can monitor the Change Stream using hasNext().
// See https://mongodb.github.io/node-mongodb-native/3.3/api/Collection.html#watch for the watch() docs
const changeStream = collection.watch(pipeline);
// Set a timer that will close the change stream after the given amount of time
// Function execution will continue because we are not using "await" here
closeChangeStream(timeInMs, changeStream);
// We can use ChangeStream's hasNext() function to wait for a new change in the change stream.
// If the change stream is closed, hasNext() will return false so the while loop will exit.
// See https://mongodb.github.io/node-mongodb-native/3.3/api/ChangeStream.html for the ChangeStream docs.
while (await changeStream.hasNext()) {
console.log(await changeStream.next());
}
You can monitor the Change Stream using the Stream API
// See https://mongodb.github.io/node-mongodb-native/3.3/api/Collection.html#watch for the watch() docs
const changeStream = collection.watch(pipeline);
// See https://mongodb.github.io/node-mongodb-native/3.3/api/ChangeStream.html#pipe for the pipe() docs
changeStream.pipe(
new stream.Writable({
objectMode: true,
write: function (doc, _, cb) {
console.log(doc);
cb();
}
})
);
// Wait the given amount of time and then close the change stream
await closeChangeStream(timeInMs, changeStream);
If your MongoDB database is hosted on Atlas (https://cloud.mongodb.com), the simplest thing to do is create a Trigger. Atlas handles programming the Change Stream code for you, so you only have to write the code that will transform the event and index them in Elasticsearch.
More information on working with Change Streams and Triggers is available in my blog post. A complete code example for all of the snippets above is available on GitHub.

What are the roles of _read and read in Node JS streams?

I'm really just looking for clarification on how these work. IMO the documentation on streams is somewhat lacking, and there actually aren't a lot of resources out their that comprehensively explain explain how they're are meant to work and be extended.
My question can be broken down into two parts
One, What is the role of the _read function within the stream module? When I run this code it endlessly prints out "hello world" until null is pushed onto the stream buffer. This seems to indicate that _read is called in some kind of loop that waits for a null in the buffer, but I can't find documentation anywhere that states this in explicit terms.
var Readable = require('stream').Readable
var rs = Readable()
rs._read = function () {
rs.push("hello world")
rs.push(null)
};
rs.on("data", function(data){
console.log("some data", data)
})
Two, what does read actually do? My understanding is that read consumes data from the read stream buffer, and fires the data event. Is that all that's going on here?
read() is something that a consumer of the readStream calls if they want to specifically read some bytes from the stream (when the stream is not flowing).
_read() is an internal method that is part of the internal implementation of the read stream. The internals of the stream call this method (it is NOT to be called from the outside) when the stream is flowing and the stream wants to get more data from the source. When called the _read() method pushes data with .push(data) or if it has no more data, then it does a .push(null).
You can see an explanation and example here in this article.
_read(size) {
if (this.data.length) {
const chunk = this.data.slice(0, size);
this.data = this.data.slice(size, this.data.length);
this.push(chunk);
} else {
this.push(null); // 'end', no more data
}
}
If you were implementing a read stream to some custom source of data, then you would implement the _read() method to fetch up to the size amount of data from your source and .push() that data into the stream.

Converting WriteStream to TransformStream

I have a (somewhat weird) writable stream that I need to convert to a transform stream.
The writable stream, normally, sits at the end of a pipe chain and emits custom events once it has collected enough data for its output. I want it to go in the middle so I can pipe it to another writeStream, i.e:
readStream.pipe(writeStreamToConvert).pipe(finalWriteStream);
What I done is the following and it works.
const through2 = require('through2')
var writeStreamToConvert = new WriteStreamToConvert();
return through2.obj(function (chunk, enc, callback) {
writeStreamToConvert.write(chunk)
// object is the event emitted from the writestream
writeStreamToConvert.on('object', (name, obj ) => {
this.push(JSON.stringify(obj, null, 4) + '\n')
});
callback()
})
This works fine, does not seem to leak memory and is fairly quick. However node gives me a warning:
Warning: Possible EventEmitter memory leak detected. 11 object listeners added. Use emitter.setMaxListeners() to increase limit
So I am a little bit curious if this is the correct way of converting writestreams?
The event handler would be best placed in a Transform stream constructor. Since through2 does not support such initialization, you would need to use node's stream API directly.
Currently, a new event handler (which is never removed -- that is how .on() works) is being added for every object written to the through2 stream. That is why the warning occurs.

NodeJS streams and premature end

Assuming a Readable Stream in NodeJS and a Data (on('data', ...)) event handler tied to it that is relatively slow, is it possible for the End event to fire before the last Data handler(s) has finished, and if so, will it prematurely terminate that handler? Or, will all Data events get dispatched and run?
In my case, I am working with large files and want to commit to a DB every data chunk. I am worried that I may lose the last record or two (or more) if End is fired before the last DB calls in the handler actually complete.
Event 'end' fire after last 'data' event. But it may happend before the last Data handler has finished. It is possible that before one 'data' handler has finished, next is started. It depends of what you have in your code, but it is possible that later call of event 'data' finish before earlier. It may cause errors and problems in your code.
Example how to cause problems (to your own tests):
var fs = require('fs');
var rr = fs.createReadStream('somebigfile.jpg');
var i=0;
rr.on('data', function(chunk) {
i++;
var s = i;
console.log('readable:' + s);
setTimeout(function(){
console.log('timeout:'+s);
}, 50-i*10);
});
rr.on('end', function() {
console.log('end');
});
It will print in your console when start each 'data' event handler. And after some miliseconds when it finish. Finish may be in different order.
Solution:
Readable Streams have two modes 'flowing mode' and a 'paused mode'. When you add 'data' event handler, you auto set Readable Streams to flowing mode.
From documentation :
When in flowing mode, data is read from the underlying system and
provided to your program as fast as possible
In this mode events will not wait for your slow actions to finish. For your need is 'paused mode'.
From documentation:
In paused mode, you must explicitly call stream.read() to get chunks
of data out. Streams start out in paused mode.
In other words: you demand chunk of data, you get it, you work with it, and when you ready you ask for new chunk of data. In this mode you controll when you want to get your data.
How to change to 'paused mode':
It is default mode for this stream. But when you register 'data' event handler it switch to 'flowing mode'. Therefore not use readstream.on('data',...)
Instead use readstream.on('readable', function(){...}) when it fire, then it means that stream is ready to give chunk of data. To get chunk of data use var chunk = readstream.read();
Example from docs:
var fs = require('fs');
var rr = fs.createReadStream('foo.txt');
rr.on('readable', function() {
console.log('readable:', rr.read());
});
rr.on('end', function() {
console.log('end');
});
Please read documentation for more details, because there are more posibilities when stream is auto switched to 'flowing mode'.
Work with slow handlers and flowing mode:
If you want/need work in 'flowing mode', there is also solution. You can pause and resume stream. When you get chunk form readstream('data'), pause stream and when you finish work then resume it.
Example from documentation:
var readable = getReadableStreamSomehow();
readable.on('data', function(chunk) {
console.log('got %d bytes of data', chunk.length);
readable.pause();
console.log('there will be no more data for 1 second');
setTimeout(function() {
console.log('now data will start flowing again');
readable.resume();
}, 1000);
});

Basic streams issue: Difficulty sending a string to stdout

I'm just starting learning about streams in node. I have a string in memory and I want to put it in a stream that applies a transformation and pipe it through to process.stdout. Here is my attempt to do it:
var through = require('through');
var stream = through(function write(data) {
this.push(data.toUpperCase());
});
stream.push('asdf');
stream.pipe(process.stdout);
stream.end();
It does not work. When I run the script on the cli via node, nothing is sent to stdout and no errors are thrown. A few questions I have:
If you have a value in memory that you want to put into a stream, what is the best way to do it?
What is the difference between push and queue?
Does it matter if I call end() before or after calling pipe()?
Is end() equivalent to push(null)?
Thanks!
Just use the vanilla stream API
var Transform = require("stream").Transform;
// create a new Transform stream
var stream = new Transform({
decodeStrings: false,
encoding: "ascii"
});
// implement the _transform method
stream._transform = function _transform(str, enc, done) {
this.push(str.toUpperCase() + "\n";
done();
};
// connect to stdout
stream.pipe(process.stdout);
// write some stuff to the stream
stream.write("hello!");
stream.write("world!");
// output
// HELLO!
// WORLD!
Or you can build your own stream constructor. This is really the way the stream API is intended to be used
var Transform = require("stream").Transform;
function MyStream() {
// call Transform constructor with `this` context
// {decodeStrings: false} keeps data as `string` type instead of `Buffer`
// {encoding: "ascii"} sets the encoding for our strings
Transform.call(this, {decodeStrings: false, encoding: "ascii"});
// our function to do "work"
function _transform(str, encoding, done) {
this.push(str.toUpperCase() + "\n");
done();
}
// export our function
this._transform = _transform;
}
// extend the Transform.prototype to your constructor
MyStream.prototype = Object.create(Transform.prototype, {
constructor: {
value: MyStream
}
});
Now use it like this
// instantiate
var a = new MyStream();
// pipe to a destination
a.pipe(process.stdout);
// write data
a.write("hello!");
a.write("world!");
Output
HELLO!
WORLD!
Some other notes about .push vs .write.
.write(str) adds data to the writable buffer. It is meant to be called externally. If you think of a stream like a duplex file handle, it's just like fwrite, only buffered.
.push(str) adds data to the readable buffer. It is only intended to be called from within our stream.
.push(str) can be called many times. Watch what happens if we change our function to
function _transform(str, encoding, done) {
this.push(str.toUpperCase());
this.push(str.toUpperCase());
this.push(str.toUpperCase() + "\n");
done();
}
Output
HELLO!HELLO!HELLO!
WORLD!WORLD!WORLD!
First, you want to use write(), not push(). write() puts data in to the stream, push() pushes data out of the stream; you only use push() when implementing your own Readable, Duplex, or Transform streams.
Second, you'll only want to write() data to the stream after you've setup the pipe() (or added some event listeners). If you write to a stream with nothing wired to the other end, the data you've written will be lost. As #naomik pointed out, this isn't true in general since a Writable stream will buffer write()s. In your example you do need to write() after pipe() though. Otherwise, the process will end before writing anything to STDOUT. This is possibly due to how the through module is implemented, but I don't know that for sure.
So, with that in mind, you can make a couple simple changes to your example to get it to work:
var through = require('through');
var stream = through(function write(data) {
this.push(data.toUpperCase());
});
stream.pipe(process.stdout);
stream.write('asdf');
stream.end();
Now, for your questions:
The easiest way to get data from memory in to a writable stream is to simply write() it, just like we're doing with stream.wrtie('asdf') in your example.
As far as I know, the stream doesn't have a queue() function, did you mean write()? Like I said above, write() is used to put data in to a stream, push() is used to push data out of the stream. Only call push() in your owns stream implementations.
Only call end() after all your data has been written to your stream. end() basically says: "Ok, I'm done now. Please finish what you're doing and close the stream."
push(null) is pretty much equivalent to end(). That being said, don't call push(null) unless you're doing it inside your own stream implementation (as stated above). It's almost always more appropriate to call end().
Based on the examples for stream (http://nodejs.org/api/stream.html#stream_readable_pipe_destination_options)
and through (https://www.npmjs.org/package/through)
it doesn't look like you are using your stream correctly... What happens if you use write(...) instead of push(...)?

Resources