How to handle mutiple streams in Nodejs? - node.js

I am writing a image manipulation service and I have to transform an image into multiple sizes
const writable1 = storage(name1).writableStream();
const writable2 = storage(name2).writableStream();
const writable3 = storage(name3).writableStream();
//piping the file stream to their respective storage stream
file.stream.pipe(imageTransformer).pipe(writable1);
file.stream.pipe(imageTransformer).pipe(writable2);
file.stream.pipe(imageTransformer).pipe(writable3);
I want to know when all the streams are finished writing to destination
Right now I have only checked for one stream like:
writable3.on('finish', callback);
//error handling
writable3.on('error', callback);
I have seen libraries like https://github.com/mafintosh/pump and https://github.com/maxogden/mississippi but these libraries only show writing to a single destination with multiple transforms.
How would I be able to check if all the streams are finished writing or one of them has errored out? How can I handle them in an array?

You can use a combination converting stream to promise and Promise.all.
In the example, I have used stream-to-promise library for the stream to promise conversion.
For each stream, a promise is created. The promise is resolved when the stream is completed and rejected when the stream fails.
const streamToPromise = require('stream-to-promise')
const promise1 = streamToPromise(readable1.pipe(writable1));
const promise2 = streamToPromise(readable2.pipe(writable2));
Promise.all([promise1, promise2]).
.then(() => console.log('all the streams are finished'));

Related

Stream large JSON from REST API using NodeJS/ExpressJS

I have to return a large JSON, resulting from a query to MongoDB, from a REST API server build-up using ExpressJS. This JSON has to be converted into .csv so the client can directly save the resulting CSV file. I know that the best solution is to use NodeJS streams and pipe. Could anyone suggest to me a working example? Thanks.
Typically when wanting to parse JSON in Node its fairly simple. In the past I would do something like the following.
const fs = require('fs');
const rawdata = fs.readFileSync('file.json');
const data = JSON.parse(rawdata);
Or even simpler with a require statement like this
const data = require('./file.json');
Both of these work great with small or even moderate size files, but what if you need to parse a really large JSON file, one with millions of lines, reading the entire file into memory is no longer a great option.
Because of this I needed a way to “Stream” the JSON and process as it went. There is a nice module named ‘stream-json’ that does exactly what I wanted.
With stream-json, we can use the NodeJS file stream to process our large data file in chucks.
const StreamArray = require( 'stream-json/streamers/StreamArray');
const fs = require('fs');
const jsonStream = StreamArray.withParser();
//internal Node readable stream option, pipe to stream-json to convert it for us
fs.createReadStream('file.json').pipe(jsonStream.input);
//You'll get json objects here
//Key is the array-index here
jsonStream.on('data', ({key, value}) => {
console.log(key, value);
});
jsonStream.on('end', ({key, value}) => {
console.log('All Done');
});
Now our data can process without running out of memory, however in the use case I was working on, inside the stream I had an asynchronous process. Because of this, I still was consuming huge amounts of memory as this just up a very large amount of unresolved promises to keep in memory until they completed.
To solve this I had to also use a custom Writeable stream like this.
const StreamArray = require( 'stream-json/streamers/StreamArray');
const {Writable} = require('stream');
const fs = require('fs');
const fileStream = fs.createReadStream('file.json');
const jsonStream = StreamArray.withParser();
const processingStream = new Writable({
write({key, value}, encoding, callback) {
//some async operations
setTimeout(() => {
console.log(key,value);
//Runs one at a time, need to use a callback for that part to work
callback();
}, 1000);
},
//Don't skip this, as we need to operate with objects, not buffers
objectMode: true
});
//Pipe the streams as follows
fileStream.pipe(jsonStream.input);
jsonStream.pipe(processingStream);
//So we're waiting for the 'finish' event when everything is done.
processingStream.on('finish', () => console.log('All done' ));
The Writeable stream also allows each asynchronous process to complete and the promises to resolve before continuing on to the next, thus avoiding the memory backup.
This stack overflow is where I got the examples for this post.
Parse large JSON file in Nodejs and handle each object independently
Also note another thing I learned in this process is if you want to start Node with more than the default amount of RAM you can use the following command.
node --max-old-space-size=4096 file.js
By default the memory limit in Node.js is 512 mb, to solve this issue you need to increase the memory limit using command –max-old-space-size. This can be used to avoid the memory limits within node. The command above would give Node 4GB of RAM to use.

Node: pipeline not blocking on paused passthrough

One of the base behaviour of node's stream is to block when writing on a paused stream, and any non piped stream is blocked.
In this example, the created PassThrough is not piped to anything in it's creation event loop. One would expect any pipeline run on this PassThrough to block until it is piped / a data event is attached, but this is not the case.
The pipeline callbacks, but nothing is consumed.
const {promises: pFs} = require('fs');
const fs = require('fs');
const {PassThrough} = require('stream');
const {pipeline: pipelineCb} = require('stream');
const util = require('util');
const pipeline = util.promisify(pipelineCb);
const path = require('path');
const assert = require('assert');
/**
* Start a test ftp server
* #param {string} outputPath
* #return {Promise<void>}
*/
function myCreateWritableStream (outputPath) {
// The stream is created in paused mode -> should block until piped
const stream = new PassThrough();
(async () => {
// Do some stuff (create directory / check space / connect...)
await new Promise(resolve => setTimeout(resolve, 500));
console.log('piping passThrough to finale output');
// Consume the stream
await pipeline(stream, fs.createWriteStream(outputPath));
console.log('passThrough stream content written');
})().catch(e => {
console.error(e);
stream.emit('error', e);
});
return stream;
}
/**
* Main test function
* #return {Promise<void>}
*/
async function main () {
// Prepare the test directory with a 'tmp1' file only
const smallFilePath = path.join(__dirname, 'tmp1');
const smallFileOut = path.join(__dirname, 'tmp2');
await Promise.all([
pFs.writeFile(smallFilePath, 'a small content'),
pFs.unlink(smallFileOut).catch(e => assert(e.code === 'ENOENT'))
]);
// Duplicate the tmp1 file to tmp2
await pipeline([
fs.createReadStream(smallFilePath),
myCreateWritableStream(smallFileOut)
]);
console.log('pipeline ended');
// Check content
const finalContent = await pFs.readdir(__dirname);
console.log('directory content');
console.log(finalContent.filter(file => file.startsWith('tmp')));
}
main().catch(e => {
process.exitCode = 1;
console.error(e);
});
This code output the following lines:
pipeline ended
directory content
[ 'tmp1' ]
piping passThrough to finale output
passThrough stream content written
If the pipeline really waited for the stream to end, then the output would be this one:
piping passThrough to finale output
passThrough stream content written
pipeline ended
directory content
[ 'tmp1', 'tmp2' ]
How can you explain this behaviour ?
I don't think the API gives the guarantees you are looking for here.
The stream.pipeline calls its callback after all data has finished writing. Since the data has been written to a new Transform stream (your Passthrough), and that stream has nowhere to put the data yet, it simply gets stored in the stream's internal buffer. That is good enough for the pipeline.
If you were to read a large enough file, filling the Transform stream's buffer, the stream backpressure can automatically trigger a pause() on the readable that is reading a file. Once the Transform stream drains, it will automatically unpause() the readable so data flow resumes.
I think your example makes two incorrect assumptions:
(1) That you can pause a transform stream. According to the stream docs, pausing any stream that is piped to a destination is ineffective, because it will immediately unpause itself as soon as a piped destination asks for more data. Also, a paused transform stream still reads data! A paused stream just doesn't write data.
(2) That a pause further down a pipeline somehow propagates up to the front of a pipeline and causes data to stop flowing. This is only true if caused by backpressure, meaning, you would need to trigger node's detection of a full internal buffer.
When working with pipes, it's best to assume you have manual control over the two farthest ends, but not necessarily of any of the pieces in the middle. (You can manually pipe() and unpipe() to connect and disconnect intermediate streams, but you can't pause them.)

What's the correct way to chain pipes within pipes in Node?

I have a function that may return one or more streams piped, back to a main function that is chaining streams together:
function streamBuilder(){
const csvStream = require('fast-csv').createWriteStream();
const fsStream = fs.createWriteStream('file.csv');
return csvStream.pipe(fsStream);
}
const dbStream = db.collection('huge').find();
const streams = streamBuilder();
dbStream.pipe(streams);
Unfortunately, it doesn't work.
The result is that the CSV stream csvStream.transform() is apparently never called, only the file write stream fileStream.transform(), which errors with a invalid chunk argument.
If I do this instead, it works:
function streamBuilder(){
const csvStream = require('fast-csv').createWriteStream();
const fsStream = fs.createWriteStream('file.csv');
return csvStream.on('data', chunk => fsStream.write(chunk));
}
But if doesn't feel right... there must be a way to chain stream pipes when they are nested (ie. streamA.pipe(streamB.pipe(streamC))). Is there a way I can just chain them without .on('data'...) and without sending input streams as arguments into the streamBuilder() function?

Cannot pipe after data has been emitted from the response nodejs

I've been experiencing a problem with the require library of node js. When I try to pipe to a file and a stream on response, I get the error: you cannot pipe after data has been emitted from the response. This is because I do some calculations before really piping the data.
Example:
var request = require('request')
var fs = require('fs')
var through2 = require('through2')
options = {
url: 'url-to-fetch-a-file'
};
var req = request(options)
req.on('response',function(res){
//Some computations to remove files potentially
//These computations take quite somme time.
//Function that creates path recursively
createPath(path,function(){
var file = fs.createWriteStream(path+fname)
var stream = through2.obj(function (chunk, enc, callback) {
this.push(chunk)
callback()
})
req.pipe(file)
req.pipe(stream)
})
})
If I just pipe to the stream without any calculations, it's just fine. How can I pipe to both a file and stream using request module in nodejs?
I found this:Node.js Piping the same readable stream into multiple (writable) targets but it is not the same thing. There, piping happens 2 times in a different tick. This example pipes like the answer in the question and still receives an error.
Instead of piping directly to the file you can add a listener to the stream you defined. So you can replace req.pipe(file) with
stream.on('data',function(data){
file.write(data)
})
stream.on('end',function(){
file.end()
})
or
stream.pipe(file)
This will pause the stream untill its read, something that doesn't happen with the request module.
More info: https://github.com/request/request/issues/887

In Meteor, how do I get a node read stream from a collection's find curser?

In Meteor, on the server side, I want to use the .find() function on a Collection and then get a Node ReadStream interface from the curser that is returned. I've tried using .stream() on the curser as described in the mongoDB docs Seen Here. However I get the error "Object [object Object] has no method 'stream'" So it looks like Meteor collections don't have this option. Is there a way to get a stream from a Meteor Collection's curser?
I am trying to export some data to CSV and I want to pipe the data directly from the collections stream into a CSV parser and then into the response going back to the user. I am able to get the response stream from the Router package we are using, and it's all working except for getting a stream from the collection. Fetching the array from the find to push it into the stream manually would defeat the purpose of a stream since it would put everything in memory. I guess my other option is to use a foreach on the collection and push the rows into the stream one by one, but this seems dirty when I could pipe the stream directly through the parser with a transform on it.
Here's some sample code of what I am trying to do:
response.writeHead(200,{'content-type':'text/csv'});
// Set up a future
var fut = new Future();
var users = Users.find({}).stream();
CSV().from(users)
.to(response)
.on('end', function(count){
log.verbose('finished csv export');
response.end();
fut.ret();
});
return fut.wait();
Have you tried creating a custom function and piping to it?
Though this would only work if Users.find() supported .pipe()(again, only if Users.find inherited from node.js streamble object).
Kind of like
var stream = require('stream')
var util = require('util')
streamreader = function (){
stream.Writable.call(this)
this.end = function() {
console.log(this.data) //this.data contains raw data in a string so do what you need to to make it usable, i.e, do a split on ',' or something or whatever it is you need to make it usable
db.close()
})
}
util.inherits(streamreader,stream.Writeable)
stream.prototype._write = function (chunk, encoding, callback) {
this.data = this.data + chunk.toString('utf8')
callback()
}
Users.find({}).pipe(new streamReader())

Resources