Clone a transform stream in NodeJS - node.js

I'm trying to reuse a couple of transform streams (gulp-like, ie. concat() or uglify()) across several readable streams. I only have access to the created instances and not the original subclasses.
It does not work out of the box, I get Error: stream.push() after EOF when I pipe at lease two distinct readable streams into my transforms. Somehow the events do appear to leak from one stream to the other.
I've tried to setup a cloneTransform function to somehow cleanly "fork" into two distincts transforms, however I can't get it to not share events:
function cloneTransform(transform) {
var ts = new Transform({objectMode: true});
ts._transform = transform._transform.bind(ts);
if(typeof transform._flush !== 'undefined') {
ts._flush = transform._flush.bind(ts);
}
return ts;
}
Any alternative idea, existing plugin or solution to address this?
Update: context
I'm working on a rewrite of the gulp-usemin package, it is hosted here: gulp-usemin-stream, example use here.
Basically you parse an index.html looking for comment blocks surrounding styles/scripts declarations, and you want to apply several configurable transformation to theses files (see grunt-usemin).
So the problem I'm trying to solve is to reuse an array of transforms, [concat(), uglify()] that are passed as options to a gulp meta transform.

you're doing unnecessary work in your code. it should be as simple as:
function clone(transform) {
var ts = new Transform({ objectMode: true})
ts._transform = transform._transform
ts._flush = transform._flush
return ts
}
however, i'm not sure how you're trying to use it, so i'm not sure what error you're getting. there's also the issue of initialization where the initial transform stream might initialize the stream with .queue = [] or something, and you won't be initializing it in your clone.
the solution completely depends on the problem you're trying to solve, and i feel like you're approaching the problem incorrectly in the first place.

It looks like you're trying to use a Transform stream directly instead of subclassing it. What you should be doing is subclassing Transform and overriding the _tranform and, optionally, _flush methods. That way, you can just create a new instance of your transform stream for each readable stream that you need to use it with.
Example:
var util = require('util');
var Transform = require('stream').Transform;
function MyTransform(options) {
Transform.call(this, options);
// ... any setup you want to do ...
}
util.inherits(MyTransform, Transform);
MyTransform.protocol._transform = function(chunk, encoding, done) {
// ... your _transform implementation ...
}
// Optional
MyTransform.protocol._flush = function(done) {
// ... optional flush implementation ...
}
Once you have that setup, you can simply create new instances of MyTransform for each stream you want to use it with:
var readStream1 = ...
var readStream2 = ...
var transform1 = new MyTransform();
var transform2 = new MyTransform();
readStream1.pipe(transform1).pipe(...);
readStream2.pipe(transform2).pipe(...);

Related

Two confusions about transform stream part of node.js documentation around pre-ES6 style

Here is the doc I am confused with.
When using pre-ES6 style constructors
const { Transform } = require('stream');
const util = require('util');
function MyTransform(options) {
if (!(this instanceof MyTransform))
return new MyTransform(options);
Transform.call(this, options);
}
util.inherits(MyTransform, Transform);
Why do we need to check this instanceof MyTransform? As far as I know, as long as we invoke new MyTransform(), evaluation of this instanceof MyTransfrom will always return true. Maybe using MyTransform() to create a Transform instance can be found in many code bases? This is the only reason I could guess.
What is the purpose of util.inherits(MyTransform, Transform); ? Just to ensure that new MyTransform() instanceof Transform returns true?
Thank you for your time in advance!
MyTransform is just a function like any other function. There's nothing special about it, and so calling it as a function (without new) will work perfectly fine. So what should happen in that case? According to this code: fall through to a constructor call and returning the resulting instance object.
As per the documentation for that function it enforces prototype inheritance, because again, you wrote MyTransform as just a plain function: while you can use new with any function, you didn't write any of the code necessary for proper prototype inheritance so using new will give you a completely useless object. That means either you add the code necessary to set set up prototype inheritance yourself, or you ask a utility function to do that for you.

node.js conditional stream pipe

I'm trying to understand how best to conditionally pipe a stream to a Transform or similar without the use of a 3rd party lib.
Ideally I'd like to do something like this
const transformA, transformB = // my transforms
const booleanA, booleanB = // my various conditions
const stream = new Readable()
stream.push('...') // my content
return stream
.pipe(booleanA && transformA)
.pipe(booleanB && transformB)
I've attempted this using libs like detour-stream, ternary-stream and others but I've encountered various strange side effects where transforms are being invoked when they shouldn't, throwing errors etc, and its left me wondering about how people accomplish this type of thing without the added complexity of those libs.
In the interim I've solved this by just taking an imperative approach, reassigning the stream on each condition.
let stream = // get my stream
if (condition) stream = stream.pipe(someTransform)
if (otherCondition) stream = stream.pipe(otherTransform)
return stream
This is fine, but I'm still curious if the immutable and functional approach can be done
Thanks for looking & reading
Unfortunately, there's no built-in support for conditional pipes, an alternative to your approach without any third-party modules is using .pipeline
const stream = require('stream');
const pipeline = util.promisify(stream.pipeline);
const read = new stream.Readable()
const pipes = [
booleanA && transformA,
booleanB && transformB
].filter(Boolean); // remove empty pipes
await pipeline(
read,
...pipes
);
Since .pipeline takes N transform streams, you can make use of spread operator from an already filtered array of pipes.
In case you want to return the last piped stream, you can drop the promisify version of pipeline.
return pipeline(
read,
...pipes,
() => {}
)

Node - Abstracting Pipe Steps into Function

I'm familiar with Node streams, but I'm struggling on best practices for abstracting code that I reuse a lot into a single pipe step.
Here's a stripped down version of what I'm writing today:
inputStream
.pipe(csv.parse({columns:true})
.pipe(csv.transform(function(row) {return transform(row); }))
.pipe(csv.stringify({header: true})
.pipe(outputStream);
The actual work happens in transform(). The only things that really change are inputStream, transform(), and outputStream. Like I said, this is a stripped down version of what I actually use. I have a lot of error handling and logging on each pipe step, which is ultimately why I'm try to abstract the code.
What I'm looking to write is a single pipe step, like so:
inputStream
.pipe(csvFunction(transform(row)))
.pipe(outputStream);
What I'm struggling to understand is how to turn those pipe steps into a single function that accepts a stream and returns a stream. I've looked at libraries like through2 but I'm but not sure how that get's me to where I'm trying to go.
You can use the PassThrough class like this:
var PassThrough = require('stream').PassThrough;
var csvStream = new PassThrough();
csvStream.on('pipe', function (source) {
// undo piping of source
source.unpipe(this);
// build own pipe-line and store internally
this.combinedStream =
source.pipe(csv.parse({columns: true}))
.pipe(csv.transform(function (row) {
return transform(row);
}))
.pipe(csv.stringify({header: true}));
});
csvStream.pipe = function (dest, options) {
// pipe internal combined stream to dest
return this.combinedStream.pipe(dest, options);
};
inputStream
.pipe(csvStream)
.pipe(outputStream);
Here's what I ended up going with. I used the through2 library and the streaming API of the csv library to create the pipe function I was looking for.
var csv = require('csv');
through = require('through2');
module.exports = function(transformFunc) {
parser = csv.parse({columns:true, relax_column_count:true}),
transformer = csv.transform(function(row) {
return transformFunc(row);
}),
stringifier = csv.stringify({header: true});
return through(function(chunk,enc,cb){
var stream = this;
parser.on('data', function(data){
transformer.write(data);
});
transformer.on('data', function(data){
stringifier.write(data);
});
stringifier.on('data', function(data){
stream.push(data);
});
parser.write(chunk);
parser.removeAllListeners('data');
transformer.removeAllListeners('data');
stringifier.removeAllListeners('data');
cb();
})
}
It's worth noting the part where I remove the event listeners towards the end, this was due to running into memory errors where I had created too many event listeners. I initially tried solving this problem by listening to events with once, but that prevented subsequent chunks from being read and passed on to the next pipe step.
Let me know if anyone has feedback or additional ideas.

How can I chain streams internally within a custom through2 stream

I'm writing my own through stream in Node which takes in a text stream and outputs an object per line of text. This is what the end result should look like:
fs.createReadStream('foobar')
.pipe(myCustomPlugin());
The implementation would use through2 and event-stream to make things easy:
var es = require('event-stream');
var through = require('through2');
module.exports = function myCustomPlugin() {
var parse = through.obj(function(chunk, enc, callback) {
this.push({description: chunk});
callback();
});
return es.split().pipe(parse);
};
However, if I were to pull this apart essentially what I did was:
fs.createReadStream('foobar')
.pipe(
es.split()
.pipe(parse)
);
Which is incorrect. Is there a better way? Can I inherit es.split() instead of use it inside the implementation? Is there an easy way to implement splits on lines without event-stream or similar? Would a different pattern work better?
NOTE: I'm intentionally doing the chaining inside the function as the myCustomPlugin() is the API interface I'm attempting to expose.
Based on the link in the previously accepted answer that put me on the right googling track, here's a shorter version if you don't mind another module: stream-combiner (read the code to convince yourself of what's going on!)
var combiner = require('stream-combiner')
, through = require('through2')
, split = require('split2')
function MyCustomPlugin() {
var parse = through(...)
return combine( split(), parse )
}
I'm working on something similar.
See this solution: Creating a Node.js stream from two piped streams
var outstream = through2().on('pipe', function(source) {
source.unpipe(this);
this.transformStream = source.pipe(stream1).pipe(stream2);
});
outstream.pipe = function(destination, options) {
return this.transformStream.pipe(destination, options);
};
return outstream;

In Meteor, how do I get a node read stream from a collection's find curser?

In Meteor, on the server side, I want to use the .find() function on a Collection and then get a Node ReadStream interface from the curser that is returned. I've tried using .stream() on the curser as described in the mongoDB docs Seen Here. However I get the error "Object [object Object] has no method 'stream'" So it looks like Meteor collections don't have this option. Is there a way to get a stream from a Meteor Collection's curser?
I am trying to export some data to CSV and I want to pipe the data directly from the collections stream into a CSV parser and then into the response going back to the user. I am able to get the response stream from the Router package we are using, and it's all working except for getting a stream from the collection. Fetching the array from the find to push it into the stream manually would defeat the purpose of a stream since it would put everything in memory. I guess my other option is to use a foreach on the collection and push the rows into the stream one by one, but this seems dirty when I could pipe the stream directly through the parser with a transform on it.
Here's some sample code of what I am trying to do:
response.writeHead(200,{'content-type':'text/csv'});
// Set up a future
var fut = new Future();
var users = Users.find({}).stream();
CSV().from(users)
.to(response)
.on('end', function(count){
log.verbose('finished csv export');
response.end();
fut.ret();
});
return fut.wait();
Have you tried creating a custom function and piping to it?
Though this would only work if Users.find() supported .pipe()(again, only if Users.find inherited from node.js streamble object).
Kind of like
var stream = require('stream')
var util = require('util')
streamreader = function (){
stream.Writable.call(this)
this.end = function() {
console.log(this.data) //this.data contains raw data in a string so do what you need to to make it usable, i.e, do a split on ',' or something or whatever it is you need to make it usable
db.close()
})
}
util.inherits(streamreader,stream.Writeable)
stream.prototype._write = function (chunk, encoding, callback) {
this.data = this.data + chunk.toString('utf8')
callback()
}
Users.find({}).pipe(new streamReader())

Resources