I'm trying to understand how best to conditionally pipe a stream to a Transform or similar without the use of a 3rd party lib.
Ideally I'd like to do something like this
const transformA, transformB = // my transforms
const booleanA, booleanB = // my various conditions
const stream = new Readable()
stream.push('...') // my content
return stream
.pipe(booleanA && transformA)
.pipe(booleanB && transformB)
I've attempted this using libs like detour-stream, ternary-stream and others but I've encountered various strange side effects where transforms are being invoked when they shouldn't, throwing errors etc, and its left me wondering about how people accomplish this type of thing without the added complexity of those libs.
In the interim I've solved this by just taking an imperative approach, reassigning the stream on each condition.
let stream = // get my stream
if (condition) stream = stream.pipe(someTransform)
if (otherCondition) stream = stream.pipe(otherTransform)
return stream
This is fine, but I'm still curious if the immutable and functional approach can be done
Thanks for looking & reading
Unfortunately, there's no built-in support for conditional pipes, an alternative to your approach without any third-party modules is using .pipeline
const stream = require('stream');
const pipeline = util.promisify(stream.pipeline);
const read = new stream.Readable()
const pipes = [
booleanA && transformA,
booleanB && transformB
].filter(Boolean); // remove empty pipes
await pipeline(
read,
...pipes
);
Since .pipeline takes N transform streams, you can make use of spread operator from an already filtered array of pipes.
In case you want to return the last piped stream, you can drop the promisify version of pipeline.
return pipeline(
read,
...pipes,
() => {}
)
Related
I am trying to figure out how to create a stream pipe which reads entries in a csv file on-demand. To do so, I thought of using the following approach using pipes (pseudocode)
const stream_pipe = input_file_stream.pipe(csv_parser)
// Then getting entries through:
let entry = stream_pipe.read()
Unfortunately, after lots of testing I figured that them moment I set up the pipe, it is automatically consumed until the end of the csv file. I tried to pause it on creation by appending .pause() at the end, but it seems to not have any effect.
Here's my current code. I am using the csv_parse library (part of the bigger csv package):
// Read file stream
const file_stream = fs.createReadStream("filename.csv")
const parser = csvParser({
columns: ['a', 'b'],
on_record: (record) => {
// A simple filter as I am interested only in numeric entries
let a = parseInt(record.a)
let b = parseInt(record.b)
return (isNaN(a) || isNaN(b)) ? undefined : record
}
})
const reader = stream.pipe(parser) // Adding .pause() seems to have no effect
console.log(reader.read()) // Prints `null`
// I found out I can use this strategy to read a few entries immediately, but I cannot break out of it and then resume as the stream will automatically be consumed
//for await (const record of reader) {
// console.log(record)
//}
I have been banging my head on this for a while and I could not find easy solutions on both the csv package and node official documentation.
Thanks in advance to anyone able to put me on the right track :)
You can do one thing while reading the stream you can create a readLineInterface and pass the input stream and normal output stream like this:
const inputStream = "reading the csv file",
outputStream = new stream();
// now create a readLineInterface which will read
// line by line you should use async/await
const res = await processRecord(readline.createInterface(inputStream, outputStream));
async function processRecord(line) {
return new Promise((res, rej) => {
if (line) {
// do the processing
res(line);
}
rej('Unable to process record');
})
}
Now create processRecord function should get the things line by line and you can you promises to make it sequential.
Note: the above code is a pseudo code just to give you an idea if things work because I have been doing same in my project to read the csv file line and line and it works fine.
I have a function that may return one or more streams piped, back to a main function that is chaining streams together:
function streamBuilder(){
const csvStream = require('fast-csv').createWriteStream();
const fsStream = fs.createWriteStream('file.csv');
return csvStream.pipe(fsStream);
}
const dbStream = db.collection('huge').find();
const streams = streamBuilder();
dbStream.pipe(streams);
Unfortunately, it doesn't work.
The result is that the CSV stream csvStream.transform() is apparently never called, only the file write stream fileStream.transform(), which errors with a invalid chunk argument.
If I do this instead, it works:
function streamBuilder(){
const csvStream = require('fast-csv').createWriteStream();
const fsStream = fs.createWriteStream('file.csv');
return csvStream.on('data', chunk => fsStream.write(chunk));
}
But if doesn't feel right... there must be a way to chain stream pipes when they are nested (ie. streamA.pipe(streamB.pipe(streamC))). Is there a way I can just chain them without .on('data'...) and without sending input streams as arguments into the streamBuilder() function?
Googling for "parse nodejs binary stream" I see lots of examples when the emitted data fits neatly into the expected return size, but zero examples of how to handle the case when the next chunk contains the remaining bytes from the first structure and the new header.
What's the "right" way to parse binary streams when I expect something like:
record-length: 4bytes
data1: 8bytes
data2: 8bytes
4-byte-records[(record-length - 16) * 4];
The data will come in as various sized chunks. But is there a way to call data.readUInt32(0) and have it wait for the chunk to fill? I'd hate to have to write a pipe stage that emits bytes and a receiving state machine, that seems so very wrong.
This has got to be solved, it is such a basic concept.
Help?
Thanks,
PT
Hmm... thats's something that can be solved using the async version of stream..read and a transform stream.
Now you can write (and it will probably be fun) your own version, but the framework I wrote, scramjet already has that async read and I gather you'd want to make this easy.
Here's the easiest I can think of, using AsyncGenerator:
const {BufferStream} = require('scramjet'); // or es6 import;
const input = BufferStream.from(getStreamFromSomewhere());
const output = DataStream.from(async function* () {
while(true) {
const recordLength = (await input.whenRead(4)).readUInt32(0); // read next chunk length
if (!recordLength) return; // stream ends here;
const data1 = await input.whenRead(8);
const data2 = await input.whenRead(8);
const restOfData = [];
for (let i = 0; i < recordLength; i += 4)
restOfData.push((await input.read(4)).readUInt32(0))
yield {data1, data2, restOfData};
}
})
.catch(e => output.end()); // this is a handler for an option where any of the reads past
// recordLength was to return null - perhaps should be better.
This is super easy in node v10 or with babel, but if you like I can add the non AsyncGenerator version here.
I'm writing my own through stream in Node which takes in a text stream and outputs an object per line of text. This is what the end result should look like:
fs.createReadStream('foobar')
.pipe(myCustomPlugin());
The implementation would use through2 and event-stream to make things easy:
var es = require('event-stream');
var through = require('through2');
module.exports = function myCustomPlugin() {
var parse = through.obj(function(chunk, enc, callback) {
this.push({description: chunk});
callback();
});
return es.split().pipe(parse);
};
However, if I were to pull this apart essentially what I did was:
fs.createReadStream('foobar')
.pipe(
es.split()
.pipe(parse)
);
Which is incorrect. Is there a better way? Can I inherit es.split() instead of use it inside the implementation? Is there an easy way to implement splits on lines without event-stream or similar? Would a different pattern work better?
NOTE: I'm intentionally doing the chaining inside the function as the myCustomPlugin() is the API interface I'm attempting to expose.
Based on the link in the previously accepted answer that put me on the right googling track, here's a shorter version if you don't mind another module: stream-combiner (read the code to convince yourself of what's going on!)
var combiner = require('stream-combiner')
, through = require('through2')
, split = require('split2')
function MyCustomPlugin() {
var parse = through(...)
return combine( split(), parse )
}
I'm working on something similar.
See this solution: Creating a Node.js stream from two piped streams
var outstream = through2().on('pipe', function(source) {
source.unpipe(this);
this.transformStream = source.pipe(stream1).pipe(stream2);
});
outstream.pipe = function(destination, options) {
return this.transformStream.pipe(destination, options);
};
return outstream;
I'm trying to reuse a couple of transform streams (gulp-like, ie. concat() or uglify()) across several readable streams. I only have access to the created instances and not the original subclasses.
It does not work out of the box, I get Error: stream.push() after EOF when I pipe at lease two distinct readable streams into my transforms. Somehow the events do appear to leak from one stream to the other.
I've tried to setup a cloneTransform function to somehow cleanly "fork" into two distincts transforms, however I can't get it to not share events:
function cloneTransform(transform) {
var ts = new Transform({objectMode: true});
ts._transform = transform._transform.bind(ts);
if(typeof transform._flush !== 'undefined') {
ts._flush = transform._flush.bind(ts);
}
return ts;
}
Any alternative idea, existing plugin or solution to address this?
Update: context
I'm working on a rewrite of the gulp-usemin package, it is hosted here: gulp-usemin-stream, example use here.
Basically you parse an index.html looking for comment blocks surrounding styles/scripts declarations, and you want to apply several configurable transformation to theses files (see grunt-usemin).
So the problem I'm trying to solve is to reuse an array of transforms, [concat(), uglify()] that are passed as options to a gulp meta transform.
you're doing unnecessary work in your code. it should be as simple as:
function clone(transform) {
var ts = new Transform({ objectMode: true})
ts._transform = transform._transform
ts._flush = transform._flush
return ts
}
however, i'm not sure how you're trying to use it, so i'm not sure what error you're getting. there's also the issue of initialization where the initial transform stream might initialize the stream with .queue = [] or something, and you won't be initializing it in your clone.
the solution completely depends on the problem you're trying to solve, and i feel like you're approaching the problem incorrectly in the first place.
It looks like you're trying to use a Transform stream directly instead of subclassing it. What you should be doing is subclassing Transform and overriding the _tranform and, optionally, _flush methods. That way, you can just create a new instance of your transform stream for each readable stream that you need to use it with.
Example:
var util = require('util');
var Transform = require('stream').Transform;
function MyTransform(options) {
Transform.call(this, options);
// ... any setup you want to do ...
}
util.inherits(MyTransform, Transform);
MyTransform.protocol._transform = function(chunk, encoding, done) {
// ... your _transform implementation ...
}
// Optional
MyTransform.protocol._flush = function(done) {
// ... optional flush implementation ...
}
Once you have that setup, you can simply create new instances of MyTransform for each stream you want to use it with:
var readStream1 = ...
var readStream2 = ...
var transform1 = new MyTransform();
var transform2 = new MyTransform();
readStream1.pipe(transform1).pipe(...);
readStream2.pipe(transform2).pipe(...);