How can I chain streams internally within a custom through2 stream - node.js

I'm writing my own through stream in Node which takes in a text stream and outputs an object per line of text. This is what the end result should look like:
fs.createReadStream('foobar')
.pipe(myCustomPlugin());
The implementation would use through2 and event-stream to make things easy:
var es = require('event-stream');
var through = require('through2');
module.exports = function myCustomPlugin() {
var parse = through.obj(function(chunk, enc, callback) {
this.push({description: chunk});
callback();
});
return es.split().pipe(parse);
};
However, if I were to pull this apart essentially what I did was:
fs.createReadStream('foobar')
.pipe(
es.split()
.pipe(parse)
);
Which is incorrect. Is there a better way? Can I inherit es.split() instead of use it inside the implementation? Is there an easy way to implement splits on lines without event-stream or similar? Would a different pattern work better?
NOTE: I'm intentionally doing the chaining inside the function as the myCustomPlugin() is the API interface I'm attempting to expose.

Based on the link in the previously accepted answer that put me on the right googling track, here's a shorter version if you don't mind another module: stream-combiner (read the code to convince yourself of what's going on!)
var combiner = require('stream-combiner')
, through = require('through2')
, split = require('split2')
function MyCustomPlugin() {
var parse = through(...)
return combine( split(), parse )
}

I'm working on something similar.
See this solution: Creating a Node.js stream from two piped streams
var outstream = through2().on('pipe', function(source) {
source.unpipe(this);
this.transformStream = source.pipe(stream1).pipe(stream2);
});
outstream.pipe = function(destination, options) {
return this.transformStream.pipe(destination, options);
};
return outstream;

Related

read huge json file and how to know when data is all been received?

I am having problem with asynchronous nature of NodeJs.
For example, I have the following code, which reads a huge json file
var json_spot_parser = function(path){
this.count = 0;
var self = this;
let jsonStream = JSONStream.parse('*');
let fileStream = fs.createReadStream(path);
jsonStream.on('data', (item) => {
// console.log(item) // which correctlt logged each json in the file
self.count++; //134,000
});
jsonStream.on('end', function () {
//I know it ends here,
});
fileStream.pipe(jsonStream);
};
json_spot_parser.prototype.print_count=function(){
console.log(this.count);
}
module.export= json_spot_parser;
In another module i use it as
var m_path = path.join(__dirname, '../..', this.pathes.spots);
this.spot_parser = new json_spot_parser(m_path);
this.spot_parser.print_count();
I want to read all json objects and process them. but the asynchronous is my problem. I am not familiar with that kind of programming. I used to program in sequence such as c, c++ so on.
Since I don't know when these program finish reading json objects, I don't know when/where to process them.
after
this.spot_parser = new json_spot_parser(m_path);
I expect to deal with json objects, but as I said i can't do it.
I want someone explain me how to write nodejs program in such case, I want to know the standard practice. So far I read some posts, but I believe most of them are short-term fixes.
So, my question is :
How a NodeJs programmer handles problems?
Please tell me standard way, I want to be good at this NodeJs.
Thx!
You can use callbacks as #paqash suggested but returning a promise would be a better solution.
At first, return a new Promise in the json_spot_parser
var json_spot_parser = function(path){
return new Promise(function(resolve, reject) {
this.count = 0;
var self = this;
let jsonStream = JSONStream.parse('*');
let fileStream = fs.createReadStream(path);
jsonStream.on('data', (item) => {
// console.log(item) // which correctlt logged each json in the file
self.count++; //134,000
});
jsonStream.on('end', function () {
resolve(self.count);
});
fileStream.pipe(jsonStream);
};
json_spot_parser.prototype.print_count=function(){
console.log(this.count);
}
});
module.export= json_spot_parser;
In another module
var m_path = path.join(__dirname, '../..', this.pathes.spots);
this.spot_parser = new json_spot_parser(m_path);
this.spot_parser.then(function(count) {console.log(count)});
As you mentioned, Node.js has an async mechanize and you should learn how to think in that way. It's required if you would like to be good at Node.js. If I can suggest, you should start with this article:
Understanding Async Programming in Node.js
Ps: Try to use camel case variables and follow Airbnb JS style guide.
You should process them in the callbacks - your code above looks pretty good, what exactly are you trying to do but are unable?

Node - Abstracting Pipe Steps into Function

I'm familiar with Node streams, but I'm struggling on best practices for abstracting code that I reuse a lot into a single pipe step.
Here's a stripped down version of what I'm writing today:
inputStream
.pipe(csv.parse({columns:true})
.pipe(csv.transform(function(row) {return transform(row); }))
.pipe(csv.stringify({header: true})
.pipe(outputStream);
The actual work happens in transform(). The only things that really change are inputStream, transform(), and outputStream. Like I said, this is a stripped down version of what I actually use. I have a lot of error handling and logging on each pipe step, which is ultimately why I'm try to abstract the code.
What I'm looking to write is a single pipe step, like so:
inputStream
.pipe(csvFunction(transform(row)))
.pipe(outputStream);
What I'm struggling to understand is how to turn those pipe steps into a single function that accepts a stream and returns a stream. I've looked at libraries like through2 but I'm but not sure how that get's me to where I'm trying to go.
You can use the PassThrough class like this:
var PassThrough = require('stream').PassThrough;
var csvStream = new PassThrough();
csvStream.on('pipe', function (source) {
// undo piping of source
source.unpipe(this);
// build own pipe-line and store internally
this.combinedStream =
source.pipe(csv.parse({columns: true}))
.pipe(csv.transform(function (row) {
return transform(row);
}))
.pipe(csv.stringify({header: true}));
});
csvStream.pipe = function (dest, options) {
// pipe internal combined stream to dest
return this.combinedStream.pipe(dest, options);
};
inputStream
.pipe(csvStream)
.pipe(outputStream);
Here's what I ended up going with. I used the through2 library and the streaming API of the csv library to create the pipe function I was looking for.
var csv = require('csv');
through = require('through2');
module.exports = function(transformFunc) {
parser = csv.parse({columns:true, relax_column_count:true}),
transformer = csv.transform(function(row) {
return transformFunc(row);
}),
stringifier = csv.stringify({header: true});
return through(function(chunk,enc,cb){
var stream = this;
parser.on('data', function(data){
transformer.write(data);
});
transformer.on('data', function(data){
stringifier.write(data);
});
stringifier.on('data', function(data){
stream.push(data);
});
parser.write(chunk);
parser.removeAllListeners('data');
transformer.removeAllListeners('data');
stringifier.removeAllListeners('data');
cb();
})
}
It's worth noting the part where I remove the event listeners towards the end, this was due to running into memory errors where I had created too many event listeners. I initially tried solving this problem by listening to events with once, but that prevented subsequent chunks from being read and passed on to the next pipe step.
Let me know if anyone has feedback or additional ideas.

How to reset nodejs stream?

How to reset nodejs stream?
How to read stream again in nodejs?
Thanks in advance!
var fs = require('fs');
var lineReader = require('line-reader');
// proxy.txt = only 3 lines
var readStream = fs.createReadStream('proxy.txt');
lineReader.open(readStream, function (err, reader) {
for(var i=0; i<6; i++) {
reader.nextLine(function(err, line) {
if(err) {
readStream.reset(); // ???
} else {
console.log(line);
}
});
}
});
There are two ways of solving your problem, as someone commented before you could simply wrap all that in a function and instead of resetting - simply read the file again.
Ofc this won't work well with HTTP requests for example so the other way, provided that you do take a much bigger memory usage into account, you can simply accumulate your data.
What you'd need is to implement some sort of "rewindable stream" - this means that you'd essentially need to implement a Transform stream that would keep a list of all the buffers and write them to a piped stream on a rewind method.
Take a look at the node API for streams here, the methods should look somewhat like this.
class Rewindable extends Transform {
constructor() {
super();
this.accumulator = [];
}
_transform(buf, enc, cb) {
this.accumulator.push(buf);
callback()
}
rewind() {
var stream = new PassThrough();
this.accumulator.forEach((chunk) => stream.write(chunk))
return stream;
}
And you would use this like this:
var readStream = fs.createReadStream('proxy.txt');
var rewindableStream = readStream.pipe(new Rewindable());
(...).on("whenerver-you-want-to-reset", () => {
var rewound = rewindablesteram.rewind();
/// and do whatever you like with your stream.
});
Actually I think I'll add this to my scramjet. :)
Edit
I released the logic below in rereadable-stream npm package. The upshot over the stream depicted here is that you can now control the buffer length and get rid of the read data.
At the same time you can keep a window of count items and tail a number of chunks backwards.

How to mock streams in NodeJS

I'm attempting to unit test one of my node-js modules which deals heavily in streams. I'm trying to mock a stream (that I will write to), as within my module I have ".on('data/end)" listeners that I would like to trigger. Essentially I want to be able to do something like this:
var mockedStream = new require('stream').readable();
mockedStream.on('data', function withData('data') {
console.dir(data);
});
mockedStream.on('end', function() {
console.dir('goodbye');
});
mockedStream.push('hello world');
mockedStream.close();
This executes, but the 'on' event never gets fired after I do the push (and .close() is invalid).
All the guidance I can find on streams uses the 'fs' or 'net' library as a basis for creating a new stream (https://github.com/substack/stream-handbook), or they mock it out with sinon but the mocking gets very lengthy very quicky.
Is there a nice way to provide a dummy stream like this?
There's a simpler way: stream.PassThrough
I've just found Node's very easy to miss stream.PassThrough class, which I believe is what you're looking for.
From Node docs:
The stream.PassThrough class is a trivial implementation of a Transform stream that simply passes the input bytes across to the output. Its purpose is primarily for examples and testing...
The code from the question, modified:
const { PassThrough } = require('stream');
const mockedStream = new PassThrough(); // <----
mockedStream.on('data', (d) => {
console.dir(d);
});
mockedStream.on('end', function() {
console.dir('goodbye');
});
mockedStream.emit('data', 'hello world');
mockedStream.end(); // <-- end. not close.
mockedStream.destroy();
mockedStream.push() works too but as a Buffer so you'll might want to do: console.dir(d.toString());
Instead of using Push, I should have been using ".emit(<event>, <data>);"
My mock code now works and looks like:
var mockedStream = new require('stream').Readable();
mockedStream._read = function(size) { /* do nothing */ };
myModule.functionIWantToTest(mockedStream); // has .on() listeners in it
mockedStream.emit('data', 'Hello data!');
mockedStream.emit('end');
The accept answer is only partially correct. If all you need is events to fire, using .emit('data', datum) is okay, but if you need to pipe this mock stream anywhere else it won't work.
Mocking a Readable stream is surprisingly easy, requiring only the Readable lib.
let eventCount = 0;
const mockEventStream = new Readable({
objectMode: true,
read: function (size) {
if (eventCount < 10) {
eventCount = eventCount + 1;
return this.push({message: `event${eventCount}`})
} else {
return this.push(null);
}
}
});
Now you can pipe this stream wherever and 'data' and 'end' will fire.
Another example from the node docs:
https://nodejs.org/api/stream.html#stream_an_example_counting_stream
Building on #flacnut 's answer, I did this (in NodeJS 12+) using Readable.from() to construct a stream preloaded with data (a list of filenames):
const mockStream = require('stream').Readable.from([
'file1.txt',
'file2.txt',
'file3.txt',
])
In my case, I wanted to mock the stream of filenames returned by fast-glob.stream:
const glob = require('fast-glob')
// inject the mock stream into glob module
glob.stream = jest.fn().mockReturnValue(mockStream)
In the function being tested:
const stream = glob.stream(globFilespec)
for await (const filename of stream) {
// filename = file1.txt, then file2.txt, then file3.txt
}
Works like a charm!
Here's a simple implementation which uses jest.fn() where the goal is to validate what has been written to the stream created by fs.createWriteStream(). The nice thing about jest.fn() is that although the calls to fs.createWriteStream() and stream.write() are inline in this test function, these functions don't need to be called directly by the test.
const fs = require('fs');
const mockStream = {}
test('mock fs.createWriteStream with mock implementation', async () => {
const createMockWriteStream = (filename, args) => {
return mockStream;
}
mockStream3.write = jest.fn();
fs.createWriteStream = jest.fn(createMockWriteStream);
const stream = fs.createWriteStream('foo.csv', {'flags': 'a'});
await stream.write('foobar');
expect(fs.createWriteStream).toHaveBeenCalledWith('foo.csv', {'flags': 'a'});
expect(mockStream.write).toHaveBeenCalledWith('foobar');
})

Clone a transform stream in NodeJS

I'm trying to reuse a couple of transform streams (gulp-like, ie. concat() or uglify()) across several readable streams. I only have access to the created instances and not the original subclasses.
It does not work out of the box, I get Error: stream.push() after EOF when I pipe at lease two distinct readable streams into my transforms. Somehow the events do appear to leak from one stream to the other.
I've tried to setup a cloneTransform function to somehow cleanly "fork" into two distincts transforms, however I can't get it to not share events:
function cloneTransform(transform) {
var ts = new Transform({objectMode: true});
ts._transform = transform._transform.bind(ts);
if(typeof transform._flush !== 'undefined') {
ts._flush = transform._flush.bind(ts);
}
return ts;
}
Any alternative idea, existing plugin or solution to address this?
Update: context
I'm working on a rewrite of the gulp-usemin package, it is hosted here: gulp-usemin-stream, example use here.
Basically you parse an index.html looking for comment blocks surrounding styles/scripts declarations, and you want to apply several configurable transformation to theses files (see grunt-usemin).
So the problem I'm trying to solve is to reuse an array of transforms, [concat(), uglify()] that are passed as options to a gulp meta transform.
you're doing unnecessary work in your code. it should be as simple as:
function clone(transform) {
var ts = new Transform({ objectMode: true})
ts._transform = transform._transform
ts._flush = transform._flush
return ts
}
however, i'm not sure how you're trying to use it, so i'm not sure what error you're getting. there's also the issue of initialization where the initial transform stream might initialize the stream with .queue = [] or something, and you won't be initializing it in your clone.
the solution completely depends on the problem you're trying to solve, and i feel like you're approaching the problem incorrectly in the first place.
It looks like you're trying to use a Transform stream directly instead of subclassing it. What you should be doing is subclassing Transform and overriding the _tranform and, optionally, _flush methods. That way, you can just create a new instance of your transform stream for each readable stream that you need to use it with.
Example:
var util = require('util');
var Transform = require('stream').Transform;
function MyTransform(options) {
Transform.call(this, options);
// ... any setup you want to do ...
}
util.inherits(MyTransform, Transform);
MyTransform.protocol._transform = function(chunk, encoding, done) {
// ... your _transform implementation ...
}
// Optional
MyTransform.protocol._flush = function(done) {
// ... optional flush implementation ...
}
Once you have that setup, you can simply create new instances of MyTransform for each stream you want to use it with:
var readStream1 = ...
var readStream2 = ...
var transform1 = new MyTransform();
var transform2 = new MyTransform();
readStream1.pipe(transform1).pipe(...);
readStream2.pipe(transform2).pipe(...);

Resources