How to reset nodejs stream? - node.js

How to reset nodejs stream?
How to read stream again in nodejs?
Thanks in advance!
var fs = require('fs');
var lineReader = require('line-reader');
// proxy.txt = only 3 lines
var readStream = fs.createReadStream('proxy.txt');
lineReader.open(readStream, function (err, reader) {
for(var i=0; i<6; i++) {
reader.nextLine(function(err, line) {
if(err) {
readStream.reset(); // ???
} else {
console.log(line);
}
});
}
});

There are two ways of solving your problem, as someone commented before you could simply wrap all that in a function and instead of resetting - simply read the file again.
Ofc this won't work well with HTTP requests for example so the other way, provided that you do take a much bigger memory usage into account, you can simply accumulate your data.
What you'd need is to implement some sort of "rewindable stream" - this means that you'd essentially need to implement a Transform stream that would keep a list of all the buffers and write them to a piped stream on a rewind method.
Take a look at the node API for streams here, the methods should look somewhat like this.
class Rewindable extends Transform {
constructor() {
super();
this.accumulator = [];
}
_transform(buf, enc, cb) {
this.accumulator.push(buf);
callback()
}
rewind() {
var stream = new PassThrough();
this.accumulator.forEach((chunk) => stream.write(chunk))
return stream;
}
And you would use this like this:
var readStream = fs.createReadStream('proxy.txt');
var rewindableStream = readStream.pipe(new Rewindable());
(...).on("whenerver-you-want-to-reset", () => {
var rewound = rewindablesteram.rewind();
/// and do whatever you like with your stream.
});
Actually I think I'll add this to my scramjet. :)
Edit
I released the logic below in rereadable-stream npm package. The upshot over the stream depicted here is that you can now control the buffer length and get rid of the read data.
At the same time you can keep a window of count items and tail a number of chunks backwards.

Related

NodeJS - read and write file causes corruption

I'm kinda new to NodeJS and I'm working on a simple file encoder.
I planned to change the very first 20kb of a file and just copy the rest of it.
So I used the following code, but it changed some bytes in the rest of the file.
Here is my code:
var fs = require('fs');
var config = require('./config');
fs.open(config.encodeOutput, 'w', function(err, fw) {
if(err) {
console.log(err);
} else {
fs.readFile(config.source, function(err, data) {
var start = 0;
var buff = readChunk(data, start);
while(buff.length) {
if(start < config.encodeSize) {
var buffer = makeSomeChanges(buff);
writeChunk(fw, buffer);
} else {
writeChunk(fw, buff);
}
start += config.ENCODE_BUFFER_SIZE;
buff = readChunk(data, start);
}
});
}
});
function readChunk(buffer, start) {
return buffer.slice(start, start + config.ENCODE_BUFFER_SIZE);
}
function writeChunk(fd, chunk) {
fs.writeFile(fd, chunk, {encoding: 'binary', flag: 'a'});
}
I opened encoded file and compared it with the original file.
I even commented these parts:
//if(start < config.encodeSize) {
// var buffer = makeSomeChanges(buff);
// writeChunk(fw, buffer);
//} else {
writeChunk(fw, buff);
//}
So my program just copies the file, but it still changes some bytes.
What is wrong?
So I checked the pattern and I realized some bytes are not in the right place and I guessed that it should be because I'm using async write function.
I changed fs.writeFile() to fs.writeFileSync() and everything is working fine now.
Since you were using asynchronous IO, you should've been waiting for a queue of operations, as multiple writes happening at the same time are likely to end up corrupting your file. This explains why your issue is solved using synchronous IO — this way, a further write cannot start before the previous one completed.
However, using synchronous APIs when asynchronous ones are available is a poor choice, due to which your program will be actually blocked while it writes to the file. You should go for async and create a queue to wait for.

How to mock streams in NodeJS

I'm attempting to unit test one of my node-js modules which deals heavily in streams. I'm trying to mock a stream (that I will write to), as within my module I have ".on('data/end)" listeners that I would like to trigger. Essentially I want to be able to do something like this:
var mockedStream = new require('stream').readable();
mockedStream.on('data', function withData('data') {
console.dir(data);
});
mockedStream.on('end', function() {
console.dir('goodbye');
});
mockedStream.push('hello world');
mockedStream.close();
This executes, but the 'on' event never gets fired after I do the push (and .close() is invalid).
All the guidance I can find on streams uses the 'fs' or 'net' library as a basis for creating a new stream (https://github.com/substack/stream-handbook), or they mock it out with sinon but the mocking gets very lengthy very quicky.
Is there a nice way to provide a dummy stream like this?
There's a simpler way: stream.PassThrough
I've just found Node's very easy to miss stream.PassThrough class, which I believe is what you're looking for.
From Node docs:
The stream.PassThrough class is a trivial implementation of a Transform stream that simply passes the input bytes across to the output. Its purpose is primarily for examples and testing...
The code from the question, modified:
const { PassThrough } = require('stream');
const mockedStream = new PassThrough(); // <----
mockedStream.on('data', (d) => {
console.dir(d);
});
mockedStream.on('end', function() {
console.dir('goodbye');
});
mockedStream.emit('data', 'hello world');
mockedStream.end(); // <-- end. not close.
mockedStream.destroy();
mockedStream.push() works too but as a Buffer so you'll might want to do: console.dir(d.toString());
Instead of using Push, I should have been using ".emit(<event>, <data>);"
My mock code now works and looks like:
var mockedStream = new require('stream').Readable();
mockedStream._read = function(size) { /* do nothing */ };
myModule.functionIWantToTest(mockedStream); // has .on() listeners in it
mockedStream.emit('data', 'Hello data!');
mockedStream.emit('end');
The accept answer is only partially correct. If all you need is events to fire, using .emit('data', datum) is okay, but if you need to pipe this mock stream anywhere else it won't work.
Mocking a Readable stream is surprisingly easy, requiring only the Readable lib.
let eventCount = 0;
const mockEventStream = new Readable({
objectMode: true,
read: function (size) {
if (eventCount < 10) {
eventCount = eventCount + 1;
return this.push({message: `event${eventCount}`})
} else {
return this.push(null);
}
}
});
Now you can pipe this stream wherever and 'data' and 'end' will fire.
Another example from the node docs:
https://nodejs.org/api/stream.html#stream_an_example_counting_stream
Building on #flacnut 's answer, I did this (in NodeJS 12+) using Readable.from() to construct a stream preloaded with data (a list of filenames):
const mockStream = require('stream').Readable.from([
'file1.txt',
'file2.txt',
'file3.txt',
])
In my case, I wanted to mock the stream of filenames returned by fast-glob.stream:
const glob = require('fast-glob')
// inject the mock stream into glob module
glob.stream = jest.fn().mockReturnValue(mockStream)
In the function being tested:
const stream = glob.stream(globFilespec)
for await (const filename of stream) {
// filename = file1.txt, then file2.txt, then file3.txt
}
Works like a charm!
Here's a simple implementation which uses jest.fn() where the goal is to validate what has been written to the stream created by fs.createWriteStream(). The nice thing about jest.fn() is that although the calls to fs.createWriteStream() and stream.write() are inline in this test function, these functions don't need to be called directly by the test.
const fs = require('fs');
const mockStream = {}
test('mock fs.createWriteStream with mock implementation', async () => {
const createMockWriteStream = (filename, args) => {
return mockStream;
}
mockStream3.write = jest.fn();
fs.createWriteStream = jest.fn(createMockWriteStream);
const stream = fs.createWriteStream('foo.csv', {'flags': 'a'});
await stream.write('foobar');
expect(fs.createWriteStream).toHaveBeenCalledWith('foo.csv', {'flags': 'a'});
expect(mockStream.write).toHaveBeenCalledWith('foobar');
})

How can I chain streams internally within a custom through2 stream

I'm writing my own through stream in Node which takes in a text stream and outputs an object per line of text. This is what the end result should look like:
fs.createReadStream('foobar')
.pipe(myCustomPlugin());
The implementation would use through2 and event-stream to make things easy:
var es = require('event-stream');
var through = require('through2');
module.exports = function myCustomPlugin() {
var parse = through.obj(function(chunk, enc, callback) {
this.push({description: chunk});
callback();
});
return es.split().pipe(parse);
};
However, if I were to pull this apart essentially what I did was:
fs.createReadStream('foobar')
.pipe(
es.split()
.pipe(parse)
);
Which is incorrect. Is there a better way? Can I inherit es.split() instead of use it inside the implementation? Is there an easy way to implement splits on lines without event-stream or similar? Would a different pattern work better?
NOTE: I'm intentionally doing the chaining inside the function as the myCustomPlugin() is the API interface I'm attempting to expose.
Based on the link in the previously accepted answer that put me on the right googling track, here's a shorter version if you don't mind another module: stream-combiner (read the code to convince yourself of what's going on!)
var combiner = require('stream-combiner')
, through = require('through2')
, split = require('split2')
function MyCustomPlugin() {
var parse = through(...)
return combine( split(), parse )
}
I'm working on something similar.
See this solution: Creating a Node.js stream from two piped streams
var outstream = through2().on('pipe', function(source) {
source.unpipe(this);
this.transformStream = source.pipe(stream1).pipe(stream2);
});
outstream.pipe = function(destination, options) {
return this.transformStream.pipe(destination, options);
};
return outstream;

Is it possible to register multiple listeners to a child process's stdout data event? [duplicate]

I need to run two commands in series that need to read data from the same stream.
After piping a stream into another the buffer is emptied so i can't read data from that stream again so this doesn't work:
var spawn = require('child_process').spawn;
var fs = require('fs');
var request = require('request');
var inputStream = request('http://placehold.it/640x360');
var identify = spawn('identify',['-']);
inputStream.pipe(identify.stdin);
var chunks = [];
identify.stdout.on('data',function(chunk) {
chunks.push(chunk);
});
identify.stdout.on('end',function() {
var size = getSize(Buffer.concat(chunks)); //width
var convert = spawn('convert',['-','-scale',size * 0.5,'png:-']);
inputStream.pipe(convert.stdin);
convert.stdout.pipe(fs.createWriteStream('half.png'));
});
function getSize(buffer){
return parseInt(buffer.toString().split(' ')[2].split('x')[0]);
}
Request complains about this
Error: You cannot pipe after data has been emitted from the response.
and changing the inputStream to fs.createWriteStream yields the same issue of course.
I don't want to write into a file but reuse in some way the stream that request produces (or any other for that matter).
Is there a way to reuse a readable stream once it finishes piping?
What would be the best way to accomplish something like the above example?
You have to create duplicate of the stream by piping it to two streams. You can create a simple stream with a PassThrough stream, it simply passes the input to the output.
const spawn = require('child_process').spawn;
const PassThrough = require('stream').PassThrough;
const a = spawn('echo', ['hi user']);
const b = new PassThrough();
const c = new PassThrough();
a.stdout.pipe(b);
a.stdout.pipe(c);
let count = 0;
b.on('data', function (chunk) {
count += chunk.length;
});
b.on('end', function () {
console.log(count);
c.pipe(process.stdout);
});
Output:
8
hi user
The first answer only works if streams take roughly the same amount of time to process data. If one takes significantly longer, the faster one will request new data, consequently overwriting the data still being used by the slower one (I had this problem after trying to solve it using a duplicate stream).
The following pattern worked very well for me. It uses a library based on Stream2 streams, Streamz, and Promises to synchronize async streams via a callback. Using the familiar example from the first answer:
spawn = require('child_process').spawn;
pass = require('stream').PassThrough;
streamz = require('streamz').PassThrough;
var Promise = require('bluebird');
a = spawn('echo', ['hi user']);
b = new pass;
c = new pass;
a.stdout.pipe(streamz(combineStreamOperations));
function combineStreamOperations(data, next){
Promise.join(b, c, function(b, c){ //perform n operations on the same data
next(); //request more
}
count = 0;
b.on('data', function(chunk) { count += chunk.length; });
b.on('end', function() { console.log(count); c.pipe(process.stdout); });
You can use this small npm package I created:
readable-stream-clone
With this you can reuse readable streams as many times as you need
For general problem, the following code works fine
var PassThrough = require('stream').PassThrough
a=PassThrough()
b1=PassThrough()
b2=PassThrough()
a.pipe(b1)
a.pipe(b2)
b1.on('data', function(data) {
console.log('b1:', data.toString())
})
b2.on('data', function(data) {
console.log('b2:', data.toString())
})
a.write('text')
I have a different solution to write to two streams simultaneously, naturally, the time to write will be the addition of the two times, but I use it to respond to a download request, where I want to keep a copy of the downloaded file on my server (actually I use a S3 backup, so I cache the most used files locally to avoid multiple file transfers)
/**
* A utility class made to write to a file while answering a file download request
*/
class TwoOutputStreams {
constructor(streamOne, streamTwo) {
this.streamOne = streamOne
this.streamTwo = streamTwo
}
setHeader(header, value) {
if (this.streamOne.setHeader)
this.streamOne.setHeader(header, value)
if (this.streamTwo.setHeader)
this.streamTwo.setHeader(header, value)
}
write(chunk) {
this.streamOne.write(chunk)
this.streamTwo.write(chunk)
}
end() {
this.streamOne.end()
this.streamTwo.end()
}
}
You can then use this as a regular OutputStream
const twoStreamsOut = new TwoOutputStreams(fileOut, responseStream)
and pass it to to your method as if it was a response or a fileOutputStream
If you have async operations on the PassThrough streams, the answers posted here won't work.
A solution that works for async operations includes buffering the stream content and then creating streams from the buffered result.
To buffer the result you can use concat-stream
const Promise = require('bluebird');
const concat = require('concat-stream');
const getBuffer = function(stream){
return new Promise(function(resolve, reject){
var gotBuffer = function(buffer){
resolve(buffer);
}
var concatStream = concat(gotBuffer);
stream.on('error', reject);
stream.pipe(concatStream);
});
}
To create streams from the buffer you can use:
const { Readable } = require('stream');
const getBufferStream = function(buffer){
const stream = new Readable();
stream.push(buffer);
stream.push(null);
return Promise.resolve(stream);
}
What about piping into two or more streams not at the same time ?
For example :
var PassThrough = require('stream').PassThrough;
var mybiraryStream = stream.start(); //never ending audio stream
var file1 = fs.createWriteStream('file1.wav',{encoding:'binary'})
var file2 = fs.createWriteStream('file2.wav',{encoding:'binary'})
var mypass = PassThrough
mybinaryStream.pipe(mypass)
mypass.pipe(file1)
setTimeout(function(){
mypass.pipe(file2);
},2000)
The above code does not produce any errors but the file2 is empty

node, How to handle/catch pipe

I've seen and read a few tutorials that state you can pipe one stream to another almost like lego blocks, but I can't find anything on how to catch a pipe command when a stream is piped to your object.
What I mean is how do I create an object with functions so I can do:
uploadWrapper = function (client, file, callback) {
upload = function (client,file,callback){
var file = file
// this.data = 'undefined'
stream.Writable.call(this);
this.end = function () {
if(typeof this.data !== 'undefined') file.data = this.data
callback(file.data,200)
}
// var path = urlB.host('upload').object('files',file.id).action('content').url
// // client.upload(path,file,callback)
}
util.inherits(upload,stream.Writable)
upload.prototype._write = function (chunk, encoding, callback) {
this.data = this.data + chunk.toString('utf8')
callback()
}
return new upload(client,file,callback)
}
exports.upload = uploadWrapper
How do I handle when data is piped to my object?
I've looked but I can't really find anything about this (maybe I haven't looked in the write places?).
Can any one point me in the right direction?
If it helps to know it, all I Want to be able to do is catch a data stream and build a string containing data with binary encoding; whether it's from a file-stream or a request stream from a server(i.e. the data from a file of a multipart request) object.
EDIT: I've updated the code to log the data
EDIT: I've fixed it, I can now receive piped data, I had to put the code in a wrapper that returned the function that implemented stream.
EDIT: different problem now, this.data in _read isn't storing in a way that this.data in the upload function can read.
EDIT: OK, now I can deal with the callback and catch the data, I need to work out how to tell if data is being piped to it or if it's being used as a normal function.
If you want to create your own stream that can be piped to and/or from, look at the node docs for implementing streams.

Resources