I've seen and read a few tutorials that state you can pipe one stream to another almost like lego blocks, but I can't find anything on how to catch a pipe command when a stream is piped to your object.
What I mean is how do I create an object with functions so I can do:
uploadWrapper = function (client, file, callback) {
upload = function (client,file,callback){
var file = file
// this.data = 'undefined'
stream.Writable.call(this);
this.end = function () {
if(typeof this.data !== 'undefined') file.data = this.data
callback(file.data,200)
}
// var path = urlB.host('upload').object('files',file.id).action('content').url
// // client.upload(path,file,callback)
}
util.inherits(upload,stream.Writable)
upload.prototype._write = function (chunk, encoding, callback) {
this.data = this.data + chunk.toString('utf8')
callback()
}
return new upload(client,file,callback)
}
exports.upload = uploadWrapper
How do I handle when data is piped to my object?
I've looked but I can't really find anything about this (maybe I haven't looked in the write places?).
Can any one point me in the right direction?
If it helps to know it, all I Want to be able to do is catch a data stream and build a string containing data with binary encoding; whether it's from a file-stream or a request stream from a server(i.e. the data from a file of a multipart request) object.
EDIT: I've updated the code to log the data
EDIT: I've fixed it, I can now receive piped data, I had to put the code in a wrapper that returned the function that implemented stream.
EDIT: different problem now, this.data in _read isn't storing in a way that this.data in the upload function can read.
EDIT: OK, now I can deal with the callback and catch the data, I need to work out how to tell if data is being piped to it or if it's being used as a normal function.
If you want to create your own stream that can be piped to and/or from, look at the node docs for implementing streams.
Related
I want to pipe data from my readable stream to a writable stream but validate in between.
In my case:
Readable Stream: http response as a stream (Axios.post response as a stream to be more specific)
Writable Stream: AWS S3
Axios.post response comes in XML format. So, it means the readable stream will read chunks that represent XML. I transform each chunk to string and check if <specificTag> (opening) and </specificTag> closing is available. Both these checks will be done in different or arbitrary chunks.
If both opening/closing tags are OK then I have to transfer the chunk to Writable stream.
I am coding like:
let openTagFound: boolean: false;
let closingTagFound: boolean: false;
readableStream.pipe(this.validateStreamData()).pipe(writableStream);
I have also defined _tranform method for validateStreamData() like:
private validateStreamData(): Transform {
let data = '', transformStream = new Transform();
let openTagFound: boolean = false;
let closingTagFound: boolean = false;
try {
transformStream._transform = function (chunk, _encoding, done) {
// Keep chunk in memory
data += chunk.toString();
if(!openTagFound) {
// Check whether openTag e.g <specificTag> is found, if yes
openTagFound = true;
}
if(!closingTagFound) {
// parse the chunk using parser
// Check whether closingTag e.g </specificTag> is found, if yes
closingTagFound = true;
}
// we are not writing anything out at this
// time, only at end during _flush
// so we don't need to call push
done();
};
transformStream._flush = function (done) {
if(openTagFound && closingTagFound) {
this.push(data);
}
done();
};
return transformStream;
} catch (ex) {
this.logger.error(ex);
transformStream.end();
throw Error(ex);
}
}
Now, you can see that I am using a variable data at:
// Keep chunk in memory
data += chunk.toString();
I want to get rid of this. I do not want to utilize memory explicitly. The final goal is to get data from Axios.post and transfer it to AWS S3, only if my validation succeeds. If not, then it should not write to S3.
Any help is much appreciated.
Thanks in Advance!!!
So, What I finally did is, let the pipe end and kept some flags to check whether it is valid or invalid and then on('end') callback, if flag says invalid explicitly destroyed destination object.
I'm familiar with Node streams, but I'm struggling on best practices for abstracting code that I reuse a lot into a single pipe step.
Here's a stripped down version of what I'm writing today:
inputStream
.pipe(csv.parse({columns:true})
.pipe(csv.transform(function(row) {return transform(row); }))
.pipe(csv.stringify({header: true})
.pipe(outputStream);
The actual work happens in transform(). The only things that really change are inputStream, transform(), and outputStream. Like I said, this is a stripped down version of what I actually use. I have a lot of error handling and logging on each pipe step, which is ultimately why I'm try to abstract the code.
What I'm looking to write is a single pipe step, like so:
inputStream
.pipe(csvFunction(transform(row)))
.pipe(outputStream);
What I'm struggling to understand is how to turn those pipe steps into a single function that accepts a stream and returns a stream. I've looked at libraries like through2 but I'm but not sure how that get's me to where I'm trying to go.
You can use the PassThrough class like this:
var PassThrough = require('stream').PassThrough;
var csvStream = new PassThrough();
csvStream.on('pipe', function (source) {
// undo piping of source
source.unpipe(this);
// build own pipe-line and store internally
this.combinedStream =
source.pipe(csv.parse({columns: true}))
.pipe(csv.transform(function (row) {
return transform(row);
}))
.pipe(csv.stringify({header: true}));
});
csvStream.pipe = function (dest, options) {
// pipe internal combined stream to dest
return this.combinedStream.pipe(dest, options);
};
inputStream
.pipe(csvStream)
.pipe(outputStream);
Here's what I ended up going with. I used the through2 library and the streaming API of the csv library to create the pipe function I was looking for.
var csv = require('csv');
through = require('through2');
module.exports = function(transformFunc) {
parser = csv.parse({columns:true, relax_column_count:true}),
transformer = csv.transform(function(row) {
return transformFunc(row);
}),
stringifier = csv.stringify({header: true});
return through(function(chunk,enc,cb){
var stream = this;
parser.on('data', function(data){
transformer.write(data);
});
transformer.on('data', function(data){
stringifier.write(data);
});
stringifier.on('data', function(data){
stream.push(data);
});
parser.write(chunk);
parser.removeAllListeners('data');
transformer.removeAllListeners('data');
stringifier.removeAllListeners('data');
cb();
})
}
It's worth noting the part where I remove the event listeners towards the end, this was due to running into memory errors where I had created too many event listeners. I initially tried solving this problem by listening to events with once, but that prevented subsequent chunks from being read and passed on to the next pipe step.
Let me know if anyone has feedback or additional ideas.
How to reset nodejs stream?
How to read stream again in nodejs?
Thanks in advance!
var fs = require('fs');
var lineReader = require('line-reader');
// proxy.txt = only 3 lines
var readStream = fs.createReadStream('proxy.txt');
lineReader.open(readStream, function (err, reader) {
for(var i=0; i<6; i++) {
reader.nextLine(function(err, line) {
if(err) {
readStream.reset(); // ???
} else {
console.log(line);
}
});
}
});
There are two ways of solving your problem, as someone commented before you could simply wrap all that in a function and instead of resetting - simply read the file again.
Ofc this won't work well with HTTP requests for example so the other way, provided that you do take a much bigger memory usage into account, you can simply accumulate your data.
What you'd need is to implement some sort of "rewindable stream" - this means that you'd essentially need to implement a Transform stream that would keep a list of all the buffers and write them to a piped stream on a rewind method.
Take a look at the node API for streams here, the methods should look somewhat like this.
class Rewindable extends Transform {
constructor() {
super();
this.accumulator = [];
}
_transform(buf, enc, cb) {
this.accumulator.push(buf);
callback()
}
rewind() {
var stream = new PassThrough();
this.accumulator.forEach((chunk) => stream.write(chunk))
return stream;
}
And you would use this like this:
var readStream = fs.createReadStream('proxy.txt');
var rewindableStream = readStream.pipe(new Rewindable());
(...).on("whenerver-you-want-to-reset", () => {
var rewound = rewindablesteram.rewind();
/// and do whatever you like with your stream.
});
Actually I think I'll add this to my scramjet. :)
Edit
I released the logic below in rereadable-stream npm package. The upshot over the stream depicted here is that you can now control the buffer length and get rid of the read data.
At the same time you can keep a window of count items and tail a number of chunks backwards.
Basically I have a file, say, 100mb.qs and I need to pass its entire contents through the following function:
function process(in){
var out = JSON.stringify(require('querystring').parse(in));
return out;
}
And then replace the file's contents with the result.
I imagine that I'll have to stream it, so...
require('fs').createReadStream('1mb.qs').pipe( /* ??? */ )
What to I do?
You should take a look at clarinet for parsing JSON as a stream.
var createReadStream = require('fs').createReadStream
, createWriteStream = require('fs').createReadStream
, parseJson = require('clarinet').createStream()
;
parseJson.on('error', function(err){
if (err) throw err
})
parseJson.on('onvalue', function(v){
// do stuff with value
})
parseJson.on('onopenobject', function (key) {
// I bet you got the idea how this works :)
})
createReadStream('100mb.qs')
.pipe(parseJson)
.pipe(createWriteStream('newerFile.qs'))
there is many more events to listen to, so you definitely show take a look.
Also, it will send data down stream whenever a JSON node is ready to be written. It couldn't get better then this.
Hope this helps
In Meteor, on the server side, I want to use the .find() function on a Collection and then get a Node ReadStream interface from the curser that is returned. I've tried using .stream() on the curser as described in the mongoDB docs Seen Here. However I get the error "Object [object Object] has no method 'stream'" So it looks like Meteor collections don't have this option. Is there a way to get a stream from a Meteor Collection's curser?
I am trying to export some data to CSV and I want to pipe the data directly from the collections stream into a CSV parser and then into the response going back to the user. I am able to get the response stream from the Router package we are using, and it's all working except for getting a stream from the collection. Fetching the array from the find to push it into the stream manually would defeat the purpose of a stream since it would put everything in memory. I guess my other option is to use a foreach on the collection and push the rows into the stream one by one, but this seems dirty when I could pipe the stream directly through the parser with a transform on it.
Here's some sample code of what I am trying to do:
response.writeHead(200,{'content-type':'text/csv'});
// Set up a future
var fut = new Future();
var users = Users.find({}).stream();
CSV().from(users)
.to(response)
.on('end', function(count){
log.verbose('finished csv export');
response.end();
fut.ret();
});
return fut.wait();
Have you tried creating a custom function and piping to it?
Though this would only work if Users.find() supported .pipe()(again, only if Users.find inherited from node.js streamble object).
Kind of like
var stream = require('stream')
var util = require('util')
streamreader = function (){
stream.Writable.call(this)
this.end = function() {
console.log(this.data) //this.data contains raw data in a string so do what you need to to make it usable, i.e, do a split on ',' or something or whatever it is you need to make it usable
db.close()
})
}
util.inherits(streamreader,stream.Writeable)
stream.prototype._write = function (chunk, encoding, callback) {
this.data = this.data + chunk.toString('utf8')
callback()
}
Users.find({}).pipe(new streamReader())