How to use Transform stream to validate data in Node.js? - node.js

I want to pipe data from my readable stream to a writable stream but validate in between.
In my case:
Readable Stream: http response as a stream (Axios.post response as a stream to be more specific)
Writable Stream: AWS S3
Axios.post response comes in XML format. So, it means the readable stream will read chunks that represent XML. I transform each chunk to string and check if <specificTag> (opening) and </specificTag> closing is available. Both these checks will be done in different or arbitrary chunks.
If both opening/closing tags are OK then I have to transfer the chunk to Writable stream.
I am coding like:
let openTagFound: boolean: false;
let closingTagFound: boolean: false;
readableStream.pipe(this.validateStreamData()).pipe(writableStream);
I have also defined _tranform method for validateStreamData() like:
private validateStreamData(): Transform {
let data = '', transformStream = new Transform();
let openTagFound: boolean = false;
let closingTagFound: boolean = false;
try {
transformStream._transform = function (chunk, _encoding, done) {
// Keep chunk in memory
data += chunk.toString();
if(!openTagFound) {
// Check whether openTag e.g <specificTag> is found, if yes
openTagFound = true;
}
if(!closingTagFound) {
// parse the chunk using parser
// Check whether closingTag e.g </specificTag> is found, if yes
closingTagFound = true;
}
// we are not writing anything out at this
// time, only at end during _flush
// so we don't need to call push
done();
};
transformStream._flush = function (done) {
if(openTagFound && closingTagFound) {
this.push(data);
}
done();
};
return transformStream;
} catch (ex) {
this.logger.error(ex);
transformStream.end();
throw Error(ex);
}
}
Now, you can see that I am using a variable data at:
// Keep chunk in memory
data += chunk.toString();
I want to get rid of this. I do not want to utilize memory explicitly. The final goal is to get data from Axios.post and transfer it to AWS S3, only if my validation succeeds. If not, then it should not write to S3.
Any help is much appreciated.
Thanks in Advance!!!

So, What I finally did is, let the pipe end and kept some flags to check whether it is valid or invalid and then on('end') callback, if flag says invalid explicitly destroyed destination object.

Related

How do I stream a chunked file using Node.js Readable?

I have a 400Mb file split into chunks that are ~1Mb each.
Each chunk is a MongoDB document:
{
name: 'stuff.zip',
index: 15,
buffer: Binary('......'),
totalChunks: 400
}
I am fetching each chunk from my database and then streaming it to the client.
Every time I get chunk from the DB I push it to the readableStream which is being piped to the client.
Here is the code:
import { Readable } from 'stream'
const name = 'stuff.zip'
const contentType = 'application/zip'
app.get('/api/download-stuff', (req, res) => {
res.set('Content-Type', contentType)
res.set('Content-Disposition', `attachment; filename=${name}`)
res.attachment(name)
// get `totalChunks` from random chunk
let { totalChunks } = await ChunkModel.findOne({ name }).select('totalChunks')
let index = 0
const readableStream = new Readable({
async read() {
if (index < totalChunks) {
let { buffer } = await ChunkModel.findOne({ name, index }).select('buffer')
let canContinue = readableStream.push(buffer)
console.log(`pushed chunk ${index}/${totalChunks}`)
index++
// sometimes it logs false
// which means I should be waiting before pushing more
// but I don't know how
console.log('canContinue = ', canContinue)
} else {
readableStream.push(null)
readableStream.destroy()
console.log(`all ${totalChunks} chunks streamed to the client`)
}
}
})
readableStream.pipe(res)
})
The code works.
But I'm wondering whether I risk having memory overflows on my local server memory, especially when the requests for the same file are too many or the chunks are too many.
Question: My code is not waiting for readableStream to finish reading the chunk that was just pushed to it, before pushing the next one. I thought it was, and that is why I'm using read(){..} in this probably wrong way. So how should I wait for each chunk to be pushed, read, streamed to the client and cleared from my server's local memory, before I push the next one in ?
I have created this sandbox in case it helps anyone
In general, when the readable interface is implemented correctly (i.e., the backpressure signal is respected), the readable interface will prevent the code from overflowing the memory regardless of source size.
When implemented according to the API spec, the readable itself does not keep references for data that has finished passing through the stream. The memory requirement of a readable buffer is adjusted by specifying a highWatermark.
In this case, the snippet does not conform to the readable interface. It violates the following two concepts:
No data shall be pushed to the readable's buffer unless read() has been called. Currently, this implementation proceeds to push data from DB immediately. Consequently, the readable buffer will start to fill before the sink has begun to consume data.
The readable's push() method returns a boolean flag. When the flag is false, the implementation must wait for .read() to be called before pushing additional data. If the flag is ignored, the buffer will overflow wrt. the highWatermark.
Note that ignoring these core criteria of Readables circumvents the backpressure logic.
An alternative implementation, if this is a Mongoose query:
app.get('/api/download-stuff', async (req, res) => {
// ... truncated handler
// A helper variable to relay data from the stream to the response body
const passThrough = new stream.PassThrough({objectMode: false});
// Pipe data using pipeline() to simplify handling stream errors
stream.pipeline(
// Create a cursor that fetch all relevant documents using a single query
ChunkModel.find().limit(chunksLength).select("buffer").sort({index: 1}).lean().cursor(),
// Cherry pick the `buffer` property
new stream.Transform({
objectMode: true,
transform: ({ buffer }, encoding, next) => {
next(null, buffer);
}
}),
// Write the retrieved documents to the helper variable
passThrough,
error => {
if(error){
// Log and handle error. At this point the HTTP headers are probably already sent,
// and it is therefore too late to return HTTP500
}
}
);
res.body = passThrough;
});

Json doesnt save properly using node.js

I am connected to websocket, each time i get message i save its content to a json file.If i get two or more messages in the same second it doesnt save it properly.How can i prevent that ?Each time i get message I am using :
fs.readFile(bought_path,'utf-8',(err,data) =>{ ...
//do something
to read json file , and
fs.writeFile(bought_path, JSON.stringify(kupljeni_itemi) , 'utf-8');
to save edited json file.
One way to guard is to make a simple locking mechanism:
let isLocked = false; // declare it in an upper scope.
if (!isLocked) { // check if it is not locked by other socket call.
isLocked = true; // set the lock before writing the content
fs.writeFile(file, json, (err) => {
isLocked = false; // unlock when you get the response
})
}
you could use synchronous read/write functions -
readFileSync and writeFileSync

How to correctly calculate the the number of bytes of a node.js stream that have been processed?

I have a stream I'm sending over the wire and takes a bit of time to fully send, so I want to display how far along it is on the fly. I know you can listen on the 'data' event for streams, but in newer versions of node, it also puts the stream into "flowing mode". I want to make sure i'm doing this correctly.
Currently I have the following stuff:
deploymentPackageStream.pause() // to prevent it from entering "flowing mode"
var bytesSent = 0
deploymentPackageStream.on('data', function(data) {
bytesSent+=data.length
process.stdout.write('\r ')
process.stdout.write('\r'+(bytesSent/1000)+'kb sent')
})
deploymentPackageStream.resume()
// copy over the deployment package
execute(conn, 'cat > deploymentPackage.sh', deploymentPackageStream).wait()
This gives me the right bytesSent output, but the resulting package seems to be missing some data off the front. If I put the 'resume' line after executing the copy line (the last line), it doesn't copy anything. If I don't resume, it also doesn't copy anything. What's going on and how do I do this properly without disrupting the stream and without entering flowing mode (I want back pressure)?
I should mention, i'm still using node v0.10.x
Alright, I made something that essentially is a passthrough, but calls a callback with data as it comes in:
// creates a stream that can view all the data in a stream and passes the data through
// parameters:
// stream - the stream to peek at
// callback - called when there's data sent from the passed stream
var StreamPeeker = exports.StreamPeeker = function(stream, callback) {
Readable.call(this)
this.stream = stream
stream.on('readable', function() {
var data = stream.read()
if(data !== null) {
if(!this.push(data)) stream.pause()
callback(data)
}
}.bind(this))
stream.on('end', function() {
this.push(null)
}.bind(this))
}
util.inherits(StreamPeeker, Readable)
StreamPeeker.prototype._read = function() {
this.stream.resume()
}
If I understand streams properly, this should appropriately handle backpressure.
Using this, I can just count up data.length in the callback like this:
var peeker = new StreamPeeker(stream, function(data) {
// use data.length
})
peeker.pipe(destination)

node, How to handle/catch pipe

I've seen and read a few tutorials that state you can pipe one stream to another almost like lego blocks, but I can't find anything on how to catch a pipe command when a stream is piped to your object.
What I mean is how do I create an object with functions so I can do:
uploadWrapper = function (client, file, callback) {
upload = function (client,file,callback){
var file = file
// this.data = 'undefined'
stream.Writable.call(this);
this.end = function () {
if(typeof this.data !== 'undefined') file.data = this.data
callback(file.data,200)
}
// var path = urlB.host('upload').object('files',file.id).action('content').url
// // client.upload(path,file,callback)
}
util.inherits(upload,stream.Writable)
upload.prototype._write = function (chunk, encoding, callback) {
this.data = this.data + chunk.toString('utf8')
callback()
}
return new upload(client,file,callback)
}
exports.upload = uploadWrapper
How do I handle when data is piped to my object?
I've looked but I can't really find anything about this (maybe I haven't looked in the write places?).
Can any one point me in the right direction?
If it helps to know it, all I Want to be able to do is catch a data stream and build a string containing data with binary encoding; whether it's from a file-stream or a request stream from a server(i.e. the data from a file of a multipart request) object.
EDIT: I've updated the code to log the data
EDIT: I've fixed it, I can now receive piped data, I had to put the code in a wrapper that returned the function that implemented stream.
EDIT: different problem now, this.data in _read isn't storing in a way that this.data in the upload function can read.
EDIT: OK, now I can deal with the callback and catch the data, I need to work out how to tell if data is being piped to it or if it's being used as a normal function.
If you want to create your own stream that can be piped to and/or from, look at the node docs for implementing streams.

Node js- writing data to the writable stream

In my node application im writing data to the file using write method in the createWriteStream method.Now i need to find whether the write for the particular stream is complete or not.How can i find that.
var stream = fs.createWriteStream('myFile.txt', {flags: 'a'});
var result = stream.write(data);
writeToStream();
function writeToStream() {
var result = stream.write(data + '\n');
if (!result) {
stream.once('drain',writeToStream());
}
}
I need to call other method for every time when write completes.How can i do this.
From the node.js WritableStream.write(...) documentation you can give the "write" method a callback that is called when the written data is flushed:
var stream = fs.createWriteStream('myFile.txt', {flags: 'a'});
var data = "Hello, World!\n";
stream.write(data, function() {
// Now the data has been written.
});
Note that you probably don't need to actually wait for each call to "write" to complete before queueing the next call. Even if the "write" method returns false you can still call subsequent writes and node will buffer the pending write requests into memory.
I am using maerics's answer along with error handling. The flag 'a' is used to Open file for appending. The file is created if it does not exist. There Other flags you can use.
// Create a writable stream & Write the data to stream with encoding to be utf8
var writerStream = fs.createWriteStream('MockData/output.txt',{flags: 'a'})
.on('finish', function() {
console.log("Write Finish.");
})
.on('error', function(err){
console.log(err.stack);
});
writerStream.write(outPutData,function() {
// Now the data has been written.
console.log("Write completed.");
});
// Mark the end of file
writerStream.end();

Resources