Pipe large files takes a lot of memory - node.js

I have a program which reads a large file from a stream and writes it to a file.
Here's reading the file from stream
var stream = new Stream.PassThrough();
Request(options, function (error, res, body) {
if (error) {
Logger.error('ProxyManager', 'getStream', error);
return callback(error);
}
}).pipe(stream); // first pipe call
Here's writing the file
var outputFile = fs.createWriteStream(filePath);
stream.pipe(outputFile); // second pipe call
My problem is the process takes a lot of memory when transferring a large file which seems like pipe is keeping the whole file in the memory in the first pipe or the second call. Can anyone help with this?

Related

Read remote file into Duplex NodeJS stream, then write the stream data into form-data upload

I am reading from a remote file on an SFTP server. The method to read the file takes a writeable stream, and writes the file data onto that stream before returning.
Once complete, I am passing this stream to form.append used by the form-data library, in order to upload that file data to an API.
The stream is declared like this:
const stream = new duplex({
write(chunk, encoding, callback) {
console.log('wrote chunk')
callback();
},
read() {
console.log(`Read method called`)
}
})
When the write method is called during the SFTP access, 'wrote chunks' is called multiple times. When the file upload using form-data is called, 'Read method called' is called once. After it is called, the HTTP request never completes.
Suspect my implementation of the read method on the Duplex stream is wrong, and the full file data is never read properly from the stream.
Is there a way I can validate what my problem is here or am I fundamentally misunderstanding how streams operate?
I have tried adding the following to the Duplex stream before it is passed to the form-data function, but none of it is ever called.
stream.on('data', () => {
console.log(`Read bytes of data.`);
});
stream.on('end', () => {
console.log('There will be no more data.');
});
stream.on('error', () => {
console.log('error');
});```

Node.js Streams: Is there a way to convert or wrap a fs write stream to a Transform stream?

With a node http server I'm trying to pipe the request read stream to the response write stream with some intermediary transforms, one of which is a file system write.
The pipeline looks like this with non pertinent code removed for simplicity:
function handler (req, res) {
req.pipe(jsonParse())
.pipe(addTimeStamp())
.pipe(jsonStringify())
.pipe(saveToFs('saved.json'))
.pipe(res);
}
The custom Transform streams are pretty straight forward, but I have no elegant way of writing saveToFs. It looks like this:
function saveToFs (filename) {
const write$ = fs.createWriteStream(filename);
write$.on('open', () => console.log('opened'));
write$.on('close', () => console.log('closed'));
const T = new Transform();
T._transform = function (chunk, encoding, cb) {
write$.write(chunk);
cb(null, chunk);
}
return T;
}
The idea is simply to pipe the data to the write stream and then through to the response stream, but fs.createWriteStream(<file.name>) is only a writable stream, so it makes this approach difficult.
Right now this code has two problems that I can see: the write stream never fires a close event (memory leak?), and I would like the data to pass through the file system write before returning data to the response stream instead of essentially multicasting to two sinks.
Any suggestions, or pointing out fundamental things I've missed would be greatly appreciated.
What you should do is save the stream returned by the .pipe before saveToFs, and then pipe that to a file and res.
function handler(req, res) {
const transformed = req.pipe(jsonParse())
.pipe(addTimeStamp())
.pipe(jsonStringify());
transformed.pipe(fs.createWriteStream('saved.json'));
transformed.pipe(res);
}
To sum it up, you can pipe the same readable stream (transformed) to multiple writable streams.
And I would like the data to pass through the file system write
before returning data to the response stream instead of essentially
multicasting to two sinks.
Use { end: false } option when piping to res.
transformed.pipe(res, { end: false });
And then call res.end() when the file is written or whenever you want.

node, How to handle/catch pipe

I've seen and read a few tutorials that state you can pipe one stream to another almost like lego blocks, but I can't find anything on how to catch a pipe command when a stream is piped to your object.
What I mean is how do I create an object with functions so I can do:
uploadWrapper = function (client, file, callback) {
upload = function (client,file,callback){
var file = file
// this.data = 'undefined'
stream.Writable.call(this);
this.end = function () {
if(typeof this.data !== 'undefined') file.data = this.data
callback(file.data,200)
}
// var path = urlB.host('upload').object('files',file.id).action('content').url
// // client.upload(path,file,callback)
}
util.inherits(upload,stream.Writable)
upload.prototype._write = function (chunk, encoding, callback) {
this.data = this.data + chunk.toString('utf8')
callback()
}
return new upload(client,file,callback)
}
exports.upload = uploadWrapper
How do I handle when data is piped to my object?
I've looked but I can't really find anything about this (maybe I haven't looked in the write places?).
Can any one point me in the right direction?
If it helps to know it, all I Want to be able to do is catch a data stream and build a string containing data with binary encoding; whether it's from a file-stream or a request stream from a server(i.e. the data from a file of a multipart request) object.
EDIT: I've updated the code to log the data
EDIT: I've fixed it, I can now receive piped data, I had to put the code in a wrapper that returned the function that implemented stream.
EDIT: different problem now, this.data in _read isn't storing in a way that this.data in the upload function can read.
EDIT: OK, now I can deal with the callback and catch the data, I need to work out how to tell if data is being piped to it or if it's being used as a normal function.
If you want to create your own stream that can be piped to and/or from, look at the node docs for implementing streams.

Node js- writing data to the writable stream

In my node application im writing data to the file using write method in the createWriteStream method.Now i need to find whether the write for the particular stream is complete or not.How can i find that.
var stream = fs.createWriteStream('myFile.txt', {flags: 'a'});
var result = stream.write(data);
writeToStream();
function writeToStream() {
var result = stream.write(data + '\n');
if (!result) {
stream.once('drain',writeToStream());
}
}
I need to call other method for every time when write completes.How can i do this.
From the node.js WritableStream.write(...) documentation you can give the "write" method a callback that is called when the written data is flushed:
var stream = fs.createWriteStream('myFile.txt', {flags: 'a'});
var data = "Hello, World!\n";
stream.write(data, function() {
// Now the data has been written.
});
Note that you probably don't need to actually wait for each call to "write" to complete before queueing the next call. Even if the "write" method returns false you can still call subsequent writes and node will buffer the pending write requests into memory.
I am using maerics's answer along with error handling. The flag 'a' is used to Open file for appending. The file is created if it does not exist. There Other flags you can use.
// Create a writable stream & Write the data to stream with encoding to be utf8
var writerStream = fs.createWriteStream('MockData/output.txt',{flags: 'a'})
.on('finish', function() {
console.log("Write Finish.");
})
.on('error', function(err){
console.log(err.stack);
});
writerStream.write(outPutData,function() {
// Now the data has been written.
console.log("Write completed.");
});
// Mark the end of file
writerStream.end();

Asynchronous file appends

In trying to learn node.js/socket.io I have been messing around with creating a simple file uploader that takes data chunks from a client browser and reassembles on server side.
The socket.io event for receiving a chunk looks as follows:
socket.on('sendChunk', function (data) {
fs.appendFile(path + fileName, data.data, function (err) {
if (err)
throw err;
console.log(data.sequence + ' - The data was appended to file ' + fileName);
});
});
The issue is that data chunks aren't necessarily appended in the order they were received due to the async calls. Typical console output looks something like this:
1 - The data was appended to file testfile.txt
3 - The data was appended to file testfile.txt
4 - The data was appended to file testfile.txt
2 - The data was appended to file testfile.txt
My question is, what is the proper way to implement this functionality in a non-blocking way but enforce sequence. I've looked at libraries like async, but really want to be able to process each as it comes in rather than creating a series and run once all file chunks are in. I am still trying to wrap my mind around all this event driven flow, so any pointers are great.
Generally you would use a queue for the data waiting to be written, then whenever the previous append finishes, you try to write the next piece. Something like this:
var parts = [];
var inProgress = false;
function appendPart(data){
parts.push(data);
writeNextPart();
}
function writeNextPart(){
if (inProgress || parts.length === 0) return;
var data = parts.shift();
inProgress = true;
fs.appendFile(path + fileName, data.data, function (err) {
inProgress = false;
if (err) throw err;
console.log(data.sequence + ' - The data was appended to file ' + fileName);
writeNextPart();
});
}
socket.on('sendChunk', function (data) {
appendPart(data);
});
You will need to expand this to keep a queue of parts and inProgress based on the fileName. My example assumes those will be constant for simplicity.
Since you need the appends to be in order or synchronous. You could use fs.appendFileSync instead of fs.appendFile. This is quickest way to handle it, but it hurts performance.
If you want to handle it asynchronously yourself, use streams which deal with this problem using EventEmitter. It turns out that the response (as well as the request) objects are streams. Create a writeable stream with fs.createWriteStream and write all pieces to append the file.
fs.createWriteStream(path, [options])#
Returns a new WriteStream object (See Writable Stream).
options is an object with the following defaults:
{ flags: 'w',
encoding: null,
mode: 0666 }
In your case you would use flags: 'a'

Resources