Asynchronous file appends - node.js

In trying to learn node.js/socket.io I have been messing around with creating a simple file uploader that takes data chunks from a client browser and reassembles on server side.
The socket.io event for receiving a chunk looks as follows:
socket.on('sendChunk', function (data) {
fs.appendFile(path + fileName, data.data, function (err) {
if (err)
throw err;
console.log(data.sequence + ' - The data was appended to file ' + fileName);
});
});
The issue is that data chunks aren't necessarily appended in the order they were received due to the async calls. Typical console output looks something like this:
1 - The data was appended to file testfile.txt
3 - The data was appended to file testfile.txt
4 - The data was appended to file testfile.txt
2 - The data was appended to file testfile.txt
My question is, what is the proper way to implement this functionality in a non-blocking way but enforce sequence. I've looked at libraries like async, but really want to be able to process each as it comes in rather than creating a series and run once all file chunks are in. I am still trying to wrap my mind around all this event driven flow, so any pointers are great.

Generally you would use a queue for the data waiting to be written, then whenever the previous append finishes, you try to write the next piece. Something like this:
var parts = [];
var inProgress = false;
function appendPart(data){
parts.push(data);
writeNextPart();
}
function writeNextPart(){
if (inProgress || parts.length === 0) return;
var data = parts.shift();
inProgress = true;
fs.appendFile(path + fileName, data.data, function (err) {
inProgress = false;
if (err) throw err;
console.log(data.sequence + ' - The data was appended to file ' + fileName);
writeNextPart();
});
}
socket.on('sendChunk', function (data) {
appendPart(data);
});
You will need to expand this to keep a queue of parts and inProgress based on the fileName. My example assumes those will be constant for simplicity.

Since you need the appends to be in order or synchronous. You could use fs.appendFileSync instead of fs.appendFile. This is quickest way to handle it, but it hurts performance.
If you want to handle it asynchronously yourself, use streams which deal with this problem using EventEmitter. It turns out that the response (as well as the request) objects are streams. Create a writeable stream with fs.createWriteStream and write all pieces to append the file.
fs.createWriteStream(path, [options])#
Returns a new WriteStream object (See Writable Stream).
options is an object with the following defaults:
{ flags: 'w',
encoding: null,
mode: 0666 }
In your case you would use flags: 'a'

Related

How can I write a buffer data to a file from readable.stream in Nodejs?

How can I write a buffer data to a file from readable.stream in Nodejs? I know there are already npm package, I am asking this question for learning purpose only. I am also wondering why there is no such method available in npm 'fs' where user can pass readablestream and create a file directly?
I tried to write a stream.readableBuffer to a file using fs.write by passing the buffer directly, but somehow a small portion of file, is corrupt, after writing, I can see image but a small portion look black in it, my guess buffer has not written completely.
I pass formdata from ajax XMLHttpRequest to serverside controller (node js router in this case).
and I used npm 'parse-formdata' to parse the request. below is the code:
parseFormdata(req, function (err, data) {
if (err) {
logger.error(err);
throw err
}
console.log('fields:', data.fields); // I have data here but how to write this data to a file?
/** perhaps a bad way to write the data to a file, looking for a better way **/
var chunk = data.parts[0].stream.readableBuffer.head.chunk;
fs.writeFile(data.parts[0].filename, chunk, function(err) {
if(err) {
console.log(err);
} else {
console.log("The file was saved!");
}
});
could some body tell me a better approach to write the data (that I got from parsing of FormData) ?
According to parseFormData
You may use the provided sample:
var pump = require('pump')
var concat = require('concat-stream')
pump(stream, concat(function (buf) {
assert.equal(String(buf), String(file), 'file received')
// then write to your file
res.end()
}))
But you may do shorter:
const ws = fs.createWriteStream('out.txt')
data.parts[0].stream.pipe(ws)
Finally, note that library has not been updated since 2017, so there may be some vulnerabilities or so..

Add a mongo request into a file and archive this file

I'm having some troubles while trying to use streams with a MongoDB request. I want to :
Get the results from a collection
Put this results into a file
Put this file into a CSV
I'm using the archiver package for the file compression. The file contains csv formatted values, so for each row I have to parse them in the CSV format.
My function take a res (output) parameters, which means that I can send the result to a client directly. For the moment, I can put this results into a file without streams. I think I'll get memory troubles for a large amount of data that's why I want to use streams.
Here is my code (with no stream)
function getCSV(res,query) {
<dbRequest>.toArray(function(err,docs){
var csv = '';
if(docs !== null){
for(var i = 0; i< docs.length; i++){
var line = '';
for(var index in docs[i]){
if(docs[i].hasOwnProperty(index) && (index !== '_id' ) ){
if(line !== '') line+= ',';
line += docs[i][index];
}
}
console.log("line",line);
csv += line += '\r\n';
}
}
}.bind(this));
fileManager.addToFile(csv);
archiver.initialize();
archiver.addToArchive(fileManager.getName());
fileManager.deleteFile();
archiver.sendToClient(res);
};
Once the csv is completed, I had it to a file with a Filemanager Object. The latter one handles file creation and manipulation. The addToArchive method add the file to the current archive, and the sendToClient method send the archive through the output (res parameter is the function).
I'm using Express.js so I call this method with a server request.
Sometimes the file contains data, sometimes it is empty, could you explain me why ?
I'd like to understand how streams works, how could I implement this to my code ?
Regards
I'm not quite sure why you're having issue with the data sometimes showing up, but here is a way to send it with a stream. A couple of points of info before the code:
.stream({transform: someFunction})
takes a stream of documents from the database and runs whatever data manipulation you want on each document as it passes through the stream. I put this function into a closure to make it easier to keep the column headers, as well as allow you to pick and choose which keys from the document to use as columns. This will allow you to use it on different collections.
Here is the function that runs on each document as it passes through:
// this is a closure containing knowledge of the keys you want to use,
// as well as whether or not to add the headers before the current line
function createTransformFunction(keys) {
var hasHeaders = false;
// this is the function that is run on each document
// as it passes through the stream
return function(document) {
var values = [];
var line;
keys.forEach(function(key) {
// explicitly use 'undefined'.
// if using !key, the number 0 would get replaced
if (document[key] !== "undefined") {
values.push(document[key]);
}
else {
values.push("");
}
});
// add the column headers only on the first document
if (!hasHeaders) {
line = keys.join(",") + "\r\n";
line += values.join(",");
hasHeaders = true;
}
else {
// add the line breaks at the beginning of each line
// to avoid having an extra line at the end
line = "\r\n";
line += values.join(",");
}
// return the document to the stream and move on to the next one
return line;
}
}
You pass that function into the transform option for the database stream. Now assuming you have a collection of people with the keys _id, firstName, lastName:
function (req, res) {
// create a transform function with the keys you want to keep
var transformPerson = createTransformFunction(["firstName", "lastName"]);
// Create the mongo read stream that uses your transform function
var readStream = personCollection.find({}).stream({
transform: transformPerson
});
// write stream to file
var localWriteStream = fs.createWriteStream("./localFile.csv");
readStream.pipe(localWriteStream);
// write stream to download
res.setHeader("content-type", "text/csv");
res.setHeader("content-disposition", "attachment; filename=downloadFile.csv");
readStream.pipe(res);
}
If you hit this endpoint, you'll trigger a download in the browser and write a local file. I didn't use archiver because I think it would add a level of complexity and take away from the concept of what's actually happening. The streams are all there, you'd just need to fiddle with it a bit to work it in with archiver.

How to correctly calculate the the number of bytes of a node.js stream that have been processed?

I have a stream I'm sending over the wire and takes a bit of time to fully send, so I want to display how far along it is on the fly. I know you can listen on the 'data' event for streams, but in newer versions of node, it also puts the stream into "flowing mode". I want to make sure i'm doing this correctly.
Currently I have the following stuff:
deploymentPackageStream.pause() // to prevent it from entering "flowing mode"
var bytesSent = 0
deploymentPackageStream.on('data', function(data) {
bytesSent+=data.length
process.stdout.write('\r ')
process.stdout.write('\r'+(bytesSent/1000)+'kb sent')
})
deploymentPackageStream.resume()
// copy over the deployment package
execute(conn, 'cat > deploymentPackage.sh', deploymentPackageStream).wait()
This gives me the right bytesSent output, but the resulting package seems to be missing some data off the front. If I put the 'resume' line after executing the copy line (the last line), it doesn't copy anything. If I don't resume, it also doesn't copy anything. What's going on and how do I do this properly without disrupting the stream and without entering flowing mode (I want back pressure)?
I should mention, i'm still using node v0.10.x
Alright, I made something that essentially is a passthrough, but calls a callback with data as it comes in:
// creates a stream that can view all the data in a stream and passes the data through
// parameters:
// stream - the stream to peek at
// callback - called when there's data sent from the passed stream
var StreamPeeker = exports.StreamPeeker = function(stream, callback) {
Readable.call(this)
this.stream = stream
stream.on('readable', function() {
var data = stream.read()
if(data !== null) {
if(!this.push(data)) stream.pause()
callback(data)
}
}.bind(this))
stream.on('end', function() {
this.push(null)
}.bind(this))
}
util.inherits(StreamPeeker, Readable)
StreamPeeker.prototype._read = function() {
this.stream.resume()
}
If I understand streams properly, this should appropriately handle backpressure.
Using this, I can just count up data.length in the callback like this:
var peeker = new StreamPeeker(stream, function(data) {
// use data.length
})
peeker.pipe(destination)

Node doesn't write to the file.

I'm trying to modify the plugin, to write to the file rather than console.
For that I have created a file stream in node like this:
var fs = require('fs');
var fileName = "mochareport.html"
var writeStream;
writeStream = fs.createWriteStream(fileName, function(err) {
if (err) {
log.warn('Cannot write HTML Report\n\t' + err.message);
} else {
log.debug('HTML report written to "%s".', fileName);
}
});
on the run, it create the file. But on the line [132]:
if(failures.length) that.writeFailures(failures);
it calls a method called writeFailures, so I have added a line like this in the writeFailures method:
writeStream.write("Hello")
But the text Hello doesn't get written to the file.
What mistake I'm doing here?
fs.createWriteStream() doesn't accept a callback. It returns a writable stream that you call .write() on and listen for events on (mostly finish if you're interested in that).
If you want to just write once (e.g. open, write, and then close) and not continue writing (e.g. open, write, write, ..., close), then you could instead use something like fs.writeFile() or fs.appendFile() (depending on your desired behavior), both of which accept a callback that gets called when the file is closed.

How to append string to file?

I have a several processes node.js. I want to only add one string to one file in each request ( like log file in nginx, apache, etc ). What is the best way do it ?
Simple:
fs.open(file, "a", 0744, function (err, fd) {
if (err) throw err;
fs.write(fd, data, null, 'utf8', function (err, written) {
if (err) throw err;
});
});
or else ?
This will work, however it may not be the best solution if it is constantly opening and closing the file. For something with quicker writes I would try benchmarking it against fs.createWriteStream, especially because this gives you a scope you can use in routes.
var fs = require("fs");
//set dummy data as random number
var data = Math.floor(Math.random()*11);
//Set our log file as a writestream variable with the 'a' flag
var logFile = fs.createWriteStream('log.txt', {
flags: "a",
encoding: "encoding",
mode: 0744
})
//call the write option where you need to append new data
logFile.write(new Date().toSting + ': ' data);
logFile.write(new Date().toSting + ': ' data);
Another solution is fs.appendFile. As Sdedelbrock notes though, fs.createWriteStream is much, much faster than this method since it doesn't need to constantly open and close the file. I coded a small benchmark on my machine, and it is about 3 times faster, definitely worth it.

Resources