Is safe read and then write a file asynchronously on node.js? - node.js

I have a method that reads and write a log file, this method is called on every request by all users, then write the log the request path in a file. The questions are two:
Is safe read and then write a file in async mode considering concurrency questions?
If yes for the first question, the code bellow will work considering concurrency questions?
If yes for the first question how I have to do?
Please, disregard exceptions and performance questions, this is a didactic code.
var logFile = '/tmp/logs/log.js';
app.get("/", function(req){
var log = {path: req.path, date: new Date().getTime()};
log(log);
});
function log(data){
fs.exists(logFile, function(exists){
if(exists){
fs.readFile(logFile, function (err, data) {
if (err){
throw err
}
var logData = JSON.parse(data.toString());
logData.push(data);
writeLog(logData);
});
}else{
writeLog([data]);
}
});
}
function writeLog(base){
fs.writeFile(logFile, JSON.stringify(base, null, '\t'), function(err) {
if(err)
throw err;
});
}

I strongly suggest that you don't just "log asynchronously" because you want the log to be ordered based on the order things happened in your app, and there is no guarantee this will happen that way if you don't synchronize it somehow.
You can, for instance, use a promise chain to synchronize it:
var _queue = Promise.resolve();
function log(message){
_queue = _queue.then(function(){ // chain to the queue
return new Promise(function(resolve){
fs.appendFile("/tmp/logs/log.txt", new Date() + message + "\n", function(err, data){
if(err) console.log(err); // don't die on log exceptions
else resolve(); // signal to the queue it can progress
});
});
});
}
You can now call log and it will queue messages and write them some time asynchronously for you. It will never take more than a single file descriptor or exhaust the server either.
Consider using a logging solution instead of rolling your own logger btw.

In you're example you're already using the Asynchronous versions of those functions. If you're concerned about the order of your operations then you should use the synchronous versions of those functions.
readFileSync
writeFileSync
Also to note, JSON.parse() is a synchronous operation.You can make this "asynchronous" using the async module and doing a async.asyncify(JSON.parse(data.toString()));.
As noted by #BenjaminGruenbaum, async.asyncify(); doesn't actually make the operation of JSON.parse(); truly asynchronous but it does provide a more "async" style for the control flow of the operations.

Related

is my code blocking node thread

I have some confusions over nodejs and would like some help. I have a table called camps, contacts and camp_contact. I have to show the contact list based on the camp the user belongs to. I used async to loop through the camps, which I save in the user session, and then grab the data from mysql.
var array_myData = [];
async.each(req.session.user.camps, function(camps, callback) {
database.getConnection(function(err, connection) {
// Use the connection
connection.query('SELECT contacts.*, contact_camp.* '+
' FROM contacts JOIN contact_camp '+
' ON contact_camp.contact_id = contacts.id '+
' WHERE contact_camp.camp_id = ?',
[camps.camp_id], function(err,data){
if(err) {
//this will call the err function
callback('error');
}
else {
array_myData = array_myData.concat(data);
callback();
}
connection.release();
});
});
//final function call.
}, function(err){
// if any error happened, this function fires.
if( err ) {
// All processing will now stop.
} else {
res.render('contacts',
{
page
});
}
});
The code works fine. Now the thing I'm wondering about is, does using array.concat block the thread? if so, how can I change that? I read around and according to what I understood, I/O operation that are not asynchronous blocks the thread like reading from file or database. Does having array like this var array = ['a','b', 'c'] and looping through it would block the thread?
Lastly, is there a way to know if a code that I have written has blocked the thread or not? Because I get worried every time I write a function of my own.
I also get confused when a create a function with a callback like:
function test(param, fn) { do something; fn(); }
I'm not sure if this kind of function without a timer would block the thread or not.
Ok, if it helps you somehow:
I read around and according to what I understood, I/O operation that are not asynchronous blocks the thread like reading from file or database.
In NodeJS the operations on file descriptors are asynchronous => non-blocking
The file descriptors would be:
database operations,
open/close files
network operations
ex.:
fs.readFile('<pathToTheFile>', (err, data) => {
if (err) throw err;
console.log(data);
});
// if your file is very big, until it is read, maybe other requests will be finished
Does having array like this var array = ['a','b', 'c'] and looping through it would block the thread?
Yes, it will block the thread, but operations like this are not resource intensive. NodeJs is an event-driven language (single-threaded), but also, in a multi threaded or multi process language, the kernel thread would still be blocked by such operations => the same thing ... this is maybe off topic.
// if you do something like this
while(true){
function test(param, fn) { do something; fn(); }
}
// you will see that you just blocked the thread
I'm not sure if this kind of function without a timer would block the thread or not
normally you would put that function in a error handler, and it shouldn't block the thread if you're not doing an infinite while.
try{}catch(err){
// do something with the error
}

Node async vs sync

I am writing a node server that reads/deletes/adds/etc a file from the filesystem. Is there any performance advantage to reading asynchronously? I can't do anything while waiting for the file to be read. Example:
deleteStructure: function(req, res) {
var structure = req.param('structure');
fs.unlink(structure, function(err) {
if (err) return res.serverError(err);
return res.ok();
});
}
I am also making requests to another server using http.get. Is there any performance advantage to fetching asynchronously? I can't do anything while waiting for the file to be fetched. Example:
getStructure: function(req, res) {
var structure = urls[req.param('structure')];
http.get(structure).then(
function (response) {
return res.send(response);
},
function (err) {
res.serverError(err)
}
);
}
If there is no performance advantage to reading files asynchronously, I can just use the synchronous methods. However, I am not aware of synchronous methods for http calls, do any built in methods exist?
FYI I am using Sails.js.
Thanks!
I can't do anything while waiting for the file to be read.
I can't do anything while waiting for the file to be fetched.
Wrong; you can handle an unrelated HTTP request.
Whenever your code is in the middle of a synchronous operation, your server will not respond to other requests at all.
This asynchronous scalability is the biggest attraction for Node.js.

nodeJS too many child processes?

I am using node to recursively traverse a file system and make a system call for each file, by using child.exec. It works well when tested on a small structure, with a couple of folders and files, but when run on the whole home directory, it crashes after a while
child_process.js:945
throw errnoException(process._errno, 'spawn');
^
Error: spawn Unknown system errno 23
at errnoException (child_process.js:998:11)
at ChildProcess.spawn (child_process.js:945:11)
at exports.spawn (child_process.js:733:9)
at Object.exports.execFile (child_process.js:617:15)
at exports.exec (child_process.js:588:18)
Does this happen because it uses up all resources? How can I avoid this?
EDIT: Code
improvement and best practices suggestions always welcome :)
function processDir(dir, callback) {
fs.readdir(dir, function (err, files) {
if (err) {...}
if (files) {
async.each(files, function (file, cb) {
var filePath = dir + "/" + file;
var stats = fs.statSync(filePath);
if (stats) {
if (stats.isFile()) {
processFile(dir, file, function (err) {
if (err) {...}
cb();
});
} else if (stats.isDirectory()) {
processDir(filePath, function (err) {
if (err) {...}
cb();
});
}
}
}, function (err) {
if (err) {...}
callback();
}
);
}
});
}
the issue can be because of having many open files simultaneously
consider using async module to solve the issue
https://github.com/caolan/async#eachLimit
async.eachLimit(
files,
20,
function(file, callback){
//process file here and call callback
},
function(err){
//done
}
);
in current example you will process 20 files at a time
Well, I don't know the reason for the failure, but if this is what you expect (using up all of the resources) or as others say (too many files open), you could try to use multitasking for it. JXcore (fork of Node.JS) offers such thing - it allows to run a task in a separate instance, but this is done still inside one single process.
While Node.JS app as a process has its limitations - JXcore with its sub-instances multiplies those limits: single process even with one extra instance (or task, or well: we can call it sub-thread) doubles the limits!
So, let's say, that you will run each of your spawn() in a separate task. Or, since tasks are not running in a main thread any more - you can even use sync method that jxcore offers : cmdSync().
Probably the the best illustration would be given by this few lines of the code:
jxcore.tasks.setThreadCount(4);
var task = function(file) {
var your_cmd = "do something with " + file;
return jxcore.utils.cmdSync(your_cmd);
};
jxcore.tasks.addTask(task, "file1.txt", function(ret) {
console.log("the exit code:", ret.exitCode);
console.log("output:", ret.out);
});
Let me repeat: the task will not block the main thread, since it is running in a separate instance!
Multitasking API is documented here: Multitasking.
As has been established in comments, you are likely running out of file handles because you are running too many concurrent operations on your files. So, a solution is to limit the number of concurrent operations that run at once so too many files aren't in use at the same time.
Here's a somewhat different implementation that uses Bluebird promises to control both the async aspects of the operation and the concurrency aspects of the operation.
To make the management of the concurrency aspect easier, this collects the entire list of files into an array first and then processes the array of filenames rather than processing as you go. This makes it easier to use a built-in concurrency capability in Bluebird's .map() (which works on a single array) so we don't have to write that code ourselves:
var Promise = require("bluebird");
var fs = Promise.promisifyAll(require("fs"));
var path = require("path");
// recurse a directory, call a callback on each file (that returns a promise)
// run a max of numConcurrent callbacks at once
// returns a promise for when all work is done
function processDir(dir, numConcurrent, fileCallback) {
var allFiles = [];
function listDir(dir, list) {
var dirs = [];
return fs.readdirAsync(dir).map(function(file) {
var filePath = path.join(dir , file);
return fs.statAsync(filePath).then(function(stats) {
if (stats.isFile()) {
allFiles.push(filePath);
} else if (stats.isDirectory()) {
return listDir(filePath);
}
}).catch(function() {
// ignore errors on .stat - file could just be gone now
return;
});
});
}
return listDir(dir, allFiles).then(function() {
return Promise.map(allFiles, function(filename) {
return fileCallback(filename);
}, {concurrency: numConcurrent});
});
}
// example usage:
// pass the initial directory,
// the number of concurrent operations allowed at once
// and a callback function (that returns a promise) to process each file
processDir(process.cwd(), 5, function(file) {
// put your own code here to process each file
// this is code to cause each callback to take a random amount of time
// for testing purposes
var rand = Math.floor(Math.random() * 500) + 500;
return Promise.delay(rand).then(function() {
console.log(file);
});
}).catch(function(e) {
// error here
}).finally(function() {
console.log("done");
});
FYI, I think you'll find that proper error propagation and proper error handling from many async operations is much, much easier with promises than the plain callback method.

node.js file system problems

I keep banging my head against the wall because of tons of different errors. This is what the code i try to use :
fs.readFile("balance.txt", function (err, data) //At the beginning of the script (checked, it works)
{
if (err) throw err;
balance=JSON.parse(data);;
});
fs.readFile("pick.txt", function (err, data)
{
if (err) throw err;
pick=JSON.parse(data);;
});
/*....
.... balance and pick are modified
....*/
if (shutdown)
{
fs.writeFile("balance2.txt", JSON.stringify(balance));
fs.writeFile("pick2.txt", JSON.stringify(pick));
process.exit(0);
}
At the end of the script, the files have not been modified the slightest. I then found out on this site that the files were being opened 2 times simultaneously, or something like that, so i tried this :
var balance, pick;
var stream = fs.createReadStream("balance.txt");
stream.on("readable", function()
{
balance = JSON.parse(stream.read());
});
var stream2 = fs.createReadStream("pick.txt");
stream2.on("readable", function()
{
pick = JSON.parse(stream2.read());
});
/****
****/
fs.unlink("pick.txt");
fs.unlink("balance.txt");
var stream = fs.createWriteStream("balance.txt", {flags: 'w'});
var stream2 = fs.createWriteStream("pick.txt", {flags: 'w'});
stream.write(JSON.stringify(balance));
stream2.write(JSON.stringify(pick));
process.exit(0);
But, this time, both files are empty... I know i should catch errors, but i just don't see where the problem is. I don't mind storing the 2 objects in the same file, if that can helps. Besides that, I never did any javascript in my life before yesterday, so, please give me a simple explanation if you know what failed here.
What I think you want to do is use readFileSync and not use readFile to read your files since you need them to be read before doing anything else in your program (http://nodejs.org/api/fs.html#fs_fs_readfilesync_filename_options).
This will make sure you have read both the files before you execute any of the rest of your code.
Make your like code do this:
try
{
balance = JSON.parse(fs.readFileSync("balance.txt"));
pick = JSON.parse(fs.readFileSync("pick.txt"));
}
catch(err)
{ throw err; }
I think you will get the functionality you are looking for by doing this.
Note, you will not be able to check for an error in the same way you can with readFile. Instead you will need to wrap each call in a try catch or use existsSync before each operation to make sure you aren't trying to read a file that doesn't exist.
How to capture no file for fs.readFileSync()?
Furthermore, you have the same problem on the writes. You are kicking off async writes and then immediately calling process.exit(0). A better way to do this would be to either write them sequentially asynchronously and then exit or to write them sequentially synchronously then exit.
Async option:
if (shutdown)
{
fs.writeFile("balance2.txt", JSON.stringify(balance), function(err){
fs.writeFile("pick2.txt", JSON.stringify(pick), function(err){
process.exit(0);
});
});
}
Sync option:
if (shutdown)
{
fs.writeFileSync("balance2.txt", JSON.stringify(balance));
fs.writeFileSync("pick2.txt", JSON.stringify(pick));
process.exit(0);
}

How to wait for all async calls to finish

I'm using Mongoose with Node.js and have the following code that will call the callback after all the save() calls has finished. However, I feel that this is a very dirty way of doing it and would like to see the proper way to get this done.
function setup(callback) {
// Clear the DB and load fixtures
Account.remove({}, addFixtureData);
function addFixtureData() {
// Load the fixtures
fs.readFile('./fixtures/account.json', 'utf8', function(err, data) {
if (err) { throw err; }
var jsonData = JSON.parse(data);
var count = 0;
jsonData.forEach(function(json) {
count++;
var account = new Account(json);
account.save(function(err) {
if (err) { throw err; }
if (--count == 0 && callback) callback();
});
});
});
}
}
You can clean up the code a bit by using a library like async or Step.
Also, I've written a small module that handles loading fixtures for you, so you just do:
var fixtures = require('./mongoose-fixtures');
fixtures.load('./fixtures/account.json', function(err) {
//Fixtures loaded, you're ready to go
};
Github:
https://github.com/powmedia/mongoose-fixtures
It will also load a directory of fixture files, or objects.
I did a talk about common asyncronous patterns (serial and parallel) and ways to solve them:
https://github.com/masylum/i-love-async
I hope its useful.
I've recently created simpler abstraction called wait.for to call async functions in sync mode (based on Fibers). It's at an early stage but works. It is at:
https://github.com/luciotato/waitfor
Using wait.for, you can call any standard nodejs async function, as if it were a sync function, without blocking node's event loop. You can code sequentially when you need it.
using wait.for your code will be:
//in a fiber
function setup(callback) {
// Clear the DB and load fixtures
wait.for(Account.remove,{});
// Load the fixtures
var data = wait.for(fs.readFile,'./fixtures/account.json', 'utf8');
var jsonData = JSON.parse(data);
jsonData.forEach(function(json) {
var account = new Account(json);
wait.forMethod(account,'save');
}
callback();
}
That's actually the proper way of doing it, more or less. What you're doing there is a parallel loop. You can abstract it into it's own "async parallel foreach" function if you want (and many do), but that's really the only way of doing a parallel loop.
Depending on what you intended, one thing that could be done differently is the error handling. Because you're throwing, if there's a single error, that callback will never get executed (count won't be decremented). So it might be better to do:
account.save(function(err) {
if (err) return callback(err);
if (!--count) callback();
});
And handle the error in the callback. It's better node-convention-wise.
I would also change another thing to save you the trouble of incrementing count on every iteration:
var jsonData = JSON.parse(data)
, count = jsonData.length;
jsonData.forEach(function(json) {
var account = new Account(json);
account.save(function(err) {
if (err) return callback(err);
if (!--count) callback();
});
});
If you are already using underscore.js anywhere in your project, you can leverage the after method. You need to know how many async calls will be out there in advance, but aside from that it's a pretty elegant solution.

Resources