node.js create and/or append to file - node.js

Using node.js I'm creating a function to update a file that contains a JSON list by appending a new element to the list. The updated list is rewritten back to the file. If the file doesn't exist I create it.
Below, __list_append(..) does the list append and file update.
My question is if (and should) I can restructure this code to not have two calls to __list_append. I'm a bit new to node.js, and don't have a good feel for the asynchronous tactics.
function list_append(filename, doc) {
fs.exists(filename, function(exists) {
if (exists) {
fs.readFile(filename, function(err, data) {
if (err)
throw err;
__list_append(filename, JSON.parse(data), doc);
});
} else
__list_append(filename, [], doc);
});
}

It's easy to get a bit pedantic with "best practices," but when I'm writing code and I get a gut feeling that something's not right or that something could be changed, I go over some well known best practices and attempt to see if the code I'm writing adheres to them. SOLID, while being principles of object oriented programming, can be useful to think about in other contexts. In this case, it seems to me that the function is violating the Single Responsibility Principle:
One of the most foundational principles of good design is:
Gather together those things that change for the same reason, and separate those things that change for different reasons.
This principle is often known as the Single Responsibility Principle or SRP. In short, it says that a subsystem, module, class, or even a function, should not have more than one reason to change.
(This could perhaps be exchanged for Separation of Concerns or other similar principles for this example, but the concept is the same.)
In this case, the function has two responsibilities: (1) getting the current (or default) list associated with a filename, and (2) appending data to said list. A first pass at separating these concerns might look something like this:
function get_current_list(filename, callback) {
fs.exists(filename, function(exists) {
if (exists) {
fs.readFile(filename, function(err, data) {
if (err)
return callback(err);
callback(null, JSON.parse(data));
});
} else
callback(null, []);
});
}
function list_append(filename, doc) {
get_current_list(filename, function(err, list) {
if(err) throw err;
__list_append(filename, list, doc);
});
}
Now, get_current_list is only responsible for getting the current list in a file (or an empty array if there is no file), and __list_append is (assumed to be) only responsible for appending to it; list_append is now a simple integration point between these two functions. The functions are a bit more reusable and can also be tested more easily (as an aside, a test-first or TDD approach to programming can help you notice these kinds of things up front). Furthermore, repeating callback in get_current_list is quite a bit more generic than repeating __list_append; if you need to change __list_append to something else, it now is only called in one place.

This case always feels unsatisfying to me because yes, you do have to repeat your call to __list_append on both branches because only one of the branches is synchronous.

I like and up voted Brandon's answer, but this also works:
function list_append(filename, doc) {
fs.exists(filename, function(exists) {
var data = [];
if (exists) {
data = fs.readFileSync(filename, "utf8");
}
__list_append(filename, data, doc);
});
}

Related

Is there a way to overcome the callback if(err) boilerplate in Node.js?

As your project grows, we started to have this much appreciated, defensive code snippet pretty much everywhere :
func(err, result){
if(err){
console.log('An error occurred!, #myModule :' + err);
return callback(err);
}
//then the rest..
}
A quick google search reveals some libs that attempt to overcome this common concern, e.g. https://www.npmjs.com/package/callback-wrappers.
But what is the best approach to minimize the boilerplate coding without compromising the early error handling mechanism we have?
There are a couple of ways you can help to alleviate this issue, both use external modules.
Firstly, and my preferred method, is to use async, and in particular, async.series, async.parallel or async.waterfall. Each of these methods will skip straight to the last function if an error occurs in any of your async calls, thus preventing the splattering of if(err) conditions throughout your callbacks.
For example:
async.waterfall([
function(cb) {
someAsyncOperation(cb);
},
function(result, cb) {
doSomethingAsyncWithResult(result, cb);
}
], function(err, result) {
if(err) {
// Handle error - could have come from any of the above function blocks
} else {
// Do something with overall result
}
});
The other option is to use a promise library, such as q. This has a function Q.denodeify to help you wrap callback-style code into promise-style. With promises, you use .then., .catch and .done:
var qSomeAsyncOperation = Q.denodeify(someAsyncOperation);
var qDoSomethingAsyncWithResult = Q.denodeify(doSomethingAsyncWithResult);
Q()
.then(qSomeAsyncOperation)
.then(qDoSomethingAsyncWithResult)
.done(function(result) {
// Do something with overall result
}, function(err) {
// Handle error - could have come from any of the above function blocks
});
I prefer using async because it is easier to understand what is going on, and it is closer to the true callback-style that node.js has adopted.

Is it possible to write asynchronous Node.js code "cleaner"?

While coding in Node.js, I encountered many situations when it is so hard to implement some elaborated logic mixed with database queries (I/O).
Consider an example written in python. We need to iterate over an array of values, for each value we query the database, then, based on the results, we need to compute the average.
def foo:
a = [1, 2, 3, 4, 5]
result = 0
for i in a:
record = find_from_db(i) # I/O operation
if not record:
raise Error('No record exist for %d' % i)
result += record.value
return result / len(a)
The same task in Node.js
function foo(callback) {
var a = [1, 2, 3, 4, 5];
var result = 0;
var itemProcessed = 0;
var error;
function final() {
if (itemProcessed == a.length) {
if (error) {
callback(error);
} else {
callback(null, result / a.length);
}
}
}
a.forEach(function(i) {
// I/O operation
findFromDb(function(err, record) {
itemProcessed++;
if (err) {
error = err;
} else if (!record) {
error = 'No record exist for ' + i;
} else {
result += record.value;
}
final();
});
});
}
You can see that such code much harder to write/read, and it is more prone to errors.
My questions:
Is there a way to make above Node.js code cleaner?
Imagine more sophisticated logic. For example, when we obtained a record from the db, we might need do another db query based on some conditions. In Node.js that becomes a nightmare. What are common patterns for dealing with such tasks?
Based on your experience, does the performance gain deserves the productivity loss when you code with Node.js?
Is there other asynchronous I/O framework/language that is easier to work with?
To answer your questions:
There are libraries such as async which provide a variety of solutions for common scenarios when working with asynchronous tasks. For "callback hell" concerns, there are many ways to avoid that as well, including (but not limited to) naming your functions and pulling them out, modularizing your code, and using promises.
More or less what you currently have is a fairly common pattern: having counter and function index variables with an array of functions to call. Again, async can help here because it reduces this kind of boilerplate that you will probably find yourself repeating often. async currently doesn't have methods that really allow for skipping individual tasks, but you could easily do this yourself if you are writing the boilerplate (just increment the function index variable by 2 for example).
From my own experience, if you properly design your javascript code with asynchronous in mind and use a lot of tools like async, you will find it easier to develop with node. Writing for asynchronous vs synchronous in node is typically always going to be more complicated (although less so with generators, fibers, etc. as compared to callbacks/promises).
I personally think that deciding on a language based upon that single aspect is not worthwhile. You have to consider much much more than just the design of the language, for example the size of the community, availability of third party libraries, performance, technical support options, ease of code debugging, etc.
Just write your code more compactly:
// parallel version
function foo (cb) {
var items = [ 1, 2, 3, 4, 5 ];
var pending = items.length;
var result = 0;
items.forEach(function (item) {
findFromDb(item, function (err, record) {
if (err) return cb(err);
if (!record) return cb(new Error('No record for: ' + item))
result += record.value / items.length;
if (-- pending === 0) cb(null, result);
});
});
}
That clocks in at 13 source lines of code compared to the 9 sloc for python that you posted. However, unlike the python that you posted, this code runs all the jobs in parallel.
To do the same thing in series, a trick I usually do is a next() function defined inline that invokes itself and pops a job off of an array:
// sequential version
function foo (cb) {
var items = [ 1, 2, 3, 4, 5 ];
var len = items.length;
var result = 0;
(function next () {
if (items.length === 0) return cb(null, result);
var item = items.shift();
findFromDb(item, function (err, record) {
if (err) return cb(err);
if (!record) return cb(new Error('No record for: ' + item))
result += record.value / len;
next();
});
})();
}
This time, 15 lines. The nice thing is that you can easily control whether the actions should happen in parallel or sequentially or somewhere in between. That is not so easy in a language like python where everything is synchronous and you've got to do lots of work-arounds like threads or evented libraries to get things back up to asynchronous. Try implementing a parallel version of what you have in python! It would most certainly be longer than the node version.
As for the promise/async route: it's not actually all that hard or bad to use ordinary functions for these relatively simple kinds of tasks. In the future (or in node 0.11+ with --harmony) you can use generators and a library like co, but that feature isn't widely deployed yet.
Everyone here seems to be suggesting async, which is a great library. But to give another suggestion, you should take a look at Promises , which is a new built-in being introduced to the language (and currently has several very good polyfills). It allows you to write asynchronous code in a way that looks much more structured. For example, take a look at this code:
var items = [ 1, 2, 3, 4 ];
var processItem = function(item, callback) {
// do something async ...
};
var values = [ ];
items.forEach(function(item) {
processItem(item, function(err, value) {
if (err) {
// something went wrong
}
values.push(value);
// all of the items have been processed, move on
if (values.length === items.length) {
doSomethingWithValues(values, function(err) {
if (err) {
// something went wrong
}
// and we're done
});
}
});
});
function doSomethingWithValues(values, callback) {
// do something async ...
}
Using promises, it would be written something like this:
var items = [ 1, 2, 3, 4 ];
var processItem = function(item) {
return new Promise(function(resolve, reject) {
// do something async ...
});
};
var doSomethingWithValues = function(values) {
return new Promise(function(resolve, reject) {
// do something async ...
});
};
// promise.all returns a new promise that will resolve when all of the promises passed to it have resolved
Promise.all(items.map(processItem))
.then(doSomethingWithValues)
.then(function() {
// and we're done
})
.catch(function(err) {
// something went wrong
});
The second version is much cleaner and simpler, and that barely even scratches the surface of promises real power. And, like I said, Promises are in es6 as a new language built-in, so (eventually) you won't even need to load in a library, it will just be available.
don't use anonymous (un-named) functions they make the code ugly and they make debugging much harder, so always name your functions and define them outside the function scope not inline.
that is a real issue with Node.js (it is called callback hell or pyramid of doom ,..) you can solve this issue by using promises or using async.js which have so many functions for handling different situations (waterfall, parallel, series, auto, ...)
well the performance gain is absolutely a good thing and it is not that much loss (when you start to master it) and also the Node.js community is great.
Check async.js, q.
The more I work with async the more I love it and I like node more. Let me give you a simple example of what I have for a server initialization.
async.parallel ({
"job1": loadFromCollection1,
"job2": loadFromCollection2,
},
function (initError, results) {
if (initError) {
console.log ("[INIT] Server initialization error occurred: " + JSON.stringify(initError, null, 3));
return callback (initError);
}
// Do more stuff with the results
});
In fact, this very same approach can be followed and one can pass different arguments to the different functions that correspond to the various jobs; see for example Passing arguments to async.parallel in node.js.
To be perfectly honest with you, I prefer the node-way which is also non-blocking. I think node forces someone to have a better design and sometimes you spend time creating more definitions and grouping functions and objects in arrays so that you can write better code. The reason I think is that in the end you want to exploit some variant of async and mix and merge stuff accordingly. In my opinion, spending some extra time and thinking about the code a bit more is well worth it when you also take into account that node is asynchronous.
Other than that, I think it is a habit. The more one writes code for node, the more one improves and writes better asynchronous code. What is good on node is that it really forces someone to write more robust code since one starts respecting all the error codes from all the functions much more. For example, how often do people check, say if malloc or new have succeeded and one does not have an error handler for a NULL pointer after the command has been issued? Writing asynchronous code though forces one to respect the events and the error codes that the events have. I guess one obvious reason is that one respects the code that one writes and in the end we have to write code that returns errors so that caller knows what happened.
I really think that you need to give it more time and start working with async more. That's all.
"If you try to code bussiness db login using pure node.js, you go straight to callback hell"
I've recently created a simple abstraction named WaitFor to call async functions in sync mode (based on Fibers): https://github.com/luciotato/waitfor
check the database example:
Database example (pseudocode)
pure node.js (mild callback hell):
var db = require("some-db-abstraction");
function handleWithdrawal(req,res){
try {
var amount=req.param("amount");
db.select("* from sessions where session_id=?",req.param("session_id"),function(err,sessiondata) {
if (err) throw err;
db.select("* from accounts where user_id=?",sessiondata.user_ID),function(err,accountdata) {
if (err) throw err;
if (accountdata.balance < amount) throw new Error('insufficient funds');
db.execute("withdrawal(?,?),accountdata.ID,req.param("amount"), function(err,data) {
if (err) throw err;
res.write("withdrawal OK, amount: "+ req.param("amount"));
db.select("balance from accounts where account_id=?", accountdata.ID,function(err,balance) {
if (err) throw err;
res.end("your current balance is " + balance.amount);
});
});
});
});
}
catch(err) {
res.end("Withdrawal error: " + err.message);
}
Note: The above code, although it looks like it will catch the exceptions, it will not.
Catching exceptions with callback hell adds a lot of pain, and i'm not sure if you will have the 'res' parameter
to respond to the user. If somebody like to fix this example... be my guest.
using wait.for:
var db = require("some-db-abstraction"), wait=require('wait.for');
function handleWithdrawal(req,res){
try {
var amount=req.param("amount");
sessiondata = wait.forMethod(db,"select","* from session where session_id=?",req.param("session_id"));
accountdata= wait.forMethod(db,"select","* from accounts where user_id=?",sessiondata.user_ID);
if (accountdata.balance < amount) throw new Error('insufficient funds');
wait.forMethod(db,"execute","withdrawal(?,?)",accountdata.ID,req.param("amount"));
res.write("withdrawal OK, amount: "+ req.param("amount"));
balance=wait.forMethod(db,"select","balance from accounts where account_id=?", accountdata.ID);
res.end("your current balance is " + balance.amount);
}
catch(err) {
res.end("Withdrawal error: " + err.message);
}
Note: Exceptions will be catched as expected.
db methods (db.select, db.execute) will be called with this=db
Your Code
In order to use wait.for, you'll have to STANDARDIZE YOUR CALLBACKS to function(err,data)
If you STANDARDIZE YOUR CALLBACKS, your code might look like:
var wait = require('wait.for');
//run in a Fiber
function process() {
var a = [1, 2, 3, 4, 5];
var result = 0;
a.forEach(function(i) {
// I/O operation
var record = wait.for(findFromDb,i); //call & wait for async function findFromDb(i,callback)
if (!record) throw new Error('No record exist for ' + i);
result += record.value;
});
return result/a.length;
}
function inAFiber(){
console.log('result is: ',process());
}
// run the loop in a Fiber (keep node spinning)
wait.launchFiber(inAFiber);
see? closer to python and no callback hell

Refactoring mongoose queries

I been using mongoose a consider amount and I cant seem to get around "callback hell" and polluting my queries with error treatments.
For example here is a route I have:
var homePage = function(req, res) {
var companyUrl = buildingId = req.params.company
db.pmModel
.findOne({ companyUrl: companyUrl })
.exec(function (err, doc) {
if (err)
return HandleError(req, res, err)
if( !doc )
return NoResult(req, res, {msg: 'Aint there'})
console.log(doc)
db.rentalModel
.find({ propertyManager: doc.id })
.populate('building')
.exec(function (err, rentals) {
if (err)
return HandleError(req, res, err)
if( !doc )
return NoResult(req, res, {msg: 'Aint there'})
console.log(doc)
var data = doc.toJSON()
data.rentals = rentals
res.render('homePage', data)
})
})
}
my question: is there a more succinct way of writing this?
So perhaps what you have above is just a small example, but it doesn't appear to me that there's too much "callback hell" going on in your code (in my opinion). However, you can certainly refactor your code. Just know in doing so you can make it more difficult to understand or follow from a maintenance perspective.
One thing you can do is simply refactor your database layer. If you always find yourself querying one collection and then turning right around and querying another, you could consider merging those collections, or at least the documents that you're looking for. In a relational database you might separate out these tables and do merges, however in a document-based database, it sometimes makes more sense to combine the data within each document. This allows for easier queries and simpler logic in your code.
Another solution is to refactor your calls into separate functions, and control the flow in a different way. A popular library to help with this is async which provides many helper functions to assist in the asynchronous world of JavaScript. There are many to choose from, but one suggestion would be to use the waterfall function for your situation (since each call must be made before the next). It would then look something like this:
async.waterfall([
function(callback){
findCompany(companyUrl, callback);
},
function(id, callback){
findPropertyManager(id, callback);
}
], function (err, rentals) {
res.render(rentals)
});
You would still need to handle the errors in each function, but you could even refactor that out into a helper function. Furthermore, you could choose to code up something yourself to help with the control flow rather than using async.
But again, the code you showed above is understandable and readable, and only contains a couple inline callbacks. In this way, there's a lot less going on and may make debugging it later (if things go wrong) easier.

'Condtions hell' in node js

I have two layers in my application(express), first is module with function which is handling database queries, fs , and so on. Second is handling requests(also known as controller/route). I just tired of all this conditions.
Sample code:
exports.updateImage = function(image, userId, callback) {
fs.readFile(image.path, function (err, imageBinary) {
if (err) callback(err);
else {
pg.connect(conString, function(err, client, done) {
done();
if (err) callback(err);
else {
client.query('UPDATE images SET data=$1, filesize=$2, filename=$3 WHERE user_id=$4', [imageBinary, image.size, image.originalFilename, userId], function(err) {
if (err) callback(err);
else callback(null);
});
}
});
}
});
};
As you can see, I callback all my errors to my controller, then it handled as internal server error. I handle database, file system possible errors, and there is too much repetitions in my code. I suppose it is bad design, and it hard to support in production. Please help me.
When you say "tired of all these conditions" I assume you're talking about all the nested callbacks and the "march off the right side of the screen" that results from that kind of directly nested callbacks? If I'm assuming incorrectly please clarify your question and I'll delete everything I'm about to write as not related. :-)
One cheap way to avoid the else structure is to instead of doing
if(err) callback(err);
else { ... stuff ... }
is to do this:
if(err) return callback(err);
Note the return: that causes execution of your function to end, nobody cares about the return values from a callback so they just get ignored. So that potentially gets rid of a layer of braces and elses.
To handle this better in general, you'll want to look at some sort of async helpers. There's three general categories of these things:
Helper libraries that manage the sequencing of multiple callbacks,
Promises, which let you represent async operations as objects, or
Language support to hide the details.
Examples of the three different types of libraries include step, flow, or async as helper libraries, for promises there's Q or when.js, and for language support look at streamline.
For more details, I did a presentation on exactly this topic about a year ago; the slides are here are there's a recording of the presentation as well.

Node.js error callback

I have a codebase which contains code similar to the code below many times:
function(doc, callback) {
doSomething(function(err) {
if(err) return callback(err);
callback(null, doc);
});
}
I'm wondering if there are any downsides to just combining the explicit error check into:
function(doc, callback) {
doSomething(function(err) {
callback(err, doc);
});
}
I understand that callback handlers are expected to check the err on callback, but in this case it's just bubbling up.
I suppose I'm wondering if based on the way callbacks are generally used, if this is an issue?
There is no difference, the code is doing the same thing. First one is just easier to edit later if you want to add some postprocessing.
Technically, second example provides a "doc" and first don't, but if somebody rely on that, they're doing it very wrong.

Resources