I am trying to work with node.js and node-java and trying to get my head wrapped around some concepts, and in particular how to write async method calls.
I think that, for a function in Java, myclass.x():
[In Java]:
Z = myclass.x(abc);
And:
[In node.js/node-java]:
myclass.x(abc, function(err,data) {
//TODO
Z = data;});
In other words, the myclass.x function gets evaluated using the parameter abc, and if no error, then the result goes into "data" which is then assigned to Z.
Is that correct?
Here's the thing (or one of the things) that I am confused about.
What happens if the function myclass.x() doesn't take any parameters?
In other words, it is normally (in Java) just called like:
Z = myclass.x();
If that is the case, how should the node.js code look?
myclass.x(, function(err,data) {
//TODO
Z = data;});
doesn't seem right, but:
myclass.x( function(err,data) {
//TODO
Z = data;});
also doesn't seem correct.
So what is the correct way to code the node.js code in this case?
Thanks in advance!!
Jim
EDIT 1: Per comments, I'm adding the specific code I'm working with is the last couple of commented out lines from this other question at:
node.js and node-java: What is equivalent node.js code for this java code?
These are the lines (commented out in that other question):
var MyFactoryImplClass = java.import("oracle.security.jps.openaz.pep.PepRequestFactoryImpl.PepRequestFactoryImpl");
var result = myFactoryImplClass.newPepRequest(newSubject, requestACTIONString ,requestRESOURCEString , envBuilt)
I tried to make the last line use an async call:
MyFactoryImplClass.getPepRequestFactory( function(err,data) {
//TODO
pepReqF1=data;})
javaLangSystem.out.printlnSync("Finished doing MyFactoryImplClass.getPepRequestFactory() and stored it in pepReqF1 =[" + pepReqF1 + "]");
But the output was showing the value of that pepReqF1 as "undefined".
If calling the method with one parameter and a callback is:
myclass.x(abc, function(err, data) {
// ...
});
Then calling a method with only a callback would be:
myclass.x(function(err, data) {
// ...
});
The function(err, data) { } part is just a normal parameter just like abc. In fact, you can pass a named function with:
function namedFun(err, data) {
// ...
}
myclass.x(abc, namedFun);
Or even:
var namedFun = function (err, data) {
// ...
}
myclass.x(abc, namedFun);
Functions in JavaScript are first-class objects like strings or arrays. You can pass a named function as a parameter to some other function:
function fun1(f) {
return f(10);
}
function fun2(x) {
return x*x;
}
fun1(fun2);
just like you can pass a named array:
function fun3(a) {
return a[0]
}
var array = [1, 2, 3];
fun3(array);
And you can pass an anonymous function as a parameter:
function fun1(f) {
return f(10);
}
fun1(function (x) {
return x*x;
});
just like you can pass an anonymous array:
function fun3(a) {
return a[0]
}
fun3([1, 2, 3]);
There is also a nice shortcut so that instead of:
fun1(function (x) {
return x*x;
});
You can write:
fun1(x => x*x);
Making my comment into an answer...
If the issue you're experiencing is that Z does not have the value you want when you are examining it, then that is probably because of a timing issue. Asynchronous callbacks happen at some unknown time in the future while the rest of your code continues to run. Because of that, the only place you can reliably use the result passed to the asynchronous callback is inside the callback itself or in some function you would call from that function and pass it the value.
So, if your .x() method calls it's callback asynchronously, then:
var Z;
myclass.x( function(err,data) {
// use the err and data arguments here inside the callback
Z = data;
});
console.log(Z); // outputs undefined
// you can't access Z here. Even when assigned
// to higher scoped variables because the callback has not yet
// been called when this code executes
You can see this is a little more clearly by understanding the sequencing
console.log('A');
someAsyncFucntion(function() {
console.log('B');
})
console.log('C');
This will produce a log of:
A
C
B
Showing you that the async callback happens some time in the future, after the rest of your sequential code has executed.
Java, on the other hand, primarily uses blocking I/O (the function doesn't return until the I/O operation is copmlete) so you don't usually have this asynchronous behavior that is standard practice in node.js. Note: I believe there are some asynchronous capabilities in Java, but that isn't the typical way things are done and in node.js, it is the typical ways things are done.
This creates a bit of an architectural mismatch if you're trying to port code that uses I/O from environment from another because the structure has to be redone in order to work properly in a node.js environment.
I have an orientdb database. I want to use nodejs with RESTfull calls to create a large number of records. I need to get the #rid of each for some later processing.
My psuedo code is:
for each record
write.to.db(record)
when the async of write.to.db() finishes
process based on #rid
carryon()
I have landed in serious callback hell from this. The version that was closest used a tail recursion in the .then function to write the next record to the db. However, I couldn't carry on with the rest of the processing.
A final constraint is that I am behind a corporate proxy and cannot use any other packages without going through the network administrator, so using the native nodejs packages is essential.
Any suggestions?
With a completion callback, the general design pattern for this type of problem makes use of a local function for doing each write:
var records = ....; // array of records to write
var index = 0;
function writeNext(r) {
write.to.db(r, function(err) {
if (err) {
// error handling
} else {
++index;
if (index < records.length) {
writeOne(records[index]);
}
}
});
}
writeNext(records[0]);
The key here is that you can't use synchronous iterators like .forEach() because they won't iterate one at a time and wait for completion. Instead, you do your own iteration.
If your write function returns a promise, you can use the .reduce() pattern that is common for iterating an array.
var records = ...; // some array of records to write
records.reduce(function(p, r) {
return p.then(function() {
return write.to.db(r);
});
}, Promsise.resolve()).then(function() {
// all done here
}, function(err) {
// error here
});
This solution chains promises together, waiting for each one to resolve before executing the next save.
It's kinda hard to tell which function would be best for your scenario w/o more detail, but I almost always use asyncjs for this kind of thing.
From what you say, one way to do it would be with async.map:
var recordsToCreate = [...];
function functionThatCallsTheApi(record, cb){
// do the api call, then call cb(null, rid)
}
async.map(recordsToCreate, functionThatCallsTheApi, function(err, results){
// here, err will be if anything failed in any function
// results will be an array of the rids
});
You can also check out other ones to enable throttling, which is probablya good idea.
I am writing a simple server in NodeJS without using any frameworks except database connection. I have this code to populate database:
module.exports = function(callback) {
var model = require('./model');
var seedData = [
// Some seed objects
];
var count = 0;
for(var i = 0; i < seedData.length; i++) {
model.Complaint.createComplaint(seedData[i], function(err, id) {
count++;
if (count === seedData.length) {
callback();
}
});
}
};
Here I check in each callback if other callbacks have been executed. If count is the length of seedData array, I call the main callback. Is it a good way to manage loop which calls async methods?
As ShanShan already mentioned in the comment, Promises are really powerful when you need to work with a lot of asynchronous functions. If you're just looking to simplify your current script a bit, Async.js may be useful for you. It allows you to rewrite your loop as:
async.each(seedData, model.Complaint.createComplaint, callback);
And provides a lot of other methods which give you more control over the flow (such as running the functions either in series or parallel).
While coding in Node.js, I encountered many situations when it is so hard to implement some elaborated logic mixed with database queries (I/O).
Consider an example written in python. We need to iterate over an array of values, for each value we query the database, then, based on the results, we need to compute the average.
def foo:
a = [1, 2, 3, 4, 5]
result = 0
for i in a:
record = find_from_db(i) # I/O operation
if not record:
raise Error('No record exist for %d' % i)
result += record.value
return result / len(a)
The same task in Node.js
function foo(callback) {
var a = [1, 2, 3, 4, 5];
var result = 0;
var itemProcessed = 0;
var error;
function final() {
if (itemProcessed == a.length) {
if (error) {
callback(error);
} else {
callback(null, result / a.length);
}
}
}
a.forEach(function(i) {
// I/O operation
findFromDb(function(err, record) {
itemProcessed++;
if (err) {
error = err;
} else if (!record) {
error = 'No record exist for ' + i;
} else {
result += record.value;
}
final();
});
});
}
You can see that such code much harder to write/read, and it is more prone to errors.
My questions:
Is there a way to make above Node.js code cleaner?
Imagine more sophisticated logic. For example, when we obtained a record from the db, we might need do another db query based on some conditions. In Node.js that becomes a nightmare. What are common patterns for dealing with such tasks?
Based on your experience, does the performance gain deserves the productivity loss when you code with Node.js?
Is there other asynchronous I/O framework/language that is easier to work with?
To answer your questions:
There are libraries such as async which provide a variety of solutions for common scenarios when working with asynchronous tasks. For "callback hell" concerns, there are many ways to avoid that as well, including (but not limited to) naming your functions and pulling them out, modularizing your code, and using promises.
More or less what you currently have is a fairly common pattern: having counter and function index variables with an array of functions to call. Again, async can help here because it reduces this kind of boilerplate that you will probably find yourself repeating often. async currently doesn't have methods that really allow for skipping individual tasks, but you could easily do this yourself if you are writing the boilerplate (just increment the function index variable by 2 for example).
From my own experience, if you properly design your javascript code with asynchronous in mind and use a lot of tools like async, you will find it easier to develop with node. Writing for asynchronous vs synchronous in node is typically always going to be more complicated (although less so with generators, fibers, etc. as compared to callbacks/promises).
I personally think that deciding on a language based upon that single aspect is not worthwhile. You have to consider much much more than just the design of the language, for example the size of the community, availability of third party libraries, performance, technical support options, ease of code debugging, etc.
Just write your code more compactly:
// parallel version
function foo (cb) {
var items = [ 1, 2, 3, 4, 5 ];
var pending = items.length;
var result = 0;
items.forEach(function (item) {
findFromDb(item, function (err, record) {
if (err) return cb(err);
if (!record) return cb(new Error('No record for: ' + item))
result += record.value / items.length;
if (-- pending === 0) cb(null, result);
});
});
}
That clocks in at 13 source lines of code compared to the 9 sloc for python that you posted. However, unlike the python that you posted, this code runs all the jobs in parallel.
To do the same thing in series, a trick I usually do is a next() function defined inline that invokes itself and pops a job off of an array:
// sequential version
function foo (cb) {
var items = [ 1, 2, 3, 4, 5 ];
var len = items.length;
var result = 0;
(function next () {
if (items.length === 0) return cb(null, result);
var item = items.shift();
findFromDb(item, function (err, record) {
if (err) return cb(err);
if (!record) return cb(new Error('No record for: ' + item))
result += record.value / len;
next();
});
})();
}
This time, 15 lines. The nice thing is that you can easily control whether the actions should happen in parallel or sequentially or somewhere in between. That is not so easy in a language like python where everything is synchronous and you've got to do lots of work-arounds like threads or evented libraries to get things back up to asynchronous. Try implementing a parallel version of what you have in python! It would most certainly be longer than the node version.
As for the promise/async route: it's not actually all that hard or bad to use ordinary functions for these relatively simple kinds of tasks. In the future (or in node 0.11+ with --harmony) you can use generators and a library like co, but that feature isn't widely deployed yet.
Everyone here seems to be suggesting async, which is a great library. But to give another suggestion, you should take a look at Promises , which is a new built-in being introduced to the language (and currently has several very good polyfills). It allows you to write asynchronous code in a way that looks much more structured. For example, take a look at this code:
var items = [ 1, 2, 3, 4 ];
var processItem = function(item, callback) {
// do something async ...
};
var values = [ ];
items.forEach(function(item) {
processItem(item, function(err, value) {
if (err) {
// something went wrong
}
values.push(value);
// all of the items have been processed, move on
if (values.length === items.length) {
doSomethingWithValues(values, function(err) {
if (err) {
// something went wrong
}
// and we're done
});
}
});
});
function doSomethingWithValues(values, callback) {
// do something async ...
}
Using promises, it would be written something like this:
var items = [ 1, 2, 3, 4 ];
var processItem = function(item) {
return new Promise(function(resolve, reject) {
// do something async ...
});
};
var doSomethingWithValues = function(values) {
return new Promise(function(resolve, reject) {
// do something async ...
});
};
// promise.all returns a new promise that will resolve when all of the promises passed to it have resolved
Promise.all(items.map(processItem))
.then(doSomethingWithValues)
.then(function() {
// and we're done
})
.catch(function(err) {
// something went wrong
});
The second version is much cleaner and simpler, and that barely even scratches the surface of promises real power. And, like I said, Promises are in es6 as a new language built-in, so (eventually) you won't even need to load in a library, it will just be available.
don't use anonymous (un-named) functions they make the code ugly and they make debugging much harder, so always name your functions and define them outside the function scope not inline.
that is a real issue with Node.js (it is called callback hell or pyramid of doom ,..) you can solve this issue by using promises or using async.js which have so many functions for handling different situations (waterfall, parallel, series, auto, ...)
well the performance gain is absolutely a good thing and it is not that much loss (when you start to master it) and also the Node.js community is great.
Check async.js, q.
The more I work with async the more I love it and I like node more. Let me give you a simple example of what I have for a server initialization.
async.parallel ({
"job1": loadFromCollection1,
"job2": loadFromCollection2,
},
function (initError, results) {
if (initError) {
console.log ("[INIT] Server initialization error occurred: " + JSON.stringify(initError, null, 3));
return callback (initError);
}
// Do more stuff with the results
});
In fact, this very same approach can be followed and one can pass different arguments to the different functions that correspond to the various jobs; see for example Passing arguments to async.parallel in node.js.
To be perfectly honest with you, I prefer the node-way which is also non-blocking. I think node forces someone to have a better design and sometimes you spend time creating more definitions and grouping functions and objects in arrays so that you can write better code. The reason I think is that in the end you want to exploit some variant of async and mix and merge stuff accordingly. In my opinion, spending some extra time and thinking about the code a bit more is well worth it when you also take into account that node is asynchronous.
Other than that, I think it is a habit. The more one writes code for node, the more one improves and writes better asynchronous code. What is good on node is that it really forces someone to write more robust code since one starts respecting all the error codes from all the functions much more. For example, how often do people check, say if malloc or new have succeeded and one does not have an error handler for a NULL pointer after the command has been issued? Writing asynchronous code though forces one to respect the events and the error codes that the events have. I guess one obvious reason is that one respects the code that one writes and in the end we have to write code that returns errors so that caller knows what happened.
I really think that you need to give it more time and start working with async more. That's all.
"If you try to code bussiness db login using pure node.js, you go straight to callback hell"
I've recently created a simple abstraction named WaitFor to call async functions in sync mode (based on Fibers): https://github.com/luciotato/waitfor
check the database example:
Database example (pseudocode)
pure node.js (mild callback hell):
var db = require("some-db-abstraction");
function handleWithdrawal(req,res){
try {
var amount=req.param("amount");
db.select("* from sessions where session_id=?",req.param("session_id"),function(err,sessiondata) {
if (err) throw err;
db.select("* from accounts where user_id=?",sessiondata.user_ID),function(err,accountdata) {
if (err) throw err;
if (accountdata.balance < amount) throw new Error('insufficient funds');
db.execute("withdrawal(?,?),accountdata.ID,req.param("amount"), function(err,data) {
if (err) throw err;
res.write("withdrawal OK, amount: "+ req.param("amount"));
db.select("balance from accounts where account_id=?", accountdata.ID,function(err,balance) {
if (err) throw err;
res.end("your current balance is " + balance.amount);
});
});
});
});
}
catch(err) {
res.end("Withdrawal error: " + err.message);
}
Note: The above code, although it looks like it will catch the exceptions, it will not.
Catching exceptions with callback hell adds a lot of pain, and i'm not sure if you will have the 'res' parameter
to respond to the user. If somebody like to fix this example... be my guest.
using wait.for:
var db = require("some-db-abstraction"), wait=require('wait.for');
function handleWithdrawal(req,res){
try {
var amount=req.param("amount");
sessiondata = wait.forMethod(db,"select","* from session where session_id=?",req.param("session_id"));
accountdata= wait.forMethod(db,"select","* from accounts where user_id=?",sessiondata.user_ID);
if (accountdata.balance < amount) throw new Error('insufficient funds');
wait.forMethod(db,"execute","withdrawal(?,?)",accountdata.ID,req.param("amount"));
res.write("withdrawal OK, amount: "+ req.param("amount"));
balance=wait.forMethod(db,"select","balance from accounts where account_id=?", accountdata.ID);
res.end("your current balance is " + balance.amount);
}
catch(err) {
res.end("Withdrawal error: " + err.message);
}
Note: Exceptions will be catched as expected.
db methods (db.select, db.execute) will be called with this=db
Your Code
In order to use wait.for, you'll have to STANDARDIZE YOUR CALLBACKS to function(err,data)
If you STANDARDIZE YOUR CALLBACKS, your code might look like:
var wait = require('wait.for');
//run in a Fiber
function process() {
var a = [1, 2, 3, 4, 5];
var result = 0;
a.forEach(function(i) {
// I/O operation
var record = wait.for(findFromDb,i); //call & wait for async function findFromDb(i,callback)
if (!record) throw new Error('No record exist for ' + i);
result += record.value;
});
return result/a.length;
}
function inAFiber(){
console.log('result is: ',process());
}
// run the loop in a Fiber (keep node spinning)
wait.launchFiber(inAFiber);
see? closer to python and no callback hell
I'm new to Node.js and am curious what the prescribed methodology is for running a loop on a process (repeatedly) where at the end of the execution some next step is to take place, but ONLY after all the iterations' callbacks have fired.
Specifically I'm making SQL calls and I need to close the sql connection after making a bunch of inserts and updates, but since they're all asynchronous, I have no way of knowing when all of them have in fact completed, so that I can call end() on the session.
Obviously this is a problem that extends far beyond this particular example, so, I'm not looking for the specific solution regarding sql, but more the general practice, which so far, I'm kind of stumped by.
What I'm doing now is actually setting a global counter to the length of the loop object and decrementing from it in each callback to see when it reaches zero, but that feels REALLY klugy, and I'm hoping theres a more elegant (and Javascript-centric) way to achieve this monitoring.
TIA
There are a bunch of flow-control libraries available that apply patterns to help with this kind of thing. My favorite is async. If you wanted to run a bunch of SQL queries one after another in order, for instance, you might use series:
async.series([
function(cb) { sql.exec("SOME SQL", cb) },
function(cb) { sql.exec("SOME MORE SQL", cb) },
function(cb) { sql.exec("SOME OTHER SQL", cb) }
], function(err, results) {
// Here, one of two things are true:
// (1) one of the async functions passed in an error to its callback
// so async immediately calls this callback with a non-null "err" value
// (2) all of the async code is done, and "results" is
// an array of each of the results passed to the callbacks
});
I wrote my own queue library to do this (I'll publish it one of these days), basically push queries onto a queue (an array basically) execute each one as it's removed, have a callback take place when the array is empty.
It doesn't take much to do it.
*edit. I've added this example code. It isn't what I've used before and I haven't tried it in practice, but it should give you a starting point. There's a lot more you can do with the pattern.
One thing to note. Queueing effectively makes your actions synchronous, they happen one after another. I wrote my mysql queue script so I could execute queries on multiple tables asynchronously but on any one table in synch, so that inserts and selects happened in the order they were requested.
var queue = function() {
this.queue = [];
/**
* Allows you to pass a callback to run, which is executed at the end
* This example uses a pattern where errors are returned from the
* functions added to the queue and then are passed to the callback
* for handling.
*/
this.run = function(callback){
var i = 0;
var errors = [];
while (this.queue.length > 0) {
errors[errors.length] = this.queue[i]();
delete this.queue[i];
i++;
}
callback(errors);
}
this.addToQueue = function(callback){
this.queue[this.queue.length] = callback;
}
}
use:
var q = new queue();
q.addToQueue(function(){
setTimeout(function(){alert('1');}, 100);
});
q.addToQueue(function(){
setTimeout(function(){alert('2');}, 50);
});
q.run();