I am running the following two functions one after the other. When runScript() finishes the callback function should be triggered. However these functions are running asynchronously, and data from the first function (which is required for the second function) returns as undefined.
I am new to callback functions, and am possibly approaching this incorrectly. I've omitted insertAssets() and insertPairs() for clarity, but their code is asynchronous and involves DB calls.
function runScript(cb) {
for(var name in exchange_objects) {
insertAssets(exchange_objects[name].assets, name); // runs some async code
}
cb();
}
runScript(function() {
for(var name in exchange_objects) {
insertPairs(exchange_objects[name].pairs, name); // runs some async code
};
});
Related
I am working with zombie.js to scrape one site, I must to use the callback style to connect to each url. The point is that I have got an urls array and I need to process each urls using an async function. This is my first approach:
Array urls = {http..., http...};
function process_url(index)
{
if(index == urls.length)
return;
async_function(url,
function() {
...
//parse the url
...
// Process the next url
process_url(index++);
}
);
}
process_url(0)
Without use someone third party nodejs library to use the asyn funtion as sync function or to wait for the function (wait.for, synchornized, mocha), this is the way that I though to resolve this problem, I don't know what would happen if the array is too big. Is the function released from the memory when the next function is called? or all the functions are in memory until the end?
Any ideas?
Your scheme will work. I call it "manually sequencing async operations".
A general purpose version of what you're doing would look like this:
function processItem(data, callback) {
// do your async function here
// for example, let's suppose it was an http request using the request module
request(data, callback);
}
function processArray(array, fn) {
var index = 0;
function next() {
if (index < array.length) {
fn(array[index++], function(err, result) {
// process error here
if (err) return;
// process result here
next();
});
}
}
next();
}
processArray(arr, processItem);
As to your specific questions:
I don't know what would happen if the array is too big. Is the
function released from the memory when the next function is called? or
all the functions are in memory until the end?
Memory in Javascript is released when it is not longer referenced by any running code and when the garbage collector gets time to run. Since you are running a series of asynchronous operations here, it is likely that the garbage collector gets a chance to run regularly while waiting for the http response from the async operation so memory could get cleaned up then. Functions are just another type of object in Javascript and they get garbage collected just like anything else. When they are no longer reference by running code, they are eligible for garbage collection.
In your specific code, because you are re-calling process_url() only in an async callback, there is no stack build-up (as in normal recursion). The prior instance of process_url() has already completed BEFORE the async callback is called and BEFORE you call the next iteration of process_url().
In general, management and coordination of multiple async operations is much, much easier using promises which are built into the current versions of node.js and are part of the ES6 ECMAScript standard. No external libraries are required to use promises in current versions of node.js.
For a list of a number of different techniques for sequencing your asynchronous operations on your array, both using promises and not using promises, see:
How to synchronize a sequence of promises?.
The first step in using promises is the "promisify" your async function so that it returns a promise instead of takes a callback.
function async_function_promise(url) {
return new Promise(function(resolve, reject) {
async_function(url, function(err, result) {
if (err) {
reject(err);
} else {
resolve(result);
}
});
});
}
Now, you have a version of your function that returns promises.
If you want your async operations to proceed one at a time so the next one doesn't start until the previous one has completed, then a usual design pattern for that is to use .reduce() like this:
function process_urls(array) {
return array.reduce(function(p, url) {
return p.then(function(priorResult) {
return async_function_promise(url);
});
}, Promise.resolve());
}
Then, you can call it like this:
var myArray = ["url1", "url2", ...];
process_urls(myArray).then(function(finalResult) {
// all of them are done here
}, function(err) {
// error here
});
There are also Promise libraries that have some helpful features that make this type of coding simpler. I, myself, use the Bluebird promise library. Here's how your code would look using Bluebird:
var Promise = require('bluebird');
var async_function_promise = Promise.promisify(async_function);
function process_urls(array) {
return Promise.map(array, async_function_promise, {concurrency: 1});
}
process_urls(myArray).then(function(allResults) {
// all of them are done here and allResults is an array of the results
}, function(err) {
// error here
});
Note, you can change the concurrency value to whatever you want here. For example, you would probably get faster end-to-end performance if you increased it to something between 2 and 5 (depends upon the server implementation on how this is best optimized).
I was reading this article on howtonode,
but don't get why it's fake async ?
So fake async is described as:
function asyncFake(data, callback) {
if(data === 'foo') callback(true);
else callback(false);
}
asyncFake('bar', function(result) {
// this callback is actually called synchronously!
});
Correct code: always async
function asyncReal(data, callback) {
process.nextTick(function() {
callback(data === 'foo');
});
}
My question is what's wrong with the first part of code ?
Why nextTick() can promise the 'right' effect ?...
Please explain to me. Thanks.
Nothing's wrong with the first case. However, it's just that you have to be proactive (as the developer) to know exactly what's happening in your code.
In the first case, you are not doing any I/O, and the callback is not actually being put into the Event Loop. It just the same as doing this:
if(data === 'foo')
return true;
else
return false;
However, in second case, you are putting the callback into the event loop until the next iteration.
In terms of how things work, nothing's wrong. However, you need to be aware of what implications are.
For instance:
function maybeSync(a, cb) {
if(a === 'a') {
cb('maybeSync called');
} else {
// put the called into event-loop for the next iteration
process.nextTick(function() {
cb('maybeSync called');
});
}
}
function definitelySync() {
console.log('definitelySync called');
}
someAsync('a', function(out) {
console.log(out);
});
definitelySync();
In the top case, which one gets called first?
If the input is "a" then the output is:
maybeSync called
definitelySync called
If the input is something else (e.g. not "a") then the output would be:
definitelySync called
maybeSync called
You need to make them consistent, as it would be easy to distinguish if the callback can be called sync/async based on the condition. Again comes back to being responsible, being consistent and being aware of what's happening in your code. =)
In the async fake function - you can see that callback is called as a normal function invocation i.e in the same call stack.
someone calls async fake fn
async fake fn calls the callback passed to it
If that callback takes time to complete - the async fake fn is kept waiting i.e it remains in call stack.
By using process tick - we are just submitting the callback function to be scheduled for execution.
This process tick call completes immediately and the asyncreal will return immediately. Thus the callback will be executed in asynchronous way.
Folks,
I have the following function, and am wondering whats the correct way to call the callback() only when the database operation completes on all items:
function mapSomething (callback) {
_.each(someArray, function (item) {
dao.dosomething(item.foo, function (err, account) {
item.email = account.email;
});
});
callback();
},
What I need is to iterate over someArray and do a database call for each element. After replacing all items in the array, I need to only then call the callback. Ofcourse the callback is in the incorrect place right now
Thanks!
The way you currently have it, callback is executed before any of the (async) tasks finish.
The async module has an each() that allows for a final callback:
var async = require('async');
// ...
function mapSomething (callback) {
async.each(someArray, function(item, cb) {
dao.dosomething(item.foo, function(err, account) {
if (err)
return cb(err);
item.email = account.email;
cb();
});
}, callback);
}
This will not wait for all your database calls to be done before calling callback(). It will launch all the database calls at once in parallel (I'm assuming that's what dao.dosomething() is). And, then immediately call callback() before any of the database calls have finished.
You have several choices to solve the problem.
You can use promises (by promisifying the database call) and then use Promise.all() to wait for all the database calls to be done.
You can use the async library to manage the coordination for you.
You can keep track of when each one is done yourself and when the last one is done, call your callback.
I would recommend options 1. or 2. Personally, I prefer to use promises and since you're interfacing with a database, this is probably not the only place you're making database calls, so I'd promisify the interface (bluebird will do that for you in one function call) and then use promises.
Here's what a promise solution could look like:
var Promise = require('bluebird');
// make promise version of your DB function
// ideally, you'd promisify the whole DB API with .promisifyAll()
var dosomething = Promise.promisify(dao.dosomething, dao);
function mapSomething (callback, errCallback) {
Promise.all(_.map(someArray, function(item) {
return dosomething(item.foo).then(function (account) {
item.email = account.email;
});
}).then(callback, errCallback);
}
This assumes you want to run all the DB calls in parallel and then call the callback when they are all done.
FYI, here's a link to how Bluebird promisify's existing APIs. I use this mechanism for all async file I/O in node and it saves a ton of coding time and makes error handling much more sane. Async callbacks are a nightmare for proper error handling, especially if exceptions can be thrown from async callbacks.
P.S. You may actually want your mapSomething() function to just return a promise itself so the caller is then responsible for specifying their own .then() handler and it allows the caller to use the returned promise for their own synchronization with other things (e.g. it's just more flexible that way).
function mapSomething() {
return Promise.all(_.map(someArray, function(item) {
return dosomething(item.foo).then(function (account) {
item.email = account.email;
});
})
}
mapSomething.then(mapSucessHandler, mapErrorHandler);
I haven't tried Bluebird's .map() myself, but once you've promisified the database call, I think it would simplify it a bit more like this:
function mapSomething() {
return Promise.map(someArray, function(item) {
return dosomething(item.foo).then(function (account) {
item.email = account.email;
});
})
}
mapSomething.then(mapSucessHandler, mapErrorHandler);
I am having difficulty using Fibers/Meteor.bindEnvironment(). I tried to have code updating and inserting to a collection if the collection starts empty. This is all supposed to be running server-side on startup.
function insertRecords() {
console.log("inserting...");
var client = Knox.createClient({
key: apikey,
secret: secret,
bucket: 'profile-testing'
});
console.log("created client");
client.list({ prefix: 'projects' }, function(err, data) {
if (err) {
console.log("Error in insertRecords");
}
for (var i = 0; i < data.Contents.length; i++) {
console.log(data.Contents[i].Key);
if (data.Contents[i].Key.split('/').pop() == "") {
Projects.insert({ name: data.Contents[i].Key, contents: [] });
} else if (data.Contents[i].Key.split('.').pop() == "jpg") {
Projects.update( { name: data.Contents[i].Key.substr(0,
data.Contents[i].Key.lastIndexOf('.')) },
{ $push: {contents: data.Contents[i].Key}} );
} else {
console.log(data.Contents[i].Key.split('.').pop());
}
}
});
}
if (Meteor.isServer) {
Meteor.startup(function () {
if (Projects.find().count() === 0) {
boundInsert = Meteor.bindEnvironment(insertRecords, function(err) {
if (err) {
console.log("error binding?");
console.log(err);
}
});
boundInsert();
}
});
}
My first time writing this, I got errors that I needed to wrap my callbacks in a Fiber() block, then on discussion on IRC someone recommending trying Meteor.bindEnvironment() instead, since that should be putting it in a Fiber. That didn't work (the only output I saw was inserting..., meaning that bindEnvironment() didn't throw an error, but it also doesn't run any of the code inside of the block). Then I got to this. My error now is: Error: Meteor code must always run within a Fiber. Try wrapping callbacks that you pass to non-Meteor libraries with Meteor.bindEnvironment.
I am new to Node and don't completely understand the concept of Fibers. My understanding is that they're analogous to threads in C/C++/every language with threading, but I don't understand what the implications extending to my server-side code are/why my code is throwing an error when trying to insert to a collection. Can anyone explain this to me?
Thank you.
You're using bindEnvironment slightly incorrectly. Because where its being used is already in a fiber and the callback that comes off the Knox client isn't in a fiber anymore.
There are two use cases of bindEnvironment (that i can think of, there could be more!):
You have a global variable that has to be altered but you don't want it to affect other user's sessions
You are managing a callback using a third party api/npm module (which looks to be the case)
Meteor.bindEnvironment creates a new Fiber and copies the current Fiber's variables and environment to the new Fiber. The point you need this is when you use your nom module's method callback.
Luckily there is an alternative that takes care of the callback waiting for you and binds the callback in a fiber called Meteor.wrapAsync.
So you could do this:
Your startup function already has a fiber and no callback so you don't need bindEnvironment here.
Meteor.startup(function () {
if (Projects.find().count() === 0) {
insertRecords();
}
});
And your insert records function (using wrapAsync) so you don't need a callback
function insertRecords() {
console.log("inserting...");
var client = Knox.createClient({
key: apikey,
secret: secret,
bucket: 'profile-testing'
});
client.listSync = Meteor.wrapAsync(client.list.bind(client));
console.log("created client");
try {
var data = client.listSync({ prefix: 'projects' });
}
catch(e) {
console.log(e);
}
if(!data) return;
for (var i = 1; i < data.Contents.length; i++) {
console.log(data.Contents[i].Key);
if (data.Contents[i].Key.split('/').pop() == "") {
Projects.insert({ name: data.Contents[i].Key, contents: [] });
} else if (data.Contents[i].Key.split('.').pop() == "jpg") {
Projects.update( { name: data.Contents[i].Key.substr(0,
data.Contents[i].Key.lastIndexOf('.')) },
{ $push: {contents: data.Contents[i].Key}} );
} else {
console.log(data.Contents[i].Key.split('.').pop());
}
}
});
A couple of things to keep in mind. Fibers aren't like threads. There is only a single thread in NodeJS.
Fibers are more like events that can run at the same time but without blocking each other if there is a waiting type scenario (e.g downloading a file from the internet).
So you can have synchronous code and not block the other user's events. They take turns to run but still run in a single thread. So this is how Meteor has synchronous code on the server side, that can wait for stuff, yet other user's won't be blocked by this and can do stuff because their code runs in a different fiber.
Chris Mather has a couple of good articles on this on http://eventedmind.com
What does Meteor.wrapAsync do?
Meteor.wrapAsync takes in the method you give it as the first parameter and runs it in the current fiber.
It also attaches a callback to it (it assumes the method takes a last param that has a callback where the first param is an error and the second the result such as function(err,result).
The callback is bound with Meteor.bindEnvironment and blocks the current Fiber until the callback is fired. As soon as the callback fires it returns the result or throws the err.
So it's very handy for converting asynchronous code into synchronous code since you can use the result of the method on the next line instead of using a callback and nesting deeper functions. It also takes care of the bindEnvironment for you so you don't have to worry about losing your fiber's scope.
Update Meteor._wrapAsync is now Meteor.wrapAsync and documented.
I found this code where it says that I can run some db queries asynchronously
var queries = [];
for (var i=0;i <1; i++) {
queries.push((function(j){
return function(callback) {
collection.find(
{value:"1"},
function(err_positive, result_positive) {
result_positive.count(function(err, count){
console.log("Total matches: " + count);
positives[j] = count;
callback();
});
}
);
}
})(i));
}
async.parallel(queries, function(){
// do the work with the results
}
I didn't get the part what is callback function how is it defined ? second in the queries.push, it is passing the function(j) what is j in this and what is this (i) for
queries.push((function(j){})(i));
I am totally confused how this code is working?
The loop is preparing an array of nearly-identical functions as tasks for async.parallel().
After the loop, given it only iterates once currently, queries would be similar to:
var queries = [
function (callback) {
collection.find(
// etc.
);
}
];
And, for each additional iteration, a new function (callback) { ... } would be added.
what is callback function how is it defined ?
callback is just a named argument for each of the functions. Its value will be defined by async.parallel() as another function which, when called, allows async to know when each of the tasks has completed.
second in the queries.push, it is passing the function(j) what is j in this and what is this (i) for
queries.push((function(j){})(i));
The (function(j){})(i) is an Immediately-Invoked Function Expression (IIFE) with j as a named argument, it's called immediately with i a passed argument, and returns a new function (callback) {} to be pushed onto queries:
queries.push(
(function (j) {
return function (callback) {
// etc.
};
})(i)
);
The purpose of the IIFE is to create a closure -- a lexical scope with local variables that is "stuck" to any functions also defined within the scope. This allows each function (callback) {} to have its own j with a single value of i (since i will continue to change value as the for loop continues).
Callback is one of the coolest features. Callback is just another normal function. You can actually pass a function itself as a parameter in another function.
Say function foo() does something. You may want to execute something else right after foo() is done executing. So, in order to achieve this, you define a function bar() and pass this function as a parameter to function foo()!
function foo(callback){
// do something
callback();
}
function bar(){
// do something else
}
foo(bar());
//here we can see that a function itself is being passed as a param to function foo.
For more understanding here's the right link.
And
queries.push((function(j){})(i));
in javascript, this is another way of calling a function.
function(){
// do something.
}();
You don't actually need to define a function name and it can be called directly without a name. You can even pass params to it.
function(a){
alert(a)'
}(10);