Here is the situation:
I am new to node.js, I have a 40MB file containing multilevel json file like:
[{},{},{}] This is an array of objects (~7000 objects). Each object has properties and a one of those properties is also an array of objects
I wrote a function to read the content of the file and iterate it. I succeeded to get what I wanted in terms of content but not usability. I thought that I wrote an async function that would allow node to serve other web requests while iterating the array but that is not the case. I would be very thankful if anyone can point me to what I've done wrong and how to rewrite it so I can have a non-blocking iteration. Here's the function that handles the situation:
function getContents(callback) {
fs.readFile(file, 'utf8', function (err, data) {
if (err) {
console.log('Error: ' + err);
return;
}
js = JSON.parse(data);
callback();
return;
});
}
getContents(iterateGlobalArr);
var count = 0;
function iterateGlobalArr() {
if (count < js.length) {
innerArr = js.nestedProp;
//iterate nutrients
innerArr.forEach(function(e, index) {
//some simple if condition here
});
var schema = {
//.....get props from forEach iteration
}
Model.create(schema, function(err, post) {
if(err) {
console.log('\ncreation error\n', err);
return;
}
if (!post) {
console.log('\nfailed to create post for schema:\n' + schema);
return;
}
});
count++;
process.nextTick(iterateGlobalArr);
}
else {
console.log("\nIteration finished");
next();
}
Just so it is clear how I've tested the above situation. I open two tabs one loading this iteration which takes some time and second with another node route which does not load until the iteration is over. So essentially I've written a blocking code but not sure how to re-factor it! I suspect that just because everything is happening in the callback I am unable to release the event loop to handle another request...
Your code is almost correct. What you are doing is inadvertently adding ALL the items to the very next tick... which still blocks.
The important piece of code is here:
Model.create(schema, function(err, post) {
if(err) {
console.log('\ncreation error\n', err);
return;
}
if (!post) {
console.log('\nfailed to create post for schema:\n' + schema);
return;
}
});
// add EVERYTHING to the very same next tick!
count++;
process.nextTick(iterateGlobalArr);
Let's say you are in tick A of the event loop when getContents() runs and count is 0. You enter iterateGlobalArr and you call Model.create. Because Model.create is async, it is returning immediately, causing process.nextTick() to add processing of item 1 to the next tick, let's say B. Then it calls iterateGlobalArr, which does the same thing, adding item 2 to the next tick, which is still B. Then item 3, and so on.
What you need to do is move the count increment and process.nextTick() into the callback of Model.create(). This will make sure the current item is processed before nextTick is invoked... which means next item is actually added to the next tick AFTER the model item has been created... which will give your app time to handle other things in between. The fixed version of iterateGlobalArr is here:
function iterateGlobalArr() {
if (count < js.length) {
innerArr = js.nestedProp;
//iterate nutrients
innerArr.forEach(function(e, index) {
//some simple if condition here
});
var schema = {
//.....get props from forEach iteration
}
Model.create(schema, function(err, post) {
// schedule our next item to be processed immediately.
count++;
process.nextTick(iterateGlobalArr);
// then move on to handling this result.
if(err) {
console.log('\ncreation error\n', err);
return;
}
if (!post) {
console.log('\nfailed to create post for schema:\n' + schema);
return;
}
});
}
else {
console.log("\nIteration finished");
next();
}
}
Note also that I would strongly suggest that you pass in your js and counter with each call to iterageGlobalArr, as it will make your iterateGlobalArr alot easier to debug, among other things, but that's another story.
Cheers!
Node is single-threaded so async will only help you if you are relying on another system/subsystem to do the work (a shell script, external database, web service etc). If you have to do the work in Node you are going to block while you do it.
It is possible to create one node process per core. This solution would result in only blocking one of the node processes and leave the rest to service your requests, but this feature is still listed as experimental http://nodejs.org/api/cluster.html.
A single instance of Node runs in a single thread. To take advantage
of multi-core systems the user will sometimes want to launch a cluster
of Node processes to handle the load.
The cluster module allows you to easily create child processes that
all share server ports.
Related
This is some of my code that I have in my index.js. Its waiting for the person to visit url.com/proxy and then it loads up my proxy page, which is really just a form which sends back an email and a code. From my MongoDB database, I grab the users order using the code, which contains some information I need (like product and the message they're trying to get). For some reason, it seems like its responding before it gets this information and then holds onto it for the next time the form is submitted.
The newline in my res.send(product + '\n' + message) isnt working either, but thats not a big deal right now.
But.. for example, the first time I fill out the form ill get a blank response. The second time, I'll get the response to whatever I filled in for the first form, and then the third time ill get the second response. I'm fairly new to Web Development, and feel like I'm doing something obviously wrong but can't seem to figure it out. Any help would be appreciated, thank you.
app.get('/proxy', function(req,res){
res.sendFile(__dirname+ "/views/proxy.html");
});
var message = "";
var product = "";
app.post('/getMessage', function(req,res)
{
returnMsg(req.body.user.code, req.body.user.email);
//res.setHeader('Content-Type', 'text/plain');
res.send(product + "\n" + message);
});
function returnMsg(code, email){
MongoClient.connect(url, function(err, db){
var cursor = db.collection('Orders').find( { "order_id" : Number(code) })
cursor.each(function(err, doc){
assert.equal(err, null);
if (doc!= null)
{
message = doc["message"];
product = doc["product"];
}
else {
console.log("wtf");
// error code here
}
});
console.log(email + " + " + message);
var document = {
"Email" : email,
"Message" : message
}
db.collection("Users").insertOne(document);
db.close();
});
}
You need to do lots of reading about your asynchronous programming works in node.js. There are significant design problems with this code:
You are using module level variables instead of request-level variables.
You are not correctly handling asynchronous responses.
All of this makes a server that simply does not work correctly. You've found one of the problems already. Your async response finishes AFTER you send your response so you end up sending the previously saved response not the current one. In addition, if multiple users are using your server, their responses will tromp on each other.
The core design principle here is first that you need to learn how to program with asynchronous operations. Any function that uses an asynchronous respons and wants to return that value back to the caller needs to accept a callback and deliver the async value via the callback or return a promise and return the value via a resolved promise. The caller then needs to use that callback or promise to fetch the async value when it is available and only send the response then.
In addition, all data associated with a request needs to stay "inside" the request handle or the request object - not in any module level or global variables. That keeps the request from one user from interfering with the requests from another user.
To understand how to return a value from a function with an asynchronous operation in it, see How do I return the response from an asynchronous call?.
What ends up happening in your code is this sequence of events:
Incoming request for /getMessage
You call returnMsg()
returnMsg initiates a connection to the database and then returns
Your request handler calls res.send() with whatever was previously in the message and product variables.
Then, sometime later, the database connect finishes and you call db.collection().find() and then iterate the cursor.
6/ Some time later, the cursor iteration has the first result which you put into your message and product variables (where those values sit until the next request comes in).
In working out how your code should actually work, there are some things about your logic that are unclear. You are assigning message and product inside of cursor.each(). Since cursor.each() is a loop that can run many iterations, which value of message and product do you actually want to use in the res.send()?
Assuming you want the last message and product value from your cursor.each() loop, you could do this:
app.post('/getMessage', function(req, res) {
returnMsg(req.body.user.code, req.body.user.email, function(err, message, product) {
if (err) {
// send some meaningful error response
res.status(500).end();
} else {
res.send(product + "\n" + message);
}
});
});
function returnMsg(code, email, callback) {
let callbackCalled = false;
MongoClient.connect(url, function(err, db) {
if (err) {
return callback(err);
}
var cursor = db.collection('Orders').find({
"order_id": Number(code)
});
var message = "";
var product = "";
cursor.each(function(err, doc) {
if (err) {
if (!callbackCalled) {
callback(err);
callbackCalled = true;
}
} else {
if (doc != null) {
message = doc["message"];
product = doc["product"];
} else {
console.log("wtf");
// error code here
}
}
});
if (message) {
console.log(email + " + " + message);
var document = {
"Email": email,
"Message": message
}
db.collection("Users").insertOne(document);
}
db.close();
if (!callbackCalled) {
callback(null, message, product);
}
});
}
Personally, I would use promises and use the promise interface in your database rather than callbacks.
This code is still just conceptual because it has other issues you need to deal with such as:
Proper error handling is still largely unfinished.
You aren't actually waiting for things like the insert.One() to finish before proceeding.
I have some confusions over nodejs and would like some help. I have a table called camps, contacts and camp_contact. I have to show the contact list based on the camp the user belongs to. I used async to loop through the camps, which I save in the user session, and then grab the data from mysql.
var array_myData = [];
async.each(req.session.user.camps, function(camps, callback) {
database.getConnection(function(err, connection) {
// Use the connection
connection.query('SELECT contacts.*, contact_camp.* '+
' FROM contacts JOIN contact_camp '+
' ON contact_camp.contact_id = contacts.id '+
' WHERE contact_camp.camp_id = ?',
[camps.camp_id], function(err,data){
if(err) {
//this will call the err function
callback('error');
}
else {
array_myData = array_myData.concat(data);
callback();
}
connection.release();
});
});
//final function call.
}, function(err){
// if any error happened, this function fires.
if( err ) {
// All processing will now stop.
} else {
res.render('contacts',
{
page
});
}
});
The code works fine. Now the thing I'm wondering about is, does using array.concat block the thread? if so, how can I change that? I read around and according to what I understood, I/O operation that are not asynchronous blocks the thread like reading from file or database. Does having array like this var array = ['a','b', 'c'] and looping through it would block the thread?
Lastly, is there a way to know if a code that I have written has blocked the thread or not? Because I get worried every time I write a function of my own.
I also get confused when a create a function with a callback like:
function test(param, fn) { do something; fn(); }
I'm not sure if this kind of function without a timer would block the thread or not.
Ok, if it helps you somehow:
I read around and according to what I understood, I/O operation that are not asynchronous blocks the thread like reading from file or database.
In NodeJS the operations on file descriptors are asynchronous => non-blocking
The file descriptors would be:
database operations,
open/close files
network operations
ex.:
fs.readFile('<pathToTheFile>', (err, data) => {
if (err) throw err;
console.log(data);
});
// if your file is very big, until it is read, maybe other requests will be finished
Does having array like this var array = ['a','b', 'c'] and looping through it would block the thread?
Yes, it will block the thread, but operations like this are not resource intensive. NodeJs is an event-driven language (single-threaded), but also, in a multi threaded or multi process language, the kernel thread would still be blocked by such operations => the same thing ... this is maybe off topic.
// if you do something like this
while(true){
function test(param, fn) { do something; fn(); }
}
// you will see that you just blocked the thread
I'm not sure if this kind of function without a timer would block the thread or not
normally you would put that function in a error handler, and it shouldn't block the thread if you're not doing an infinite while.
try{}catch(err){
// do something with the error
}
I have been working on nodeJS + MongoDB, using the Express and Mongoose frameworks for a few months, and I wanted to ask you guys what is really happening in a situation such as the following:
Model1.find({}, function (err, elems) {
if (err) {
console.log('ERROR');
} else {
elems.forEach(function (el) {
Model2.find({[QUERY RELATED WITH FIELDS IN 'el']}, function (err, elems2) {
if (err) {
console.log('ERROR');
} else {
//DO STAFF.
}
});
});
}
});
My best guess is that there's a main thread looping over elems, and then different threads attending each query over Model2, but I'm not really sure.
Is that correct? And also, is this a good solution? And if not, how would you code in a situation such as this, where you need the information in each of the elements you get from Model1 to get elements from Model2, and perform the actual functionality you are looking for?
I know I could elaborate a more complex query where I could get all the elements each of the 'el' in elems would yield, but I¡d rather not do that, because in that case i would be worried about the memory expense.
Also, I've been thinking about changing the data model, but I've gone over it and I'm confident it is well thought, and I don't think that's the best solution for my aplication.
Thanks!
NodeJS is a single threaded environment and it works asynchronously for blocking function calls such as network requests in your case. So there is only one thread and your query results will be called asynchronously so that nothing will be blocked due to intensive network operation.
In your scenario if the first query returns quite a lot of records such as 100000 thousands you may exhaust your mongo server in your loop as you will query your server as many as the result of first query instantly. This will happen because node won't stop for receiving the results of each query as it works asynchronously.
So usually manually throttling the requests to network operations is a good practice. This is not trivial when working on asynchronous environment. One way to do is to use recursive function call. Basically you split your tasks into groups and do each group in batch, once you are done with one batch you start with your next group.
Here is a simple example on how to do it, I have used promises instead of callback functions, Q is a promise library that is very useful for handling promises:
var rows = [...]; // array of many
function handleRecursively(startIndex, batchSize){
var promises = [];
for(i = 0; i < batchSize && i + batchSize < rows.length; i++){
var theRow = rows[startIndex + i];
promises.push(doAsynchronousJobWithTheRow(theRow));
}
//you wait until you handle all tasks in this iteration
Q.all(promises).then(function(){
startIndex += batchSize;
if(startIndex < rows.length){ // if there is still task to do continue with next batch
handleRecursively(startIndex, batchSize); }
})
}
handleRecursively(0, 1000);
Here is the best solution :
Model1.find({}, function (err, elems) {
if (err) {
console.log('ERROR');
} else {
loopAllElements(0,elems);
}
});
function loopAllElements(startIndex,elems){
if (startIndex==elems.length) {
return "success";
}else{
Model2.find({[QUERY RELATED WITH FIELDS IN elems[startIndex] ]}, function (err, elems2) {
if (err) {
console.log('ERROR');
return "error";
} else {
//DO STAFF.
loopAllElements(startIndex+1, elems);
}
});
}
}
I am having difficulty using Fibers/Meteor.bindEnvironment(). I tried to have code updating and inserting to a collection if the collection starts empty. This is all supposed to be running server-side on startup.
function insertRecords() {
console.log("inserting...");
var client = Knox.createClient({
key: apikey,
secret: secret,
bucket: 'profile-testing'
});
console.log("created client");
client.list({ prefix: 'projects' }, function(err, data) {
if (err) {
console.log("Error in insertRecords");
}
for (var i = 0; i < data.Contents.length; i++) {
console.log(data.Contents[i].Key);
if (data.Contents[i].Key.split('/').pop() == "") {
Projects.insert({ name: data.Contents[i].Key, contents: [] });
} else if (data.Contents[i].Key.split('.').pop() == "jpg") {
Projects.update( { name: data.Contents[i].Key.substr(0,
data.Contents[i].Key.lastIndexOf('.')) },
{ $push: {contents: data.Contents[i].Key}} );
} else {
console.log(data.Contents[i].Key.split('.').pop());
}
}
});
}
if (Meteor.isServer) {
Meteor.startup(function () {
if (Projects.find().count() === 0) {
boundInsert = Meteor.bindEnvironment(insertRecords, function(err) {
if (err) {
console.log("error binding?");
console.log(err);
}
});
boundInsert();
}
});
}
My first time writing this, I got errors that I needed to wrap my callbacks in a Fiber() block, then on discussion on IRC someone recommending trying Meteor.bindEnvironment() instead, since that should be putting it in a Fiber. That didn't work (the only output I saw was inserting..., meaning that bindEnvironment() didn't throw an error, but it also doesn't run any of the code inside of the block). Then I got to this. My error now is: Error: Meteor code must always run within a Fiber. Try wrapping callbacks that you pass to non-Meteor libraries with Meteor.bindEnvironment.
I am new to Node and don't completely understand the concept of Fibers. My understanding is that they're analogous to threads in C/C++/every language with threading, but I don't understand what the implications extending to my server-side code are/why my code is throwing an error when trying to insert to a collection. Can anyone explain this to me?
Thank you.
You're using bindEnvironment slightly incorrectly. Because where its being used is already in a fiber and the callback that comes off the Knox client isn't in a fiber anymore.
There are two use cases of bindEnvironment (that i can think of, there could be more!):
You have a global variable that has to be altered but you don't want it to affect other user's sessions
You are managing a callback using a third party api/npm module (which looks to be the case)
Meteor.bindEnvironment creates a new Fiber and copies the current Fiber's variables and environment to the new Fiber. The point you need this is when you use your nom module's method callback.
Luckily there is an alternative that takes care of the callback waiting for you and binds the callback in a fiber called Meteor.wrapAsync.
So you could do this:
Your startup function already has a fiber and no callback so you don't need bindEnvironment here.
Meteor.startup(function () {
if (Projects.find().count() === 0) {
insertRecords();
}
});
And your insert records function (using wrapAsync) so you don't need a callback
function insertRecords() {
console.log("inserting...");
var client = Knox.createClient({
key: apikey,
secret: secret,
bucket: 'profile-testing'
});
client.listSync = Meteor.wrapAsync(client.list.bind(client));
console.log("created client");
try {
var data = client.listSync({ prefix: 'projects' });
}
catch(e) {
console.log(e);
}
if(!data) return;
for (var i = 1; i < data.Contents.length; i++) {
console.log(data.Contents[i].Key);
if (data.Contents[i].Key.split('/').pop() == "") {
Projects.insert({ name: data.Contents[i].Key, contents: [] });
} else if (data.Contents[i].Key.split('.').pop() == "jpg") {
Projects.update( { name: data.Contents[i].Key.substr(0,
data.Contents[i].Key.lastIndexOf('.')) },
{ $push: {contents: data.Contents[i].Key}} );
} else {
console.log(data.Contents[i].Key.split('.').pop());
}
}
});
A couple of things to keep in mind. Fibers aren't like threads. There is only a single thread in NodeJS.
Fibers are more like events that can run at the same time but without blocking each other if there is a waiting type scenario (e.g downloading a file from the internet).
So you can have synchronous code and not block the other user's events. They take turns to run but still run in a single thread. So this is how Meteor has synchronous code on the server side, that can wait for stuff, yet other user's won't be blocked by this and can do stuff because their code runs in a different fiber.
Chris Mather has a couple of good articles on this on http://eventedmind.com
What does Meteor.wrapAsync do?
Meteor.wrapAsync takes in the method you give it as the first parameter and runs it in the current fiber.
It also attaches a callback to it (it assumes the method takes a last param that has a callback where the first param is an error and the second the result such as function(err,result).
The callback is bound with Meteor.bindEnvironment and blocks the current Fiber until the callback is fired. As soon as the callback fires it returns the result or throws the err.
So it's very handy for converting asynchronous code into synchronous code since you can use the result of the method on the next line instead of using a callback and nesting deeper functions. It also takes care of the bindEnvironment for you so you don't have to worry about losing your fiber's scope.
Update Meteor._wrapAsync is now Meteor.wrapAsync and documented.
I'm using Mongoose with Node.js and have the following code that will call the callback after all the save() calls has finished. However, I feel that this is a very dirty way of doing it and would like to see the proper way to get this done.
function setup(callback) {
// Clear the DB and load fixtures
Account.remove({}, addFixtureData);
function addFixtureData() {
// Load the fixtures
fs.readFile('./fixtures/account.json', 'utf8', function(err, data) {
if (err) { throw err; }
var jsonData = JSON.parse(data);
var count = 0;
jsonData.forEach(function(json) {
count++;
var account = new Account(json);
account.save(function(err) {
if (err) { throw err; }
if (--count == 0 && callback) callback();
});
});
});
}
}
You can clean up the code a bit by using a library like async or Step.
Also, I've written a small module that handles loading fixtures for you, so you just do:
var fixtures = require('./mongoose-fixtures');
fixtures.load('./fixtures/account.json', function(err) {
//Fixtures loaded, you're ready to go
};
Github:
https://github.com/powmedia/mongoose-fixtures
It will also load a directory of fixture files, or objects.
I did a talk about common asyncronous patterns (serial and parallel) and ways to solve them:
https://github.com/masylum/i-love-async
I hope its useful.
I've recently created simpler abstraction called wait.for to call async functions in sync mode (based on Fibers). It's at an early stage but works. It is at:
https://github.com/luciotato/waitfor
Using wait.for, you can call any standard nodejs async function, as if it were a sync function, without blocking node's event loop. You can code sequentially when you need it.
using wait.for your code will be:
//in a fiber
function setup(callback) {
// Clear the DB and load fixtures
wait.for(Account.remove,{});
// Load the fixtures
var data = wait.for(fs.readFile,'./fixtures/account.json', 'utf8');
var jsonData = JSON.parse(data);
jsonData.forEach(function(json) {
var account = new Account(json);
wait.forMethod(account,'save');
}
callback();
}
That's actually the proper way of doing it, more or less. What you're doing there is a parallel loop. You can abstract it into it's own "async parallel foreach" function if you want (and many do), but that's really the only way of doing a parallel loop.
Depending on what you intended, one thing that could be done differently is the error handling. Because you're throwing, if there's a single error, that callback will never get executed (count won't be decremented). So it might be better to do:
account.save(function(err) {
if (err) return callback(err);
if (!--count) callback();
});
And handle the error in the callback. It's better node-convention-wise.
I would also change another thing to save you the trouble of incrementing count on every iteration:
var jsonData = JSON.parse(data)
, count = jsonData.length;
jsonData.forEach(function(json) {
var account = new Account(json);
account.save(function(err) {
if (err) return callback(err);
if (!--count) callback();
});
});
If you are already using underscore.js anywhere in your project, you can leverage the after method. You need to know how many async calls will be out there in advance, but aside from that it's a pretty elegant solution.