NodeJS callbacks: tracking db calls so I can terminate the process - node.js

I got a bill from Heroku this month, much to my surprise. It was only a few dollars, luckily, but I didn't think my usage had been that high. I checked the bill and it said I'd used about 1000 hours last month. I was briefly confused, since my app just runs for a few seconds every hour to send some emails, but then I realized that the process just wasn't terminating.
After commenting out swaths of my code, I've determined that the process doesn't exit because my mongoose database connection is still open. But I've got several nested callbacks to the database and then to mailgun to send these emails, and sometimes the mailgun callback has its own mailgun callback. How do I keep track of these and ensure that the database is closed at the end?

I asked my JS ninja friend, and he said to use semaphores. This sounded daunting but was actually incredibly easy.
npm install semaphore --save
Package page here. Then, for each of my database calls, I did this:
sem.take(function () {
Object.find({key: value}, function () {
sem.leave(); // (I don't need the database anymore)
// tons of other code
});
});
Then I made sure that all of that code runs before this:
sem.take(function () {
sem.leave();
db.close();
});
I think I probably could use a deeper understanding of what's going on, but this is working for now.

Related

setTimeouts not always working on node.js discord bot

I created a discord bot using Node.js and the discord.js-commando framework.
One of the features is to create writing sprints, which essentially is a timer, so you could say: I want to write for 20 minutes, starting in 5 minutes. The bot would then wait 5 minutes and start the sprint, then after 20 minutes it notifies the users doing it that it's ended and waits for wordcounts to come in, then posts the results.
This was working fine when the bot was only on one server, but it's been added to several more recently (78 according to the !stats command, though I don't know how many are actively using it), and since then, it's been very erratic.
Sometimes the sprint never starts, sometimes it never ends, sometimes it ends and then after you post your wordcounts, it never posts the final results.
This is my first ever dabbling with Node.js, so I don't know if I'm doing something wrong. I am doing all of the timers with the setTimeout function.
Here is the command file: link to GitHub
As an example, this is the timeout that is set after a user submits their wordcount, if everyone has now submitted their wordcount, so we can display the results:
msg.say('The word counts are in. Results coming up shortly...');
this.finished = 1;
// Clear original timeout
this.clear();
// Set new one
this.messageTimeout = setTimeout(function() {
obj.finish(msg);
}, 10000);
Where clear is:
clear() {
clearTimeout(this.messageTimeout);
}
Is there something inherently wrong with doing it this way? I know very little about Node.js... Should I perhaps look at doing a cron every minute instead to process sprints? Or could this be a server issue? I am running it on a free EC2 AWS server, but the reports all look okay, no resources are being used at abnormally high levels.
Thanks.

Is NodeJS is suitable for websites - (ie. Port Blocking)?

I have gone through many painful months with is issue and I am now ready to let this go to the bin of "what-a-great-lie-for-websites-nodejs-is"!!!
All NodeJS tutorials discuss how to create a website. When done, it works. For one person at a time though. All requests sent to the port will be blocked by the first come-first-serve situation. Why? Because most requests sent to the nodejs server will have to get parsed, data requested from the database, data calculated and parsed, response prepared and sent back to the ajax call. (this is a mere simple website example).
Same applies for authentication - a request is made, data is parsed, authentication takes place, session is created and sent back to the requester.
No matter how you sugar coat this - All requests are done this way. Yes you can employ async functionality which will shorten the time spent on some portions, yes you can try promises, yes you can employ clustering, yes you can employ forking/spawning, etc... The result is always the same at all times: port gets blocked.
I tried responding with a reference so that we can use sockets to pass the data back and matching it with the reference - that also blocked it.
The point of the matter is this: when you ask for help, everyone wants all sort of code examples, but never go to the task of helping with an actual answer that works. The whole wide world!!!!! Which leads me to the answer that nodeJS is not suitable for websites.
I read many requests regarding this and all have been met with: "you must code properly"..! Really? Is there no NodeJS skilled and experienced person who can lend an answer on this one????!!!!!
Most of the NodeJS coders come from the PHP side - All websites using PHP never have to utilise any workaround whatsoever in order to display a web page and it never blocks 2 people at the same time. Thanks to the web server.
So how come NodeJS community cannot come to some sort of an asnwer on this one other than: "code carefully"???!!!
They want examples - each example is met with: "well that is a blocking code, do it another way", or "code carefully"!!! Come one people!!!
Think of the following scenario: One User, One page with 4 lists of records. Theoretically all lists should be filled up with records independently. What is happening because of how data is requested, prepared and responded, each list in reality is waiting for the next one to finish. That is on one session alone.
What about 2 people, 2 sessions at the same time?
So my question is this: is NodeJS suitable for a website and if it is, can anyone show and prove this with a short code??? If you can't prove it, then the answer is: "NodeJS is not suitable for websites!"
Here is an example based on the simplest tutorial and it is still blocking:
var express = require('express'),
fs = require("fs");
var app = express();
app.get('/heavyload', function (req, res) {
var file = '/media/sudoworx/www/sudo-sails/test.zip';
res.send('Heavy Load');
res.end();
fs.readFile(file, 'utf8', function (err,fileContent) {
});
});
app.get('/lightload', function (req, res) {
var file = '/media/sudoworx/www/sudo-sails/test.zip';
res.send('Light Load');
res.end();
});
app.listen(1337, function () {
console.log('Listening!')
});
Now, if you go to "/heavyload" it will immediately respond because that is the first thing sent to the browser, and then nodejs proceeds reading a heavy file (a large file). If you now go to the second call "/lightload" at the same time, you will see that it is waiting for the loading of the file to finish from the first call before it proceeds with the browser output. This is the simplest example of how nodejs simply fails in handling what otherwise would be simple in php and similar script languages.
Like mentioned before, I tried as many as 20 different ways to do this in my career of nodejs programmer. I totally love nodejs, but I cannot get past this obstacle... This is not a complaint - it is a call for help because I am at my end road with nodejs and I don't know what to do.
I thank you kindly.
So here is what I found out. I will answer it with an example of a blocking code:
for (var k = 0; k < 15000; k++){
console.log('Something Output');
}
res.status(200).json({test:'Heavy Load'});
This will block because it has to do the for loop for a long time and then after it finished it will send the output.
Now if you do the same code like this it won't block:
function doStuff(){
for (var k = 0; k < 15000; k++){
console.log('Something Output');
}
}
doStuff();
res.status(200).json({test:'Heavy Load'});
Why? Because the functions are run asynchronously...! So how will I then send the resulting response to the requesting client? Currently I am doing it as follows:
Run the doStuff function
Send a unique call reference which is then received by the ajax call on the client side.
Put the callback function of the client side into a waiting object.
Listen on a socket
When the doStuff function is completed, it should issue a socket message with the resulting response together with the unique reference
When the socket on the client side gets the message with the unique reference and the resulting response, it will then match it with the waiting callback function and run it.
Done! A bit of a workaround (as mentioned before), but it's working! It does require a socket to be listening. And that is my solution to this port-blocking situation in NodeJS.
Is there some other way? I am hoping someone answers with another way, because I am still feeling like this is some workaround. Eish! ;)
is NodeJS suitable for a website
Definitely yes.
can anyone show and prove this with a short code
No.
All websites using PHP never have to utilise any workaround whatsoever
in order to display a web page and it never blocks 2 people at the
same time.
Node.js doesn't require any workarounds as well, it simply works without blocking.
To more broadly respond to your question/complaint:
A single node.js machine can easily serve as a web-server for a website and handle multiple sessions (millions actually) without any need for workarounds.
If you're coming from a PHP web-server, maybe instead of trying to migrate an existing code to a new Node website, first play with online simple website example of Node.js + express, if that works well, you can start adding code that require long-running processes like reading from DBs or reading/writing to files and verify that visitors aren't being blocked (they shouldn't be blocked).
See this tutorial on how to get started.
UPDATE FOR EXAMPLE CODE
To fix the supplied example code, you should convert your fs.readFile call to fs.createReadStream. readFile is less recommended for large files, I don't think readFile literally blocked anything, but the need to allocate and move large amounts of bytes may choke the server, createReadStream uses chunks instead which is much easier on the CPU and RAM:
rstream = fs.createReadStream(file);
var length = 0;
rstream.on('data', function (chunk) {
length += chunk.length;
// Do something with the chunk
});
rstream.on('end', function () { // done
console.log('file read! length = ' + length);
});
After switching your code to createReadStream I'm able to serve continues calls to heavyload / lightload in ~3ms each
THE BETTER ANSWER I SHOULD HAVE WRITTEN
Node.js is a single process architecture, but it has multi-process capabilities, using the cluster module, it allows you to write master/workers code that distributes the load across multiple workers on multiple processes.
You can also use pm2 to do the clustering for you, it has a built-in load balancer to distribute to work, and also allows for scaling up/down without downtime, see this.
In your case, while one process is reading/writing a large file, other processes can accept incoming requests and handle them.

Getting data out of a MongoDB call [duplicate]

This question already has answers here:
Why is my variable unaltered after I modify it inside of a function? - Asynchronous code reference
(7 answers)
Closed 5 years ago.
I am unable to retrieve data from my calls to MongoDB.
The calls work as I can display results to the console but when I try to write/copy those results to an external array so they are usable to my program, outside the call, I get nothing.
EVERY example that I have seen does all it's work within the connection loop. I cannot find any examples where results are copied to an array (either global or passed in), the connection ends, and the program continues processing the external array.
Most of the sample code out there is either too simplistic (ie. just console.log within the connection loop) or is way too complex, with examples of how to make express api routes. I don't need this as I am doing old fashioned serial batch processing.
I understand that Mongo is built to be asynchronous but I should still be able to work with it.
MongoClient.connect('mongodb://localhost:27017/Lessons', function (err, db) {
assert.equal(err, null);
console.log("Connectied to the 'Lessons' database");
var collection = db.collection('students');
collection.find().limit(10).toArray(function(err, docs) {
// console.log(docs);
array = docs.slice(0); //Cloning array
console.log(array);
db.close();
});
});
console.log('database is closed');
console.log(array);
It looks like I'm trying to log the data before the loop has finished. But how to synchronize the timing?
If somebody could explain this to me I'd be really grateful as I've been staring at this stuff for days and am really feeling stupid.
From the code you have shared, do you want the array to display in the console.log at the end? This will not work with your current setup as the 2 console.log's at the end will run before the query to your database is complete.
You should grab your results with a call back function. If your not sure what those are, you will need to learn as mongo / node use them everywhere. Basically javascript is designed to run really fast, and won't wait for something to finish before going to the next line of code.
This tutorial helped me a lot when I was first learning: https://zellwk.com/blog/crud-express-mongodb/
Could you let us know what environment you are running this mongo request? It would give more context because right now I'm not sure how you are using mongo.
thanks for the quick response.
Environment is Windows7 with an instance of mongod running in the background so I'm connecting to localhost. I'm using a db that I created but you can use any collection to run the sample code.
I did say I thought it was a timing thing. Your observation "the 2 console.log's at the end will run before the query to your database is complete" really clarified the problem for me.
I replaced the code at the end, after the connect() with the following code:
function waitasecond(){
console.log(array);
}
setTimeout(waitasecond, 2000);
And the array is fully populated. This suggests that what I was trying to do, at least the way I wanted to do, is not possible. I think I have two choices.
Sequential processing (as I originally concieved) - I would have to put some kind of time delay to let the db call finish before commencing.
Create a function with all the code for the processing that needs to be done and call it from inside the database callback when the database returns the data.
The first options is a bit smelly. I wouldn't want to see that in production so I guess I'll take the second option.
Thanks for the recommeded link. I did a quick look and the problem, for me, is that this is describing a very common pattern that relies on express listening for router calls to respond. The processing I'm doing has nothing to do with router calls.
Ah for the good old days of syncronous io.

Reasonable handling scheme for Node.JS async errors in unusual places

In Java, I am used to try..catch, with finally to cleanup unused resources.
In Node.JS, I don't have that ability.
Odd errors can occur for example the database could shut down at any moment, any single table or file could be missing, etc.
With nested calls to db.query(..., function(err, results){..., it becomes tedious to call if(err) {send500(res); return} every time, especially if I have to cleanup resources, for example db.end() would definitely be appropriate.
How can one write code that makes async catch and finally blocks both be included?
I am already aware of the ability to restart the process, but I would like to use that as a last-resort only.
A full answer to this is pretty in depth, but it's a combination of:
consistently handling the error positional argument in callback functions. Doubling down here should be your first course of action.
You will see #izs refer to this as "boilerplate" because you need a lot of this whether you are doing callbacks or promises or flow control libraries. There is no great way to totally avoid this in node due to the async nature. However, you can minimize it by using things like helper functions, connect middleware, etc. For example, I have a helper callback function I use whenever I make a DB query and intend to send the results back as JSON for an API response. That function knows how to handle errors, not found, and how to send the response, so that reduces my boilerplate substantially.
use process.on('uncaughtExcepton') as per #izs's blog post
use try/catch for the occasional synchronous API that throws exceptions. Rare but some libraries do this.
consider using domains. Domains will get you closer to the java paradigm but so far I don't see that much talk about them which leads me to expect they are not widely adopted yet in the node community.
consider using cluster. While not directly related, it generally goes hand in hand with this type of production robustness.
some libraries have top-level error events. For example, if you are using mongoose to talk to mongodb and the connection suddenly dies, the connection object will emit an error event
Here's an example. The use case is a REST/JSON API backed by a database.
//shared error handling for all your REST GET requests
function normalREST(res, error, result) {
if (error) {
log.error("DB query failed", error);
res.status(500).send(error);
return;
}
if (!result) {
res.status(404).send();
return;
}
res.send(result); //handles arrays or objects OK
}
//Here's a route handler for /users/:id
function getUser(req, res) {
db.User.findById(req.params.id, normalREST.bind(null, res));
}
And I think my takeaway is that overall in JavaScript itself, error handling is basically woefully inadequte. In the browser, you refresh the page and get on with your life. In node, it's worse because you're trying to write a robust and long-lived server process. There is a completely epic issue comment on github that goes into great detail how things are just fundamentally broken. I wouldn't get your hopes up of ever having JavaScript code you can point at and say "Look, Ma, state-of-the-art error handling". That said, in practice if you follow the points I listed above, empirically you can write programs that are robust enough for production.
See also The 4 Keys to 100% Uptime with node.js.

Is there any way to launch async operation in node js upon exit?

Node js module during the operation requests some resources on remote service, that better be released when it exits. We know that there is very nice:
process.on('exit', function() {
// ...
});
But then it is said that it won't wait for any async operations to complete. So the question is if there's any workaround (there should be some, since it's quite widespread usage case)? Maybe one could start separate process or something?..
Only workaround I've seen is adding a wait loop and not finishing/returning from the .on('exit', function until a property has been updated globally.
Totally a bodge-job design-wise, very bad practice, but I've seen it work for short calls (I think there is some timeout but I never bothered to look into the details).
I think you could/should do clean-up before on('exit') by listening for ctrl-c signal like in this post.

Resources