Express 4 / Node JS - Gracefully managing uncaughtException - node.js

I try my very best to ensure that there are no errors in my code, but occasionally there is an uncaught exception that comes along and kills my app.
I could do with it not killing the app, but instead output it to a file somewhere, and try to resume the app where it left off - or restart quietly and show a nice message to all users on the application that something has gone wrong and to give it a sec while it sorts itself out.
In the event of the app not running, it'd be good if it could redirect it to somewhere that says "The app isn't running, get in touch to let me know" or something like that.
I could use process.on('uncaughtException') ... - but is this the right thing to do?
Thank you very much for taking the time to read this, and I appreciate your help and thoughts on this matter.

You can't actually resume after a crash, not at least without code written specifically for that purpose, like defining state and everything.
Otherwise use clusters to restart the app.
// ... your code ...
var cluster = require('cluster');
process.on('uncaughtException', function(err){
//.. do with `err` as you please
cluster.fork(); // start another instance of the app
});
When it forks, how does it affect the users - do they experience any latency while it's switching?
Clusters are usually used to keep running more than a single copy of your node app at all times, so that while one of the workers respawns, others are still active and preventing any latency.
if (cluster.isMaster)
require('os').cpus().forEach(cluster.fork);
cluster.on('exit', cluster.fork);
Is there anything that I should look out for, e.g. say there was an error connecting to the database and I hadn't put in a handler to deal with that, so the app kept on crashing - would it just keep trying to fork and hog all the system resources?
I've actually not thought about that concern before now. Sounds like a good concern.
Usually the errors are user instigated so it's not expected to cause such an issue.
Maybe database not connecting issue, and other such unrecoverable errors should be handled before the code actually goes into creating the forks.
mongoose.connection.on('open', function() {
// create forks here
});
mongoose.connection.on('error', function() {
// don't start the app if database isn't working..
});
Or maybe such errors should be identified and forks shouldn't be created. But you'll probably have to know in advance which errors could those be, so you could handle them.

Related

Redis publish memory leak?

I know that there is already many questions like this, but i don't find one that fits my implementation.
I'm using redis in a Node.js env, and it feels like redis.publish is leaking some memory. I expect it to be some kind of "backpressure" thing, like seen here:
Node redis publisher consuming too much memory
But to my understanding: Node needs to release that kind of pressure in a synchronous context, otherwise, the node event loop won't be called, and the GC won't be called either.
My program looks like that:
const websocketApi = new WebsocketApi()
const currentState = {}
websocketApi.connect()
websocketApi.on('open', () => {
channels.map((channel) => websocketApi.subscribeChannel(channel))
})
websocketApi.on('message', (message) => {
const ob = JSON.parse(message)
if (currentState[ob.id]) {
currentState[ob.id] = update(currentState[ob.id], ob.data)
} else {
currentState[ob.id] = ob.data
}
const payload = {
channel: ob.id,
info: currentState[ob.id],
timestamp: Date.now(),
type: 'newData'
}
// when i remove this part, the memory is stable
redisClient.publish(payload.channel, JSON.stringify(payload))
})
// to reconnect in case of error
websocketApi.on('close', () =>
websocketApi.connect())
It seems that the messages are too close from each other, so it doesn't have time to release the strings hold in the redis.publish.
Do you have any idea of what is wrong in this code ?
EDIT: More specifically, what I can observe when I do memory dumps of my application:
The memory is staturated with string that are my Stringified JSON payloads, and "chunks" of messages that are send via Redis itself. Their ref are hold inside the redis client manly in variables called chunk.
Some string payloads are still released, but I create them way faster.
When I don't publish the messages via Redis, the "currentState" variable grows until a point then don't grow anymore. It obviously has a big RAM impact, but it's expected. The rest is fine and the application is stable around 400mb, and it explodes whith the redis publisher (PM2 restarts it cause it reaches max RAM capacity)
My feeling here is that I ask redis to publish way more that it can handle, and redis doesn't have the time to finish to publish the messages. It still holds all the context, so it doesn't release anything. I may need some kind of "queue" to let redis release some context and finish publishing the messages. Is that really a possibility or am I becoming crazy ?
Basically, every loop in my program is "independent". Is it possible to have as many redis clients as I have got loops ? is it a better idea ? (IMHO, node is mono threaded, so it won't help, but it may help the V8 to better track down memory references and releasing memory)
The redis client buffers commands if the client is not connected either because it has not yet connected or its connection fails or it fails to connect.
Make sure that you can connect to the redis server. Make sure that your program is connected to the server. I would suggest adding a listener to redisClient.on('connect') if that is not emitted the client never connected.
If you are connected, the client shouldn't be buffering but to make the problem appear sooner disable the offline queue, pass the option enable_offline_queue: false to createClient this will cause attempts to send commands when not connected fail.
You should attach an error listener to the redisClient: redisClient.on('error', console.error.bind(console)). This might yield a message as to why the client is buffering.

What is the "right" way to deal with EPIPE and other socket errors in Node.js?

The only way I have found to "catch" EPIPE errors thrown asynchronously by a socket timing out or closing prematurely is to directly attach an event handler to the socket object itself, as demonstrated in the documentation here:
https://nodejs.org/api/errors.html
const net = require('net');
const connection = net.connect('localhost');
// Adding an 'error' event handler to a stream:
connection.on('error', (err) => {
// If the connection is reset by the server, or if it can't
// connect at all, or on any sort of error encountered by
// the connection, the error will be sent here.
console.error(err);
});
This works, but is in many cases unhelpful -- if you're accessing a database or another service that has a node driver, the request and socket objects are likely inaccessible from your app code.
The most obvious solution is "don't do things that generate these errors" but since any non-trivial application is dependent on other services, no amount of input-checking in advance can guarantee that the service on the other end won't hang up unexpectedly, throwing an EPIPE in your code and in all likelihood crashing Node.
So, the options for handling this situation seem to be:
Let the error crash your app and use nodemon or supervisor to automatically restart. This isn't clean, but it seems like the only way to really guarantee you'll get back up and running safely.
Write custom connection clients for dependent services. This let's you attach error handlers where known problems could occur. But it violates DRY and means that you're now on the hook for maintaining your own custom client code when otherwise reasonable open source solutions already exist. Basically, it adds a huge maintenance burden for a slightly cleaner solution to a fairly rare problem.
Am I missing something, or are those the best options available?

Async profiling nodejs server to review the code?

We encountered performance problem on our nodejs server holding 100k ip everyday.
Now we want to review the code and find the bottle-neck.
#jfriend00 from what we can see now, the problem seems to be DB access and file access. But we don't know what logic caused this access.
We are still looking for good ways to do the async profiling of nodejs server.
Here's what we tried
Nodetime
This works for us to some extent. It can give the executing time of code specified to the lines. However, we can't locate the error because the server works async and no stacking and calling info can be determined.
Async-profiling
This works with async and is said to be the first of this kind.
Problem is, we've integrated it's js code with our server-side code.
var AsyncProfile = require('async-profile')
AsyncProfile.profile(function () {
///// OUR SERVER-SIDE CODE RESIDES HERE
setTimeout(function () {
// doAsyncStuff
});
});
We can only record the profile of one time of server execution for one request. Can we use this code with things like forever? I've no idea with this.
dtrace
This is too general for us to locate problem in nodejs code.
Do you have any idea on profiling nodejs server code? Any hints or suggestions are appreciated. Thanks.

Shutting down a Node.js http server in a unit test

Supposed I have some unit tests that test a web server. For reasons I don't want to discuss here (outside scope ;-)), every test needs a newly started server.
As long as I don't send a request to the server, everything is fine. But once I do, a call to the http server's close function does not work as expected, as all made requests result in kept-alive connections, hence the server waits for 120 seconds before actually closing.
Of course this is not acceptable for running the tests.
At the moment, the only solutions I'd see was either
setting the keep-alive timeout to 0, so a call to close will actually close the server,
or to start each server on a different port, although this becomes hard to handle when you have lots of tests.
Any other ideas of how to deal with this situation?
PS: I had a asked How do I shutdown a Node.js http(s) server immediately? a while ago, and found a viable way to work around it, but as it seems this workaround does not run reliably in every case, as I am getting strange results from time to time.
function createOneRequestServer() {
var server = http.createServer(function (req, res) {
res.write('write stuff');
res.end();
server.close();
}).listen(8080);
}
You could also consider using process to fork processes and kill them after you have tested on that process.
var child = fork('serverModuleYouWishToTest.js');
function callback(signalCode) {
child.kill(signalCode);
}
runYourTest(callback);
This method is desirable because it does not require you to write special cases of your servers to service only one request, and keeps your test code and your production code 100% independant.

How do I prevent node.js from crashing? try-catch doesn't work

From my experience, a php server would throw an exception to the log or to the server end, but node.js just simply crashes. Surrounding my code with a try-catch doesn't work either since everything is done asynchronously. I would like to know what does everyone else do in their production servers.
PM2
First of all, I would highly recommend installing PM2 for Node.js. PM2 is really great at handling crash and monitoring Node apps as well as load balancing. PM2 immediately starts the Node app whenever it crashes, stops for any reason or even when server restarts. So, if someday even after managing our code, app crashes, PM2 can restart it immediately. For more info, Installing and Running PM2
Other answers are really insane as you can read at Node's own documents at http://nodejs.org/docs/latest/api/process.html#process_event_uncaughtexception
If someone is using other stated answers read Node Docs:
Note that uncaughtException is a very crude mechanism for exception handling and may be removed in the future
Now coming back to our solution to preventing the app itself from crashing.
So after going through I finally came up with what Node document itself suggests:
Don't use uncaughtException, use domains with cluster instead. If you do use uncaughtException, restart your application after every unhandled exception!
DOMAIN with Cluster
What we actually do is send an error response to the request that triggered the error, while letting the others finish in their normal time, and stop listening for new requests in that worker.
In this way, domain usage goes hand-in-hand with the cluster module, since the master process can fork a new worker when a worker encounters an error. See the code below to understand what I mean
By using Domain, and the resilience of separating our program into multiple worker processes using Cluster, we can react more appropriately, and handle errors with much greater safety.
var cluster = require('cluster');
var PORT = +process.env.PORT || 1337;
if(cluster.isMaster)
{
cluster.fork();
cluster.fork();
cluster.on('disconnect', function(worker)
{
console.error('disconnect!');
cluster.fork();
});
}
else
{
var domain = require('domain');
var server = require('http').createServer(function(req, res)
{
var d = domain.create();
d.on('error', function(er)
{
//something unexpected occurred
console.error('error', er.stack);
try
{
//make sure we close down within 30 seconds
var killtimer = setTimeout(function()
{
process.exit(1);
}, 30000);
// But don't keep the process open just for that!
killtimer.unref();
//stop taking new requests.
server.close();
//Let the master know we're dead. This will trigger a
//'disconnect' in the cluster master, and then it will fork
//a new worker.
cluster.worker.disconnect();
//send an error to the request that triggered the problem
res.statusCode = 500;
res.setHeader('content-type', 'text/plain');
res.end('Oops, there was a problem!\n');
}
catch (er2)
{
//oh well, not much we can do at this point.
console.error('Error sending 500!', er2.stack);
}
});
//Because req and res were created before this domain existed,
//we need to explicitly add them.
d.add(req);
d.add(res);
//Now run the handler function in the domain.
d.run(function()
{
//You'd put your fancy application logic here.
handleRequest(req, res);
});
});
server.listen(PORT);
}
Though Domain is pending deprecation and will be removed as the new replacement comes as stated in Node's Documentation
This module is pending deprecation. Once a replacement API has been finalized, this module will be fully deprecated. Users who absolutely must have the functionality that domains provide may rely on it for the time being but should expect to have to migrate to a different solution in the future.
But until the new replacement is not introduced, Domain with Cluster is the only good solution what Node Documentation suggests.
For in-depth understanding Domain and Cluster read
https://nodejs.org/api/domain.html#domain_domain (Stability: 0 - Deprecated)
https://nodejs.org/api/cluster.html
Thanks to #Stanley Luo for sharing us this wonderful in-depth explanation on Cluster and Domains
Cluster & Domains
I put this code right under my require statements and global declarations:
process.on('uncaughtException', function (err) {
console.error(err);
console.log("Node NOT Exiting...");
});
works for me. the only thing i don't like about it is I don't get as much info as I would if I just let the thing crash.
As mentioned here you'll find error.stack provides a more complete error message such as the line number that caused the error:
process.on('uncaughtException', function (error) {
console.log(error.stack);
});
Try supervisor
npm install supervisor
supervisor app.js
Or you can install forever instead.
All this will do is recover your server when it crashes by restarting it.
forever can be used within the code to gracefully recover any processes that crash.
The forever docs have solid information on exit/error handling programmatically.
Using try-catch may solve the uncaught errors, but in some complex situations, it won't do the job right such as catching async function. Remember that in Node, any async function calls can contain a potential app crashing operation.
Using uncaughtException is a workaround but it is recognized as inefficient and is likely to be removed in the future versions of Node, so don't count on it.
Ideal solution is to use domain: http://nodejs.org/api/domain.html
To make sure your app is up and running even your server crashed, use the following steps:
use node cluster to fork multiple process per core. So if one process died, another process will be auto boot up. Check out: http://nodejs.org/api/cluster.html
use domain to catch async operation instead of using try-catch or uncaught. I'm not saying that try-catch or uncaught is bad thought!
use forever/supervisor to monitor your services
add daemon to run your node app: http://upstart.ubuntu.com
hope this helps!
Give a try to pm2 node module it is far consistent and has great documentation. Production process manager for Node.js apps with a built-in load balancer. please avoid uncaughtException for this problem.
https://github.com/Unitech/pm2
Works great on restify:
server.on('uncaughtException', function (req, res, route, err) {
log.info('******* Begin Error *******\n%s\n*******\n%s\n******* End Error *******', route, err.stack);
if (!res.headersSent) {
return res.send(500, {ok: false});
}
res.write('\n');
res.end();
});
By default, Node.js handles such exceptions by printing the stack trace to stderr and exiting with code 1, overriding any previously set process.exitCode.
know more
process.on('uncaughtException', (err, origin) => {
console.log(err);
});
UncaughtException is "a very crude mechanism" (so true) and domains are deprecated now. However, we still need some mechanism to catch errors around (logical) domains. The library:
https://github.com/vacuumlabs/yacol
can help you do this. With a little of extra writing you can have nice domain semantics all around your code!

Resources