NodeJS - Print stack trace when stuck/frozen - node.js

Is it possible to print the stack trace of a nodejs app when it is becoming very slow or froze to get information about performance spikes?
This would be incredibly helpful in instances where the reproduction for the issue is unknown.
In Java this saved hundreds of hours and was straight forward:
spawn a new "watchdog" thread
send a heartbeat every 50ms from the main thread to the watchdog
if the "watchdog" doesn't receive a heartbeat for +200ms, log the main threads stacktrace
Is something like this possible with nodejs?
FI: the nodejs diagnostics report doesn't contain any javascript stack trace when initiated from a sig kill event.

You are looking for checking if event loop is blocked or slow. There is a npm package https://www.npmjs.com/package/blocked-at that detects slow synchronous execution and report where it started.
Usage:
const blocked = require('blocked-at');
blocked((time, stack) => {
console.log(`Blocked for ${time}ms, operation started here:`, stack)
});
from scratch you can implement yourself a check in this way:
var interval = 500;
var interval = setInterval(function() {
var last = process.hrtime();
setImmediate(function() {
var delta = process.hrtime(last);
if (delta > blockDelta) {
console.log("node.eventloop_blocked", delta);
}
});
}, interval);
The idea is: if the timer doesn't fire after the expected time, this mean that event loop was blocked in some operation.
This snippet check if event loop is blocked for more than 500 ms. Isn't perfect, I'm suggest to use blocked-at for more robust control.

Related

Amazon SQS with aws-sdk receiveMessage Stall

I'm using the aws-sdk node module with the (as far as I can tell) approved way to poll for messages.
Which basically sums up to:
sqs.receiveMessage({
QueueUrl: queueUrl,
MaxNumberOfMessages: 10,
WaitTimeSeconds: 20
}, function(err, data) {
if (err) {
logger.fatal('Error on Message Recieve');
logger.fatal(err);
} else {
// all good
if (undefined === data.Messages) {
logger.info('No Messages Object');
} else if (data.Messages.length > 0) {
logger.info('Messages Count: ' + data.Messages.length);
var delete_batch = new Array();
for (var x=0;x<data.Messages.length;x++) {
// process
receiveMessage(data.Messages[x]);
// flag to delete
var pck = new Array();
pck['Id'] = data.Messages[x].MessageId;
pck['ReceiptHandle'] = data.Messages[x].ReceiptHandle;
delete_batch.push(pck);
}
if (delete_batch.length > 0) {
logger.info('Calling Delete');
sqs.deleteMessageBatch({
Entries: delete_batch,
QueueUrl: queueUrl
}, function(err, data) {
if (err) {
logger.fatal('Failed to delete messages');
logger.fatal(err);
} else {
logger.debug('Deleted recieved ok');
}
});
}
} else {
logger.info('No Messages Count');
}
}
});
receiveMessage is my "do stuff with collected messages if I have enough collected messages" function
Occasionally, my script is stalling because I don't get a response for Amazon at all, say for example there are no messages in the queue to consume and instead of hitting the WaitTimeSeconds and sending a "no messages object", the callback isn't called.
(I'm writing this up to Amazon Weirdness)
What I'm asking is whats the best way to detect and deal with this, as I have some code in place to stop concurrent calls to receiveMessage.
The suggested answer here: Nodejs sqs queue processor also has code that prevents concurrent message request queries (granted it's only fetching one message a time)
I do have the whole thing wrapped in
var running = false;
runMonitorJob = setInterval(function() {
if (running) {
} else {
running = true;
// call SQS.receive
}
}, 500);
(With a running = false after the delete loop (not in it's callback))
My solution would be
watchdogTimeout = setTimeout(function() {
running = false;
}, 30000);
But surely this would leave a pile of floating sqs.receive's lurking about and thus much memory over time?
(This job runs all the time, and I left it running on Friday, it stalled Saturday morning and hung till I manually restarted the job this morning)
Edit: I have seen cases where it hangs for ~5 minutes and then suddenly gets messages BUT with a wait time of 20 seconds it should throw a "no messages" after 20 seconds. So a WatchDog of ~10 minutes might be more practical (depending on the rest of ones business logic)
Edit: Yes Long Polling is already configured Queue Side.
Edit: This is under (latest) v2.3.9 of aws-sdk and NodeJS v4.4.4
I've been chasing this (or a similar) issue for a few days now and here's what I've noticed:
The receiveMessage call does eventually return although only after 120 seconds
Concurrent calls to receiveMessage are serialised by the AWS.SDK library so making multiple calls in parallel have no effect.
The receiveMessage callback does not error - in fact after the 120 seconds have passed, it may contain messages.
What can be done about this? This sort of thing can happen for a number of reasons and some/many of these things can't necessarily be fixed. The answer is to run multiple services each calling receiveMessage and processing the messages as they come - SQS supports this. At any time, one of these services may hit this 120 second lag but the other services should be able to continue on as normal.
My particular problem is that I have some critical singleton services that can't afford 120 seconds of down time. For this I will look into either 1) use HTTP instead of SQS to push messages into my service or 2) spawn slave processes around each of the singletons to fetch the messages from SQS and push them into the service.
I also ran into this issue, but not when calling receiveMessage but sendMessage. I also saw hangups of exactly 120 seconds. I also saw it with a few other services, like Firehose.
That lead me to this line in the AWS SDK:
SQS Constructor
httpOptions:
timeout [Integer] — Sets the socket to timeout after timeout milliseconds of inactivity on the socket. Defaults to two minutes (120000).
to implement a fix, I override the timeout for my SQS client that performs the sendMessage to timeout after 10 seconds, and another with 25 seconds for receiving (where I long poll for 20 seconds):
var sendClient = new AWS.SQS({httpOptions:{timeout:10*1000}});
var receiveClient = new AWS.SQS({httpOptions:{timeout:25*1000}});
I've had this out in production for a week now and I've noticed that all of my SQS stalling issues have been eliminated.

How to forcibly keep a Node.js process from terminating?

TL;DR
What is the best way to forcibly keep a Node.js process running, i.e., keep its event loop from running empty and hence keeping the process from terminating? The best solution I could come up with was this:
const SOME_HUGE_INTERVAL = 1 << 30;
setInterval(() => {}, SOME_HUGE_INTERVAL);
Which will keep an interval running without causing too much disturbance if you keep the interval period long enough.
Is there a better way to do it?
Long version of the question
I have a Node.js script using Edge.js to register a callback function so that it can be called from inside a DLL in .NET. This function will be called 1 time per second, sending a simple sequence number that should be printed to the console.
The Edge.js part is fine, everything is working. My only problem is that my Node.js process executes its script and after that it runs out of events to process. With its event loop empty, it just terminates, ignoring the fact that it should've kept running to be able to receive callbacks from the DLL.
My Node.js script:
var
edge = require('edge');
var foo = edge.func({
assemblyFile: 'cs.dll',
typeName: 'cs.MyClass',
methodName: 'Foo'
});
// The callback function that will be called from C# code:
function callback(sequence) {
console.info('Sequence:', sequence);
}
// Register for a callback:
foo({ callback: callback }, true);
// My hack to keep the process alive:
setInterval(function() {}, 60000);
My C# code (the DLL):
public class MyClass
{
Func<object, Task<object>> Callback;
void Bar()
{
int sequence = 1;
while (true)
{
Callback(sequence++);
Thread.Sleep(1000);
}
}
public async Task<object> Foo(dynamic input)
{
// Receives the callback function that will be used:
Callback = (Func<object, Task<object>>)input.callback;
// Starts a new thread that will call back periodically:
(new Thread(Bar)).Start();
return new object { };
}
}
The only solution I could come up with was to register a timer with a long interval to call an empty function just to keep the scheduler busy and avoid getting the event loop empty so that the process keeps running forever.
Is there any way to do this better than I did? I.e., keep the process running without having to use this kind of "hack"?
The simplest, least intrusive solution
I honestly think my approach is the least intrusive one:
setInterval(() => {}, 1 << 30);
This will set a harmless interval that will fire approximately once every 12 days, effectively doing nothing, but keeping the process running.
Originally, my solution used Number.POSITIVE_INFINITY as the period, so the timer would actually never fire, but this behavior was recently changed by the API and now it doesn't accept anything greater than 2147483647 (i.e., 2 ** 31 - 1). See docs here and here.
Comments on other solutions
For reference, here are the other two answers given so far:
Joe's (deleted since then, but perfectly valid):
require('net').createServer().listen();
Will create a "bogus listener", as he called it. A minor downside is that we'd allocate a port just for that.
Jacob's:
process.stdin.resume();
Or the equivalent:
process.stdin.on("data", () => {});
Puts stdin into "old" mode, a deprecated feature that is still present in Node.js for compatibility with scripts written prior to Node.js v0.10 (reference).
I'd advise against it. Not only it's deprecated, it also unnecessarily messes with stdin.
Use "old" Streams mode to listen for a standard input that will never come:
// Start reading from stdin so we don't exit.
process.stdin.resume();
Here is IFFE based on the accepted answer:
(function keepProcessRunning() {
setTimeout(keepProcessRunning, 1 << 30);
})();
and here is conditional exit:
let flag = true;
(function keepProcessRunning() {
setTimeout(() => flag && keepProcessRunning(), 1000);
})();
You could use a setTimeout(function() {""},1000000000000000000); command to keep your script alive without overload.
spin up a nice repl, node would do the same if it didn't receive an exit code anyway:
import("repl").then(repl=>
repl.start({prompt:"\x1b[31m"+process.versions.node+": \x1b[0m"}));
I'll throw another hack into the mix. Here's how to do it with Promise:
new Promise(_ => null);
Throw that at the bottom of your .js file and it should run forever.

Node.js Synchronous Library Code Blocking Async Execution

Suppose you've got a 3rd-party library that's got a synchronous API. Naturally, attempting to use it in an async fashion yields undesirable results in the sense that you get blocked when trying to do multiple things in "parallel".
Are there any common patterns that allow us to use such libraries in an async fashion?
Consider the following example (using the async library from NPM for brevity):
var async = require('async');
function ts() {
return new Date().getTime();
}
var startTs = ts();
process.on('exit', function() {
console.log('Total Time: ~' + (ts() - startTs) + ' ms');
});
// This is a dummy function that simulates some 3rd-party synchronous code.
function vendorSyncCode() {
var future = ts() + 50; // ~50 ms in the future.
while(ts() <= future) {} // Spin to simulate blocking work.
}
// My code that handles the workload and uses `vendorSyncCode`.
function myTaskRunner(task, callback) {
// Do async stuff with `task`...
vendorSyncCode(task);
// Do more async stuff...
callback();
}
// Dummy workload.
var work = (function() {
var result = [];
for(var i = 0; i < 100; ++i) result.push(i);
return result;
})();
// Problem:
// -------
// The following two calls will take roughly the same amount of time to complete.
// In this case, ~6 seconds each.
async.each(work, myTaskRunner, function(err) {});
async.eachLimit(work, 10, myTaskRunner, function(err) {});
// Desired:
// --------
// The latter call with 10 "workers" should complete roughly an order of magnitude
// faster than the former.
Are fork/join or spawning worker processes manually my only options?
Yes, it is your only option.
If you need to use 50ms of cpu time to do something, and need to do it 10 times, then you'll need 500ms of cpu time to do it. If you want it to be done in less than 500ms of wall clock time, you need to use more cpus. That means multiple node instances (or a C++ addon that pushes the work out onto the thread pool). How to get multiple instances depends on your app strucuture, a child that you feed the work to using child_process.send() is one way, running multiple servers with cluster is another. Breaking up your server is another way. Say its an image store application, and mostly is fast to process requests, unless someone asks to convert an image into another format and that's cpu intensive. You could push the image processing portion into a different app, and access it through a REST API, leaving the main app server responsive.
If you aren't concerned that it takes 50ms of cpu to do the request, but instead you are concerned that you can't interleave handling of other requests with the processing of the cpu intensive request, then you could break the work up into small chunks, and schedule the next chunk with setInterval(). That's usually a horrid hack, though. Better to restructure the app.

How to call task CreateNew()/ ContinueWith() periodically?

I'm new to TPL and .net 4. I'm kind of stuck to implement the following multithread design.
What I want to do is to check serial ports and send out data then update the list box on UI(WPF), then wait for 1 sec and do that once again.
What I have done is:
//from WPF UI thread
var uiThreadTaskScheduler = TaskScheduler.FromCurrentSynchronizationContext();
var bgTask = Task<MonitorStatus>.Factory.StartNew(() =>
{
MonitorStatus status = new MonitorStatus();
//some time consuming job on serial ports
return status;
});
bgTask.ContinueWith(task =>
{
MonitorStatus status = task.Result;
//update the list box
}, uiThreadTaskScheduler);
What I love most is StartNew() and ContinueWith(): it starts to do some time consuming job in anther thread and come back to UI thread with task.Result and I can update the UI. No synchronization object explicitly!
But how can I keep this run again and again after a 1 sec time interval?
I want to re-run the whole thing at the last code in ContinueWith(), so that it never stops. but how?
Another solution in my mind is to use threading.timer, but its thread is running in other thread than UI thread, which is not as convenient as ContinueWith().

how to stop(or terminate ) MPI_Recv after some perticular time when there is deadlock in MPI?

I am trying to detect deadlocks in MPI
is there any method in which we can jump from function like MPI_Recv after particular time.
MPI_Recv is a blocking function and will just sit there untill it receives the data it is waiting for, so if you are looking to have it timeout and error if things lock up then I don't think that's the one for you.
You could look into using MPI_Irecv, which is the non-blocking version. You could then emulate the blocking behaviour of MPI_Recv using MPI_Wait or MPI_Test.
If you use a combination of MPI_Irecv and MPI_Test you could make a snippet that waits to recieve for a specified length of time, then errors if it hasn't. Rough example:
MPI_Irecv(..., &request); //start a receive request, non-blocking
time_t start_time = time(); //get start time
MPI_Test(&request, &gotData, ...); //test, have we got it yet
//loop until we have received, or taken too long
while (!gotData && difftime(time(),start_time) < TIMEOUT_TIME) {
//wait a bit.
MPI_Test(&request, &gotData, ...); //test again
}
//By now we either have received the data, or taken too long, so...
if (!gotData) {
//we must have timed out
MPI_Cancel(&request);
MPI_Request_free(&request);
//throw an error
}

Resources