How to debug mystery ENOTFOUND? - node.js

I am getting random restarts on a PM2 managed nodejs cluster. The only symptom I get on the error log is of the following pattern - an ENOTFOUND on dns.js.
Error: getaddrinfo ENOTFOUND walkinto.inhttp walkinto.inhttp:80
at errnoException (dns.js:28:10)
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:76:26)
Clearly the problem is a malformed server name - walkinto.inhttp is incorrect and it should be walkinto.in . The challenge is that this is not a host name hard coded in the code. There are many places in this fairly large code base that makes name resolution and it is of dynamic nature.
I have spent considerable time to pinpoint the root cause but so far have had no luck. I need help to print more log information from dns.js; probably a call stack 'may' would help to move forward.
Q1 : How to enable more detailed logging on nodejs core modules?
Q2 : What could cause a nodejs restart to happen for an ENOTFOUND? How to avoid a restart - This path is not desirable.
Q3: Are there any other smarter way to trouble shoot this problem?

Since there's no way for us to help you solve the issue without some code to go on, I'll answer your questions:
How to enable more detailed logging on nodejs core modules?
Run node with the inspect option and attach to the debugger with Chrome DevTools or another application. See these links:
https://nodejs.org/api/debugger.html
https://nodejs.org/en/docs/guides/debugging-getting-started/
What could cause a nodejs restart to happen for an ENOTFOUND? How to avoid a restart - This path is not desirable.
The Node runtime isn't restarting. The error you're seeing is generated from something similar to throw new Error(`getaddrinfo ${err}`), and any uncaught error from throw will crash the runtime.
The restart is happening because you run the app via PM2, and can be disabled by passing the --no-autorestart option to PM2. If you want to avoid the application from crashing, you should wrap whatever code that this could be generated from in a try/catch-block, and try to recover from the error.
Are there any other smarter way to trouble shoot this problem?
This is most likely not an issue with the dns stdlib module. If I understand correctly, you are performing name resolutions on dynamically generated data, and that is most likely your issue. Somewhere in the code you have one or more functions that are either not validating the generated data or are generating invalid data due to a bug. We can't help you solve that unfortunately, since you haven't provided any code to go on. Would be great if you could try to pinpoint what code might cause this and update the question with it.

I was getting this error in my request that was something like this:
var optionsSearch = {
host: 'https://mysite.sharepoint.com',
path: '_api/search/query?querytext="sharepoint"',
method: 'GET'
};
All did was removing the https:// leaving only mysite.sharepoint.com and it was fixed.

Related

Test execution randomly aborted by an issue originated in the request-pipeline of Hammerhead (Testcafe e2e tests)

thank you for looking into this!
We are running a quite comprehensive testsuite (some hunderds of test) with the goal to make sure that our tracking implemantation works as expected. We are executing this tests via CI 4x a day. Since a few weeks we have random test aborts, which are unfortunately extremely hard to track and reproduce.
What is the Current behavior?
Errors: Unhandled promise rejection:
Error [ERR_HTTP2_INVALID_SESSION]: The session has been destroyed
at new NodeError (node:internal/errors:371:5)
at ClientHttp2Session.request (node:internal/http2/core:1702:13)
at DestinationRequest._sendRealThroughHttp2 (/home/ec2-user/actions-runner/_work/ds_cerberus/ds_cerberus/node_modules/testcafe-hammerhead/lib/request-pipeline/destination-request/index.js:51:32)
at DestinationRequest._send (/home/ec2-user/actions-runner/_work/ds_cerberus/ds_cerberus/node_modules/testcafe-hammerhead/lib/request-pipeline/destination-request/index.js:110:18)
at runMicrotasks (:null:null)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
For us it looks like a racing condition inside the hammerhead-lib which is used as a proxy for testcafe. But we have no idea how to fix this - or least make sure that not the whole suite of tests is aborted by it.
This is the stacktrace if the tests are aborted. Unfortunately this means that the execution of all tests is aborted and does not only affect a single test, which renders the whole suite for us pretty useless...
Steps to Reproduce
There seems to be more aborts if the tests are executed in the suite and not solely - but even there it is quite hard to identify a pattern.
TestCafe version
"testcafe": "^1.17.1",
Node.js version
node-version: '16.x'
Command-line arguments
testcafe --config-file .testcaferc-dev.json tests
The issue may be related to HTTP/2 requests. So, you can disable it using the following option: disableHttp2. Check if the issue is reproduced after that.
You can also try to increase timeouts, e.g. ajaxRequestTimeout and testExecutionTimeout.
If this does not help, please create a simple project where the issue is reproducible and share it here. We will research it on our side.

nodejs server gives bad request(400) errors. Why ?

I am using nodejs(0.12) and express (3.1.0).
My server keeps on running perfectly for some times but after some times, it start giving 400 (bad requests) and keep it giving 400 for next all requests
message: "Error: Bad Request
at SendStream.error (/var/www/storehippo/node_modules/express/node_modules/send/lib/send.js:145:16)
at SendStream.pipe (/var/www/storehippo/node_modules/express/node_modules/send/lib/send.js:298:31)
at Object.static (/var/www/storehippo/node_modules/express/node_modules/connect/lib/middleware/static.js:83:8)
at Object.handle (eval at eval at wrapHandle (/var/www/storehippo/node_modules/newrelic/lib/instrumentation/connect.js:1:0))
at /var/www/storehippo/node_modules/express/node_modules/connect/lib/proto.js:199:15
at /var/www/storehippo/node_modules/newrelic/lib/transaction/tracer/index.js:157:28
at Object.<anonymous> (/var/www/storehippo/dist/dist_17-09-2016_10:20:03/app/index.js:252:5)
at Object.handle (eval at eval at wrapHandle (/var/www/storehippo/node_modules/newrelic/lib/instrumentation/connect.js:1:0))
at /var/www/storehippo/node_modules/express/node_modules/connect/lib/proto.js:199:15
at /var/www/storehippo/node_modules/newrelic/lib/transaction/tracer/index.js:157:28"
For fixing it, I have to restart my server and not able to find root cause of it.
How to find root cause and resolve it ?
First, you are using Node 0.12 - currently the LTS version (recommended for all uses) is 4.5.0 and the Current version is 6.6.0 (6.x will become LTS next month). You may consider upgrading Node because you are using a very outdated version. The maintainence period of 0.12 will end in few months and then it will no longer get any updates, see: https://github.com/nodejs/LTS#lts_schedule
The Express module you use is also very outdated. The latest 3.x is I think 3.21.2 and the current version of Express is 4.14.0.
Now, if you want to find the problem then you should probably start from looking at the line 252 of /var/www/storehippo/dist/dist_17-09-2016_10:20:03/app/index.js because that seems to be the only line of your own code in that stack trace. The other lines seem to be all external modules, but it's also possible that the problem lies with one of those modules.
There may be a lot of reasons why your server behaves fine and then starts to misbehave - you may have some memory leak, some resources that are not freed and get exhausted after some time, you may change some state in your application that causes other requests to fail, etc.
Unfortunately you didn't include any info that would make it possible to help you with finding the problem.

Random 'ECONNABORTED' error when using sendFile in Express/Node

I have set a node server with Express middleware. I get the ECONNABORTED error randomly on some files when loading an HTML file which triggers about 10 other loads (js, css, etc.). The exact error is:
{ [Error: Request aborted] code: 'ECONNABORTED' }
Generated by this simplified code (after I tried to debug the issue):
res.sendFile(res.locals.physicalUrl,function (err) {
if (err)
console.log(err);
...
}
Many posts talk about this error resulting from not specifying the full path name. That is not the situation here. I do specify the full path and indeed the error is randomly generated. There are times when the page and all its subsequent links load perfectly and there are times when they do not. I tried to flush the cache and did not find any pattern to connect it with this.
This specific error appears to be a a generic term for socket connection getting aborted and is discussed in the context of other applications like FTP.
Having realized that the node worker threads can be increased, I tried to do so using:
process.env.UV_THREADPOOL_SIZE = 20;
However, my understanding is that even absent this, at most the file transfer may have to wait for a worker thread to be free and not get aborted. I am not talking about big files here, all files are less than 1 MB.
I have a gut feeling that this has nothing to do with node directly.
Please point to any other possibilities (node or otherwise) to handle this error. Also, any other indirect solutions? Retrying a few times could be one but that would be clumsy. EDIT: No, I cannot retry. Headers are already sent with the error!
A SIDE NOTE:
Many examples on the use of sendFile skip using the callback thereby giving the impression that it is a synchronous call. It is not. Do use the callback at all times, check for success and only then move on to the "next" middleware or take appropriate steps if the send fails for whatever reason. Not doing so can make it difficult to debug the consequences in an asynchronous environment.
See https://stackoverflow.com/a/36949631/2798152
Could it be possible that in some cases you terminate the connection by calling res.end before the asynchronous call to res.sendFile ends?
If that's not the case - can you pastebin more of your application code?
Uninstalling and Re-installing MongoDB solved this for me.
I was facing the same problem. It started happening when I had to force restart my laptop because it became unresponsive. On restarting, trying to connect to mongo server using nodejs, always threw ECONNABORTED error

Meteor: “Failed to receive keepalive! Exiting.”

I'm working on a project which uses Npm request package for making request to an API server. On getting response, the callback processes the returned response. During this response processing I get the error: Failed to receive keepalive! Exiting. The following code will help you understand.
request({url: 'http://api-link-from-where-data-is-to-be-fetched'
},
function (err,res,body) {
//The code for processing response
}
Anybody can help me please who knows how to resolve this issue?
This might help answer this for you:
https://github.com/meteor/meteor/issues/1302
The last post on that page says:
Note that this is just a behavior of the develop-mode meteor run (and any hosting environment that chooses to turn on the keepalive option, which probably isn't most of them), not a production issue. And in any case, if your Node process is churning CPU for seconds, it's not going to be able to respond to any network traffic.
this post might help you : Meteor error message: "Failed to receive keepalive! Exiting."
Removing autopublish with meteor remove autopublish and then writing my own publish and subscribe functions fixed the problem.

Node Exception Handling

What is the best way in node to handle unhandled expections that are coming out of core node code? I have a background process that runs and crawls web content and will run for long periods of time without issue, but every so often an unexpected exception occurs and I can't seem to gracefully handle it. The usual culprit appears to be some networking issue (lost connectivity) where the http calls I'm making fail. All of the functions that I have created follow the pattern of FUNCTION_NAME(error, returned_data), but in the situations where the error occurs I'm not seeing any of the functions I created in the call stack that is printed out, instead its showing some of the core node modules. I'm not really worried about these infrequent errors and their root cause, the purpose of this posting is just trying to find a graceful way of handling these exceptions.
I've tried putting a try/catch at the top level of my code where everything runs under but it doesn't seem to capture these exceptions. Is it good practice in node to use try/catch within all the lower level functions that use any core code? Or is there some way to globally capture all unhandled exceptions?
Thanks
Chris
UPDATED TO ADD STACK
node.js:201
throw e; // process.nextTick error, or 'error' event on first tick
^
Error: connect Unknown system errno 10060
at errnoException (net.js:642:11)
at Object.afterConnect [as oncomplete] (net.js:633:18)
is there some way to globally capture all unhandled exceptions?
You can catch all exceptions using process.on('uncaughtException'). Listening to this event will avoid the default action of printing the stack and exiting. However be conscious that ignoring exceptions may lead to problems in your app execution.
Link: http://nodejs.org/docs/latest/api/process.html#process_event_uncaughtexception
Pay attention to the documentation advice:
Note that uncaughtException is a very crude mechanism for exception handling. Using try / catch in your program will give you more control over your program's flow. Especially for server programs that are designed to stay running forever, uncaughtException can be a useful safety mechanism.
To catch network errors and avoid the default behavior (printing stack and exit) you have to listen to "error" events.
For example
var net = require('net');
var client = net.connect(80, 'invalid.host', function () {
console.log("Worked");
})
client.on('error', console.log);
I wrote about this recently at http://snmaynard.com/2012/12/21/node-error-handling/. A new feature of node in version 0.8 is domains and allow you to combine all the forms of error handling into one easier manage form. You can read about them in my post and in the docs.
You can use domains to handle callback error arguments, error event emitters and exceptions all in one place. The problem in this specific case is that when you dont handle an error event emitter, node by default will print the stack trace and exit the app.
I've put together a quick error handling file which logs and emails me whenever an unhandled exception is thrown. it then (optionally) tries to restart the server.
Check it out!

Resources