I have an application which writes to a datastore, so I attempt to send an HTTP request out to the datastore (hbase + stargate). ETIMEDOUT exception, which kills the process.
I have .on('error') on every socket connection present or at least seemingly present, including requests and responses. I even took an extreme step and make a change to the source code, which is supposed to "ignore" those errors in the third post:
http://comments.gmane.org/gmane.comp.lang.javascript.nodejs/25283
I even have a
process.on('uncaughtException', function(){})
All of this still to no avail, and my processes keep dying. Potentially losing everything that built up in the ZMQ stream queue.
The weirdest part yet is that one server out of the 4 server cluster, behaves just fine.
I had a like-issue with our datastore that relied on HTTP requests.
How are you sending the "HTTP request out"? Is it with a library? Have you tried putting a timeout limit on the HTTP request to avoid the ETIMEDOUT exception? Although this does not address the main issue, it will give you the ability to catch the timeout by throwing your own controlled exception.
Related
I have looked at the request trace for several requests that resulted in the same outcome.
What will happen is I'll get a HttpModule="iisnode", Notification="EXECUTE_REQUEST_HANDLER", HttpStatus=500, HttpReason="Internal Server Error", HttpSubstatus=1013, ErrorCode="The pipe has been ended. (0x6d)"
This is a production API. Fewer than 1% of requests get this result but it's not the requests themselves - I can reissue the same request and it'll work.
I log telemetry for every API request - basics on the way in, things like http status and execution time as the response is on its way out.
None of the requests that get this error are in telemetry which makes me think something is happening somewhere between IIS and iisnode.
If anyone has resolved this or has solid thoughts on how to pin down what the root issue is I'd appreciate it.
Well for me, what's described here covered the bulk of the issue: github.com/Azure/iisnode/issues/57 Setting keepAliveTimeout to 0 on the express server reduced the 500s significantly.
Once the majority of the "noise" was eliminated it was much easier to correlate the remaining 500s that would occur to things I could see in my logs. For example, I'm using a 3rd party node package to resize images and a couple of the "images" that were loaded into the system weren't images. Instead of gracefully throwing an exception, the package seems to exit out of the running node process. True story. So on Azure, it would get restarted, but while that was happening requests would get a 500 internal server error.
Encountered a very weird issue.
I have two VMs, running CentOS Linux.
Server side has a REST API (Using none-Poco socket), and one of the API is to response a POST.
On the client side, use POCO library to call the REST.
If the returned message is long, it will be truncated at 176 k, or 240 k, or 288 k.
Same code, same environment, running on server side, Good.
On the client VM, use python to do the REST call, Good.
ONLY failed if I use the same good code, on client VM.
When msg got truncated, the https status code always return 200
On the server side, I logged the response message that I sent every time. Everything looks normal.
I have tried whole bunch of things, like:
set the socket timeout and receiveResponse timeout to an hour
wait for 2 seconds after I send the request but before I call the receive
Set the receive buffer big enough
Try whole bunch of approach to make sure receive stream is empty, no more data
It just does not work.
Anyone have similar issue? I started pulling my hair.... Please talk to me, anything... before I am bold.
I have this script where I'm taking a large dataset and calling a remote api, using request-promise, using a post method. If I do this individually, the request works just fine. However, if I loop through a sample set of 200-records using forEach and async/await, only about 6-15 of the requests come back with a status of 200, the others are returning with a 500 error.
I've worked with the owner of the API, and their logs only show the 200-requests. So I don't think node is actually sending out the ones that come back as 500.
Has anyone run into this, and/or know how I can get around this?
To my knowledge, there's no code in node.js that automatically makes a 500 http response for you. Those 500 responses are apparently coming from the target server's network. You could look at a network trace on your server machine to see for sure.
If they are not in the target server logs, then it's probably coming from some defense mechanism deployed in front of their server to stop misuse or overuse of their server (such as rate limiting from one source) and/or to protect its ability to respond to a meaningful number of requests (proxy, firewall, load balancer, etc...). It could even be part of a configuration in the hosting facility.
You will likely need to find out how many simultaneous requests the target server will accept without error and then modify your code to never send more than that number of requests at once. They could also be measuring requests/sec to it might not only be an in-flight count, but could be the rate at which requests are sent.
Currently, I am working on a REST API using the Node hapijs framework. The API is deployed on Heroku.
There is a GET endpoint in the API that makes a get request to retrieve data from a third party and processes the data before sending a reply. This particular endpoint times out from time to time. When the endpoint times out, Heroku returns an H12 error. Once it has timed out, subsequent requests to that endpoint result in the H12 errors. I have to restart the application on Heroku to get that endpoint working again. No other endpoints in the API are affected in any way by this error and continue to work just fine even after the error has ocurred.
In my debugging process and looking through the logs, it seems that there are times when a response is not returned from the third party API, causing the error.
I've tried the following solutions to try and solve the problem:
I am using the request library to make requests. Hence, I've tried setting a timeout to 5000 ms as part of the options passed in to the request. It has worked at times... the timeout is triggered and the endpoint sends the timeout error associated with request. This is the kind of behavior that I would like, since subsequent requests to the endpoint work. However, there are times when the request timeout is not triggered but Heroku still returns an H12 error (always after 30 seconds, the Heroku default). After that, subsequent requests to that endpoint return the H12 error (also after 30 seconds). It seems that some sort of process gets "stuck" on Heroku and is not terminated until I restart the app.
I've tried adding a timeout to the hapi.js route config object. I get the same results as above.
I've continued doing research and suspect that the issues has to do with the description given here and here. It seems like setting a timeout at the app server level that can send a SIGKILL to the Heroku worker might do the trick. It seems fairly straightforward in Ruby but I cannot find much information on how to do this in Node.
Any insight is greatly appreciated. I am aware that a timeout might occur when making a request to a third party. That is not the issue. The issue is that the endpoint seems to get "stuck" on Heroku after a timeout and it becomes unresponsive.
Thanks for the help!
I had a similar issue and after giving up for a day, I came back to it and found my error. I wasn't sending a response to the client when an error occurred on the server side. Make sure you are returning a response no matter what the result of your server side algorithm is. If there is an error, return that. If the request was successful, return that response. I hope that helps.
If that doesn't help, check heroku's guides on handling Request Timeouts, especially the Debugging request timeouts section could be of help:
We have a C# Web API server and a Node Express server. We make hundreds of requests from the C# server to a route on the Node server. The route on the Node server does intensive work and often doesn't return for 6-8 seconds.
Making hundreds of these requests simultaneously seems to cause the Node server to fail. Errors in the Node server output include either socket hang up or ECONNRESET. The error from the C# side says
No connection could be made because the target machine actively refused it.
This error occurs after processing an unpredictable number of the requests, which leads me to think it is simply overloading the server. Using a Thread.Sleep(500) on the C# side allows us to get through more requests, and fiddling with the wait there leads to more or less success, but thread sleeping is rarely if ever the right answer, and I think this case is no exception.
Are we simply putting too much stress on the Node server? Can this only be solved with Load Balancing or some form of clustering? If there is an another alternative, what might it look like?
One path I'm starting to explore is the node-toobusy module. If I return a 503 though, what should be the process in the following code? Should I Thread.Sleep and then re-submit the request?
It sounds like your node.js server is getting overloaded.
The route on the Node server does intensive work and often doesn't return for 6-8 seconds.
This is a bad smell - if your node process is doing intense computation, it will halt the event loop until that computation is completed, and won't be able to handle any other requests. You should probably have it doing that computation in a worker process, which will run on another cpu core if available. cluster is the node builtin module that lets you do that, so I'll point you there.
One path I'm starting to explore is the node-toobusy module. If I return a 503 though, what should be the process in the following code? Should I Thread.Sleep and then re-submit the request?
That depends on your application and your expected load. You may want to refresh once or twice if it's likely that things will cool down enough during that time, but for your API you probably just want to return a 503 in C# too - better to let the client know the server's too busy and let them make their own decision then to keep refreshing on its behalf.