ForEach Bulk Post Requests are failing

ForEach Bulk Post Requests are failing - node.js

I have this script where I'm taking a large dataset and calling a remote api, using request-promise, using a post method. If I do this individually, the request works just fine. However, if I loop through a sample set of 200-records using forEach and async/await, only about 6-15 of the requests come back with a status of 200, the others are returning with a 500 error.
I've worked with the owner of the API, and their logs only show the 200-requests. So I don't think node is actually sending out the ones that come back as 500.
Has anyone run into this, and/or know how I can get around this?

To my knowledge, there's no code in node.js that automatically makes a 500 http response for you. Those 500 responses are apparently coming from the target server's network. You could look at a network trace on your server machine to see for sure.
If they are not in the target server logs, then it's probably coming from some defense mechanism deployed in front of their server to stop misuse or overuse of their server (such as rate limiting from one source) and/or to protect its ability to respond to a meaningful number of requests (proxy, firewall, load balancer, etc...). It could even be part of a configuration in the hosting facility.
You will likely need to find out how many simultaneous requests the target server will accept without error and then modify your code to never send more than that number of requests at once. They could also be measuring requests/sec to it might not only be an in-flight count, but could be the rate at which requests are sent.

Related

REST API In Node Deployed as Azure App Service 500 Internal Server Errors

I have looked at the request trace for several requests that resulted in the same outcome.
What will happen is I'll get a HttpModule="iisnode", Notification="EXECUTE_REQUEST_HANDLER", HttpStatus=500, HttpReason="Internal Server Error", HttpSubstatus=1013, ErrorCode="The pipe has been ended. (0x6d)"
This is a production API. Fewer than 1% of requests get this result but it's not the requests themselves - I can reissue the same request and it'll work.
I log telemetry for every API request - basics on the way in, things like http status and execution time as the response is on its way out.
None of the requests that get this error are in telemetry which makes me think something is happening somewhere between IIS and iisnode.
If anyone has resolved this or has solid thoughts on how to pin down what the root issue is I'd appreciate it.

Well for me, what's described here covered the bulk of the issue: github.com/Azure/iisnode/issues/57 Setting keepAliveTimeout to 0 on the express server reduced the 500s significantly.
Once the majority of the "noise" was eliminated it was much easier to correlate the remaining 500s that would occur to things I could see in my logs. For example, I'm using a 3rd party node package to resize images and a couple of the "images" that were loaded into the system weren't images. Instead of gracefully throwing an exception, the package seems to exit out of the running node process. True story. So on Azure, it would get restarted, but while that was happening requests would get a 500 internal server error.

Multiple Api requests in nodejs gives Timedout error

My Api to Update Product Details sends 4000 requests at once and waits for a minute and then shows timedout error . please suggest me a good way to handle this issue.

Whatever the target is that you're sending these 4000 requests to at once apparently cannot handle that many requests all in flight at the same time and still respond to each request in a timely fashion. This is not surprising.
The usual solution to something like this would be to limit how many simultaneous requests you send to the target to something smaller like 10 or 20 (usually determined with testing). You send 10 requests, then each time a request finishes, you send the next one in line. This way, you never over tax the target server.
You can hand-code code to distribute requests as described above or you can use a pre-built function to do that like Bluebirds Promise.map() or a function like mapConcurrent() and you can see a discussion of this general issue here.

Faster HTTP scraping per POST request?

I'm writing an API that returns an array of redirects for any given page:
router.post('/trace', function(req,res){
if(!req.body.link)
return res.status(405).send(""); //error: no link provided!
console.log("\tapi/trace()", req.body.link);
var redirects = [];
function exit(goodbye){
if(goodbye)
console.log(goodbye);
res.status(200).send(JSON.stringify(redirects)); //end
}
function getRedirect(link){
request({ url: link, followRedirect: false }, function (err, response, body) {
if(err)
exit(err);
else if(response.headers.location){
redirects.push(response.headers.location);
getRedirect(response.headers.location);
}
else
exit(); //all done!
});
}
getRedirect(req.body.link);
});
and here is the corresponding browser request:
$.post('/api/trace', { link: l }, cb);
a page will make about 1000 post request very quickly and then waits a very long time to get each request back.
The problem is the response to the nth request is very slow. individual request takes about half a second, but as best I cant tell the express server is processing each link sequentially. I want the server to make all the requests and respond as it receives a response.
Am I correct in assuming express POST router is running processes sequentially? How do I get it to blast all requests and pass the responses as it gets them?

My question is why is it so slow / is POST an async process on a "out of the box" express server?
You may be surprised to find out that this is probably first a browser issue, not a node.js issue.
A browser will have a max number of simultaneous requests it will allow your Javascript ajax to make to same host which will vary slightly from one browser to the next, but is around 6. So, if you're making 1000 requests, then only around 6 are being sent at at time. The rest go in a queue in the browser waiting for prior requests to finish. So, your node server likely isn't getting 1000 simultaneous requests. You should be able to confirm this by logging incoming requests in your node.js app. You will probably see a long delay before it receives the 1000th request (because it's queued by the browser).
Here's a run-down of how many simultanous requests to a given host each of the browser supported (as of a couple years ago): Max parallel http connections in a browser?.
My first recommendation would be to package up an array of requests to make from the client to the server (perhaps 50 at a time) and then send that in one request. That will give your node.js server plenty to chew on and won't run afoul of the browser's connection limit to the same host.
As for the node.js server, it depends a lot on what you're doing. If most of what you're doing in the node.js server is just networking and not a lot of processing that requires CPU cycles, then node.js is very efficient at handling lots and lots of simultaneous requests. If you start engaging a bunch of CPU (processing or preparing results), then you make benefit from either adding worker processes or using node.js clustering. In your case, you may want to use worker processes. You can examine your CPU load when your node.js server is processing a bunch of work and see if the one CPU that node.js is using is anywhere near 100% or not. If it isn't, then you don't need more node.js processes. If it is, then you do need to spread the work over more node.js processes to go faster.
In your specific case, it looks like you're really only doing networking to collect 302 redirect responses. Your single node.js process should be able to handle a lot of those requests very efficiently so probably the issue is just that your client is being throttled by the browser.
If you want to send a lot of requests to the server (so it can get to work on as many as feasible), but want to get results back immediately as they become available, that's a little more work.
One scheme that could work is to open a webSocket or socket.io connection. You can then send a giant array of URLs that you want the server to check for you in one message over the socket.io connection. Then, as the server gets a result, it can send back each individual result (tagged with the URL that it corresponds to). That way, you can somewhat get the best of both worlds with the server crunching on a long list of URLs, but able to send back individual responses as soon as it gets them.
Note, you will probably find that there is an upper limit to how many outbound http requests you may want to run at the same time from your node.js server too. While modern versions of node.js don't throttle you like the browser does, you probably also don't want your node.js server attempting to run 10,000 simultaneous requests because you may exhaust some sort of network resource pool. So, once you get past the client bottleneck, you will want to test your server at different levels of simultaneous requests open to see where it performs best. This is both to optimize its performance, but also to protect your server against attempting to overextend its use of networking or memory resources and get into error conditions.

How can I ensure Socket Timeout errors from my app, rather than 403 errors from Nginx?

I wrote a search API using NodeJS / HapiJS. I set this to run on port 40000 and then I reverse-proxied it to port 80 via Nginx. Nginx does nothing on this machine except act as a reverse proxy to the many apps we have running on various ports.
My company uses the API to look up client data. It is highly configurable, but we don't know which configuration would give us the best results. So I wrote a massive regression test in Clojure. The app fires 10s of millions of requests at the API, testing the different parameters for the configuration.
Occasionally the API gets overwhelmed and so the Clojure code gets a SocketTimeoutException. That's fine. I wrote some code that catches all the SocketTimeoutException and waits a while and then retries the call.
All of the above is great and we love it.
However, once in awhile, instead of getting a SocketTimeoutException because of a slow response from the API, we instead get a 403 error from Nginx. This is not cool. I'm not even sure why it happens, though I assume Nginx sometimes gets tired of waiting for a response from the API, and then perhaps decides the app is offline? Though in that case I would expect a 500 error, not a 403. But possibly Nginx treats some reverse proxies as permission issues?
Anyway, slow responses and SocketTimeoutException are fine with us, but 403 errors from Nginx are not okay. How do I ensure that Nginx remains quiet, so my Clojure app can talk to my NodeJS app without interruption?

Why is node.js only processing six requests at a time?

We have a node.js server which implements a REST API as a proxy to a central server which has a slightly different, and unfortunately asymmetric REST API.
Our client, which runs in various browsers, asks the node server to get the tasks from the central server. The node server gets a list of all the task ids from the central one and returns them to the client. The client then makes two REST API calls per id through the proxy.
As far as I can tell, this stuff is all done asynchronously. In the console log, it looks like this when I start the client:
Requested GET URL under /api/v1/tasks/*: /api/v1/tasks/
This takes a couple seconds to get the list from the central server. As soon as it gets the response, the server barfs this out very quickly:
Requested GET URL under /api/v1/tasks/id/:id :/api/v1/tasks/id/438
Requested GET URL under /api/v1/workflow/id/:id :/api/v1/workflow/id/438
Requested GET URL under /api/v1/tasks/id/:id :/api/v1/tasks/id/439
Requested GET URL under /api/v1/workflow/id/:id :/api/v1/workflow/id/439
Requested GET URL under /api/v1/tasks/id/:id :/api/v1/tasks/id/441
Requested GET URL under /api/v1/workflow/id/:id :/api/v1/workflow/id/441
Then, each time a pair of these requests gets a result from the central server, another two lines is barfed out very quickly.
So it seems our node.js server is only willing to have six requests out at a time.

There are no TCP connection limits imposed by Node itself. (The whole point is that it's highly concurrent and can handle thousands of simultaneous connections.) Your OS may limit TCP connections.
It's more likely that you're either hitting some kind of limitation of your backend server, or you're hitting the builtin HTTP library's connection limit, but it's hard to say without more details about that server or your Node implementation.
Node's built-in HTTP library (and obviously any libraries built on top of it, which are most) maintains a connection pool (via the Agent class) so that it can utilize HTTP keep-alives. This helps increase performance when you're running many requests to the same server: rather than opening a TCP connection, making a HTTP request, getting a response, closing the TCP connection, and repeating; new requests can be issued on reused TCP connections.
In node 0.10 and earlier, the HTTP Agent will only open 5 simultaneous connections to a single host by default. You can change this easily: (assuming you've required the HTTP module as http)
http.globalAgent.maxSockets = 20; // or whatever
node 0.12 sets the default maxSockets to Infinity.
You may want to keep some kind of connection limit in place. You don't want to completely overwhelm your backend server with hundreds of HTTP requests under a second – performance will most likely be worse than if you just let the Agent's connection pool do its thing, throttling requests so as to not overload your server. Your best bet will be to run some experiments to see what the optimal number of concurrent requests is in your situation.
However, if you really don't want connection pooling, you can simply bypass the pool entirely – sent agent to false in the request options:
http.get({host:'localhost', port:80, path:'/', agent:false}, callback);
In this case, there will be absolutely no limit on concurrent HTTP requests.

It's the limit on number of concurrent connections in the browser:
How many concurrent AJAX (XmlHttpRequest) requests are allowed in popular browsers?
I have upvoted the other answers, as they helped me diagnose the problem. The clue was that node's socket limit was 5, and I was getting 6 at a time. 6 is the limit in Chrome, which is what I was using to test the server.

How are you getting data from the central server? "Node does not limit connections" is not entirely accurate when making HTTP requests with the http module. Client requests made in this way use the http.globalAgent instance of http.Agent, and each http.Agent has a setting called maxSockets which determines how many sockets the agent can have open to any given host; this defaults to 5.
So, if you're using http.request or http.get (or a library that relies on those methods) to get data from your central server, you might try changing the value of http.globalAgent.maxSockets (or modify that setting on whatever instance of http.Agent you're using).
See:
http.Agent documentation
agent.maxSockets documentation
http.globalAgent documentation
Options you can pass to http.request, including an agent parameter to specify your own agent

Node js can handle thousands of incoming requests - yes!
But when it comes down to ougoing requests every request has to deal with a dns lookup and dns lookup's, disk reads etc are handled by the libuv which is programmed in C++. The default value of threads for each node process is 4x threads.
If all 4x threads are busy with https requests ( dns lookup's ) other requests will be queued. That is why no matter how brilliant your code might be : you sometimes get 6 or sometimes less concurrent outgoing requests per second completed.
Learn about dns cache to reduce the amount of dns look up's and increase libuv size. If you use PM2 to manage your node processes they do have a well documentation on their side on environment variables and how to inject them. What you are looking for is the environment variable UV_THREADPOOL_SIZE = 4
You can set the value anywhere between 1 or max limit of 1024. But keep in mind libuv limit of 1024 is across all event loops.

I have seen the same problem in my server. It was only processing 4 requests.
As explained already from 0.12 maxsockets defaults to infinity. That easily overwhelms the sever. Limiting the requests to say 10 by
http.globalAgent.maxSockets = 20;
solved my problem.

Are you sure it just returns the results to the client? Node processes everything in one thread. So if you do some fancy response parsing or anything else which doesn't yield, then it would block all your requests.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string