Multiple Api requests in nodejs gives Timedout error - node.js

My Api to Update Product Details sends 4000 requests at once and waits for a minute and then shows timedout error . please suggest me a good way to handle this issue.

Whatever the target is that you're sending these 4000 requests to at once apparently cannot handle that many requests all in flight at the same time and still respond to each request in a timely fashion. This is not surprising.
The usual solution to something like this would be to limit how many simultaneous requests you send to the target to something smaller like 10 or 20 (usually determined with testing). You send 10 requests, then each time a request finishes, you send the next one in line. This way, you never over tax the target server.
You can hand-code code to distribute requests as described above or you can use a pre-built function to do that like Bluebirds Promise.map() or a function like mapConcurrent() and you can see a discussion of this general issue here.

Related

Sending a response after jobs have finished processing in Express

So, I have Express server that accepts a request. The request is web scraping that takes 3-4 minute to finish. I'm using Bull to queue the jobs and processing it as and when it is ready. The challenge is to send this results from processed jobs back as response. Is there any way I can achieve this? I'm running the app on heroku, but heroku has a request timeout of 30sec.
You don’t have to wait until the back end finished do the request identified who is requesting . Authenticate the user. Do a res.status(202).send({message:”text});
Even though the response was sended to the client you can keep processing and stuff
NOTE: Do not put a return keyword before res.status...
The HyperText Transfer Protocol (HTTP) 202 Accepted response status code indicates that the request has been accepted for processing, but the processing has not been completed; in fact, processing may not have started yet. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place.
202 is non-committal, meaning that there is no way for the HTTP to later send an asynchronous response indicating the outcome of processing the request. It is intended for cases where another process or server handles the request, or for batch processing.
You always need to send response immediately due to timeout. Since your process takes about 3-4 minutes, it is better to send a response immediately mentioning that the request was successfully received and will be processed.
Now, when the task is completed, you can use socket.io or web sockets to notify the client from the server side. You can also pass a response.
The client side also can check continuously if the job was completed on the server side, this is called polling and is required with older browsers which don't support web sockets. socket.io falls back to polling when browsers don't support web sockets.
Visit socket.io for more information and documentation.
Best approach to this problem is socket.io library. It can send data to client send whenever you want. It triggers a function on client side which receives the data. Socket.io supports different languages and it is really ease to use.
website link
Documentation Link
create a jobs table in a database or persistant storage like redis
save each job in the table upon request with a unique id
update status to running on starting the job
sent HTTP 202 - Accepted
At the client implement a polling script, At the server implement a job status route/api. The api accept a job id and queries the job table and respond with the status
When the job is finished update the job table with status completed, when the jon is errored updated the job table with status failed and maybe a description column to store the cause for error
This solution makes your system horizontaly scalable and distributed. It also prevents the consequences of unexpected connection drops. Polling interval depends on average job completion duration. I would recommend an average interval of 5 second
This can be even improved to store job completion progress in the jobs table so that the client can even display a progress bar
->Request time out occurs when your connection is idle, different servers implement in a different way so timeout time differs
1)The solution for this timeout problem would be to make your connections open(constant), that is the connection between client and servers should remain constant.
So for such scenarios use WebSockets, which ensures that after the initial request and response handshake between client and server the connection stays open.
there are many libraries to implement realtime connection.Eg Pubnub,socket.io. This is the same technology used for live streaming.
Node js can handle many concurrent connections and its lightweight too, won't use many resources too.

One API call vs multiple

I have a process in the back-end which will take take on average 30 to 90 seconds to complete.
Is it better to have a font-end react app make ONE API call and wait for back-end to complete and process and return the data. Or is it better to have the front-end make multiple calls, lets say every 2 seconds to check if the process and complete and get back the result?
Both are valid approaches. You could also report status changes with websocket so there's no need for polling.
If you do want to go the polling route, the general recommendation is to:
Return 202 accepted from your long-running process endpoint.
Also return a Link header with a url to where the status of the process can be read.
The client can then follow that client and ping it every x seconds.
I think it's not good to make a single API call and wait for 30-90 seconds to get a response. Instead send a response immediately mentioning that the request is successful and would be processed.
Now you can use web sockets or library like socket.io so that the server can communicate directly to the client once the requested processing is complete.
The multiple API calls to check if server is done or server has any new message is called polling and is not much efficient but it is still required in old browsers which don't support web sockets. Socket.io support s polling automatically in old browsers.
But, yes if you want you can do multiple calls to check if server is done processing, but I would prefer server to communicate back to the client , it is better.

ForEach Bulk Post Requests are failing

I have this script where I'm taking a large dataset and calling a remote api, using request-promise, using a post method. If I do this individually, the request works just fine. However, if I loop through a sample set of 200-records using forEach and async/await, only about 6-15 of the requests come back with a status of 200, the others are returning with a 500 error.
I've worked with the owner of the API, and their logs only show the 200-requests. So I don't think node is actually sending out the ones that come back as 500.
Has anyone run into this, and/or know how I can get around this?
To my knowledge, there's no code in node.js that automatically makes a 500 http response for you. Those 500 responses are apparently coming from the target server's network. You could look at a network trace on your server machine to see for sure.
If they are not in the target server logs, then it's probably coming from some defense mechanism deployed in front of their server to stop misuse or overuse of their server (such as rate limiting from one source) and/or to protect its ability to respond to a meaningful number of requests (proxy, firewall, load balancer, etc...). It could even be part of a configuration in the hosting facility.
You will likely need to find out how many simultaneous requests the target server will accept without error and then modify your code to never send more than that number of requests at once. They could also be measuring requests/sec to it might not only be an in-flight count, but could be the rate at which requests are sent.

Faster HTTP scraping per POST request?

I'm writing an API that returns an array of redirects for any given page:
router.post('/trace', function(req,res){
if(!req.body.link)
return res.status(405).send(""); //error: no link provided!
console.log("\tapi/trace()", req.body.link);
var redirects = [];
function exit(goodbye){
if(goodbye)
console.log(goodbye);
res.status(200).send(JSON.stringify(redirects)); //end
}
function getRedirect(link){
request({ url: link, followRedirect: false }, function (err, response, body) {
if(err)
exit(err);
else if(response.headers.location){
redirects.push(response.headers.location);
getRedirect(response.headers.location);
}
else
exit(); //all done!
});
}
getRedirect(req.body.link);
});
and here is the corresponding browser request:
$.post('/api/trace', { link: l }, cb);
a page will make about 1000 post request very quickly and then waits a very long time to get each request back.
The problem is the response to the nth request is very slow. individual request takes about half a second, but as best I cant tell the express server is processing each link sequentially. I want the server to make all the requests and respond as it receives a response.
Am I correct in assuming express POST router is running processes sequentially? How do I get it to blast all requests and pass the responses as it gets them?
My question is why is it so slow / is POST an async process on a "out of the box" express server?
You may be surprised to find out that this is probably first a browser issue, not a node.js issue.
A browser will have a max number of simultaneous requests it will allow your Javascript ajax to make to same host which will vary slightly from one browser to the next, but is around 6. So, if you're making 1000 requests, then only around 6 are being sent at at time. The rest go in a queue in the browser waiting for prior requests to finish. So, your node server likely isn't getting 1000 simultaneous requests. You should be able to confirm this by logging incoming requests in your node.js app. You will probably see a long delay before it receives the 1000th request (because it's queued by the browser).
Here's a run-down of how many simultanous requests to a given host each of the browser supported (as of a couple years ago): Max parallel http connections in a browser?.
My first recommendation would be to package up an array of requests to make from the client to the server (perhaps 50 at a time) and then send that in one request. That will give your node.js server plenty to chew on and won't run afoul of the browser's connection limit to the same host.
As for the node.js server, it depends a lot on what you're doing. If most of what you're doing in the node.js server is just networking and not a lot of processing that requires CPU cycles, then node.js is very efficient at handling lots and lots of simultaneous requests. If you start engaging a bunch of CPU (processing or preparing results), then you make benefit from either adding worker processes or using node.js clustering. In your case, you may want to use worker processes. You can examine your CPU load when your node.js server is processing a bunch of work and see if the one CPU that node.js is using is anywhere near 100% or not. If it isn't, then you don't need more node.js processes. If it is, then you do need to spread the work over more node.js processes to go faster.
In your specific case, it looks like you're really only doing networking to collect 302 redirect responses. Your single node.js process should be able to handle a lot of those requests very efficiently so probably the issue is just that your client is being throttled by the browser.
If you want to send a lot of requests to the server (so it can get to work on as many as feasible), but want to get results back immediately as they become available, that's a little more work.
One scheme that could work is to open a webSocket or socket.io connection. You can then send a giant array of URLs that you want the server to check for you in one message over the socket.io connection. Then, as the server gets a result, it can send back each individual result (tagged with the URL that it corresponds to). That way, you can somewhat get the best of both worlds with the server crunching on a long list of URLs, but able to send back individual responses as soon as it gets them.
Note, you will probably find that there is an upper limit to how many outbound http requests you may want to run at the same time from your node.js server too. While modern versions of node.js don't throttle you like the browser does, you probably also don't want your node.js server attempting to run 10,000 simultaneous requests because you may exhaust some sort of network resource pool. So, once you get past the client bottleneck, you will want to test your server at different levels of simultaneous requests open to see where it performs best. This is both to optimize its performance, but also to protect your server against attempting to overextend its use of networking or memory resources and get into error conditions.

Node.js options to push updates to some microcontrollers that have an HTTP 1.1 stack

The title pretty well says it.
I need the microcontrollers to stay connected to the server to receive updates within a couple of seconds and I'm not quite sure how to do this.
The client in this case is very limited to say the least and it seems like all the solutions I've found for polling or something like socket.io require dropping some significant JavaScript to the client.
If I'm left having to reimplement one of those libraries in C on the micro I could definitely use some pointers on the leanest way to handle it.
I can't just pound the server with constant requests because this is going to increase to a fair number of connected micros.
Just use ordinary long polling: each controller initially makes an HTTP request and waits for a response, which happens when there's an update. Once the controller receives the response, it makes another request. Lather, rinse, repeat. This won't hammer the server because each controller makes only one request per update, and node's architecture is such that you can have lots of requests pending, since you aren't creating a new thread or process for each active connection.

Resources