One API call vs multiple - node.js

I have a process in the back-end which will take take on average 30 to 90 seconds to complete.
Is it better to have a font-end react app make ONE API call and wait for back-end to complete and process and return the data. Or is it better to have the front-end make multiple calls, lets say every 2 seconds to check if the process and complete and get back the result?

Both are valid approaches. You could also report status changes with websocket so there's no need for polling.
If you do want to go the polling route, the general recommendation is to:
Return 202 accepted from your long-running process endpoint.
Also return a Link header with a url to where the status of the process can be read.
The client can then follow that client and ping it every x seconds.

I think it's not good to make a single API call and wait for 30-90 seconds to get a response. Instead send a response immediately mentioning that the request is successful and would be processed.
Now you can use web sockets or library like socket.io so that the server can communicate directly to the client once the requested processing is complete.
The multiple API calls to check if server is done or server has any new message is called polling and is not much efficient but it is still required in old browsers which don't support web sockets. Socket.io support s polling automatically in old browsers.
But, yes if you want you can do multiple calls to check if server is done processing, but I would prefer server to communicate back to the client , it is better.

Related

How to use socket.io properly with express app

I wonder how do I use socket.io properly with my express app.
I have a REST API written in express/node.js and I want to use socket.io to add real-time feature for my app. Consider that I want to do something I can do just by sending a request to my REST API. What should I do with socket.io? Should I send request to the REST API and send socket.io client the result of the process or handle the whole process within socket.io emitter and then send the result to socket.io client?
Thanks in advance.
Question is not that clear but from what I'm getting from it, is that you want to know what you would use it for that you cant already do with your current API?
The short answer is, well nothing really.. Websockets are just the natural progression of API's and the need for a more 'real-time' interface between systems.
Old methods (and still used and relevant for the right use case) is long polling where you keep checking back to the server for updated items and if so grab them.. This works but it can be expensive in terms of establishing a connection, performing a lookup, then closing a connection.
websockets keep that connection open, allowing both the client and server to communicate real time. So for example, lets say you make an update to your backend data and want users to get that update, using long polling you would rely on each client to ping back to the server, check if there is an update and if so grab it. This can cause lags between updates, some users have updated data while other do not etc.
Now, take the same scenario with websockets, you make an update to the backend data, hit submit, this then emits to your socket server. Socket server takes the call, performs the task ( grabs updated data ) and emits it to the users, each connected user instantly gets that update.
Socket servers are typically used for things like real time chats or polling where packets are smaller but they are also used for web games etc. Depending on the size of your payloads will determine how best to send data back and forth because the larger the payload the more resources / bandwidth it will take on the socket server so its something to consider.

Sending a response after jobs have finished processing in Express

So, I have Express server that accepts a request. The request is web scraping that takes 3-4 minute to finish. I'm using Bull to queue the jobs and processing it as and when it is ready. The challenge is to send this results from processed jobs back as response. Is there any way I can achieve this? I'm running the app on heroku, but heroku has a request timeout of 30sec.
You don’t have to wait until the back end finished do the request identified who is requesting . Authenticate the user. Do a res.status(202).send({message:”text});
Even though the response was sended to the client you can keep processing and stuff
NOTE: Do not put a return keyword before res.status...
The HyperText Transfer Protocol (HTTP) 202 Accepted response status code indicates that the request has been accepted for processing, but the processing has not been completed; in fact, processing may not have started yet. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place.
202 is non-committal, meaning that there is no way for the HTTP to later send an asynchronous response indicating the outcome of processing the request. It is intended for cases where another process or server handles the request, or for batch processing.
You always need to send response immediately due to timeout. Since your process takes about 3-4 minutes, it is better to send a response immediately mentioning that the request was successfully received and will be processed.
Now, when the task is completed, you can use socket.io or web sockets to notify the client from the server side. You can also pass a response.
The client side also can check continuously if the job was completed on the server side, this is called polling and is required with older browsers which don't support web sockets. socket.io falls back to polling when browsers don't support web sockets.
Visit socket.io for more information and documentation.
Best approach to this problem is socket.io library. It can send data to client send whenever you want. It triggers a function on client side which receives the data. Socket.io supports different languages and it is really ease to use.
website link
Documentation Link
create a jobs table in a database or persistant storage like redis
save each job in the table upon request with a unique id
update status to running on starting the job
sent HTTP 202 - Accepted
At the client implement a polling script, At the server implement a job status route/api. The api accept a job id and queries the job table and respond with the status
When the job is finished update the job table with status completed, when the jon is errored updated the job table with status failed and maybe a description column to store the cause for error
This solution makes your system horizontaly scalable and distributed. It also prevents the consequences of unexpected connection drops. Polling interval depends on average job completion duration. I would recommend an average interval of 5 second
This can be even improved to store job completion progress in the jobs table so that the client can even display a progress bar
->Request time out occurs when your connection is idle, different servers implement in a different way so timeout time differs
1)The solution for this timeout problem would be to make your connections open(constant), that is the connection between client and servers should remain constant.
So for such scenarios use WebSockets, which ensures that after the initial request and response handshake between client and server the connection stays open.
there are many libraries to implement realtime connection.Eg Pubnub,socket.io. This is the same technology used for live streaming.
Node js can handle many concurrent connections and its lightweight too, won't use many resources too.

Faster HTTP scraping per POST request?

I'm writing an API that returns an array of redirects for any given page:
router.post('/trace', function(req,res){
if(!req.body.link)
return res.status(405).send(""); //error: no link provided!
console.log("\tapi/trace()", req.body.link);
var redirects = [];
function exit(goodbye){
if(goodbye)
console.log(goodbye);
res.status(200).send(JSON.stringify(redirects)); //end
}
function getRedirect(link){
request({ url: link, followRedirect: false }, function (err, response, body) {
if(err)
exit(err);
else if(response.headers.location){
redirects.push(response.headers.location);
getRedirect(response.headers.location);
}
else
exit(); //all done!
});
}
getRedirect(req.body.link);
});
and here is the corresponding browser request:
$.post('/api/trace', { link: l }, cb);
a page will make about 1000 post request very quickly and then waits a very long time to get each request back.
The problem is the response to the nth request is very slow. individual request takes about half a second, but as best I cant tell the express server is processing each link sequentially. I want the server to make all the requests and respond as it receives a response.
Am I correct in assuming express POST router is running processes sequentially? How do I get it to blast all requests and pass the responses as it gets them?
My question is why is it so slow / is POST an async process on a "out of the box" express server?
You may be surprised to find out that this is probably first a browser issue, not a node.js issue.
A browser will have a max number of simultaneous requests it will allow your Javascript ajax to make to same host which will vary slightly from one browser to the next, but is around 6. So, if you're making 1000 requests, then only around 6 are being sent at at time. The rest go in a queue in the browser waiting for prior requests to finish. So, your node server likely isn't getting 1000 simultaneous requests. You should be able to confirm this by logging incoming requests in your node.js app. You will probably see a long delay before it receives the 1000th request (because it's queued by the browser).
Here's a run-down of how many simultanous requests to a given host each of the browser supported (as of a couple years ago): Max parallel http connections in a browser?.
My first recommendation would be to package up an array of requests to make from the client to the server (perhaps 50 at a time) and then send that in one request. That will give your node.js server plenty to chew on and won't run afoul of the browser's connection limit to the same host.
As for the node.js server, it depends a lot on what you're doing. If most of what you're doing in the node.js server is just networking and not a lot of processing that requires CPU cycles, then node.js is very efficient at handling lots and lots of simultaneous requests. If you start engaging a bunch of CPU (processing or preparing results), then you make benefit from either adding worker processes or using node.js clustering. In your case, you may want to use worker processes. You can examine your CPU load when your node.js server is processing a bunch of work and see if the one CPU that node.js is using is anywhere near 100% or not. If it isn't, then you don't need more node.js processes. If it is, then you do need to spread the work over more node.js processes to go faster.
In your specific case, it looks like you're really only doing networking to collect 302 redirect responses. Your single node.js process should be able to handle a lot of those requests very efficiently so probably the issue is just that your client is being throttled by the browser.
If you want to send a lot of requests to the server (so it can get to work on as many as feasible), but want to get results back immediately as they become available, that's a little more work.
One scheme that could work is to open a webSocket or socket.io connection. You can then send a giant array of URLs that you want the server to check for you in one message over the socket.io connection. Then, as the server gets a result, it can send back each individual result (tagged with the URL that it corresponds to). That way, you can somewhat get the best of both worlds with the server crunching on a long list of URLs, but able to send back individual responses as soon as it gets them.
Note, you will probably find that there is an upper limit to how many outbound http requests you may want to run at the same time from your node.js server too. While modern versions of node.js don't throttle you like the browser does, you probably also don't want your node.js server attempting to run 10,000 simultaneous requests because you may exhaust some sort of network resource pool. So, once you get past the client bottleneck, you will want to test your server at different levels of simultaneous requests open to see where it performs best. This is both to optimize its performance, but also to protect your server against attempting to overextend its use of networking or memory resources and get into error conditions.

Node takes very long time to response to the JSON request

I've implemented the chat application using node.js. The program open the connection with the client and it'll response the new message when the EventEmitter emit "recv" event.
The problem is it takes very long time to response to other request when the server hold about 3 or 4 more streams. The chrome developer tool show the status of the request as pending. it took more than 5-30 second to reach the server(localhost). I use console.log to log when the new request is received by the node.js
I have no idea why there's a long pause. Is there any limit on chrome browser, node.js or any other stuffs i should know? Does the node delay when it hold too many request at the same time and how should i measure this value? Thank you
Chrome supports six simultaneous connections per domain, so if those are already in use, it will have to wait for one to close. If you want to know what's going on, use a packet capture program to check the actual network traffic.
Browsers are limited to certain number of parallel connections which applies to the same browser context - for example when you have opened let's say more than 6 tabs, then the connections will be queued and you will see them pending.
You can avoid this limitation, for example, by using unique poll subdomain for each client connection. This is how facebook workaround this limitation, however problem is with Firefox, where this workaround doesn't work and your connections will be queued when they reach the limit even when you use unique subdomains.
Other solution might be to use HTML5 local storage where you can take advantage of StorageEvent which propagate changes also to other tabs within the same browser. This is how StackOverflow chat is done. Advantage of this approach is that you need only one polling connection with the server, but disadvantage is lack of HTML5 local storage support in older browsers or different implementation in FF version < 4.

Notifying a browser about events on server

I have a java based web application(struts 1.2). I have a requirement to display a status on the frontend (jsp). Now the status might change which my server gets notified by another server. But I want this status change to be notified to the browser.
I don't want to make a refresh at intervals. Rather I have to implement something like done in gmail chat, ie. the browser gets notified by changing events on the server.
Any ideas on how to go about this?
I was thinking on lines of opening a request to server for status, and at the server end I would hold the request and wouldn't respond back until there is a status change. Any pointers, examples on this?
Best possible solution will be to make use of XMPP protocol. It's standardized and a lot of open source solutions will get you started within minutes. You can use combination of Smack, StropheJS and Openfire to get your java based app work as desired.
There's a method called Long Polling (Comet). It basically sends a request to the server. The request thread created on the server simply waits for new data for the user, with a time limit of maybe 1 minute or more. When new data is available it is returned.
The main problem is to tackle the server-side issue, you don't want to have one thread for every user just waiting for new data. Of course you could use some asynchronous methods depending on your back-end.
Ref: http://en.wikipedia.org/wiki/Push_technology
Alternative way would be to use WebSockets. The problem is that it's not supported by all browsers today.

Resources