Sending a response after jobs have finished processing in Express - node.js

So, I have Express server that accepts a request. The request is web scraping that takes 3-4 minute to finish. I'm using Bull to queue the jobs and processing it as and when it is ready. The challenge is to send this results from processed jobs back as response. Is there any way I can achieve this? I'm running the app on heroku, but heroku has a request timeout of 30sec.

You don’t have to wait until the back end finished do the request identified who is requesting . Authenticate the user. Do a res.status(202).send({message:”text});
Even though the response was sended to the client you can keep processing and stuff
NOTE: Do not put a return keyword before res.status...
The HyperText Transfer Protocol (HTTP) 202 Accepted response status code indicates that the request has been accepted for processing, but the processing has not been completed; in fact, processing may not have started yet. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place.
202 is non-committal, meaning that there is no way for the HTTP to later send an asynchronous response indicating the outcome of processing the request. It is intended for cases where another process or server handles the request, or for batch processing.

You always need to send response immediately due to timeout. Since your process takes about 3-4 minutes, it is better to send a response immediately mentioning that the request was successfully received and will be processed.
Now, when the task is completed, you can use socket.io or web sockets to notify the client from the server side. You can also pass a response.
The client side also can check continuously if the job was completed on the server side, this is called polling and is required with older browsers which don't support web sockets. socket.io falls back to polling when browsers don't support web sockets.
Visit socket.io for more information and documentation.

Best approach to this problem is socket.io library. It can send data to client send whenever you want. It triggers a function on client side which receives the data. Socket.io supports different languages and it is really ease to use.
website link
Documentation Link

create a jobs table in a database or persistant storage like redis
save each job in the table upon request with a unique id
update status to running on starting the job
sent HTTP 202 - Accepted
At the client implement a polling script, At the server implement a job status route/api. The api accept a job id and queries the job table and respond with the status
When the job is finished update the job table with status completed, when the jon is errored updated the job table with status failed and maybe a description column to store the cause for error
This solution makes your system horizontaly scalable and distributed. It also prevents the consequences of unexpected connection drops. Polling interval depends on average job completion duration. I would recommend an average interval of 5 second
This can be even improved to store job completion progress in the jobs table so that the client can even display a progress bar

->Request time out occurs when your connection is idle, different servers implement in a different way so timeout time differs
1)The solution for this timeout problem would be to make your connections open(constant), that is the connection between client and servers should remain constant.
So for such scenarios use WebSockets, which ensures that after the initial request and response handshake between client and server the connection stays open.
there are many libraries to implement realtime connection.Eg Pubnub,socket.io. This is the same technology used for live streaming.
Node js can handle many concurrent connections and its lightweight too, won't use many resources too.

Related

One API call vs multiple

I have a process in the back-end which will take take on average 30 to 90 seconds to complete.
Is it better to have a font-end react app make ONE API call and wait for back-end to complete and process and return the data. Or is it better to have the front-end make multiple calls, lets say every 2 seconds to check if the process and complete and get back the result?
Both are valid approaches. You could also report status changes with websocket so there's no need for polling.
If you do want to go the polling route, the general recommendation is to:
Return 202 accepted from your long-running process endpoint.
Also return a Link header with a url to where the status of the process can be read.
The client can then follow that client and ping it every x seconds.
I think it's not good to make a single API call and wait for 30-90 seconds to get a response. Instead send a response immediately mentioning that the request is successful and would be processed.
Now you can use web sockets or library like socket.io so that the server can communicate directly to the client once the requested processing is complete.
The multiple API calls to check if server is done or server has any new message is called polling and is not much efficient but it is still required in old browsers which don't support web sockets. Socket.io support s polling automatically in old browsers.
But, yes if you want you can do multiple calls to check if server is done processing, but I would prefer server to communicate back to the client , it is better.

Long Polling in Gatling

Warning: Please bear with me and I am fairly new with Gatling. So, apologies in advance. :P :)
I was going through the Loadrunner Asynchronous Calls Function - wb_reg_async_attributes, and I found that there are four different Asynchronous Conversation Patterns, which are:
Poll - The client polls the server periodically for information.
Long Poll - The client polls the server and waits for a response.
When the response arrives, another poll request is initiated.
Push -The client sends a request. The server response is to send updates
when there are changes to the requested information.
Cross-user - One user performs an activity that is reflected in another user's client. For example, user1 sends an email and user2 receives
notification.
Now, I have a requirement where I need to test Long-Polling using Gatling.
As far as I know, there are two ways in Gatling:
Poll
SSE
Please feel free to let me know in case I am wrong.
By using Polling function of Gatling, I am getting a Gateway Timeout Error. My theory is:
Gatling sends the request --> doesn't get a response --> Comes back with Gateway Timeout error.
Is there a way I can emulate Long Polling in Gatling? Please help me out in resolving this challenge.
Poll works in the similar fashion as LongPoll

How Request and Response will got process in service stack?

I am using service stack to build the create RESTful services, not have depth knowledge of it. This works as sending request and getting response back. I have scenario and my question is depends on it.
Scenario: I am sending request from browser or any client where I am able to send request to server. Consider server will take 3 seconds to process single request and send back response to browser. After one second, I have sent another request to server from same browser(client). Now I am getting response of second request which I sent later.
Question 1: What is happening behind with the first request which I did not get response.
Question 2: How I can stop processing of orphan request.
Edit : I have used IIS server to host services.
ServiceStack executes requests concurrently on multithreaded web servers, whether you're hosting on ASP.NET/IIS or self-hosted so 2 concurrent requests are running concurrently on different threads. There are different scenarios possible if you're executing async tasks in your Services in which it frees up the thread to execute different tasks, but the implementation details are largely irrelevant here.
HTTP Web Requests are each executed to their end, even when its client connection is lost your Services are never notified and no Exceptions are raised.
But for long running Services you can enable the high-level ServiceStack's Cancellable Requests Feature which enables a way for clients to cancel long running requests.

node js on heroku - request timeout issue

I am using Sails js (node js framework) and running it on Heroku and locally.
The API function reads from an external file and performs long computations that might take hours on the queries it read.
My concern is that after a few minutes it returns with timeout.
I have 2 questions:
How to control the HTTP request / response timeout (what do I really need to control here?)
Is HTTP request considered best practice for this target? or should I use Socket IO? (well, I have no experience on Socket IO and not sure if I am not talking bullshit).
You should use the worker pattern to accomplish any work that would take more than a second or so:
"Web servers should focus on serving users as quickly as possible. Any non-trivial work that could slow down your user’s experience should be done asynchronously outside of the web process."
"The Flow
Web and worker processes connect to the same message queue.
A process adds a job to the queue and gets a url.
A worker process receives and starts the job from the queue.
The client can poll the provided url for updates.
On completion, the worker stores results in a database."
https://devcenter.heroku.com/articles/asynchronous-web-worker-model-using-rabbitmq-in-node

Node takes very long time to response to the JSON request

I've implemented the chat application using node.js. The program open the connection with the client and it'll response the new message when the EventEmitter emit "recv" event.
The problem is it takes very long time to response to other request when the server hold about 3 or 4 more streams. The chrome developer tool show the status of the request as pending. it took more than 5-30 second to reach the server(localhost). I use console.log to log when the new request is received by the node.js
I have no idea why there's a long pause. Is there any limit on chrome browser, node.js or any other stuffs i should know? Does the node delay when it hold too many request at the same time and how should i measure this value? Thank you
Chrome supports six simultaneous connections per domain, so if those are already in use, it will have to wait for one to close. If you want to know what's going on, use a packet capture program to check the actual network traffic.
Browsers are limited to certain number of parallel connections which applies to the same browser context - for example when you have opened let's say more than 6 tabs, then the connections will be queued and you will see them pending.
You can avoid this limitation, for example, by using unique poll subdomain for each client connection. This is how facebook workaround this limitation, however problem is with Firefox, where this workaround doesn't work and your connections will be queued when they reach the limit even when you use unique subdomains.
Other solution might be to use HTML5 local storage where you can take advantage of StorageEvent which propagate changes also to other tabs within the same browser. This is how StackOverflow chat is done. Advantage of this approach is that you need only one polling connection with the server, but disadvantage is lack of HTML5 local storage support in older browsers or different implementation in FF version < 4.

Resources