Background jobs that run on every request on Heroku and node.js - node.js

I have an app that needs to run a very long process (takes 30-60 seconds for each request). After the processing, the result is then returned to the request as a response. This works fine locally, but it crashes my Heroku instance.
What I'd like to happen instead is:
User comes on site, request sent to backend
Backend returns immediately, and kicks off another process/task/job that does the processing
When the processing ends, the response is returned to the correct user.
I am not sure what all I need for this. Based on an hour-long research, it seems like I can use Redis as a queue and a worker can poll it every x minutes. But what I can't understand is how to figure out which request to send the response to after processing ends.
Is there a sample Express/node.js for this? Any pointers are helpful.

Like you found in your research, setting up a worker queue using Redis is a good approach for long running processes. A nice library for this is kue (https://github.com/learnboost/kue).
When it comes to responding to a request with the results of the job, having an outanding requesting hanging waiting for a response is not a good way to go about it (and may not work, heroku kills requests that have been idle for a certain period of time).
What you could do is when the request is made start the background job and respond to the request right away with job ID. The client can then poll the server for the status of the job, when the job is complete it can then fetch the needed result.

Kue (from #mattetre's answer) is not maintained anymore. Kue's GitHub page suggests Bull as a good alternative. It is a fast and reliable Redis based queue for Node.js.

Related

How to poll another server from Node.js?

I'm currently developing a Shopify app with Node/Express and a Postgres database. When a user registers an account and connects their Shopify store, I'll need to download all of their store's orders. They could have 100,000s of orders, so I'd like to use a Shopify GraphQL Bulk Operation. While Shopify is handling this, my Node server will need to poll the Shopify server to check on the progress, and when the operation is complete, Shopify will send me a link where I can download all of the data. Once the data is processed and stored in my database, I'll send the user an email to say that their account is now set up.
How should I handle polling the Shopify server? The process could take anywhere from a few mins to hours. Using setInterval() would be a bad idea right? Because if the server restarts for whatever reason, It will lose the interval? So, should I use some sort of background task? And would I need to store anything in my database? I've researched cron jobs, child processes, worker threads, the bull package -- and it's left me a little confused.
(I also know that I could use a webhook, but Shopify offers no guarantees that my app will receive the webhook.)
Upon installation, launch a background job labeled "GetCustomerOrders". As you know, background jobs are mature, and nicely handle problems. For example, they can retry themselves if something goes wrong.
The Background job itself just sets up the Bulk Download and then settles into Poll. Polling is no big deal and just happens. As you said, could be minutes, could take hours. Nevertheless, a poll gets status on a bulk download, and that can even be hot-rodded. For example, you poll with an ID. So you poll till that ID completes. Regardless of restarts.
At the end of that rather simple setup, you get an URL to download and parse JSON. Spawn another job even for that. Endless fun. Why sweat it? Background jobs are the way to go.
The Webhook idea is OK but as the documentation says, they are not 100% and CRON is bush-league in that it misses out on the mature development of jobs in queues and is more like a simple trigger. Relying on CRON to start something is fine, but gives you zero management over what it starts.
I am guessing NodeJS has a decent background job system by this time. When you look at Sidekiq for Ruby you realize what awesome is. Surely you can find a copycat in Node that comes close anyway.

IIS applicaiton HTTP method stop running

I have Web application on IIS server.
I have POST method that take a long time to run (Around 30-40min).
After period time the application stop running (Without any exception).
I set Idle timeout to be 0 and It is not help for me.
What can I do to solve it?
Instead of doing all the work initiated by the request before responding at all:
Receive the request
Put the information in the request in a queue (which you could manage with a database table, ZeroMQ, or whatever else you like)
Respond with a "Request recieved" message.
That way you respond within seconds, which is acceptable for HTTP.
Then have a separate process monitor the queue and process the data on it (doing the 30-40 minute long job). When the job is complete, notify the user.
You could do this through the browser with a Notification or through a WebSocket or use a completely different mechanism (such as by sending an email to the user who made the request).

AJAX request errors out with no response

I have an application with webix on UI and node js on server side.
From the UI if I trigger a long running AJAX request for e.g. process 1000 records, the request errors out after 1.5 mins (not consistently) approximately.
The error object contains no information about the reason for request failure but since processing smaller set of records seems to work fine I am thinking of blaming it on timeout.
From the developer console I see that request seems to be Stalled and response is empty.
Currently I cant drop a request and keep polling it after every few seconds to see if the processing has been finished. I have to wait for the request to finish but I am not sure how to do it as webix forum doesn't seem to have any information on this except for setting timeout.
If setting timeout is the way to go then what would happen tomorrow if the request size goes to 2000 records - I don't want to keep on increasing the timeout
Also, if I am left with no choice how would I implement the polling. If I drop a request on to server there can be other clients as well who are triggering a similar request. How would I distinguish between requests originated from different clients?
I would really appreciate some help on this.

Distributed queue wrapped with https request/response

Looking for some advice on how to do the following:
Receive request from website for certain long running process (~10-30seconds)
Website backend schedules a job and puts onto distributed queue .. could be SQS/Kue/resque
A worker takes the job off the queue and processes it. Stores result somewhere.
Website backend subscribes to job complete event and gets the result of processed job.
Website backend closes request to website with result of the task.
1,2 and 3 are fine. I am just finding it tricky to pass the result of a queued task back to the backend so that it can close the request.
Polling from the website isnt an option - the request has to stay open for however long the task takes to be processed. I'm using nodejs.
2 - 4 are all happening on the server side. There is nothing stopping you from polling the expected result location (on the server side) for the result and then returning the result when it finally appears.
Client sends requests
Server starts job and begins polling for the result
The result comes back so the poll loop on the server side ends
Server sends result back to client
The client-server connection is finally severed
You could get even more efficient code going if the job can execute a url when it finishes. In this case your service would have two endpoints... one for the client to start the process, and another that your job queue can call.
Client sends requests
Server starts job... saves the response callback in a global object so that it is not closed (I'm assuming something like express here)
openJobs.push({ id: 12345, res: res });
jobQueue.execute({ id: 12345, data: {...}});
When the job finishes and saves the result, call the service url with the id
You can check that the job has actually finished and remove the job from the openJobs list
Finish the original response
openJob.res.send(data);
This will send the data and close the original client-server connection.
The overall result is that you have no polling at all... which is cool.
Of course... In either of these scenarios you are screwed if your server shuts down in the middle of a batch... This is why I would recommend something like socket.io in this scenario. You would queue the results of jobs somewhere and socket.io would poll/wait for callbacks on the list and push to the client when there are new items. This is better because if the server crashes no biggie - the client will re-connect once the server comes back up.

How to handle requests that have heavy load?

This is a Brain-Question for advice on which scenario is a smarter approach to tackle situations of heavy lifting on the server end but with a responsive UI for the User.
The setup;
My System consists of two services (written in node); One Frontend Service that listens on Requests from the user and a Background Worker, that does heavy lifting and wont be finished within 1-2 seconds (eg. video conversion, image resizing, gzipping, spidering etc.). The User is connected to the Frontend Service via WebSockets (and normal POST Requests).
Scenario 1;
When a User eg. uploads a video, the Frontend Service only does some simple checks, creates a job in the name of the User for the Background Worker to process and directly responds with status 200. Later on the Worker see's its got work, does the work and finishes the job. It then finds the socket the user is connected to (if any) and sends a "hey, job finished" with the data related to the video conversion job (url, length, bitrate, etc.).
Pros I see: Quick User feedback of sucessfull upload (eg. ProgressBar can be hidden)
Cons I see: User will get a fake "success" respond with no data to handle/display and needs to wait till the job finishes anyway.
Scenario 2;
Like Scenario 1 but that the Frontend Service doesn't respond with a status 200 but rather subscribes to the created job "onComplete" event and lets the Request dangle till the callback is fired and the data can be sent down the pipe to the user.
Pros I see: "onSuccess", all data is at the User
Cons I see: Depending on the job's weight and active job count, the Users request could Timeout
While writing this question things are getting clearer to me by the minute (Scenario 1, but with smart success and update events sent). Regardless, I'd like to hear about other Scenarios you use or further Pros/Cons towards my Scenarios!?
Thanks for helping me out!
Some unnecessary info; For websockets I'm using socket.io, for job creating kue and for pub/sub redis
I just wrote something like this and I use both approaches for different things. Scenario 1 makes most sense IMO because it matches the reality best, which can then be conveyed most accurately to the user. By first responding with a 200 "Yes I got the request and created the 'job' like you requested" then you can accurately update the UI to reflect that the request is being dealt with. You can then use the push channel to notify the user of updates such as progress percentage, error, and success as needed but without the UI 'hanging' (obviously you wouldn't hang the UI in scenario 2 but its an awkward situation that things are happening and the UI just has to 'guess' that the job is being processed).
Scenario 1 -- but instead of responding with 200 OK, you should respond with 202 Accepted. From Wikipedia:
https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
202 Accepted The request has been accepted for processing, but the
processing has not been completed. The request might or might not
eventually be acted upon, as it might be disallowed when processing
actually takes place.
This leaves the door open for the possibility of worker errors. You are just saying you accepted the request and is trying to do something with it.

Resources