How to let the frontend know when a background job is done? - node.js

In Heroku long requests can cause H12 timeout errors.
The request must then be processed...by your application...within 30 seconds to
avoid the timeout.
src
Heroku suggests moving long tasks to background jobs.
Sending an email...Accessing a remote API...
Web scraping / crawling...you should move this heavy lifting into a background job which can run asynchronously from your web request.
src
Heroku's docs say requests shouldn't take longer than 500ms to return a response.
It’s important for a web application to serve end-user requests as
fast as possible. A good rule of thumb is to avoid web requests which
run longer than 500ms. If you find that your app has requests that
take one, two, or more seconds to complete, then you should consider
using a background job instead.
src
So if I have a background job, how do I tell the frontend when the background job is done and what the job returns?
On Heroku their example code just returns the background job id. But this won't give the frontend the information it needs.
app.post('/job', async (req, res) => {
let job = await workQueue.add();
res.json({ id: job.id });
});
For example this method won't tell the frontend when an image is done being uploaded. Or the frontend won't know when a call to an API, like an external exchange rate API, returns a result, like an exchange rate, and what that result is.
Someone suggested using job.finished() but doesn't this get you back where you started? Now your requests are waiting for the queue to finish in order to respond. So your requests are the a same length as when there was no queue and this could lead to timeout errors again.
const result = await job.finished();
res.send(result);
This is example uses Bull, Redis, Node.js.

Someone suggested websockets. I didn't find an example of this yet.
The idea of using a queue for long tasks is that you post the task and
then return immediately. I guess you are updating the database as last
step in your job, and only use the completed event for notifying the
clients. What you need to do in this case is to implement either a
websocket or similar realtime communication and push the notification
to relevant clients. This can become complicated so you can save some
time with a solution like https://pusher.com/ or similar...
https://github.com/OptimalBits/bull/issues/1901
I also saw a solution in heroku's full example, which I didn't originally see:
web server
// Fetch updates for each job
async function updateJobs() {
for (let id of Object.keys(jobs)) {
let res = await fetch(`/job/${id}`);
let result = await res.json();
if (!!jobs[id]) {
jobs[id] = result;
}
render();
}
}
frontend
// Fetch updates for each job
async function updateJobs() {
for (let id of Object.keys(jobs)) {
let res = await fetch(`/job/${id}`);
let result = await res.json();
if (!!jobs[id]) {
jobs[id] = result;
}
render();
}
}
// Attach click handlers and kick off background processes
window.onload = function() {
document.querySelector("#add-job").addEventListener("click", addJob);
document.querySelector("#clear").addEventListener("click", clear);
setInterval(updateJobs, 200);
};

Related

Is it fine to not await for a log.write() promise inside a cloud run container?

I'm using #google-cloud/logging to log some stuff out of my express app over on Cloud Run.
Something like this:
routeHandler.ts
import { Logging } from "#google-cloud/logging";
const logging = new Logging({ projectId: process.env.PROJECT_ID });
const logName = LOG_NAME;
const log = logging.log(logName);
const resource = {
type: "cloud_run_revision",
labels: { ... }
};
export const routeHandler: RequestHandler = (req,res,next) => {
try {
// EXAMPLE: LOG A WARNING
const metadata = { resource, severity: "WARNING" };
const entry = log.entry(metadata,"SOME WARNING MSG");
await log.write(entry);
return res.sendStatus(200);
}
catch(err) {
// EXAMPLE: LOG AN ERROR
const metadata = { resource, severity: "ERROR" };
const entry = log.entry(metadata,"SOME ERROR MSG");
await log.write(entry);
return res.sendStatus(500);
}
};
You can see that the log.write(entry) is asynchronous. So, in theory, it would be recommended to await for it. But here is what the documentation from #google-cloud/logging says:
Doc link
And I got no problem with that. In my real case, even if the log.write() fails, it is inside a try-catch and any errors will be handled just fine.
My problem is that it kind of conflicts with the Cloud Run documentation:
Doc link
Note: If I don't wait for the log.write() call, I'll end the request cycle by responding to the request
And Cloud Run does behave like that. A couple weeks back, I tried to respond immediately to the request and fire some long background job. And the process kind of halted for a while, and I think it restarted once it got another request. Completely unpredictable. And when I ran this test I'm mentioning here, I even had a MIN_INSTANCE=1 set on my cloud run service container. Even that didn't allow my background job to run smoothly. Therefore, I don't think it's fine to leave the process doing background stuff when I've finished handling a request (by doing the "fire and forget" approach).
So, what should I do here?
Posting this answer as a Community Wiki based on #Karl-JorhanSjögren's correct assumption in the comments.
For Log calls on apps running in Cloud Run you are indeed encouraged to take a Fire and Forget approach, since you don't really need to force synchronicity on that.
As mentioned in the comments replying to your concern on the CPU being disabled after the request is fulfilled, the CPU will be throttled first so that the instance can be brought back up quickly and completely disabled after a longer period of inactivity. So firing of small logging calls that in most cases will finish within milliseconds shouldn't be a problem.
What is mentioned in the documentation is targeted at processes that run for Longer periods of time.

Nodejs prevent new request before send response to last request

How to prevent new requests before sending the response to the last request. on On the other hand just process one request at the same time.
app.get('/get', function (req, res) {
//Stop enter new request
someAsyncFunction(function(result){
res.send(result);
//New Request can enter now
}
}
Even tho I agree with jfriend00 that this might not be the optimal way to do this, if you see that it's the way to go, I would just use some kind of state management to check if it's allowed to access that /get request and return a different response if it's not.
You can use your database to do this. I strongly recommend using Redis for this because it's in-memory and really quick. So it's super convenient. You can use mongodb or mysql if you prefer so, but Redis would be the best. This is how it would look, abstractly -
Let's say you have an entry in your database called isLoading, and it's set to false by default.
app.get('/get', function (req, res) {
//get isloading from your state management of choice and check it's value
if(isLoading == true) {
// If the app is loading, notify the client that he should wait
// You can check for the status code in your client and react accordingly
return res.status(226).json({message: "I'm currently being used, hold on"})
}
// Code below executes if isLoading is not true
//Set your isLoading DB variable to true, and proceed to do what you have
isLoading = true
someAsyncFunction(function(result){
// Only after this is done, isLoading is set to false and someAsyncFunction can be ran again
isLoading = false
return res.send(result)
}
}
Hope this helps
Uhhhh, servers are designed to handle multiple requests from multiple users so while one request is being processed with asynchronous operations, other requests can be processed. Without that, they don't scale beyond a few users. That is the design of any server framework for node.js, including Express.
So, whatever problem you're actually trying to solve, that is NOT how you should solve it.
If you have some sort of concurrency issue that is pushing you to ask for this, then please share the ACTUAL concurrency problem you need to solve because it's much better to solve it a different way than to handicap your server into one request at a time.

NodeJS child_process or nextTick or setTimeout for long waiting task?

I have seen some questions about sending response immediately and run CPU intensive tasks.
My case is my node application depends on third party service responses so the process flow is
Node receives request and authenticates with third-party service
Send response to user after authentication
Do some tasks that needs responses from third party service
Save the results to database
In my case there is no CPU intensive tasks and no need to give results of additional tasks to the user but node needs to wait for responses from third-party service. I have to do multiple req/res to/from the third-party service after the authentication to complete the task.
How can I achieve this situation?
I have seen some workarounds with child_process, nextTick and setTimeOut.
Ultimately I want to send response immediately to user and do tasks related to that user.
Thanks in advance.
elsewhere in your code
function do_some_tasks() { //... }
// route function
(req, res) => {
// call some async task
do_some_tasks()
// if the above is doing some asynchronous task, next function should be called immediately without waiting, question is is it so?
res.send()
}
// if your do_some_tasks() is synchronous func, the you can do
// this function call will be put to queue and executed asynchronously
setImmediate(() => {
do_some_tasks()
})
// this will be called in the current iteration
res.send(something)
Just writing a very general code block here:
var do_some_tasks = (req, tp_response) => {
third_party_tasks(args, (err, result)=<{
//save to DB
});
}
var your_request_handler = (req,res) => {
third_party_auth(args, (tp_response)=>{
res.send();
//just do your tasks here
do_some_tasks(req, tp_response);
});
}

How to trigger background-processes, how to receive intermediate results?

I have a NodeJS / background-process issue, that I don't know how to solve it 'elegant', straight, the right way.
The user submits some (like ~10 or more) URLs via a textarea and then they should be processed asynchronous. [a screenshot with puppeteer has to be taken, some information gathered, the screenshot should be processed with sharp and the result should be persisted in a MongoDB. The screenshot via GridFS and the URL in an own collection with a reference to the screenshot].
While this async process is calculated in the background, the page should be updated whenever a URL got processed.
There are so many ways to do that, but which one is the most correct/straightforward/resource saving way?
Browserify and I do it in the browser? No, too much stuff on the client side.. AJAX/Axios posts and wait for the URLs to be processed and reflect the results on the side? Trigger the process before the response gets send back to the client or let the client start the processing?
So, I made a workflow engine of some sort that supports long-running jobs. And I followed this tutorial https://farazdagi.com/2014/rest-and-long-running-jobs/
Which is nothing, when a request is created you just return a status code and when the jobs are completed you just log them somewhere and use that.
For, this I used EventEmitter which is used inside a promise. It's only my solution maybe not elegant, maybe outright wrong. Made a little POC for you.
const events = require('events')
const emitter = new events.EventEmitter();
const actualWork = function() {
return new Promise((res,rej)=>{
setTimeout(res, 1000);
})
}
emitter.on('workCompleted', function(){
// log somewhere
});
app.get('/someroute', (req,res)=>{
res.json({msg:'reqest initiated', id: 'some_id'})
actualWork()
.then(()=>{
emitter.emit('workCompleted', {id: 'some_id'});
});
})
app.get('/someroute/id/status', (req,res)=>{
//get the log
})

React app with Server-side rendering crashes with load

I'm using react-boilerplate (with react-router, sagas, express.js) for my React app and on top of it I've added SSR logic so that once it receives an HTTP request it renders react components to string based on URL and sends HTML string back to the client.
While react rendering is happening on the server side, it also makes fetch request through sagas to some APIs (up to 5 endpoints based on the URL) to get data for components before it actually renders the component to string.
Everything is working great if I make only several request to the Node server at the same time, but once I simulate load of 100+ concurrent requests and it starts processing it then at some point it crashes with no indication of any exception.
What I've noticed while I was trying to debug the app is that once 100+ incoming requests begin to be processed by the Node server it sends requests to APIs at the same time but receives no actual response until it stops stacking those requests.
The code that's used for rendering on the server side:
async function renderHtmlDocument({ store, renderProps, sagasDone, assets, webpackDllNames }) {
// 1st render phase - triggers the sagas
renderAppToString(store, renderProps);
// send signal to sagas that we're done
store.dispatch(END);
// wait for all tasks to finish
await sagasDone();
// capture the state after the first render
const state = store.getState().toJS();
// prepare style sheet to collect generated css
const styleSheet = new ServerStyleSheet();
// 2nd render phase - the sagas triggered in the first phase are resolved by now
const appMarkup = renderAppToString(store, renderProps, styleSheet);
// capture the generated css
const css = styleSheet.getStyleElement();
const doc = renderToStaticMarkup(
<HtmlDocument
appMarkup={appMarkup}
lang={state.language.locale}
state={state}
head={Helmet.rewind()}
assets={assets}
css={css}
webpackDllNames={webpackDllNames}
/>
);
return `<!DOCTYPE html>\n${doc}`;
}
// The code that's executed by express.js for each request
function renderAppToStringAtLocation(url, { webpackDllNames = [], assets, lang }, callback) {
const memHistory = createMemoryHistory(url);
const store = createStore({}, memHistory);
syncHistoryWithStore(memHistory, store);
const routes = createRoutes(store);
const sagasDone = monitorSagas(store);
store.dispatch(changeLocale(lang));
match({ routes, location: url }, (error, redirectLocation, renderProps) => {
if (error) {
callback({ error });
} else if (renderProps) {
renderHtmlDocument({ store, renderProps, sagasDone, assets, webpackDllNames })
.then((html) => {
callback({ html });
})
.catch((e) => callback({ error: e }));
} else {
callback({ error: new Error('Unknown error') });
}
});
}
So my assumption is that something is going wrong once it receives too many HTTP requests which in turn generates even more requests to API endpoints to render react components.
I've noticed that it blocks event loop for 300ms after renderAppToString() for every client request, so once there are 100 concurrent requests it blocks it for about 10 seconds. I'm not sure if that's a normal or bad thing though.
Is it worth trying to limit simultaneous requests to Node server?
I couldn't find much information on the topic of SSR + Node crashes. So I'd appreciate any suggestions as to where to look at to identify the problem or for possible solutions if anyone has experienced similar issue in the past.
In the above image, I am doing ReactDOM.hydrate(...) I can also load my initial and required state and send it down in hydrate.
I have written the middleware file and I am using this file to decide based on what URL i should send which file in response.
Above is my middleware file, I have created the HTML string of the whichever file was requested based on URL. Then I add this HTML string and return it using res.render of express.
Above image is where I compare the requested URL path with the dictionary of path-file associations. Once it is found (i.e. URL matches) I use ReactDOMserver render to string to convert it into HTML. This html can be used to send with handle bar file using res.render as discussed above.
This way I have managed to do SSR on my most web apps built using MERN.io stack.
Hope my answer helped you and Please write comment for discussions
1. Run express in a cluster
A single instance of Node.js runs in a single thread. To take
advantage of multi-core systems, the user will sometimes want to
launch a cluster of Node.js processes to handle the load.
As Node is single threaded the problem may also be in a file lower down the stack were you are initialising express.
There are a number of best practices when running a node app that are not generally mentioned in react threads.
A simple solution to improve performance on a server running multiple cores is to use the built in node cluster module
https://nodejs.org/api/cluster.html
This will start multiple instance of your app on each core of your server giving you a significant performance improvement (if you have a multicore server) for concurrent requests
See for more information on express performance
https://expressjs.com/en/advanced/best-practice-performance.html
You may also want to throttle you incoming connections as when the thread starts context switching response times drop rapidly this can be done by adding something like NGINX / HA Proxy in front of your application
2. Wait for the store to be hydrated before calling render to string
You don't want to have to render you layout until your store has finished updating as other comments note this is a blocks the thread while rendering.
Below is the example taken from the saga repo which shows how to run the sagas with out the need to render the template until they have all resolved
store.runSaga(rootSaga).done.then(() => {
console.log('sagas complete')
res.status(200).send(
layout(
renderToString(rootComp),
JSON.stringify(store.getState())
)
)
}).catch((e) => {
console.log(e.message)
res.status(500).send(e.message)
})
https://github.com/redux-saga/redux-saga/blob/master/examples/real-world/server.js
3. Make sure node environment is set correctly
Also ensure you are correctly using NODE_ENV=production when bundling / running your code as both express and react optimise for this
The calls to renderToString() are synchronous, so they are blocking the thread while they are running. So its no surprise that when you have 100+ concurrent requests that you have an extremely blocked up queue hanging for ~10 seconds.
Edit: It was pointed out that React v16 natively supports streaming, but you need to use the renderToNodeStream() method for streaming the HTML to the client. It should return the exact same string as renderToString() but streams it instead, so you don't have to wait for the full HTML to be rendered before you start sending data to the client.

Resources