Node Express. Continue function execution after res.render - node.js

It is necessary to build the page and then continue the execution of the function, which takes a lot of time. But res.render waits for the function to execute no matter where it is called.
I want the page to start building without waiting for the data to be processed.
Here is my code:
let promise = new Promise((resolve, reject) => {
let wb = new Workbook();
let ws = sheet_from_array_of_arrays(public_data); //May take up to 5 seconds
wb.SheetNames.push(ws_name);
wb.Sheets[ws_name] = ws;
XLSX.writeFile(wb, '/tmp/' + name); //May take up to 10 seconds
resolve('End of promise!!!');
})
res.render('ihelp/lists/person_selection_view_data', {
data: public_data,
name
});
promise.then(answer => { console.log(answer) });
How can i do this?

The code that you want to run looks to be synchronous (I don't see any promises or callbacks related to it). If it takes multiple seconds to run, that will mean that it will block your entire app for that amount of time.
This means that asynchronous functions, like res.render(), will not complete until the processing is done. Even if you change the order:
res.render(...);
long_running_code();
Not only will this not make res.render() send back a response before long_running_code is started, it will also stop your app from responding to new incoming requests (and/or block any current requests) until it's done.
If you have CPU-intensive code that will block the event look, take a look at worker_threads, which can be used to offload CPU-intensive code to separate threads, and therefore keep your main JS thread free to handle the HTTP-part of your app.

The res.render takes a 3rd argument for a callback. If supplied, Express will not send the rendered html string automatically. Any extended process can be executed after that.
res.render('ihelp/lists/person_selection_view_data', {data: public_data, name}, function (err, html) {
res.send(html);
let wb = new Workbook();
let ws = sheet_from_array_of_arrays(public_data); //May take up to 5 seconds
wb.SheetNames.push(ws_name);
wb.Sheets[ws_name] = ws;
XLSX.writeFile(wb, '/tmp/' + name); //May take up to 10 seconds
})

The executor function (inside the Promise constructor) is executed synchronously, and if this takes too long, you can move it into another Promise:
res.render('ihelp/lists/person_selection_view_data', {
data: public_data,
name
});
Promise.resolve().then(function() {
new Promise((resolve, reject) => {
// your executor code
}).then(answer => { console.log(answer) });
});

Related

Looping over JSON objects synchronously NodeJS

I'm requesting a JSON API then doing a for each loop over a JSON object and want a way to know once it's complete.
This is my current for each loop
Object.entries(jsonbody.data.users).forEach(([key, value]) => {
//code to run for each user
});
At the moment it'll just loop through each user till it's completed them all but I have no definitive max number of objects there could be so I can just do the usual loop counting technique.
I'm not sure if I've just got this completely wrong but I've always had trouble working out how to perform synchronous events with node js so hopefully, this will teach me
It depends on whether the "code to run for each user" is synchronous or asynchronous.
There isn't enough detail in your question to give a definitive answer.
If the code in the loop is synchronous then the loop will complete before any code after it will run.
That is the definition of synchronous code ;)
let count = 0;
Object.entries(jsonbody.data.users).forEach(([key, value]) => {
// some sync code to run for each user
count++;
});
// Because the code in your loop is synchronous then we'll only get to this
// part of the code once the code above is complete.
console.log(`The tasks have finished`);
console.log(`count of users=`, count);
If the per-user code is asynchronous then you will need to wait until all tasks have completed before continuing.
It is common to use Promises for this:
// Note the use of `.map()` here instead of `.forEach()` that is used in previous example
const promises = Object.entries(jsonbody.data.users).map(([key, value]) => {
return new Promise((resolve, reject) => {
// some async code to run for each user
// Resolving to `key` here for example only - you might return data from
// some `fetch()` request or other async task
resolve(key);
});
});
// Because the code above is **asynchronous** then we'll reach this point in code
// before the code above has "finished" (i.e. there are still outstanding tasks).
// We have to wait until those tasks have completed using either `await` or `promise.then()`
const results = await Promise.all(promises);
console.log(`The tasks have finished. results=`, results);
// OR use `promise.then(...)`
Promise.all(promises).then((results) => {
console.log(`The tasks have finished. results=`, results);
});

Possible to make an event handler wait until async / Promise-based code is done?

I am using the excellent Papa Parse library in nodejs mode, to stream a large (500 MB) CSV file of over 1 million rows, into a slow persistence API, that can only take one request at a time. The persistence API is based on Promises, but from Papa Parse, I receive each parsed CSV row in a synchronous event like so: parseStream.on("data", row => { ... }
The challenge I am facing is that Papa Parse dumps its CSV rows from the stream so fast that my slow persistence API can't keep up. Because Papa is synchronous and my API is Promise-based, I can't just call await doDirtyWork(row) in the on event handler, because sync and async code doesn't mix.
Or can they mix and I just don't know how?
My question is, can I make Papa's event handler wait for my API call to finish? Kind of doing the persistence API request directly in the on("data") event, making the on() function linger around somehow until the dirty API work is done?
The solution I have so far is not much better than using Papa's non-streaming mode, in terms of memory footprint. I actually need to queue up the torrent of on("data") events, in form of generator function iterations. I could have also queued up promise factories in an array and work it off in a loop. Any which way, I end up saving almost the entire CSV file as huge collection of future Promises (promise factories) in memory, until my slow API calls have worked all the way through.
async importCSV(filePath) {
let parsedNum = 0, processedNum = 0;
async function* gen() {
let pf = yield;
do {
pf = yield await pf();
} while (typeof pf === "function");
};
var g = gen();
g.next();
await new Promise((resolve, reject) => {
try {
const dataStream = fs.createReadStream(filePath);
const parseStream = Papa.parse(Papa.NODE_STREAM_INPUT, {delimiter: ",", header: false});
dataStream.pipe(parseStream);
parseStream.on("data", row => {
// Received a CSV row from Papa.parse()
try {
console.log("PA#", parsedNum, ": parsed", row.filter((e, i) => i <= 2 ? e : undefined)
);
parsedNum++;
// Simulate some really slow async/await dirty work here, for example
// send requests to a one-at-a-time persistence API
g.next(() => { // don't execute now, call in sequence via the generator above
return new Promise((res, rej) => {
console.log(
"DW#", processedNum, ": dirty work START",
row.filter((e, i) => i <= 2 ? e : undefined)
);
setTimeout(() => {
console.log(
"DW#", processedNum, ": dirty work STOP ",
row.filter((e, i) => i <= 2 ? e : undefined)
);
processedNum++;
res();
}, 1000)
})
});
} catch (err) {
console.log(err.stack);
reject(err);
}
});
parseStream.on("finish", () => {
console.log(`Parsed ${parsedNum} rows`);
resolve();
});
} catch (err) {
console.log(err.stack);
reject(err);
}
});
while(!(await g.next()).done);
}
So why the rush Papa? Why not allow me to work down the file a bit slower -- the data in the original CSV file isn't gonna run away, we have hours to finish the streaming, why hammer me with on("data") events that I can't seem to slow down?
So what I really need is for Papa to become more of a grandpa, and minimize or eliminate any queuing or buffering of CSV rows. Ideally I would be able to completely sync Papa's parsing events with the speed (or lack thereof) of my API. So if it weren't for the dogma that async code can't make sync code "sleep", I would ideally send each CSV row to the API inside the Papa event, and only then return control to Papa.
Suggestions? Some kind of "loose coupling" of the event handler with the slowness of my async API is fine too. I don't mind if a few hundred rows get queued up. But when tens of thousands pile up, I will run out of heap fast.
Why hammer me with on("data") events that I can't seem to slow down?
You can, you just were not asking papa to stop. You can do this by calling stream.pause(), then later stream.resume() to make use of Node stream's builtin back-pressure.
However, there's a much nicer API to use than dealing with this on your own in callback-based code: use the stream as an async iterator! When you await in the body of a for await loop, the generator has to pause as well. So you can write
async importCSV(filePath) {
let parsedNum = 0;
const dataStream = fs.createReadStream(filePath);
const parseStream = Papa.parse(Papa.NODE_STREAM_INPUT, {delimiter: ",", header: false});
dataStream.pipe(parseStream);
for await (const row of parseStream) {
// Received a CSV row from Papa.parse()
const data = row.filter((e, i) => i <= 2 ? e : undefined);
console.log("PA#", parsedNum, ": parsed", data);
parsedNum++;
await dirtyWork(data);
}
console.log(`Parsed ${parsedNum} rows`);
}
importCSV('sample.csv').catch(console.error);
let processedNum = 0;
function dirtyWork(data) {
// Simulate some really slow async/await dirty work here,
// for example send requests to a one-at-a-time persistence API
return new Promise((res, rej) => {
console.log("DW#", processedNum, ": dirty work START", data)
setTimeout(() => {
console.log("DW#", processedNum, ": dirty work STOP ", data);
processedNum++;
res();
}, 1000);
});
}
Async code in JavaScript can sometimes be a little hard to grok. It's important to remember how Node operates handles concurrency.
The node process is single-threaded, but it uses a concept called an event loop. The consequence of this is that async code and callbacks are essentially equivalent representations of the same thing.
Of course, you need an async function to use await, but your callback from Papa Parse can be an async function:
parse.on("data", async row => {
await sync(row)
})
Once the await operation completes, the arrow function ends, and all references to row will be eliminated, so the garbage collector can successfully collect row, releasing that memory.
The effect this has is concurrently executing sync every time a row is parsed, so if you can only sync one record at a time, then I would recommend wrapping the sync function in a debouncer.

nodejs concurrency synchronous execution

I've two WebSockets getting data asynchronously, every time I get some message from the sockets I execute some code in CompareData.
The problem is that CompareData should be executed synchronously, or (better) only if it is not already running
This is my code:
function CompareData(data) {
console.log('data ', data);
AsyncFunction();
};
ws1 = new WebSocket(WS1_URL);
ws2 = new WebSocket(WS2_URL);
ws1.on('message', (data) => {
CompareData(data);
});
ws2.on('message', (data) => {
CompareData(data);
});
Can you help me, please? I'm very new to NodeJs
Node.js is single threaded. So you don't really get true concurrency issues occurring in Node programs as you might in other languages. In your example, there can only be at most one WebSocket callback for CompareData occurring at any given time.
You should not make synchronous call in node.js but you can make those call sequential. See below example might be helpful.
var messages = [];
var inProgress = false;
function CompareData(data) {
return new Promise((resolve, reject) => {
// do some stuff and resolve
setTimeout(() => {
resolve(data);
}, 1000);
});
};
const start = async () => {
if (!inProgress) {
if (messages.length !== 0) {
inProgress = true;
try {
const data = await CompareData(messages.shift());
console.log(data);
} catch (error) {
console.log(error);
}
inProgress = false;
await start();
}else{
console.log('Process Done');
}
}
}
const handler = (data) => {
messages.push(data);
start();
}
handler(1);
handler(2);
handler(3);
handler(4);
// ws1 = new WebSocket(WS1_URL);
// ws2 = new WebSocket(WS2_URL);
// ws1.on('message', handler);
// ws2.on('message', handler);
You should use some mutex in order to avoid that two async operations of compareData are executed at the same time, like node-mutex or mutexify.
My suggestions are:
First of all, you need to know when CompareData is finished. Reorganize your code to use promises or callbacks. If you're using third-party async functions, I'm almost sure they provide some feedback on completion - This is a must have in async world
Add inProgress = false flag somewhere to serve for you as simple lock. As someone posted, JS is single-threaded and you're guaranteed that your code won't get interrupted in the middle of operation. Thanks to that you can use really simple locks instead of complicated os-based mutexes known from multithreaded
langs.
In ws.on(...) check if inProgress is set. If not, lock it and run CompareData
In CompareData completion callback or on promise resolution set inProgress back to false, so you're no longer ignoring incoming data.
If you can simply discard the data, there is no need to complicate this scenario with extra queues, mutexes, etc.
If you need to serve it all, then queue incoming data and serve next piece after completion callback is fired.
This is basically what Rahul's suggests, but he uses features that are not established in current version of standard, so don't use it if you're not transpiling your code.

setTimeout or child_process.spawn?

I have a REST service in Node.js with one specific request running a bunch of DB commands and other file processing that could take 10-15 seconds to run. Since I didn't want to hold up my browser request thread, I wrote a separate .js script to do the needful, called the script using child_process.spawn() in my Node.js code and immediately returned OK back to the client. This works fine, but then so does calling the same script (as a local function) by just using a simple setTimeout.
router.post("/longRequest", function(req, res) {
console.log("Started long request with id: " + req.body.id);
var longRunningFunction = function() {
// Usually runs a bunch of things that take time.
// Simulating a 10 sec delay for sample code.
setTimeout(function() {
console.log("Done processing for 10 seconds")
}, 10000);
}
// Below line used to be
// child_process.spawn('longRunningFunction.js'
setTimeout(longRunningFunction, 0);
res.json({status: "OK"})
})
So, this works for my purpose. But what's the downside ? I probably can't monitor the offline process easily as child_process.spawn which would give me a process id. But, does this cause problems in the long run ? Will it hold up Node.js processing if the 10 second processing increases to a lot more in the future ?
The actual longRunningFunction is something that reads an Excel file, parses it and does a bulk load using tedious to a MS SQL Server.
var XLSX = require('xlsx');
var FileAPI = require('file-api'), File = FileAPI.File, FileList = FileAPI.FileList, FileReader = FileAPI.FileReader;
var Connection = require('tedious').Connection;
var Request = require('tedious').Request;
var TYPES = require('tedious').TYPES;
var importFile = function() {
var file = new File(fileName);
if (file) {
var reader = new FileReader();
reader.onload = function (evt) {
var data = evt.target.result;
var workbook = XLSX.read(data, {type: 'binary'});
var ws = workbook.Sheets[workbook.SheetNames[0]];
var headerNames = XLSX.utils.sheet_to_json( ws, { header: 1 })[0];
var data = XLSX.utils.sheet_to_json(ws);
var bulkLoad = connection.newBulkLoad(tableName, function (error, rowCount) {
if (error) {
console.log("bulk upload error: " + error);
} else {
console.log('inserted %d rows', rowCount);
}
connection.close();
});
// setup your columns - always indicate whether the column is nullable
Object.keys(columnsAndDataTypes).forEach(function(columnName) {
bulkLoad.addColumn(columnName, columnsAndDataTypes[columnName].dataType, { length: columnsAndDataTypes[columnName].len, nullable: true });
})
data.forEach(function(row) {
var addRow = {}
Object.keys(columnsAndDataTypes).forEach(function(columnName) {
addRow[columnName] = row[columnName];
})
bulkLoad.addRow(addRow);
})
// execute
connection.execBulkLoad(bulkLoad);
};
reader.readAsBinaryString(file);
} else {
console.log("No file!!");
}
};
So, this works for my purpose. But what's the downside ?
If you actually have a long running task capable of blocking the event loop, then putting it on a setTimeout() is not stopping it from blocking the event loop at all. That's the downside. It's just moving the event loop blocking from right now until the next tick of the event loop. The event loop will be blocked the same amount of time either way.
If you just did res.json({status: "OK"}) before running your code, you'd get the exact same result.
If your long running code (which you describe as file and database operations) is actually blocking the event loop and it is properly written using async I/O operations, then the only way to stop blocking the event loop is to move that CPU-consuming work out of the node.js thread.
That is typically done by clustering, moving the work to worker processes or moving the work to some other server. You have to have this work done by another process or another server in order to get it out of the way of the event loop. A setTimeout() by itself won't accomplish that.
child_process.spawn() will accomplish that. So, if you have an actual event loop blocking problem to solve and the I/O is already as async optimized as possible, then moving it to a worker process is a typical node.js solution. You can communicate with that child process in a number of ways, but one possibility would be via stdin and stdout.

How to sleep the thread in node.js without affecting other threads?

As per Understanding the node.js event loop, node.js supports a single thread model. That means if I make multiple requests to a node.js server, it won't spawn a new thread for each request but will execute each request one by one. It means if I do the following for the first request in my node.js code, and meanwhile a new request comes in on node, the second request has to wait until the first request completes, including 5 second sleep time. Right?
var sleep = require('sleep');
sleep.sleep(5)//sleep for 5 seconds
Is there a way that node.js can spawn a new thread for each request so that the second request does not have to wait for the first request to complete, or can I call sleep on specific thread only?
If you are referring to the npm module sleep, it notes in the readme that sleep will block execution. So you are right - it isn't what you want. Instead you want to use setTimeout which is non-blocking. Here is an example:
setTimeout(function() {
console.log('hello world!');
}, 5000);
For anyone looking to do this using es7 async/await, this example should help:
const snooze = ms => new Promise(resolve => setTimeout(resolve, ms));
const example = async () => {
console.log('About to snooze without halting the event loop...');
await snooze(1000);
console.log('done!');
};
example();
In case you have a loop with an async request in each one and you want a certain time between each request you can use this code:
var startTimeout = function(timeout, i){
setTimeout(function() {
myAsyncFunc(i).then(function(data){
console.log(data);
})
}, timeout);
}
var myFunc = function(){
timeout = 0;
i = 0;
while(i < 10){
// By calling a function, the i-value is going to be 1.. 10 and not always 10
startTimeout(timeout, i);
// Increase timeout by 1 sec after each call
timeout += 1000;
i++;
}
}
This examples waits 1 second after each request before sending the next one.
Please consider the deasync module, personally I don't like the Promise way to make all functions async, and keyword async/await anythere. And I think the official node.js should consider to expose the event loop API, this will solve the callback hell simply. Node.js is a framework not a language.
var node = require("deasync");
node.loop = node.runLoopOnce;
var done = 0;
// async call here
db.query("select * from ticket", (error, results, fields)=>{
done = 1;
});
while (!done)
node.loop();
// Now, here you go
When working with async functions or observables provided by 3rd party libraries, for example Cloud firestore, I've found functions the waitFor method shown below (TypeScript, but you get the idea...) to be helpful when you need to wait on some process to complete, but you don't want to have to embed callbacks within callbacks within callbacks nor risk an infinite loop.
This method is sort of similar to a while (!condition) sleep loop, but
yields asynchronously and performs a test on the completion condition at regular intervals till true or timeout.
export const sleep = (ms: number) => {
return new Promise(resolve => setTimeout(resolve, ms))
}
/**
* Wait until the condition tested in a function returns true, or until
* a timeout is exceeded.
* #param interval The frenequency with which the boolean function contained in condition is called.
* #param timeout The maximum time to allow for booleanFunction to return true
* #param booleanFunction: A completion function to evaluate after each interval. waitFor will return true as soon as the completion function returns true.
*/
export const waitFor = async function (interval: number, timeout: number,
booleanFunction: Function): Promise<boolean> {
let elapsed = 1;
if (booleanFunction()) return true;
while (elapsed < timeout) {
elapsed += interval;
await sleep(interval);
if (booleanFunction()) {
return true;
}
}
return false;
}
The say you have a long running process on your backend you want to complete before some other task is undertaken. For example if you have a function that totals a list of accounts, but you want to refresh the accounts from the backend before you calculate, you can do something like this:
async recalcAccountTotals() : number {
this.accountService.refresh(); //start the async process.
if (this.accounts.dirty) {
let updateResult = await waitFor(100,2000,()=> {return !(this.accounts.dirty)})
}
if(!updateResult) {
console.error("Account refresh timed out, recalc aborted");
return NaN;
}
return ... //calculate the account total.
}
It is quite an old question, and though the accepted answer is still entirely correct, the timers/promises API added in v15 provides a simpler way.
import { setTimeout } from 'timers/promises';
// non blocking wait for 5 secs
await setTimeout(5 * 1000);

Resources