How to execute / abort long running tasks in Node JS? - node.js

NodeJS server with a Mongo DB - one feature will generate a report JSON file from the DB, which can take a while (60 seconds up - has to process hundreds of thousands of entries).
We want to run this as a background task. We need to be able to start a report build process, monitor it, and abort it if the user decides to change the params and re build it.
What is the simplest approach with node? Don't really want to get into the realms of separate worker servers processing jobs, message queues etc - we need to keep this on the same box and fairly simple implementation.
1) Start the build as a async method, and return to the user, with socket.io reporting progress?
2) Spin off a child process for the build script?
3) Use something like https://www.npmjs.com/package/webworker-threads?
With the few approaches I've looked at I get stuck on the same two areas;
1) How to monitor progress?
2) How to abort an existing build process if the user re-submits data?
Any pointers would be greatly appreciated...

The best would be to separate this task from your main application. That said, it'd be easy to run it in the background.
To run it in the background and monit without message queue etc., the easiest would be a child_process.
You can launch a spawn job on an endpoint (or url) called by the user.
Next, setup a socket to return live monitoring of the child process
Add another endpoint to stop the job, with a unique id returned by 1. (or not, depending of your concurrency needs)
Some coding ideas:
var spawn = require('child_process').spawn
var job = null //keeping the job in memory to kill it
app.get('/save', function(req, res) {
if(job && job.pid)
return res.status(500).send('Job is already running').end()
job = spawn('node', ['/path/to/save/job.js'],
{
detached: false, //if not detached and your main process dies, the child will be killed too
stdio: [process.stdin, process.stdout, process.stderr] //those can be file streams for logs or wathever
})
job.on('close', function(code) {
job = null
//send socket informations about the job ending
})
return res.status(201) //created
})
app.get('/stop', function(req, res) {
if(!job || !job.pid)
return res.status(404).end()
job.kill('SIGTERM')
//or process.kill(job.pid, 'SIGTERM')
job = null
return res.status(200).end()
})
app.get('/isAlive', function(req, res) {
try {
job.kill(0)
return res.status(200).end()
} catch(e) { return res.status(500).send(e).end() }
})
To monit the child process you could use pidusage, we use it in PM2 for example. Add a route to monit a job and call it every second. Don't forget to release memory when job ends.
You might want to check out this library which will help you manage multi processing across microservices.

Related

NodeJS worker threads pools

Have a question for those with more experience with worker threads here..
I have been doing some testing with worker threads and have a question on how they work, or maybe how they should work.
I am using a worker thread pool called piscina, this has been setup and appears to be working. 'Appears' to is the key..
Here is my scenario. I have a 'workers.js' file that has a 'longer' run script ( this is for testing and purposefully a long loop)
When running it, it does what it should, the main EL is still open to process other tasks etc, however what I have noticed is that there only appears to be 1 worker.
What I mean by that, is subsequent request to that route seem to get queued up, so the request don't run in parallel, instead, waits until the first worker finished, then goes on to the next.
What I would like to have happen, is that each request fires off a new 'worker' (to a point, then place them in the que) right now we have the app running in containers, so if the CPU or Mem gets too high, it should push traffic to the other container etc.
Now with that being said, we would still like to have the workers spawn out with each request as to not bottleneck the inbound request to that 'page'.
Again, maybe I'm not understanding this properly but anyone with more experience with workers that would help me out would be greatly appreciated.
*** edit with code ***
Route:
*Route contains imports of piscina library
router.get('/:error?', auth("3","edit"),function(req,res){
console.log('running')
let piscina = new Piscina({
filename: path.resolve(__dirname, 'worker.js'),
minThreads:5
});
const result = piscina.run({accountID:req.session.AccountID,cID:req.session.cID,cCode:req.session.cCode}).then(data =>{
console.log(date)
})
});
Worker file
module.exports = async({accountID,cID,cCode}) => {
const n = 10000000000;
for (let i = 1; i <= n; i++) {
}
return 'finished';
})
After running the loop, it simply return back the 'finished' string, as noted that works, however if I hit the page in multiple tabs they do not all finish at the same time, instead, lets say tab 1 takes 7 seconds to finish, tab 2 and 3 tak 14, then 21 etc.
Note: The variables we are passing to the worker have no use RN, its just a loop that we run for testing to verify the flow

How to complete a process in Node JS after executing all the operations

I am very new to NodeJS and trying to develop an application which acts as a scheduler that tries to fetch data from ELK and sends the processed data to another ELK. I am able to achieve the expected behaviour but after completing all the processes, scheduler job does not exists and wait for another scheduler job to come up.
Note: This scheduler runs every 3 minutes.
job.js
const self = module.exports = {
async schedule() {
if (process.env.SCHEDULER == "MinuteFrequency") {
var timenow = moment().seconds(0).milliseconds(0).valueOf();
var endtime = timenow - 60000;
var starttime = endtime - 60000 * 3;
//sendData is an async method
reports.sendData(starttime, endtime, "SCHEDULER");
}
}
}
I tried various solutions such Promise.allSettled(....., Promise.resolve(true), etc, but not able to fix this.
As per my requirement, I want the scheduler to complete and process and exit so that I can save some resources as I am planning to deploy the application using Kubernetes cronjobs.
When all your work is done, you can call process.exit() to cause your application to exit.
In this particular code, you may need to know when reports.sendData() is actually done before exiting. We would have to know what that code is and/or see the code to know how to know when it is done. Just because it's an async function doesn't mean it's written properly to return a promise that resolves when it's done. If you want further help, show us the code for sendData() and any code that it calls too.

How to have more then 30sec response timeout in heroku

Guys Heroku is terminating the req if the response takes more then 30sec to return, so is there any way I can wait for as long as the response would come back?
Well the user is uploading his file and I need to do something with the file in my server and after updates are done I will give a download link to the user. But mostly it takes more then 30 sec for the server to process the file so that the user need to wait for response
From the official Heroku Helpcenter : https://devcenter.heroku.com/articles/request-timeout
The timeout value is not configurable. If your server requires longer than 30 seconds to complete a given request, we recommend moving that work to a background task or worker to periodically ping your server to see if the processing request has been finished. This pattern frees your web processes up to do more work, and decreases overall application response times.
The short answer is : No, you can't change this configuration. I suggest you investigate why your application needs more than 30 seconds to process that request. If it takes longer than 10 seconds your really should consider the steps suggested in the Heroku Help Center 👆
Your Problem
You mention you need this for file processing. I understand that file processing could easily take longer than 30 seconds. Normally what I would do is to just create some sort of task reference and keep it in a database along with a status ("processing", "finished", "failed") - also store the original file and then just end the request of the user. This shouldn't take long. Then process the task ... with another endpoint or websocket connection the user could check if the task has been fullfilled.
Use a Task Queue
The following is just a basic interpretation of a solution - it's not meant for copy & pasting as it depends on so many things.
Routes (Endpoints)
Basically you need to have 3 routes in your backend. One for uploading the file, one for downloading the processed file and one for checking the status of the task.
1. Upload
app.post('/files', /* some middleware e.g. multer */, async (req, res) => {
// This is your upload controller
// I assume at this point the file has been uploaded and
// req.file contains a reference to the uploaded file.
// create new process task and add to queue
const task = await createNewTask(req.file);
queue.push(task);
// now a task has been created, but the user
// doesn't need to wait for it to finish
// so let's end the request here.
return req.status(200).json(task);
});
2. Check Status
app.get('/task/:id', async (req, res) => {
// From uploading a file in the first step, you'll
// get back a task id. Use the task id to check on
// the status.
const task = await getTask(req.params.id);
if (!task) {
return res.status(404).end();
} else {
return res.status(200).json(task);
}
});
The task can include informations like status, progress percentage, original filename, new filename or even a download link to the processed file once it's finished. Status could be something like pending, processing, finished or failed.
3. Download
app.get('/file/:filename', (req, res) => {
return req.status(200)
.sendFile('./path/to/file/' + req.params.filename);
});
Notes
It might be a good idea to rename the incoming files with a random id like a uuid. So it's easier to work with them in the automation process. Also the random id could be used for the task id at the same time.
It's up to you how big you want to go with this. For the task queue there are many different libraries to help you out with it. It could be an in-memory queue or one that's backed with a database.

Simulating Failure to Access Mongodb

I create an Express.js library that uses the official Node.js driver for its Mongodb operations.
I'm currently in the process of writing unit tests and I want to simulate failures to access the database in order to ensure:
The library acknowledges failure cases (handles the error)
Makes the right error callbacks and fires the proper events.
I want the tests to run cross-platform, preferably without having to shut down or start the database with special parameters.
Looking at the reference for commands, the sleep command seems to do almost exactly what I want, but the waiting time in seconds in pretty long, plus it is flagged as for internal use only and you need to fire the database with a special parameter for it to work. The forceerror command looks like another good one, but again, it's listed for internal use only and the description is vague to say the least.
I am wondering if there is any recommended (preferably not overly hackish) way of doing this.
Here, it requires superuser privileges for the node process executing this script to send signals to the MongoDB process without having spawned it, but it is the best I found so far to simulate unresponsiveness:
var MongoDB = require('mongodb');
MongoDB.MongoClient.connect("mongodb://localhost:27017/SomeDB", {'server': {'socketOptions': {'connectTimeoutMS': 50, 'socketTimeoutMS': 50}}}, function(Err, DB) {
if(Err)
{
console.log(Err);
}
else
{
DB.command({'serverStatus': 1}, function(Err, Result) {
if(Err)
{
console.log(Err)
}
else
{
process.kill(Result.pid, 'SIGSTOP');
//Put testing logic to test unresponsiveness
process.kill(Result.pid, 'SIGCONT');
DB.close();
}
});
}
});
Edit:
If your testing logic crashes on Linux, you can resume the MongoDB process manually on the shell by executing:
kill -CONT PID
Where PID is the process id of the MongoDB process.

Log4js takes hours to write logfile

I am using log4js in my code to log the results and errors. The program runs for about 2,5 hours before the final console output is made and afterwards needs several hours to complete writing the logfile. The log is writing for 6 hours now (since the algorithm itself finished) and the filesize is 100mb.
The log will be about 1,5 million lines (when done).
Is it normal for the log to be written as slow as this? Are there "standard" mistakes to make when using log4js that I could check?
In case you want to know: The program is running on an Intel i5 with 8gb RAM and an SSD drive, so the hardware shouldn't be the problem I guess.
I am not sure what other information I can give you, just ask ahead if you need to know something.
Dropbox sounds like a good candidate. Any anti virus software could also interfere.
Firstly I would confirm what your system is capable of by creating a mini log4js benchmark for the various configurations available on your PC, then compare that to your application performance.
var Benchmark = require('benchmark');
var log4js = require('log4js');
log4js.clearAppenders();
log4js.loadAppender('file');
log4js.addAppender(log4js.appenders.file('NUL'), 'nulnulnul');
var lognul = log4js.getLogger('nulnulnul');
log4js.addAppender(log4js.appenders.file('c:/your_dropbox/test.log'), 'normallog');
var lognorm = log4js.getLogger('normallog');
log4js.addAppender(log4js.appenders.file('c:/tmp/test.log'), 'nodropbox');
var lognodr = log4js.getLogger('nodropbox');
log4js.addAppender(log4js.appenders.file('c:/virus-exception/test.log'), 'nodropvir');
var lognodv = log4js.getLogger('nodropvir');
var suite = new Benchmark.Suite;
// add tests
suite.add('Log#Nul', function() {
lognul.info("Some lengthy nulnulnul info messages");
})
.add('Log#normal', function() {
lognorm.info("Some lengthy normallog info messages");
})
.add('Log#NoDropbox', function() {
lognodr.info("Some lengthy nodropbox info messages");
})
.add('Log#NoVirusOrDropbox', function() {
lognodv.info("Some lengthy nodropvir info messages");
})
// add listeners
.on('cycle', function(event) {
console.log(String(event.target));
})
.on('complete', function() {
console.log('Fastest is ' + this.filter('fastest').pluck('name'));
})
// run async
.run({ 'async': false });
If Dropbox or Virus software doesn't turn out to be the problem there are two Windows Sysinternal tools that will help you see what is going on your system when your process is running.
Process Explorer - Overall task manager/performance viewer
Gives you an overall view of your system so you can see which processes are doing what. You can also drill down into specific processes as well (right click/properties)
Process Monitor - Event profiler for processes.
Process Monitor is like a log file of all the system calls any process makes.
You can filter down to specific processes or calls so in your case you would be able to monitor Dropbox and your Node.js process and see if their access to the file in question is interleaved while Dropbox does it's work.

Resources