Prevent the same job to run twice node-cron - node.js

I'm working with node and typescript, using node-cron 2.0 to schedule a background operation to run every hour.
cron.schedule("0 0 * * * *", () => {
purgeResponsesSurveys();
});
I'm concerned about what happens if the method doesn't finish within 1 hour since I don't want two instances of my method to run at the same time.
What are the best practices to prevent the scheduler from invoking the function purgeResponsesSurveys if it is already running from the previous hourly invocation?

You can use a Semaphore to prevent parallel calls.
You will need to know when purgeResponsesSurveys is done. So if it's asynchronous you will need to return Promise or receive a callback that will be called when purgeResponsesSurveys is done.
I used semaphore npm package.
Here is a small example/simulation.
const semaphore = require('semaphore');
const sem = semaphore(1);
simulateCron(function() {
console.log('cron was triggered')
// wrap task with mutex
sem.take(function() {
longTask(function(){
sem.leave();
})
})
})
function longTask(cb) {
console.log("Start longTask")
setTimeout(function(){
cb()
console.log("Done longTask")
}, 3000)
}
function simulateCron(cb) {
setInterval(cb, 500)
}
// output
cron was triggered
Start longTask
cron was triggered
cron was triggered
cron was triggered
cron was triggered
cron was triggered
Done longTask
Start longTask
cron was triggered
...

Related

Run a Cron Job every 30mins after onCreate Firestore event

I want to have a cron job/scheduler that will run every 30 minutes after an onCreate event occurs in Firestore. The cron job should trigger a cloud function that picks the documents created in the last 30 minutes-validates them against a json schema-and saves them in another collection.How do I achieve this,programmatically writing such a scheduler?
What would also be fail-safe mechanism and some sort of queuing/tracking the documents created before the cron job runs to push them to another collection.
Building a queue with Firestore is simple and fits perfectly for your use-case. The idea is to write tasks to a queue collection with a due date that will then be processed when being due.
Here's an example.
Whenever your initial onCreate event for your collection occurs, write a document with the following data to a tasks collection:
duedate: new Date() + 30 minutes
type: 'yourjob'
status: 'scheduled'
data: '...' // <-- put whatever data here you need to know when processing the task
Have a worker pick up available work regularly - e.g. every minute depending on your needs
// Define what happens on what task type
const workers: Workers = {
yourjob: (data) => db.collection('xyz').add({ foo: data }),
}
// The following needs to be scheduled
export const checkQueue = functions.https.onRequest(async (req, res) => {
// Consistent timestamp
const now = admin.firestore.Timestamp.now();
// Check which tasks are due
const query = db.collection('tasks').where('duedate', '<=', new Date()).where('status', '==', 'scheduled');
const tasks = await query.get();
// Process tasks and mark it in queue as done
tasks.forEach(snapshot => {
const { type, data } = snapshot.data();
console.info('Executing job for task ' + JSON.stringify(type) + ' with data ' + JSON.stringify(data));
const job = workers[type](data)
// Update task doc with status or error
.then(() => snapshot.ref.update({ status: 'complete' }))
.catch((err) => {
console.error('Error when executing worker', err);
return snapshot.ref.update({ status: 'error' });
});
jobs.push(job);
});
return Promise.all(jobs).then(() => {
res.send('ok');
return true;
}).catch((onError) => {
console.error('Error', onError);
});
});
You have different options to trigger the checking of the queue if there is a task that is due:
Using a http callable function as in the example above. This requires you to perform a http call to this function regularly so it executes and checks if there is a task to be done. Depending on your needs you could do it from an own server or use a service like cron-job.org to perform the calls. Note that the HTTP callable function will be available publicly and potentially, others could also call it. However, if you make your check code idempotent, it shouldn't be an issue.
Use the Firebase "internal" cron option that uses Cloud Scheduler internally. Using that you can directly trigger the queue checking:
export scheduledFunctionCrontab =
functions.pubsub.schedule('* * * * *').onRun((context) => {
console.log('This will be run every minute!');
// Include code from checkQueue here from above
});
Using such a queue also makes your system more robust - if something goes wrong in between, you will not loose tasks that would somehow only exist in memory but as long as they are not marked as processed, a fixed worker will pick them up and reprocess them. This of course depends on your implementation.
You can trigger a cloud function on the Firestore Create event which will schedule the Cloud Task after 30 minutes. This will have queuing and retrying mechanism.
An easy way is that you could add a created field with a timestamp, and then have a scheduled function run at a predefined period (say, once a minute) and execute certain code for all records where created >= NOW - 31 mins AND created <= NOW - 30 mins (pseudocode). If your time precision requirements are not extremely high, that should work for most cases.
If this doesn't suit your needs, you can add a Cloud Task (Google Cloud product). The details are specified in this good article.

How can I handle pm2 cron jobs that run longer than the cron interval?

I have a cron job running on pm2 that sends notifications on a 5 second interval. Although it should never happen, I'm concerned that the script will take longer than 5 seconds to run. Basically, if the previous run takes 6 seconds, I don't want to start the next run until the first one finishes. Is there a way to handle this solely in pm2? Everything I've found says to use shell scripting to handle it, but it's not nearly as easy to replicate and move to new servers when needed.
As of now, I have the cron job just running in a never ending while loop (unless there's an error) that waits up to 5 seconds at the end. If it errors, it exits and reports the error, then restarts because it's running via pm2. I'm not too excited about this implementation though. Are there other options?
edit for clarification of my current logic -
function runScript() {
while (!err) {
// do stuff
wait(5 seconds - however long 'do stuff' took) // if it took 1 second to 'do stuff', then it waits 4 seconds
}
}
runScript()
This feels like a hacky way to get around the cron limits of pm2. It's possible that I'm just being paranoid... I just wanna make sure I'm not using antipatterns.
What do you mean you have the cron job running in a while loop? PM2 is starting a node process which contains a never-ending while loop that waits 5 seconds? Your implementation of a cron seems off to me, maybe you could provide more details.
Instead of a cron, I would use something like setTimeout method. Run your script using PM2 and in the script is a method like such:
function sendMsg() {
// do the work
setTimeout(sendMsg, 5000); // call sendMsg after waiting 5 seconds
}
sendMsg();
By doing it this way, your sendMsg function can take all the time it needs to run, and the next call will start 5 seconds after that. PM2 will restart your application if it crashes.
If you're looking to do it at specific 5 second intervals, but only when the method is not running, simply add a tracking variable to the equation, something like:
let doingWork = false;
function sendMsg() {
if (!doingWork) {
doingWork = true;
// do the work
doingWork = false;
}
}
setInterval(sendMsg, 5000); // call sendMsg every 5 seconds
You could replace setInterval with PM2 cron call on the script, but the variable idea remains the same.
To have exactly 5000 ms between the end your actions:
var myAsyncLongAction = function(cb){
// your long action here
return cb();
};
var fn = function(){
setTimeout(function(){
// your long action here
myAsyncLongAction(function(){
console.log(new Date().getTime());
setImmediate(fn);
});
}, 5000)
};
fn();
To have exactly 5000 ms between the start of your actions :
var myAsyncLongAction = function(cb){
// your long action here
setTimeout(function(){
return cb();
}, 1000);
};
var fn = function(basedelay, delay){
if(delay === undefined)
delay = basedelay;
setTimeout(function(){
// your long action here
var start = new Date().getTime();
myAsyncLongAction(function(){
var end = new Date().getTime();
var gap = end - start;
console.log("Action took "+(gap)+" ms, send next action in : "+(basedelay - gap)+" ms");
setImmediate(fn, basedelay, (gap < basedelay ? 1 : basedelay - gap));
});
}, delay);
};
fn(5000);

looking for a node.js scheduler that wont start if the job is still running

I'm looking for a schedular/ cron for nodejs.
But I need an important feature- if the jobs did not finish (when the time for it to start again arrived), I want it to not start/ delay the schedule.
For example, I need to run a job every 5 minutes. The job started at 8:00, but finished only at 8:06. so I want the job of 8:05 to either wait until 8:06, or not to start at all, and wait for the next cycle at 8:10.
Is there a package that does that? If not, what is the best way to implement this?
You can use the cron package. It allows you to start/stop the cronjob manually. Which means you can call these functions when your cronjob is finished.
const CronJob = require('cron').CronJob;
let job;
// The function you are running
const someFunction = () => {
job.stop();
doSomething(() => {
// When you are done
job.start();
})
};
// Create new cronjob
job = new CronJob({
cronTime: '00 00 1 * * *',
onTick: someFunction,
start: false,
timeZone: 'America/Los_Angeles'
});
// Auto start your cronjob
job.start();
You can implement it by yourself:
// The job has to have a method to inform about completion
function myJob(input, callback) {
setTimeout(callback, 10 * 60 * 1000); // It will complete in 10 minutes
}
// Scheduler
let jobIsRunning = false;
function scheduler() {
// Do nothing if job is still running
if (jobIsRunning) {
return;
}
// Mark the job as running
jobIsRunning = true;
myJob('some input', () => {
// Mark the job as completed
jobIsRunning = false;
});
}
setInterval(scheduler, 5 * 60 * 1000); // Run scheduler every 5 minutes

I want to create a cron job with nodejs, but How can I check that the previous job is finished or not?

I want to create a cron job with nodejs. But How can I check the previous is finished or not? I just want one job is running at the same time.
There is a great library for cron jobs in Node.JS: cron
The "nature" of cron jobs is that they are be executed periodically. If you want a job to start only if the previous job has finished, you can use a flag (set it to true at the start of the job, set it to false at the end of the job - then new jobs can check if the flag is true).
BUT, maybe you should use a job queue - once a job has finished, the next job will start executing. That depends on your scenario, so please provide more information.
A popular library that has queues: collections.
You could combine cron and async's queues, for example:
var CronJob = require('cron').CronJob;
var async = require('async');
var NUMBER_CONCURRENT_JOBS = 1;
var q = async.queue(function(task, callback) {
task(callback);
}, NUMBER_CONCURRENT_JOBS);
var job = function(callback) {
setTimeout(function() {
console.log('JOB EXECUTED');
callback();
}, 5000);
}
new CronJob('* * * * * *', function() {
console.log('JOB REQUIRED');
q.push(job);
}, null, true, 'America/Los_Angeles');

How to schedule a job once every Thursday using Kue?

Using Kue, how do I schedule a job to be executed once every Thursday? The Kue readme mentions that I can delay a Job, but what about repeatedly executing the Job at a specific time?
I can do what I want with a cron job, but I like Kue's features.
What I want is to process a Job once anytime on Thursday, but only once.
I had a similar question and I basically came up with the following. If anyone else has a different solution I would love to see some other ideas.
var jobQueue = kue.createQueue();
// Define job processor
jobQueue.process('thursday-jobs', function (job, done) {
var milisecondsTillThurs = // TODO: Get the time until next thursday. For this I used moment.js
// do this job again next Thursday
jobQueue.create('thursday-jobs').delay(milisecondsTillThurs).save();
// For Example purpose this job waits then calls done
setTimeout(function () {
done();
}, 10000);
});
// Use some initialization code to check if the job exists yet, and create it otherwise
kue.Job.rangeByType('thursday-jobs','delayed', 0, 10, '', function (err, jobs) {
if (err) {return handleErr(err);}
if (!jobs.length) {
jobQueue.create('thursday-jobs').save();
}
// Start checking for delayed jobs. This defaults to checking every 5 seconds
jobQueue.promote();
});
Kue has minimal documentation, but the source is well commented and easy to read
Take a look at kue-scheduler. I'm pretty sure that you should be able to do something like this:
var kue = require('kue-scheduler');
var Queue = kue.createQueue();
//create a job instance
var job = Queue
.createJob('every', data)
.attempts(3)
.backoff(backoff)
.priority('normal');
//schedule it to run every Thursday at 00:00:00
var thursday = '0 0 0 * * 4';
Queue.every(thursday, job);
//somewhere process your scheduled jobs
Queue.process('every', function(job, done) {
...
done();
});
kue-scheduler docs: https://github.com/lykmapipo/kue-scheduler;
link in their docs to cron stuff: https://github.com/kelektiv/node-cron;

Resources