I am recently implementing a system that automatically replies to tweets that contain arbitrary hashtags. This system consists of a process that periodically crawls Twitter and a process that periodically replies to these tweets. Following the tradition of my company, these periodical jobs are implemented with working tables on RDMS that have a status column which would have values like "waiting", "processing" or "succeed". To ensure redundancy, I make multiple same processes running by leveraging low level locks.
My question is, I'm implementing periodically jobs with working tables in RDMS, how these jobs are implemented in generally.
There's a node package cron which allows you to execute code at some specified interval, just like crontab. Here's a link to the package: https://www.npmjs.org/package/cron
For example:
var cronJob = require("cron").CronJob;
// Run this cron job every Sunday (0) at 7:00:00 AM
new cronJob("00 00 7 * * 0", function() {
// insert code to run here...
}, null, true);
You might be able to use that module to run some job periodically, which crawls twitter or replies to tweets.
Related
I have a MySQL table tasks. In tasks, we can create a normal task or a recurring task that will automatically create a new task in the MySQL tasks table and send an email notification to the user that a task has been created. After a lot of research, I found out that you can do it in four methods
MySQL events
Kue, bull, agenda(node.js scheduling libraries)
Using a cron job to monitor every day for tasks
the recurring tasks would be repeated over weekly, daily, monthly, and yearly.
We must put an option to remove the recurring event at any time. What would be a nice and clean solution?
As you've identified there are a number of ways of going about this, here's how I would do it but I'm making a number of assumptions such as how many tasks you're likely to have and how flexible the system is going forward.
If you're unlikely to change the task time options (daily, weekly, monthly, yearly). Each task would have the following fields last_run_date and next_run_date. Every time a task is run I would update these fields and create an entry in a log table such as task_run_log which will also store the date/time the task was run at.
I would then have a cron job which fires a HTTP message to a nodejs service. This web service would look through the table of tasks, find which ones need to be executed for that day and would dispatch a message for each task into some sort of a queue (AWS SQS, GCP Pub/Sub, Apache Kafka, etc). Each message in the queue would represent a single task that needs to be carried out, workers can subscribe to this queue and process the task themselves. Once a worker has processed a job it would then make the log entry and update the last_run_date and next_run_date fields. If a task fails it'll add it into move that message into an error queue and will log a failed task in the task log.
This system would be robust as any failed jobs would exist as failed jobs in your database and would appear in an error queue (which you can either drain to remove the failed jobs, or you can replay them into the normal queue when the worker is fixed). It would also scale to many tasks that have to happen each day as you can scale up your workers. You also won't be flooding cron, your cron job will just send a single HTTP request each day to your HTTP service which kicks off the processing.
You can also setup alerts based on whether the cron job runs or not to make sure the process gets kicked off properly.
I had to do something very similar, you can use the npm module node-schedule
Node scheduler has many features. You can first create your rule setup, which determines when it runs and then schedules the job, which is where determine what the job performs and activates it, I have an example below from my code which sets a job to run at midnight every day.
var rule = new schedule.RecurrenceRule();
rule.dayOfWeek = [0, new schedule.Range(1, 6)];
var j = schedule.scheduleJob(rule, function(){
sqlUpdate(server);
});
This may not exactly fit all of your requirements alone but there are other features and setups you can do.
For example you can cancel any job with the cancel function
j.cancel()
You can also set start times and end times like so as shown in the npm page
let startTime = new Date(Date.now() + 5000);
let endTime = new Date(startTime.getTime() + 5000);
var j = schedule.scheduleJob({ start: startTime, end: endTime, rule: '*/1 * * * * *' }, function(){
console.log('Time for tea!');
});
There are also other options for scheduling the date and time as this also follows the cron format. Meaning you can set dynamic times
var j = schedule.scheduleJob('42 * * * *', function(){
console.log();
});
As such this would allow node.js to handle everything you need. You would likely need to set up a system to keep track of the scheduled jobs (var j) But it would allow you to cancel it and schedule it to your desire.
It additionally can allow you to reschedule, retrieve the next scheduled event and you can have multiple date formats.
If you need to persist the jobs after the process is turned of and on or reset you will need to save the details of the job, a MySQL database would make sense here, and upon startup, the code could make a quick pull and restart all of the created tasks based on the data from the database. And when you cancel a job you just delete it from the database. It should be noted the process needs to be on for this to work, a job will not run if the process is turned off
Is it more efficient to run a Node JS application through CRON at set times, or to run a 24/7 Node Application utilizing schedulers inherently?
Option A:
CRON Job for daily process - calls daily NodeJS App
CRON Job for weekly process - calls weekly NodeJS App
Advantage is the application only runs when it needs to.
Disadvantage is overhead and organization. You are required to have separate projects/scripts for each different action.
Option B:
Separate NodeJS application which runs 24/7.
Calls it's daily operations daily, weekly operations weekly, etc.
Advantage: One project containing all CRON rules - easy to add more tasks
Disadvantage: Project must be running 24/7, more overhead when not needed.
If you are looking at daily and weekly frequency only, I would go for a CRON job that calls your NodeJS app. If that task runs in 5 minutes, you will only have CPU and RAM utilisation during those minutes, instead of having a full NodeJS app staying in memory and using CPU resources all day (although minimal).
You don't need two directories; you can only have one directory where you have a daily.js job file and a weekly.js file. Or you could even have 1 single file (index.js) and call it with an argument to trigger the daily or weekly job.
In your index.js you would need to read the process.argv array to read the argument that was passed in.
The index.js code would look as follows:
if (process.argv[2] == "D") {
//code for daily task
} else if (process.argv[2] == "W") {
//code for weekly task
} else {
throw new Error("Invalid argument");
}
And your crontab file would look like this (to run the daily job at 3am every day and the weekly job at 6am on Monday for example):
0 3 * * * node /path/index.js D
0 6 * * 1 node /path/index.js W
I am planning for a nodeJS Application where the server has to handle time-based jobs. For example:
Job 1: Every 2 hours
Job 2: Every 5 minutes
Job 3: Every 10 days
...
These jobs are saved in a database and there will be a lot of jobs - it could be a millions of these.
So now is my question which is the right way in nodeJS to handle this ? There are a lot cron packages for cron like processes but I thought its maybe "to much" for this.
I was thinking about .... Timeouts! Right - timeouts.
This is the schema I was thinking of:
var job = getJob();
var executeInMs = getMSUntilExecute(job);
setTimeout(function(){handle();}, executeInMs);
With this schema it really could be that there are a million timeouts running at the same time in my application.
The question now is: Is this a Server-Killer or is this the right way to handle this ?
I currently do service using beanstalkd and node.js.
I would like when jobs fail, retry n time before give up the job.
If the job succede i want do it the same job 10 time.
So, what is the best practice, stock in mongo db with the jobId the error and success count, or delete and put a new job with a an error and success count in the body.
I dont know if i'm clear? so tell me , thanks a lot
There is a stats-job <id>\r\n that should also be available via the API library that returns, among other things, how many times the specific job has been reserved, released, buried, and so on.
This allows for a number of retries of failed jobs by checking previous reservation/releases.
To run the same job multiple times, I would personally create either one additional job, with a success count that would then be incremented (into another new job) - or, all nine new jobs, with optional delays before they start.
You have a couple of ways to do this:
you can release the job, and obtain from stats the number of reserves
you can put a new job with a retry count, and keep track of history in the data payload
You should do the later, and you don't need MongoDB as a second dependency.
In my webapp, users can create recurring invoices that need to be generated and sent out at certain dates each month. For example, an invoice might need to be sent out on the 5th of every month.
I am using Kue to process all my background jobs so I want to do it in this case as well.
My current solution is to use setInterval() to create a processRecurringInvoices job every hour. This job will then find all recurring invoices from database and create a separate generateInvoice job for each recurring invoice.
The generateInvoice job will then actually generate the invoice, and if needed, will also in turn create a sendInvoiceToEmail job that will email the invoice.
At the moment this solution looks good to me, because it has a nice separation of concerns, however, I have the following questions:
I am not sure if I should wait for all the 'child' jobs to complete before I call done() on the main processRecurringInvoices job?
Where should I handle errors? Should i pass them back to the processRecurringInvoices job or should I handle them separately for each job?
How can I make sure that if processing takes extra long time (more than an hour), and either processRecurringInvoices or any of the child jobs are still runnning, the processRecurringInvoices job is not created again? Kind of like a unique job, or mutual exclusion?
Instead of "processRecurringInvoices" it might be easier to think of it as a job that initiates other, separate invoice-processing jobs. Thinking of it this way, once the invoice processing jobs have been enqueued, you can safely call done() on the job that kicks them all off.
Thinking of the problem in the way described in question 1, errors should be handled within each of the individual invoice processing jobs. If an error occurred finding potential invoice jobs, then that would probably be handled in the processRecurringInvoices jobs.
you can use kue.Job.rangeByType() to search for currently active jobs. If a job is active, you can skip kicking it off again.