creating recurring events in nodejs to update or insert MySQL table - node.js

I have a MySQL table tasks. In tasks, we can create a normal task or a recurring task that will automatically create a new task in the MySQL tasks table and send an email notification to the user that a task has been created. After a lot of research, I found out that you can do it in four methods
MySQL events
Kue, bull, agenda(node.js scheduling libraries)
Using a cron job to monitor every day for tasks
the recurring tasks would be repeated over weekly, daily, monthly, and yearly.
We must put an option to remove the recurring event at any time. What would be a nice and clean solution?

As you've identified there are a number of ways of going about this, here's how I would do it but I'm making a number of assumptions such as how many tasks you're likely to have and how flexible the system is going forward.
If you're unlikely to change the task time options (daily, weekly, monthly, yearly). Each task would have the following fields last_run_date and next_run_date. Every time a task is run I would update these fields and create an entry in a log table such as task_run_log which will also store the date/time the task was run at.
I would then have a cron job which fires a HTTP message to a nodejs service. This web service would look through the table of tasks, find which ones need to be executed for that day and would dispatch a message for each task into some sort of a queue (AWS SQS, GCP Pub/Sub, Apache Kafka, etc). Each message in the queue would represent a single task that needs to be carried out, workers can subscribe to this queue and process the task themselves. Once a worker has processed a job it would then make the log entry and update the last_run_date and next_run_date fields. If a task fails it'll add it into move that message into an error queue and will log a failed task in the task log.
This system would be robust as any failed jobs would exist as failed jobs in your database and would appear in an error queue (which you can either drain to remove the failed jobs, or you can replay them into the normal queue when the worker is fixed). It would also scale to many tasks that have to happen each day as you can scale up your workers. You also won't be flooding cron, your cron job will just send a single HTTP request each day to your HTTP service which kicks off the processing.
You can also setup alerts based on whether the cron job runs or not to make sure the process gets kicked off properly.

I had to do something very similar, you can use the npm module node-schedule
Node scheduler has many features. You can first create your rule setup, which determines when it runs and then schedules the job, which is where determine what the job performs and activates it, I have an example below from my code which sets a job to run at midnight every day.
var rule = new schedule.RecurrenceRule();
rule.dayOfWeek = [0, new schedule.Range(1, 6)];
var j = schedule.scheduleJob(rule, function(){
sqlUpdate(server);
});
This may not exactly fit all of your requirements alone but there are other features and setups you can do.
For example you can cancel any job with the cancel function
j.cancel()
You can also set start times and end times like so as shown in the npm page
let startTime = new Date(Date.now() + 5000);
let endTime = new Date(startTime.getTime() + 5000);
var j = schedule.scheduleJob({ start: startTime, end: endTime, rule: '*/1 * * * * *' }, function(){
console.log('Time for tea!');
});
There are also other options for scheduling the date and time as this also follows the cron format. Meaning you can set dynamic times
var j = schedule.scheduleJob('42 * * * *', function(){
console.log();
});
As such this would allow node.js to handle everything you need. You would likely need to set up a system to keep track of the scheduled jobs (var j) But it would allow you to cancel it and schedule it to your desire.
It additionally can allow you to reschedule, retrieve the next scheduled event and you can have multiple date formats.
If you need to persist the jobs after the process is turned of and on or reset you will need to save the details of the job, a MySQL database would make sense here, and upon startup, the code could make a quick pull and restart all of the created tasks based on the data from the database. And when you cancel a job you just delete it from the database. It should be noted the process needs to be on for this to work, a job will not run if the process is turned off

Related

How to set agenda job concurrency properly

Here's an example job:
const Agenda = require('agenda')
const agenda = new Agenda({db: {address: process.env.MONGO_URL}})
agenda.define('example-job', (job) => {
console.log('took a job -', job.attrs._id)
})
So now, let's say I queue up up 11 agenda jobs like this:
const times = require('lodash/times')
times(11, () => agenda.now('example-job'))
Now if I look in the DB I can see that there are 11 jobs queued and ready to go (like I would expect).
So now I start one worker process:
agenda.on('ready', () => {
require('./jobs/example_job')
agenda.start()
})
When that process starts I see 5 jobs get pulled off the queue, this makes sense because the defaultConcurrency is 5 https://github.com/agenda/agenda#defaultconcurrencynumber
So far so good, but if I start another worker process (same as the one above) I would expect 5 more jobs to be pulled off the queue so there would be a total of 10 running (5 per process), and one left on the queue.
However, when the second worker starts, it doesn't pull down any more jobs, it just idles.
I would expect that defaultConcurrency is the number of jobs that can run at any given moment per process, but it looks like it is a setting that applies to the number of jobs at any moment in aggregate, across all agenda processes.
What am I missing here or what is the correct way to specify how many jobs can run per process, without putting a limit on the number of jobs that can be run across all the processes.
The problem is that the defaultLockLimit needs to be set.
By default, lock limit is 0, which means no limit, which means one worker will lock up all the available jobs, allowing no other workers to claim them.
By setting defaultLockLimit to the same value as defaultConcurrency this ensures that a worker will only lock the jobs that it is actively processing.
See: https://github.com/agenda/agenda/issues/412#issuecomment-374430070

Workflow System with Azure Table Storage

I have a system where we need to run a simple workflow.
Example:
On Jan 1st 08:15 trigger task A for object Z
When triggered then run some code (implementation details not important)
Schedule task B for object Z to run at Jan 3rd 10:25 (and so on)
The workflow itself is simple, but I need to run 500.000+ instances and that's the tricky part.
I know Windows Workflow Foundation and for that very same reason I have chosen not to use that.
My initial design would be to use Azure Table Storage and I would really appreciate some feedback on the design.
The system will consist of two tables
Table "Jobs"
PartitionKey: ObjectId
Rowkey: ProcessOn (UTC Ticks in reverse so that newest are on top)
Attributes: State (Pending, Processed, Error, Skipped), etc...
Table "Timetable"
PartitionKey: YYYYMMDD
Rowkey: YYYYMMDDHHMM_<GUID>
Attributes: Job_PartitionKey, Job_RowKey
The idea is that the runs table will have the complete history of jobs per object and the Timetable will have a list of all jobs to run in the future.
Some assumptions:
A job will never span more than one Object
There will only ever be one pending job per Object
The "job" is very lightweight e.g. posting a message to a queue
The system must be able to perform these tasks:
Execute pending jobs
Query for all records in "Timetable" with a "partition <= Today" and "RowKey <= today"
For each record (in parallel)
Lookup job in Jobs table via PartitionKey and RowKey
If "not exists" or State != Pending then skip
Execute "logic". If fails => log and maybe do some retry logic
Submit "Next run date in Timetable"
Submit "Update State = Processed" and "New Job Record (next run)" as a single transaction
When all are finished => Delete all processed Timetable records
Concern: Only two of the three records modifications are in a transaction. Could this be overcome in any way?
Stop workflow
Stop/pause workflow for Object Z
Query top 1 jobs in Jobs table by PartitionKey
If any AND State == Pending then update to "Cancelled"
(No need to bother cleaning Timetable it will clean itself up "when time comes")
Start workflow
Create Pending record in Jobs table
Create record in Timetable
In terms of "executing the thing" I would
be using a Azure Function or Scheduler-thing to execute the pending jobs every 5 minutes or so.
Any comments or suggestions would be highly appreciated.
Thanks!
How about using Service Bus instead? The BrokeredMessage class has a property called ScheduledEnqueueTimeUtc. You can just schedule when you want your jobs to run via the ScheduledEnqueueTimeUtc property, and then fuggedabouddit. You can then have a triggered webjob that monitors the Service Bus messaging queue, and will be triggered very near when the job message is enqueued. I'm a big fan of relying on existing services to minimize the coding needed.

Best practice beanstalkd (queue) and node.js

I currently do service using beanstalkd and node.js.
I would like when jobs fail, retry n time before give up the job.
If the job succede i want do it the same job 10 time.
So, what is the best practice, stock in mongo db with the jobId the error and success count, or delete and put a new job with a an error and success count in the body.
I dont know if i'm clear? so tell me , thanks a lot
There is a stats-job <id>\r\n that should also be available via the API library that returns, among other things, how many times the specific job has been reserved, released, buried, and so on.
This allows for a number of retries of failed jobs by checking previous reservation/releases.
To run the same job multiple times, I would personally create either one additional job, with a success count that would then be incremented (into another new job) - or, all nine new jobs, with optional delays before they start.
You have a couple of ways to do this:
you can release the job, and obtain from stats the number of reserves
you can put a new job with a retry count, and keep track of history in the data payload
You should do the later, and you don't need MongoDB as a second dependency.

How do you implement periodically executing job?

I am recently implementing a system that automatically replies to tweets that contain arbitrary hashtags. This system consists of a process that periodically crawls Twitter and a process that periodically replies to these tweets. Following the tradition of my company, these periodical jobs are implemented with working tables on RDMS that have a status column which would have values like "waiting", "processing" or "succeed". To ensure redundancy, I make multiple same processes running by leveraging low level locks.
My question is, I'm implementing periodically jobs with working tables in RDMS, how these jobs are implemented in generally.
There's a node package cron which allows you to execute code at some specified interval, just like crontab. Here's a link to the package: https://www.npmjs.org/package/cron
For example:
var cronJob = require("cron").CronJob;
// Run this cron job every Sunday (0) at 7:00:00 AM
new cronJob("00 00 7 * * 0", function() {
// insert code to run here...
}, null, true);
You might be able to use that module to run some job periodically, which crawls twitter or replies to tweets.

How to process scheduled, recurring jobs with Kue?

In my webapp, users can create recurring invoices that need to be generated and sent out at certain dates each month. For example, an invoice might need to be sent out on the 5th of every month.
I am using Kue to process all my background jobs so I want to do it in this case as well.
My current solution is to use setInterval() to create a processRecurringInvoices job every hour. This job will then find all recurring invoices from database and create a separate generateInvoice job for each recurring invoice.
The generateInvoice job will then actually generate the invoice, and if needed, will also in turn create a sendInvoiceToEmail job that will email the invoice.
At the moment this solution looks good to me, because it has a nice separation of concerns, however, I have the following questions:
I am not sure if I should wait for all the 'child' jobs to complete before I call done() on the main processRecurringInvoices job?
Where should I handle errors? Should i pass them back to the processRecurringInvoices job or should I handle them separately for each job?
How can I make sure that if processing takes extra long time (more than an hour), and either processRecurringInvoices or any of the child jobs are still runnning, the processRecurringInvoices job is not created again? Kind of like a unique job, or mutual exclusion?
Instead of "processRecurringInvoices" it might be easier to think of it as a job that initiates other, separate invoice-processing jobs. Thinking of it this way, once the invoice processing jobs have been enqueued, you can safely call done() on the job that kicks them all off.
Thinking of the problem in the way described in question 1, errors should be handled within each of the individual invoice processing jobs. If an error occurred finding potential invoice jobs, then that would probably be handled in the processRecurringInvoices jobs.
you can use kue.Job.rangeByType() to search for currently active jobs. If a job is active, you can skip kicking it off again.

Resources