In my webapp, users can create recurring invoices that need to be generated and sent out at certain dates each month. For example, an invoice might need to be sent out on the 5th of every month.
I am using Kue to process all my background jobs so I want to do it in this case as well.
My current solution is to use setInterval() to create a processRecurringInvoices job every hour. This job will then find all recurring invoices from database and create a separate generateInvoice job for each recurring invoice.
The generateInvoice job will then actually generate the invoice, and if needed, will also in turn create a sendInvoiceToEmail job that will email the invoice.
At the moment this solution looks good to me, because it has a nice separation of concerns, however, I have the following questions:
I am not sure if I should wait for all the 'child' jobs to complete before I call done() on the main processRecurringInvoices job?
Where should I handle errors? Should i pass them back to the processRecurringInvoices job or should I handle them separately for each job?
How can I make sure that if processing takes extra long time (more than an hour), and either processRecurringInvoices or any of the child jobs are still runnning, the processRecurringInvoices job is not created again? Kind of like a unique job, or mutual exclusion?
Instead of "processRecurringInvoices" it might be easier to think of it as a job that initiates other, separate invoice-processing jobs. Thinking of it this way, once the invoice processing jobs have been enqueued, you can safely call done() on the job that kicks them all off.
Thinking of the problem in the way described in question 1, errors should be handled within each of the individual invoice processing jobs. If an error occurred finding potential invoice jobs, then that would probably be handled in the processRecurringInvoices jobs.
you can use kue.Job.rangeByType() to search for currently active jobs. If a job is active, you can skip kicking it off again.
Related
I am using NodeJS,MongoDB and node-cron npm module to schedule jobs. For 10K of jobs it is taking less time and less memory. But when i am scheduling 100k jobs it is taking more than 10 minutes to schedule jobs and taking nearly 1.5GB of RAM and some times out of memory. Is there any best way achieve this like using activemq or rabbitmq?
One strategy is that you only schedule the next job to run. When it runs, you query the database and find the next job and schedule it.
If you add a new job, you check if it wants to run sooner than the now current next job and, if so, you schedule it and deschedule the previous next job (it will get rescheduled later after this new job runs).
If you remove a job, you check if it is the current next job. If it is, you deschedule it and find the next job in the database and schedule it.
If your database is configured for efficiently querying by job run time, this can be very efficient, uses hardly any memory and scales to an infinitely large number of jobs.
Sometimes Crons is working sometimes getting missed. I have attached all setting and result. Anyone can check and revert.
It's completely normal behaviour. Some jobs are skipped caused the time frame is out of scheduled time for specified cron job. In your case the reindex process is scheduled every 1 minute. If there is more things to index (lot of changes on products, categories etc.) one minute is's not enough to complete. Also there is only one process per cron group, in your case index. Use Separate Process in cron configuration means that indexes process will run as separate process in relation to other cron groups.
I have a MySQL table tasks. In tasks, we can create a normal task or a recurring task that will automatically create a new task in the MySQL tasks table and send an email notification to the user that a task has been created. After a lot of research, I found out that you can do it in four methods
MySQL events
Kue, bull, agenda(node.js scheduling libraries)
Using a cron job to monitor every day for tasks
the recurring tasks would be repeated over weekly, daily, monthly, and yearly.
We must put an option to remove the recurring event at any time. What would be a nice and clean solution?
As you've identified there are a number of ways of going about this, here's how I would do it but I'm making a number of assumptions such as how many tasks you're likely to have and how flexible the system is going forward.
If you're unlikely to change the task time options (daily, weekly, monthly, yearly). Each task would have the following fields last_run_date and next_run_date. Every time a task is run I would update these fields and create an entry in a log table such as task_run_log which will also store the date/time the task was run at.
I would then have a cron job which fires a HTTP message to a nodejs service. This web service would look through the table of tasks, find which ones need to be executed for that day and would dispatch a message for each task into some sort of a queue (AWS SQS, GCP Pub/Sub, Apache Kafka, etc). Each message in the queue would represent a single task that needs to be carried out, workers can subscribe to this queue and process the task themselves. Once a worker has processed a job it would then make the log entry and update the last_run_date and next_run_date fields. If a task fails it'll add it into move that message into an error queue and will log a failed task in the task log.
This system would be robust as any failed jobs would exist as failed jobs in your database and would appear in an error queue (which you can either drain to remove the failed jobs, or you can replay them into the normal queue when the worker is fixed). It would also scale to many tasks that have to happen each day as you can scale up your workers. You also won't be flooding cron, your cron job will just send a single HTTP request each day to your HTTP service which kicks off the processing.
You can also setup alerts based on whether the cron job runs or not to make sure the process gets kicked off properly.
I had to do something very similar, you can use the npm module node-schedule
Node scheduler has many features. You can first create your rule setup, which determines when it runs and then schedules the job, which is where determine what the job performs and activates it, I have an example below from my code which sets a job to run at midnight every day.
var rule = new schedule.RecurrenceRule();
rule.dayOfWeek = [0, new schedule.Range(1, 6)];
var j = schedule.scheduleJob(rule, function(){
sqlUpdate(server);
});
This may not exactly fit all of your requirements alone but there are other features and setups you can do.
For example you can cancel any job with the cancel function
j.cancel()
You can also set start times and end times like so as shown in the npm page
let startTime = new Date(Date.now() + 5000);
let endTime = new Date(startTime.getTime() + 5000);
var j = schedule.scheduleJob({ start: startTime, end: endTime, rule: '*/1 * * * * *' }, function(){
console.log('Time for tea!');
});
There are also other options for scheduling the date and time as this also follows the cron format. Meaning you can set dynamic times
var j = schedule.scheduleJob('42 * * * *', function(){
console.log();
});
As such this would allow node.js to handle everything you need. You would likely need to set up a system to keep track of the scheduled jobs (var j) But it would allow you to cancel it and schedule it to your desire.
It additionally can allow you to reschedule, retrieve the next scheduled event and you can have multiple date formats.
If you need to persist the jobs after the process is turned of and on or reset you will need to save the details of the job, a MySQL database would make sense here, and upon startup, the code could make a quick pull and restart all of the created tasks based on the data from the database. And when you cancel a job you just delete it from the database. It should be noted the process needs to be on for this to work, a job will not run if the process is turned off
I need to ensure the same job added to queue isn't duplicated within a certain period of time.
Is it worth including partial timestamps (i.e. D/M/Y-HH:M) in my unique jobId strings, so it processes only if not in the same Minute?
It would still duplicate if one job was added at 12:01 and the other at 12:09 – or does Bull have a much better way of doing this?
Bull is designed to support idempotence by ignoring jobs that were added with existing job ids. Be careful to not enable options such as removeOnCompleted, since the job will be removed after completion and not being considered the next time you add a job.
In your case, where you want to make sure that no new jobs are added during a given timespan, just make sure that all the job ids during that timestamp are the same, for example as you wrote in your comment removing the 4 last digits of your UNIX timestamp.
I feel you should use Bull's API to check that the job is running or not, then you decide if you add the job to the queue if not (patch on the producer).
You can also decide to check if a similar job is already running when your are running the job (inside the process function) and do an early return instead of executing the job (patch on the consumer).
You can use the Queue getJobs function to do so:
getJobs(types: string[], start?: number, end?: number, asc?: boolean):Promise<Job[]>
"Returns a promise that will return an array of job instances of the given types. Optional parameters for range and ordering are provided."
From documentation:
https://github.com/OptimalBits/bull/blob/develop/REFERENCE.md#queuegetjobs
The Job item should provide enough data so you can find the one you are looking for.
I currently do service using beanstalkd and node.js.
I would like when jobs fail, retry n time before give up the job.
If the job succede i want do it the same job 10 time.
So, what is the best practice, stock in mongo db with the jobId the error and success count, or delete and put a new job with a an error and success count in the body.
I dont know if i'm clear? so tell me , thanks a lot
There is a stats-job <id>\r\n that should also be available via the API library that returns, among other things, how many times the specific job has been reserved, released, buried, and so on.
This allows for a number of retries of failed jobs by checking previous reservation/releases.
To run the same job multiple times, I would personally create either one additional job, with a success count that would then be incremented (into another new job) - or, all nine new jobs, with optional delays before they start.
You have a couple of ways to do this:
you can release the job, and obtain from stats the number of reserves
you can put a new job with a retry count, and keep track of history in the data payload
You should do the later, and you don't need MongoDB as a second dependency.