Snowflake tasks combining CRON scheduling (set time) with NON-CRON scheduling (every minute) - cron

Every hour, new data will be loaded in a table. As soon as that is ready I want to call a stored procedure.
Is it possible with Snowflake Tasks and Streams to schedule that starting every hour on the hour, check every minute if the table has new data and if it does, call a stored procedure and stop checking every minute?
At first I thougt about having 1 task with an hourly schedule. The next task would be scheduled every minute and have the hourly task as its predecessor. Unfortunately it is not possible for a task to have both a predecessor and a schedule. What would be a way to create this kind of scheduling?
For example
CREATE OR REPLACE TASK tsk_X_X_1
WAREHOUSE = X
SCHEDULE = 'USING CRON 0 1 * * * UTC'
AS INSERT INTO XX.Test SELECT 'hourly', sysdate()
;
CREATE OR REPLACE TASK tsk_X_X_2
WAREHOUSE = X
SCHEDULE = '5 minutes'
AFTER TASK tsk_X_X_1
As INSERT INTO XX.Testing SELECT '5min', sysdate()
;
Error: Task TSK_X_X_2 cannot have both a schedule and a predecessor.

Related

Django-Q scheduling tasks every X seconds

I am using Django-Q to schedule a periodic simple task that has to be repeated every < 1 minute.
Croniter, used under the hood to parse cron expressions for the scheduler, specifies that cron "seconds" support is available:
https://pypi.org/project/croniter/#about-second-repeats
So I created a cron-type schedule that looks like this:
Schedule.objects.update_or_create(name='mondrian_scheduler', defaults= {'func':'mondrianapi.tasks.run_scheduler', 'schedule_type':Schedule.CRON,
'cron': '* * * * * */20'} )
Django-q correctly parses and schedules the job, but the real frequency doesn't seem to go below 30 seconds (31, actually), whatever the 6th argument says:
2021-05-12 10:17:08.528307+00:00---run_bot ID 1
2021-05-12 10:17:39.166822+00:00---run_bot ID 1
2021-05-12 10:18:09.899772+00:00---run_bot ID 1
2021-05-12 10:18:40.648140+00:00---run_bot ID 1
2021-05-12 10:19:11.176563+00:00---run_bot ID 1
2021-05-12 10:19:41.857376+00:00---run_bot ID 1
The guard (or sentinel) process is responsible for querying for any scheduled tasks which are due, and it only does this twice per minute:
Scheduler
Twice a minute the scheduler checks for any scheduled tasks that should be starting.
Creates a task from the schedule
Subtracts 1 from django_q.Schedule.repeats
Sets the next run time if there are repeats left or if it has a negative value.
https://django-q.readthedocs.io/en/latest/architecture.html?highlight=scheduler#scheduler
The guard process is also responsible for checking that all of the other processes are still running, so it is not exactly thirty seconds.
Unfortunately the scheduler interval is not configurable. If you're comfortable modifying django_q, the relevant code is in django_q/cluster.py, in Sentinel.guard().

creating recurring events in nodejs to update or insert MySQL table

I have a MySQL table tasks. In tasks, we can create a normal task or a recurring task that will automatically create a new task in the MySQL tasks table and send an email notification to the user that a task has been created. After a lot of research, I found out that you can do it in four methods
MySQL events
Kue, bull, agenda(node.js scheduling libraries)
Using a cron job to monitor every day for tasks
the recurring tasks would be repeated over weekly, daily, monthly, and yearly.
We must put an option to remove the recurring event at any time. What would be a nice and clean solution?
As you've identified there are a number of ways of going about this, here's how I would do it but I'm making a number of assumptions such as how many tasks you're likely to have and how flexible the system is going forward.
If you're unlikely to change the task time options (daily, weekly, monthly, yearly). Each task would have the following fields last_run_date and next_run_date. Every time a task is run I would update these fields and create an entry in a log table such as task_run_log which will also store the date/time the task was run at.
I would then have a cron job which fires a HTTP message to a nodejs service. This web service would look through the table of tasks, find which ones need to be executed for that day and would dispatch a message for each task into some sort of a queue (AWS SQS, GCP Pub/Sub, Apache Kafka, etc). Each message in the queue would represent a single task that needs to be carried out, workers can subscribe to this queue and process the task themselves. Once a worker has processed a job it would then make the log entry and update the last_run_date and next_run_date fields. If a task fails it'll add it into move that message into an error queue and will log a failed task in the task log.
This system would be robust as any failed jobs would exist as failed jobs in your database and would appear in an error queue (which you can either drain to remove the failed jobs, or you can replay them into the normal queue when the worker is fixed). It would also scale to many tasks that have to happen each day as you can scale up your workers. You also won't be flooding cron, your cron job will just send a single HTTP request each day to your HTTP service which kicks off the processing.
You can also setup alerts based on whether the cron job runs or not to make sure the process gets kicked off properly.
I had to do something very similar, you can use the npm module node-schedule
Node scheduler has many features. You can first create your rule setup, which determines when it runs and then schedules the job, which is where determine what the job performs and activates it, I have an example below from my code which sets a job to run at midnight every day.
var rule = new schedule.RecurrenceRule();
rule.dayOfWeek = [0, new schedule.Range(1, 6)];
var j = schedule.scheduleJob(rule, function(){
sqlUpdate(server);
});
This may not exactly fit all of your requirements alone but there are other features and setups you can do.
For example you can cancel any job with the cancel function
j.cancel()
You can also set start times and end times like so as shown in the npm page
let startTime = new Date(Date.now() + 5000);
let endTime = new Date(startTime.getTime() + 5000);
var j = schedule.scheduleJob({ start: startTime, end: endTime, rule: '*/1 * * * * *' }, function(){
console.log('Time for tea!');
});
There are also other options for scheduling the date and time as this also follows the cron format. Meaning you can set dynamic times
var j = schedule.scheduleJob('42 * * * *', function(){
console.log();
});
As such this would allow node.js to handle everything you need. You would likely need to set up a system to keep track of the scheduled jobs (var j) But it would allow you to cancel it and schedule it to your desire.
It additionally can allow you to reschedule, retrieve the next scheduled event and you can have multiple date formats.
If you need to persist the jobs after the process is turned of and on or reset you will need to save the details of the job, a MySQL database would make sense here, and upon startup, the code could make a quick pull and restart all of the created tasks based on the data from the database. And when you cancel a job you just delete it from the database. It should be noted the process needs to be on for this to work, a job will not run if the process is turned off

Cron Schedule for a job that runs every minute from 8 am to 7:30 pm

I can't figure out how to create a job that ends at a specific hour and minute
If you break your cronjob into two, it would look like:
* 8-19 * * * command
0-30/1 19 * * * command
first line runs every minute from 8-19, and second line every minute from 19-19:30.
Cron triggers are not quite suitable for these types of schedules. If you do not insist on using a Cron trigger, I recommend that you check the Daily Time Interval trigger that is designed for use-cases such as yours. I am attaching a screenshot of a Daily Time Interval Trigger configuration for your use-case.

Quzrtz scheduler with multiple cron schedules in a single trigger

I am trying to create a windows service that executes twice a day.
and I was successfully able to create it using two triggers added to a single job.
var job = JobBuilder.Create<Job>().StoreDurably().WithIdentity("Report_Name", "Report_Group").Build();
scheduler.AddJob(job, true);
var trigger_1 = TriggerBuilder.Create()
.WithIdentity("Report_Name_1", "Report_Group_A")
.StartNow()
.WithCronSchedule(string.Format("0 {0} {1} ? * *", Utility.Schedule_StartTime_1.Minute, Utility.Schedule_StartTime_1.Hour)) //0 Min hour
.ForJob(job)
.Build();
var trigger_2 = TriggerBuilder.Create()
.WithIdentity("Report_Name_2", "Report_Group_B")
.StartNow()
.WithCronSchedule(string.Format("0 {0} {1} ? * *", Utility.Schedule_StartTime_2.Minute, Utility.Schedule_StartTime_2.Hour)) //0 Min hour
.ForJob(job)
.Build();
scheduler.ScheduleJob(trigger_1);
scheduler.ScheduleJob(trigger_2);
scheduler.Start();
Can I use a single trigger to add multiple cron schedules
No, the trigger can have only one schedule.
One of the main reason why this is done is to prevent a situation when it is not clear for scheduler how to resolve competition between
conditions.
Imagine you have a job with 2 intersected schedules: let's say you want to run the job every 15 mins and every hour, and it takes up to 10 mins to execute it. In this case, you would need to specify how you want to handle scenarios, when
a job is executing, but scheduler fires new execution.
a job should be fired by both schedules
To allow handling such cases, the trigger has attributes like Priority and Misfire Instructions.

Cron job time syntax

I have a cron job that processes an action for several records in my database. I want it to process each record with a 5 minute delay, then repeat the process every 12 hours. What is the syntax that I need to use to make this happen? For example, if I have 5 rows in my database that the cron job will process. I want it to process the first row, then process the next row 5 minutes later, then process the next row 5 minutes later, etc. until all rows have been processed. Then repeat the whole process every 12 hours. I tried using */5 */12 * * * but it did not work.
It won't work the way you have configured.
if I have 5 rows in my database that the cron job will process. I want it to process the first row, then process the next row 5 minutes later, then process the next row 5 minutes later, etc
Write a shell scrip to achieve above goal. Cron won't do it for you. Hint: use sleep function in your scrip to wait for 5 minutes before processing next record.
Then repeat the whole process every 12 hours
use * */12 * * * in cron to let your shell script run after every 12 hrs.
So, in short, Cron will trigger a run of you script very 12 hrs AND your script has the logic to wait for 5 minutes between processing any two consecutive DB records.

Resources