i need some help understanding the behaviour with monthly Cron expression [43 10 3,8,12 */3 *] with start_date as datetime(year=2019, month=11, day=18, hour=1, minute=30, second=0 , tzinfo=pendulum.timezone("UTC")) and end_date as None . This has backfill set as true .
Current Date is: 2020-10-19
As per my understanding it should not have triggered last two runs 10-03 and 10-08 . Can someone please help me understand this behavior? Also if it is triggering run for execution_date of 10-03 and 10-08 then why not for 10-12?
Could you elaborate on "it should not have trifggered the last two runs"?
The cron expression 43 10 3,8,12 */3 * matches:
“At 10:43 on day-of-month 3, 8, and 12 in every 3rd month.”
A good tool to validate cron expression is crontab.guru.
The execution date 10-12 hasn't triggered yet, because of how Airflow handles execution_date - see airflow scheduler:
The scheduler won’t trigger your tasks until the period it covers has ended e.g., A job with schedule_interval set as #daily runs after the day has ended. This technique makes sure that whatever data is required for that period is fully available before the dag is executed. In the UI, it appears as if Airflow is running your tasks a day late
Let’s Repeat That, the scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.
This means the run with execution date 2020-10-12 10:43:00 will be triggered just shortly before 2021-01-03 10:43:00.
Related
I need to schedule my spark v3.0.2 on job to run on specified dates (i.e. March31 and Dec31) of every year.
I am using airflow for scheduling.
how to handle this use-case ?
If you want to run your job only March, 31st and December, 31st, you can set a cron expression in schedule_interval argument in your DAG definition.
Cron expression will be 0 0 31 3,12 * and can be translated to run at midnight the 31st day of the month for month 3 (March) and month 12 (December). Thus your DAG definition should be:
from airflow import DAG
your_dag = DAG(
dag_id='your_dag_id',
...
schedule_interval='0 0 31 3,12 *',
...
)
For more complicated cases such as run April, 15th and August, 23rd that cannot be defined with cron expression, I guess you should do as Iñigo suggested
You have some options here:
Option 1:
Create a dag that has as the first step one PytonOperator that the date and fails if it not Dec31 or Mar31.
Make this first step required to run the next step.
Option 2:
Create one dag that runs yearly for every date. This looks awful but it can be done easily with a single python file like this:
# Create a dag for an exact date
def createYearlyDagForDate(startdate):
with DAG(startdate=startdate,
task_id=f"createdagdordate_{startdate.strftime(month_%m_day_%d)}"
schedule_interval="#yearly") as dag:
sparkjob = SparkSubmitOperator(...)
return dag
for x in [datetime(2021,12,31), datetime(2021,03,31) ]:
createYearlyDag(x)
The trick here is having a task_id for each dag. If you reuse task_id in the dags, you are overwriting the dag and will have just one declared.
I am having issues scheduling my DAG runs properly, especially in terms of schedule_interval & start_date.
Airflow's default behavior doesn't suit my use case, wherein if I set an arbitrarily early date for start_date, airflow will always trigger the initial run outside the "intuitive" first schedule (e.g. if I set my schedule_interval to run at 12AM every Wednesday & Friday, and I created this DAG on a Monday with an arbitrarily early date many weeks back, there is a cron run at the exact instance the DAG is created in the Monday, which I do not want).
Understand from reference that the scheduled run will happen after 1 scheduled_interval cycle. While I have taken reference to How to work correctly airflow schedule_interval & https://airflow.apache.org/docs/1.10.6/scheduler.html#backfill-and-catchup, they generally show examples related to regular intervals (e.g. hourly, daily, monthly, etc.).
However, in my scenario, I have a schedule_interval of 0 16 * * 1,2,0 & 0 16 * * 1,3,5. As the intervals between runs are irregular, how do I set my start_date such that:
The first run happens exactly on the next datetime when it is "intuitive" to run? e.g.
if I create these 2 DAGs on a Saturday, I would want my DAGs to make their first run at 1600h on Sunday & Monday respectively
if I create these 2 DAGs on Monday 1700h, I would want my DAGs to make their first run at 1600h on Tuesday & Wednesday respectively
In AWS glue service there is an option to trigger job by custom CRON expression. Before i used this (0/2 * * ? *) cron expression to trigger job for every 2 hours.
Now I need to change the cron expression to trigger every 90 minutes, i.e for every 1 and a half hour. I tried with many cron expressions but that did not triggered for every 90 minutes. Even if i give for 90 minutes, it trigged for every 1 hour.
Can anyone help me out by providing the correct cron expression to trigger job for every 90 minutes ?
You can use the following pattern which was based on Bill Weiss' answer on Server Fault. It was modified to comply with the unique syntax AWS uses (reference here):
0 0-21/3 * * ? *
30 1-22/3 * * ? *
You'll have to define two separate Glue Triggers to accomplish this, each with the same job settings.
If curious, the syntax reads:
Run every 0th minute for every third hour for 0-21 hours
Run every 30th minute for every third hour for 1-22 hours
I wonder if it is possible to write an cron expression with several conditions:
Job should be run with given interval in minutes. For example with interval 42 minutes the fire times would be 10:00, 10:42, 11:24, 12:06 and etc.
If the current minute does not end with 0 (e.g. 10:28,10:29), then cron first fire time should be 10:30. So it means that first fire time should have "round" minutes.
I hope that you understand these conditions. Is it possible to describe them with quartz cron?
You can use job trigger like described below in Quartz.net 3.0:
var jobTrigger = TriggerBuilder.Create()
.StartNow()
.WithSimpleSchedule(s => s
.WithIntervalInMinutes(42)
.RepeatForever())
.Build();
And you can restart app at first round time, so it will fire first time at the same time only.
I usually use http://www.cronmaker.com/ to generate my cron expressions. And if you try the every 42 mins option you'll get the following expression: " 0 0/42 * 1/1 * ? *". As for the "round" minutes thing, you can try this when building your trigger:
ITrigger trigger = TriggerBuilder.Create()
.WithIdentity(JobTrigger, JobGroup)
.WithCronSchedule(CroneExpression)
.StartAt(new DateTimeOffset(DateTime.Now,
TimeSpan.FromMinutes(DateTime.Now.Minute % 10)))
.Build();
It is not possible, see for explanation and similar issue: Quartz.net - Repeat on day n, of every m months?
it is also not possible by Cron expressions. To do this, you would need to apply some complex logic, use some operator that is not present in evaluators. Why do you need this? Would you like to combine those 2 requirements and create single complex pattern?
I am using the Quartz Scheduling and I've tried to create a trigger that starts every day at 9 AM until 5 PM, every 25 minutes. It should like that:
9:00, 9:25, 9:50, 10:15, 10:40, 11:05, etc
The final quarts expression looks like that:
0 0/25 9-17 * * ? *
But the execution looks like that:
9:00, 9:25, 9:50, 10:00, 10:25, 10:50, 11:00, etc
There is any way to reach this schedule:
9:00, 9:25, 9:50, 10:15, 10:40, 11:05, etc
or I should change quartz?
Thank you!
Actually this question is similar to Cron expression to be executed every 45 minutes SO question.
Cron expression will not allow you to do that as it defines the exact date and times, when a trigger must be fired. And setup like your actually means "fire every 25 minutes, starting at minute 0 of every hour".
You can achive what you want by using SimpleTrigger with .WithIntervalInMinutes(25) configuration.
SimpleTrigger should meet your scheduling needs if you need to have a job execute exactly once at a specific moment in time, or at a specific moment in time followed by repeats at a specific interval.
P.S. Your cron expression will work for 20 minutes (0 0/20 9-17 * * ? *), as 60 is a multiple of 20. Just in case changing interval is not critical to you)
P.S.2 To be honest you can use Cron expressions if setup few trigger for different intervals, but that is useless. Anyway look onto this SO answer