How to configure Airflow dag to run at specific interval? - cron

How to configure the Airflow dag to execute at specified interval?
I'm trying to schedule my jobs to run at every 29 days, If the start date is 2021/11/13 it's going to run at 2021/12/12 and next one would be 2022/1/10.
Is there any way to do this on airflow? Any pointer would be appreciated.

It can be configured in schedule_interval argument while creating DAG() object.
You can use Cron syntax to set your schedule. In your case, "0 0 */29 * *"
dag = DAG('DAG_NAME', default_args=default_args, schedule_interval="0 0 */29 * *")

Related

Airflow terminates current run, and starts new run, every day at midnight, despite my CRON schedule

I have an Airflow job that I wish to run every 130 minutes. I have set the cron schedule like this: "*/130 * * * *".
This schedule functions normally, until the clock hits midnight. Each day at midnight, if a job is currently underway, Airflow will terminate the job and start a new job. I do NOT want this behavior. Thanks in advance for your advice!

Run airflow on schedule that can't be specified with cron

I have a number of jobs that are currently being scheduled with multiple cron schedules.
For example, I have a job that runs on:
35 9,13,16 * * mon-fri & 40 16 * * mon-fri
I would like to have 1 dag with the schedule, run at 9:35, 13:35, 16:35, 16:40 mon-fri.
Is it possible to do this with airflow/cron?
You can do it with custom timetables https://airflow.apache.org/docs/apache-airflow/stable/howto/timetable.html

Issue while running Monthly Cron Expression with Airflow

i need some help understanding the behaviour with monthly Cron expression [43 10 3,8,12 */3 *] with start_date as datetime(year=2019, month=11, day=18, hour=1, minute=30, second=0 , tzinfo=pendulum.timezone("UTC")) and end_date as None . This has backfill set as true .
Current Date is: 2020-10-19
As per my understanding it should not have triggered last two runs 10-03 and 10-08 . Can someone please help me understand this behavior? Also if it is triggering run for execution_date of 10-03 and 10-08 then why not for 10-12?
Could you elaborate on "it should not have trifggered the last two runs"?
The cron expression 43 10 3,8,12 */3 * matches:
“At 10:43 on day-of-month 3, 8, and 12 in every 3rd month.”
A good tool to validate cron expression is crontab.guru.
The execution date 10-12 hasn't triggered yet, because of how Airflow handles execution_date - see airflow scheduler:
The scheduler won’t trigger your tasks until the period it covers has ended e.g., A job with schedule_interval set as #daily runs after the day has ended. This technique makes sure that whatever data is required for that period is fully available before the dag is executed. In the UI, it appears as if Airflow is running your tasks a day late
Let’s Repeat That, the scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.
This means the run with execution date 2020-10-12 10:43:00 will be triggered just shortly before 2021-01-03 10:43:00.

Airflow: Is there a way where I can trigger a job at 35th second, every minute using CRON expression?

I want to trigger a shell script at 35th second, every minute. However, I see that airflow supports CRON from the minute level instead of the second level.
For example, the cron Expression
35 * * * *
triggers a job at every 35th minute in airflow.
I'm not familiar with Airflow, but cron only supports 1-minute resolution.
You could use cron to invoke a job every minute and let the job sleep for 35 seconds:
* * * * * sleep 35 ; do_something

Combine two cron-scheduling intervals in a single DAG

Rewrite of the question:
Using airflow, I would like to schedule a process to run every two hours from 2 till 10 am and a single time at 22:30. The schedule_interval parameter accepts a cron-expression, but it is not possible to define a single cron-expression to achieve the above scheduling. Currently, I did:
dag = DAG(process_name, schedule_interval='30 2,4,6,8,10,12,14,16,18,20,22,23 * * *', default_args=default_args)
But this will execute the process every 30 minutes past the hour, and this every 2 hours from 2 till 23.
Is there a way I can combine two cron-schedules in Airflow?
0 2-10/2 * * *
30 22 * * *
Original question:
I have 2,4,6,10,12,14,16,18,20,22 00 * *
I need to have 23, 30 in my schedule, but I don't want 2-22 to be run at the 30 min interval.
So, I realized, it is not possible!
You can't use two cron expressions for the same DAG (Might change in the future if PR: Add support for multiple cron expressions in schedule_interval is accepted)
Starting Airflow >=2.2.0:
It is possible to get custom time based triggering using custom Timetable by customizing the DAG scheduling to match what you expect.
To do so you need to define the scheduling logic by implementing next_dagrun_info and infer_manual_data_interval functions - Airflow will leverage this logic to schedule your DAG.
You can view an example can be found here.

Resources