Properly Defining DAG Schedule Interval - cron

Problem: We're having difficulties having a DAG fire off given a defined interval. It's also preventing manual DAG executions as well. We've added catchup=False as well to the DAG definition.
Context: We're planning to have a DAG execute on a 4HR interval from M-F. We've defined this behavior using the following CRON expression:
0 0 0/4 ? * MON,TUE,WED,THU,FRI *
We're unsure at this time whether the interval has been defined properly or if there are extra parameters we're ignoring.
Any help would be appreciated.

I think you are looking for is 0 0/4 * * 1-5 which will run on 4th hour from 0 through 23 on every day-of-week from Monday through Friday.
Your DAG object can be:
from airflow import DAG
with DAG(
dag_id="my_dag",
start_date=datetime(2022, 2, 22),
catchup=False,
schedule_interval="0 0/4 * * 1-5",
) as dag:
# add your operators here

Related

Airflow Task Instance in Success state but no Start and End date (Not executed)

I have an Airflow DAG which is scheduled to run once in a month. The DAG ran wonderfully till 01-08-2021. The next schedule was on 01-09-2021. The DAG and the Task were in SUCCESS state, but the DAG actually did not run. I couldn't see the start and end dates for the task instance. There are no logs too. Any help is appreciated. Thanks!
Airflow version : 1.10.6
Task Instances : All are in SUCCESS state
DAG details :
key
value
schedule_interval
0 6 1 * *
max_active_runs
0 / 16
concurrency
16
default_args
{'provide_context': True, 'depends_on_past': False, 'start_date': <Pendulum [2021-05-01T00:00:00+00:00]>, 'retries': 1, 'catchup_by_default': False, 'retry_delay': datetime.timedelta(0, 300)}
tasks count
1
task ids
['GENERATE_INVOICE']
filepath
invoicing.py
owner
airflow

How do I define a timezone aware Dag to run at 5 past midnight every day?

I'm running Apache Airflow 1.10.0 and I want to take advantage of the new timezone aware Dag feature. I must admit that the Airflow scheduler is a bit confusing and I'm not quite sure how to accomplish what I'm trying to do. I am trying to define a Dag that will run at 5 past midnight (Eastern time) every day.
So far I've tried defining the Dag with a timezone aware start_date using Pendulum. My schedule interval is timedelta(days=1). For some reason this has resulted in runs at seemingly odd times 12:00, etc.
My current Dag definition:
...
dag_tz = pendulum.timezone('US/Eastern')
default_args = {
'owner': 'airflow',
'email': '<email_address>',
'email_on_failure': True,
'email_on_retry': True,
'retries': 3,
'depends_on_past': False,
'retry_delay': timedelta(minutes=5),
'provide_context': True,
'start_date': datetime(2019, 5, 1, tzinfo=dag_tz)
}
dag = DAG('my_dag_id', default_args=default_args,
catchup=False, schedule_interval=timedelta(days=1))
...
What I'd like is for the Dag to run at the same time each day. I've seen that I can use a cron expression for schedule_interval but that's confusing as well because I'm not sure if I need to include my UTC offset in the cron expression or if the fact that the Dag is timzeone aware will take care of this.
For example, should my schedule_interval be 05 04 * * * or 05 00 * * * or something else entirely?
After some experimentation I have concluded that in order to get the dag to run at 5 past midnight every day I need to use a schedule interval of 05 00 * * * along with the timezone aware start date.
You can also write it without 0-prefix. Like 5 0 * * *

Airflow schedular doesnt pick the DAG on specific schedule_interval config( 0 8 * * 5)

I want to add schedule(Every Friday at 8:00 AM) to the DAG. Below is my config for the DAG:
DAG CONFIG:
args = {
'owner': 'airflow',
'start_date': airflow.utils.dates.days_ago(20),
'depends_on_past': False,
'email': [failure_email],
'email_on_failure': True,
'email_on_retry': True,
}
dag = DAG(dag_id='dag_airflow_delete_logs_less_than_40_days',
default_args=args,
schedule_interval='0 8 * * 5',
max_active_runs=1)
After adding the schedule to DAG, airflow didn't pick up the dag on Friday 8:00 AM UTC. I removed '5' from the crontab schedule and configured as '0 8 * * *' it worked fine for every day.
I also tried different ways to schedule interval using crontab format, still no luck:
(0 8 * * 5)
(0 8 * * FRI)
I don't understand why it's not working when I specify the day in the interval? Please let me know your thoughts. Thanks in advance!
Note: I used below websites to check the crontab configs .
1) https://crontab.guru/
2) http://corntab.com
Attached Screenshot for the DAG Runs: http://tinypic.com/view.php?pic=5x4x3d&s=9#.W-RSjLpFyFQ

Airflow - Error when scheduling airflow script to run on specific days of the week

I am trying to have an airflow script to be scheduled to run every Tuesday at 9:10 AM UTC. Given below is how I have defined it.
dag = DAG(
dag_id=DAG_NAME,
default_args=args,
schedule_interval="10 9 * * 2",
catchup=False
I however find that when the time comes, the script does not get triggered automatically. However if I do not have the value defined in the day column (last column), the scheduler works fine. Any idea where I am going wrong.
Thanks
Update:
args = {
'owner': 'admin',
'start_date': airflow.utils.dates.days_ago(9)
}
dag = DAG(
dag_id=DAG_NAME,
default_args=args,
schedule_interval = "10 9 * * 2",
catchup = False
)
This one stumps people more than anything else in Airflow, but as commenter and Airflow documentation state,
The scheduler runs your job one schedule_interval AFTER the start date, at the END of the period.
In this case you can either bump back your DAG start_date one schedule_interval or wait for the next schedule_interval to complete.

Cron tab scheduling for different timings on different days

I need to schedule a job from Mon-Thur at 7pm and on Friday i need it to be scheduled at 11pm. I am using Airflow and need the cron tab notation like
0 19 * * Mon-Thu
Any suggestion are welcome.
Thank you
Regards,
CJ
You can create your dag as:
dag = DAG("Your_dag", default_args=default_args, schedule_interval="0 19 * * 1-4")
You could do something like this:
schedules = {
'M-Th': '0 19 * * 1-4',
'F': '0 23 * * 5',
}
for name, schedule in schedules.items():
globals()[name] = DAG('<base_dag_name.' + name, default_args=default_args, schedule_interval=schedule)
This will create two DAGs from a single file. DAGs need to be in the global scope to be recognized by Airflow.

Resources