I have one requirement, in my application every month TWS job get trigger on different dates. For example, Monthly job in Jan will run on 10th and in Feb it will run on 15th and in march on 20th ....Is there any way to implement this in AirFlow ? Not sure if we can do this using Crons, since day(dd) is different in each month. Does AirFlow support custom calendar ?
Not sure if we can do this using Crons, since day(dd) is different in each month
One way to do that is to use BranchOperator that will call your Python function with the business logic:
dag = DAG(....) # trigger daily
def define_datetime(context):
# here your logic to find the route to follow, depending on the execution time
# you should find this date in context['execution_date']
return "execute" if date_is_expected() else "pass"
with dag:
branch = BranchPythonOperator(
...
)
pass = DummyOperator(task_id='pass',...)
execute = PythonOperator(task_id='execute'...) # or any operator that will execute the job
branch >> [pass, execute]
Related
is there a way to set up/write a custom schedule_interval in an Airflow DAG?
What I'm looking for is a way to set up a schedule when DAG runs on a daily basis except of holidays (like Christmas, Labor Day, Independence Day etc.)
It is not possible to achieve with standard cron expressions. Any help/guide is much appreciated.
Use the PythonBranchOperator or create a new operator that inherits BaseBranchOperator where you implement the skipping logic. I believe you'll need a DummyOperator as the "skip" branch, and your regular DAG flow as the other arm. For your cron expression, use whatever the normal schedule should be and implement custom skips in the task that's handling branching.
There is no native support for this type of scheduling but you can solve this with adding ShortCircuitOperator to the beginning of your workflow.
This operator execute a python callable. If condition met it continue workflow if condition doesn't met it mark all downstream tasks as skipped.
Possible solution can be:
import holidays
def decide(**kwargs):
# Select country
us_holidays = holidays.US()
if str(kwargs['execution_date']) in us_holidays:
return False # Skip workflow if it's a holiday.
return True
dag = DAG(
dag_id='mydag',
schedule_interval='#daily',
default_args=default_args,
)
start_op = ShortCircuitOperator(
task_id='start_task',
python_callable=decide,
provide_context=True, # Remove this line if you are using Airflow>=2.0.0
dag=dag
)
#Replace this with your actual Operator in your workflow.
next_op = Operator(
task_id='next_task',
dag=dag
)
start_op >> next_op
This solution is based on the answer provided in Detecting a US Holiday I didn't test it but it should work. In any case you can replace the logic in decide to any method that detects if a date is a holiday or not.
Assuming I have a timestamp like one obtained from current_timestamp() UDF inside spark when using a function like: hour(), minute(), ... . How can I specify a time zone?
I believe that https://issues.apache.org/jira/browse/SPARK-18350 introduced the support for it. But can't get it to work. Similar to the last comment on the page:
session.read.schema(mySchema)
.json(path)
.withColumn("year", year($"_time"))
.withColumn("month", month($"_time"))
.withColumn("day", dayofmonth($"_time"))
.withColumn("hour", hour($"_time", $"_tz"))
Having a look at the definition of the hour function, it uses an Hour
expression which can be constructed with an optional timeZoneId. I
have been trying to create an Hour expression but this is
Spark-internal construct - and the API forbids to use it directly. I
guess providing a function hour(t: Column, tz: Column) along with the
existing hour(t: Column) would not be a satisfying design.
I am stuck on trying to pass a specific time zone to the default builtin time UDFs.
I'm running this script;
$query = "SELECT * FROM XXX WHERE email='$Email'";
if($count==1) // fails
if($count==0) // succeeds
If successful
mysql_query ("INSERT INTO XXX (email) values ('$Email'");
Then proceeds onto the next script.
So, it checks to see if you have already ran this script in the past on that account, if you have your email is stored then you can't run this script ever again on that same email.
However, after this script has been processed I want it to delete the row created for the email after 6 hours.
So that after 6 hours they may run the script again.. I've been enlightened that I need to use Cron jobs for this, But I'm not sure how.. Any help is highly appreciated!
Many regards, and thanks in advance.
0 0,6,12,18 * * * /path/to/mycommand
This means starting from hour 0, 6, 12, and 18 the cron job would run. That would be the cron needed to do what you want.
Depending on which linux version you are running you will need to see how to actually create the cron job.
I would think at now +6 hours is a better choice here.
I had a sctipt runthisapp.sh and I am having some dates in my db table holiday . My problem is every day it should check the db ,if the date is present in holiday table runthisapp.sh should run at 10'o clock
Else
It should run at 8'o clock.
I had tried but can't find the solution .Can you.help me.on this please
You can have two crontab entries, one at 8am, one at 10am, passing different options to your script, e.g. the former takes --holiday=0, and the latter --holiday=1, and your script should just return doing nothing if in the wrong "holidayness".
Is there any possibility to make the Quartz.Net trigger to work based on weekly basis. I have done up the rest of the things. Its urgent. Please guide me how can i do this.
Create a cron trigger with a format such as "0 22 30 ? * MON". This means it will run at the time 22:30:00 every Monday. The year parameter is omitted. The day-of-month is irrelevant (hence '?') and the month is set to every month ('*').
The configuration in your jobs config file would be:
<trigger>
<cron>
<name>MondayTrigger</name>
<group>MyGroup</group>
<description>A description</description>
<job-name>Job1</job-name>
<job-group>JobGroup1</job-group>
<cron-expression>0 22 30 ? * MON</cron-expression>
</cron>
</trigger>