is there a way to set up/write a custom schedule_interval in an Airflow DAG?
What I'm looking for is a way to set up a schedule when DAG runs on a daily basis except of holidays (like Christmas, Labor Day, Independence Day etc.)
It is not possible to achieve with standard cron expressions. Any help/guide is much appreciated.
Use the PythonBranchOperator or create a new operator that inherits BaseBranchOperator where you implement the skipping logic. I believe you'll need a DummyOperator as the "skip" branch, and your regular DAG flow as the other arm. For your cron expression, use whatever the normal schedule should be and implement custom skips in the task that's handling branching.
There is no native support for this type of scheduling but you can solve this with adding ShortCircuitOperator to the beginning of your workflow.
This operator execute a python callable. If condition met it continue workflow if condition doesn't met it mark all downstream tasks as skipped.
Possible solution can be:
import holidays
def decide(**kwargs):
# Select country
us_holidays = holidays.US()
if str(kwargs['execution_date']) in us_holidays:
return False # Skip workflow if it's a holiday.
return True
dag = DAG(
dag_id='mydag',
schedule_interval='#daily',
default_args=default_args,
)
start_op = ShortCircuitOperator(
task_id='start_task',
python_callable=decide,
provide_context=True, # Remove this line if you are using Airflow>=2.0.0
dag=dag
)
#Replace this with your actual Operator in your workflow.
next_op = Operator(
task_id='next_task',
dag=dag
)
start_op >> next_op
This solution is based on the answer provided in Detecting a US Holiday I didn't test it but it should work. In any case you can replace the logic in decide to any method that detects if a date is a holiday or not.
Related
I have one requirement, in my application every month TWS job get trigger on different dates. For example, Monthly job in Jan will run on 10th and in Feb it will run on 15th and in march on 20th ....Is there any way to implement this in AirFlow ? Not sure if we can do this using Crons, since day(dd) is different in each month. Does AirFlow support custom calendar ?
Not sure if we can do this using Crons, since day(dd) is different in each month
One way to do that is to use BranchOperator that will call your Python function with the business logic:
dag = DAG(....) # trigger daily
def define_datetime(context):
# here your logic to find the route to follow, depending on the execution time
# you should find this date in context['execution_date']
return "execute" if date_is_expected() else "pass"
with dag:
branch = BranchPythonOperator(
...
)
pass = DummyOperator(task_id='pass',...)
execute = PythonOperator(task_id='execute'...) # or any operator that will execute the job
branch >> [pass, execute]
I wanted to run Cucumber Feature file based on the Test case ID that scanerio name contains.
I know we can use #CucumberOptions 'features' tag and specify the line number to execute e.g "src/test/resources/Folder/myfile.feature:7:12"
This will run scenarios at line 7 and 12. But i wanted to run based on the TC ID.
Below is the feature file code
#Run
Feature: Login Functionality
Scenario: First Test Case(TC.No:1)
Given I perform action 1
Scenario: Second Test Case(TC.No:2)
Given I perform action 2
Scenario: Third Test Case(TC.No:3)
Given I perform action 3
Scenario: Fourth Test Case(TC.No:4)
Given I perform action 4
Scenario: Fifth Test Case(TC.No:5)
Given I perform action 5
All the scenario's are in a single feature.
For the feature file code above i wanted some way through which i can execute based on TC Id. E.g I only want to execute TC1,TC2 and TC5( TC id's picked up from scenario names).
There is a property file that contains the TC Id's to be executed. My code should read the file and then execute only those TC id's.
This can help me in reducing the number of automation TC's to be run.
Is it possible?
You can use the name property of #CucumberOptions or use the '-n' option if you are using the cli option. It also supports regular expressions.
To run TC.No:1 and TC.No:4 use something like this
#CucumberOptions(name = { "TC.No:1|TC.No:4" })
or
#CucumberOptions(name = { "TC.No:1","TC.No:4" })
You can get more details at this link.
As you are reading the ids from a file, the second option is the best. Use the cucumber.api.cli.Main class main() method to execute the features. You can create the options dynamically. Refer to this post.
CLI reference docs.
Not familiar with cucumber-jvm.
But, here is the general logic which should work (based on my ruby Cucumber knowledge)
In the hook, you can write the logic to under before method to get the scenario name scenario.name and then extract the TC.No. Compare the TC.No and skip if it's not part of your list.
Here is the link which will give information how to skip the scenario (use this class in the before method)
https://junit.org/junit4/javadoc/4.12/org/junit/AssumptionViolatedException.html
However, the best practice is to use the tags, it would have been easy if you had #TCId-xx tag. Still you can write a simple program that will scan all the feature files and update the scenarios with the tag based on the TC.No in the scenario name.
Assuming I have a timestamp like one obtained from current_timestamp() UDF inside spark when using a function like: hour(), minute(), ... . How can I specify a time zone?
I believe that https://issues.apache.org/jira/browse/SPARK-18350 introduced the support for it. But can't get it to work. Similar to the last comment on the page:
session.read.schema(mySchema)
.json(path)
.withColumn("year", year($"_time"))
.withColumn("month", month($"_time"))
.withColumn("day", dayofmonth($"_time"))
.withColumn("hour", hour($"_time", $"_tz"))
Having a look at the definition of the hour function, it uses an Hour
expression which can be constructed with an optional timeZoneId. I
have been trying to create an Hour expression but this is
Spark-internal construct - and the API forbids to use it directly. I
guess providing a function hour(t: Column, tz: Column) along with the
existing hour(t: Column) would not be a satisfying design.
I am stuck on trying to pass a specific time zone to the default builtin time UDFs.
I'm using Azure Data Factory(V2) to schedule a copy pipeline activity - the requirement is that every day the job should run and select everything from a table, from the last 5 days. I have scheduled the copy and tried the following syntax in the source dataset:
select * from [dbo].[aTable] where [aDate] >= '#{formatDateTime(adddays(pipeline().parameters.windowStart, 'yyyy-MM-dd HH:mm' ),-5)}'
But this doesn't work, I'm getting an error stating that adddays is expecting an int for it's second parameter but is receiving a string.
Can anybody advise on the proper way to nest this??
Thanks
I cant test this right now, so I'll risk a possible answer just by looking at your query. I think it should be like this:
select * from [dbo].[aTable] where [aDate] >= '#{formatDateTime(adddays(pipeline().parameters.windowStart, -5), 'yyyy-MM-dd HH:mm')}'
Hope this helps!
I am using JIRA Builtin Script Listener - Create a sub-task. to create subtasks for Dev and QA for every story and bug in JIRA.
I would like the subtasks to be always assigned to user "Virtual QA". It seems that I have to do this through Additional issue actions field.
I am trying to use:
issue.summary = ('QA: ' + issue.summary)
issue.assignee = 'Virtual QA'
This works only if I use only the first line to set the subtask summary but when I add the second line the script does not run. Can you please help me to solve it?
I was not able to help myself using the official documentation on: https://jamieechlin.atlassian.net/wiki/display/GRV/Built-In+Scripts
you need to pass user object instead of string, same question here
https://answers.atlassian.com/questions/66562/set-assignee-to-some-specific-user-in-post-function-script
After a lot of investigation and a hundred text tickets this works :)
import com.atlassian.jira.component.ComponentAccessor
issue.summary = ('QA: ' + issue.summary)
issue.setAssignee(ComponentAccessor.getUserUtil().getUser('qa'))