Azure Pipeline Trigger Parameters - azure

I have two YAML pipelines A and B, where A triggers B. They both have same parameter P. Pipeline B has set resource trigger and is run after pipeline A finishes - this works. However it seems that pipeline B is not run using the same parameter P as pipeline A is. B uses always default (first) parameter.
I have tried finding a solution to pass parameters from A to B, without success. I found some older (2020) similar question, where there is stated it is not possible.
Is this something that cannot be done (using resource triggers) or am I missing something?

As per the So thread answer return by MSFT you can't pass different parameter values to pipeline triggers.
And you can follow the workaround provided by the same MSFT that using DevOps counters.
variables:
internalVersion: 1
semanticVersion: $[counter(variables['internalVersion'], 1)]
for more information of DevOps counters check this document.
You can even raise a feature request.

Related

Can we set task wise parameters using Databricks Jobs API "run-now"

I have a job with multiple tasks like Task1 -> Task2. I am trying to call the job using api "run now". Task details are below
Task1 - It executes a Note Book with some input parameters
Task2 - It executes a Note Book with some input parameters
So, how I can provide parameters to job api using "run now" command for task1,task2?
I have a parameter "lib" which needs to have values 'pandas' and 'spark' task wise.
I know that we can give unique parameter names like Task1_lib, Task2_lib and read that way.
current way:
json = {"job_id" : 3234234, "notebook_params":{Task1_lib: a, Task2_lib: b}}
Is there a way to send task wise parameters?
It's not supported right now - parameters are defined on the job level. You can ask your Databricks representative (if you have) to communicate this ask to the product team who works on the Databricks Workflows.

Restrict variable grouping being used in Azure DevOps Step

I have two variable groups with overlapping keys but different values. I want to use one group under one task [ JSON replace ] and the other group in another [JSON replace ].
I have tried going through the documentations and it says that variables can only be set at root/stage/job levels. Is there a way I can work around it?
I want to use one group under one task [ JSON replace ] and the other group in another [JSON replace ].
According to the document Specify jobs in your pipeline, we could to know that:
You can organize your pipeline into jobs. Every pipeline has at least
one job. A job is a series of steps that run sequentially as a unit.
In other words, a job is the smallest unit of work that can be
scheduled to run.
And the variable group will be added as a preselected condition to the precompiled review stage, we could not re-set in the task level.
To resolve this issue, you could overwrite the specify variable by the Logging Command:
Write-Host "##vso[task.setvariable variable=testvar;]testvalue"

How to acces output folder from a PythonScriptStep?

I'm new to azure-ml, and have been tasked to make some integration tests for a couple of pipeline steps. I have prepared some input test data and some expected output data, which I store on a 'test_datastore'. The following example code is a simplified version of what I want to do:
ws = Workspace.from_config('blabla/config.json')
ds = Datastore.get(ws, datastore_name='test_datastore')
main_ref = DataReference(datastore=ds,
data_reference_name='main_ref'
)
data_ref = DataReference(datastore=ds,
data_reference_name='main_ref',
path_on_datastore='/data'
)
data_prep_step = PythonScriptStep(
name='data_prep',
script_name='pipeline_steps/data_prep.py',
source_directory='/.',
arguments=['--main_path', main_ref,
'--data_ref_folder', data_ref
],
inputs=[main_ref, data_ref],
outputs=[data_ref],
runconfig=arbitrary_run_config,
allow_reuse=False
)
I would like:
my data_prep_step to run,
have it store some data on the path to my data_ref), and
I would then like to access this stored data afterwards outside of the pipeline
But, I can't find a useful function in the documentation. Any guidance would be much appreciated.
two big ideas here -- let's start with the main one.
main ask
With an Azure ML Pipeline, how can I access the output data of a PythonScriptStep outside of the context of the pipeline?
short answer
Consider using OutputFileDatasetConfig (docs example), instead of DataReference.
To your example above, I would just change your last two definitions.
data_ref = OutputFileDatasetConfig(
name='data_ref',
destination=(ds, '/data')
).as_upload()
data_prep_step = PythonScriptStep(
name='data_prep',
script_name='pipeline_steps/data_prep.py',
source_directory='/.',
arguments=[
'--main_path', main_ref,
'--data_ref_folder', data_ref
],
inputs=[main_ref, data_ref],
outputs=[data_ref],
runconfig=arbitrary_run_config,
allow_reuse=False
)
some notes:
be sure to check out how DataPaths work. Can be tricky at first glance.
set overwrite=False in the `.as_upload() method if you don't want future runs to overwrite the first run's data.
more context
PipelineData used to be the defacto object to pass data ephemerally between pipeline steps. The idea was to make it easy to:
stitch steps together
get the data after the pipeline runs if need be (datastore/azureml/{run_id}/data_ref)
The downside was that you have no control over where the pipeline is saved. If you wanted to data for more than just as a baton that gets passed between steps, you could have a DataTransferStep to land the PipelineData wherever you please after the PythonScriptStep finishes.
This downside is what motivated OutputFileDatasetConfig
auxilary ask
how might I programmatically test the functionality of my Azure ML pipeline?
there are not enough people talking about data pipeline testing, IMHO.
There are three areas of data pipeline testing:
unit testing (the code in the step works?
integration testing (the code works when submitted to the Azure ML service)
data expectation testing (the data coming out of the meets my expectations)
For #1, I think it should be done outside of the pipeline perhaps as part of a package of helper functions
For #2, Why not just see if the whole pipeline completes, I think get more information that way. That's how we run our CI.
#3 is the juiciest, and we do this in our pipelines with the Great Expectations (GE) Python library. The GE community calls these "expectation tests". To me you have two options for including expectation tests in your Azure ML pipeline:
within the PythonScriptStep itself, i.e.
run whatever code you have
test the outputs with GE before writing them out; or,
for each functional PythonScriptStep, hang a downstream PythonScriptStep off of it in which you run your expectations against the output data.
Our team does #1, but either strategy should work. What's great about this approach is that you can run your expectation tests by just running your pipeline (which also makes integration testing easy).

Azure DevOps Releases skip tasks

I'm currently working on implementing CI/CD pipelines for my company in Azure DevOps 2020 (on premise). There is one requirement I just not seem to be able to solve conveniently: skipping certain tasks depending on user input in a release pipeline.
What I want:
User creates new release manually and decides if a task group should be executed.
Agent Tasks:
1. Powershell
2. Task Group (conditional)
3. Task Group
4. Powershell
What I tried:
Splitting the tasks into multiple jobs with the task group depending on a manual intervention task.
does not work, if the manual intervention is rejected the whole execution stops with failed.
Splitting the tasks into multiple stages doing almost the same as above with the same outcome.
Splitting the tasks into multiple stages trigger every stage manually.
not very usable because you have to execute what you want in the correct order and after the previous stages succeeded.
Variable set at release creation (true/false).
Will use that if nothing better comes up but kinda prone to typos and not very usable for the colleagues who will use this. Unfortunately Azure DevOps seems to not support dropdown or checkbox variables for releases. (but works with parameters in builds)
Two Stages one with tasks 1,2,3,4 and one with tasks 1,3,4.
not very desireable for me because of duplication.
Any help would be highly appreciated!
Depends on what the criteria is for the pipelines to run. One recommendation would be two pipeline lines calling the same template. And each pipeline may have a true/false embedded in it to pass as a parameter to the template.
The template will have all the tasks defined in it; however, the conditional one will have a condition like:
condition: and(succeeded(), eq('${{ parameters.runExtraStep}}', true))
This condition would be set at the task level.
Any specific triggers can be defined in the corresponding pipeline.
Here is the documentation on Azure YAML Templates to get you started.
Unfortunately, it's impossible to add custom condition for a Task Group, but this feature is on Roadmap. Check the following user voice and you can vote it:
https://developercommunity.visualstudio.com/idea/365689/task-group-custom-conditions-at-group-and-task-lev.html
The workaround is that you can clone the release definition (right click a release definition > Clone), then remove some tasks or task groups and save it, after that you can create release with corresponding release definition per to detailed scenario.
Finally I decided to stick with Releases and split my tasks into 3 agent jobs. Job 1 with the first powershell, job 2 with the conditional taskgroup that executes only if a variable is true and job 3 with the remaining tasks.
As both cece-dong and dreadedfrost stated, I could've achieved a selectable runtime parameter for the condition with yaml pipelines. Unfortunately one of the task groups needs a specific artifact from a yaml pipeline. Most of the time it would be the "latest", which can be easily achieved with a download artifacts task but sometimes a previous artifact get's chosen. I have found no easy way to achieve this in a way as convenient as it is in releases where you by default have a dropdown with a list of artifacts.
If found this blog post for anyone interested on how you can handle different build artifacts in yaml pipelines.
Thanks for helping me out!

How to re-try an ADF pipeline execution until conditions are met

An ADF pipeline needs to be executed on a daily basis, lets say at 03:00 h AM.
But prior execution we also need to check if the data sources are available.
Data is provided by an external agent, it periodically loads the corresponding data into each source table and let us know when this process is completed using a flag-table: if data source 1 is ready it set flag to 1.
I don't find a way to implement this logic with ADF.
We would need something that, for instance, at 03.00 h would trigger an 'element' that checks the flags, if the flags are not up don't launch the pipeline. Past, lets say, 10 minutes, check again the flags, and be like this for at most X times OR until the flags are up.
If the flags are up, launch the pipeline execution and stop trying to launch the pipeline any further.
How would you do it?
The logic per se is not complicated in any way, but I wouldn't know where to implement it. Should I develop an Azure Funtions that launches the Pipeline or is there a way to achieve it with an out-of-the-box AZDF activity?
There is a UNTIL iteration activity where you can check if your clause.
Example:
Your azure function (AF) checking the flag and returns 0 or 1.
Build ADF pipeline with UNTIL activity where you check the output of AF (if its 1 do something). In UNTIL activity you can have your process step. For example, you have a variable flag that will before until activity is 0. In your until you check if it's 1. if it is do your processing step, if its not, put WAIT activity on 10 min or so.
So you have the ability in ADF to iterate until something it's not satisfied.
Hope that this will help you :)

Resources