Is there a way to visualise the changes to an ADF pipeline when reviewing a PR? - azure

I am currently reviewing between 5-15 pull requests a week on a project being developed using Azure Data Factory (ADF) and Databricks.
Most pull requests contain changes to our ADF pipelines, which gets stored in source code as nested JSON.
What I've found is that, as a reviewer, being able to visually see the changes being made to an ADF pipeline in the pull request make a huge difference in the speed and accuracy at which I can perform my review. Obviously, I can check out the branch and go view the pipelines for that branch directly on ADF, but that does not give me a differential view.
My question is this: Is there a way to parse two ADF pipeline json objects (source and destination branch versions of the same file) and generate a visual representation of each object? Ideally highlighting the differences, but just showing them would be a good first stab.
Bonus points if we can fit this into a Azure DevOps release pipeline and generate it automatically as part of the CI/CD pipeline.

If you are already using Azure DevOps then you should have exactly what you are wanting available in every Pull Request. For any Pull Request you can click on the Files tab and it will show a side by side comparison of every file. It color codes it and includes additions, updates, and removals. It is very helpful for review. Please refer to this screenshot for details and illustration:

Related

Can Alteryx workflow for ML pipeline be built with flat files first and later swapped with API connector?

I'm trying to build a ML pipeline using Alteryx. I'll be pulling data via API and build an automated workflow. But first, until I get license and make sure, I'd like to use flat files to build the pipeline. The data in flat files and data from API (batch) would essentially be the same. Can I develop the full pipeline first and then swap the ingestion portion later with API connector?
I have searched online but haven't found an answer to this.

Azure Pipeline Check is any builds are running in different azure pipeline task

So to give you a bit of context we have a service which has been split into two different services ie one for the read and one for the write side operations. The read side is called ProductStore and the write side is called ProductCatalog. The issue were facing was down the write side as the load tests create 100 products in the write side resource web app and then they are transferred to the read side for the load test to then read x number of times. If a build is launched in the product catalog because something new was merged to master then this will cause issues in the product store pipeline if it gets run concurrently.
The question I want to ask is there a way in the ProductStore yaml file to directly query via a specified azure task or via an AzurePowershell script to check if a build is currently running in the ProductCatalog pipeline.
The second part of this would be to loop/wait until that pipeline has successfully finished before resuming the product store pipeline.
Hope this is clear as I'm not sure how to best ask this question as I'm very new to the DevOps pipelines flow but this would massively help if there was a good way of checking this sort of thing.
As a workaround , you can set Pipeline completion trigger in ProductStore pipeline.
To trigger a pipeline upon the completion of another, specify the triggering pipeline as a pipeline resource.
Or configure build completion triggers in the UI, choose Triggers from the settings menu, and navigate to the YAML pane.

Migrating existing (entire) Azure DevOps pipeline to YAML based pipelines (in bulk)

I would like to move the existing Azure DevOps pipelines to YAML based for obvious advantages. The problem is there are many of these and each one has many jobs.
When I click around in Azure DevOps, the "View YAML" link only appears for one job at a time. So that's gonna be a lot of manual work to view YAMLs for each pipeline x jobs and move that to code.
But for each pipeline there seems to be a way to "export" the entire pipeline in json. I was wondering if there is a similar way to at least dump the entire pipeline as YAML if not an entire folder.
If there is an API which exports the same then even better.
Until now, what we supported is what you see, use View YAML to copy/paste the definition of agent job. There has another workaround to get the entire definition of one pipeline is use the API to get the JSON from a build definition, convert it to YAML, tweak the syntax, then if needed, update the tasks which are referenced.
First, use Get Build Definition api to get the entire definition
of one pipeline.
Invoke JSON to YAML converter. Copy/paste the JSON of definition
into this converter.
Copy the YAML to a YAML editor of Azure Devops. Then the most important step is tweak the syntax.
Replace the refName key values with task names and version. For this, you can go our tasks source code which opened in github, built in tasks can be found there(note: please see the task.json file of corresponding task)
Noted: Use this method has another disadvantage that you need very familiar with YAML syntax so that you can tweak the content which convert from JSON successfully.
This is done and there is blog post about exporting pipelineas YAML on devblogs
It's it worth to mention that the new system knows how to handle every feature listed here:
Single and multiple jobs
Checkout options
Execution plan parallelism
Timeout and name metadata
Demands
Schedules and other triggers
Pool selection, including jobs which differ from the default
All tasks and all inputs, including optimizing for default inputs
Job and step conditions
Task group unrolling
In fact, there are only two areas which we know aren’t covered.
Variables
Variables defined in YAML “shadow” (hide) variables set in the UI. Therefore, we didn’t want to export them into the YAML file in case you need an ability to alter them at runtime. If you have UI variables in your Classic pipeline, we mention them by name in the comments to remind you that you need to configure them in your new YAML pipeline definition.
Timezone translation
cron schedules in YAML are in UTC, while Classic schedules are in the organization’s timezone. Timezone handling has a lot of sharp edges, so we opted not to try to be clever. We export the schedule without doing any translation, so your scheduled builds might be off by a certain number of hours unless you manually modify them. Here again, we make a note in the comments to remind you.
But there won't be support for release pipelines:
No plans to do so. Classic RM pipelines are different enough in their execution that I can’t make the same strong guarantees about correctness as I can with classic Build. Also, a number of concepts were re-thought between RM and unified YAML pipelines. In some cases, there isn’t a direct translation for an RM feature. A human is required to think about what the pipeline is designed to accomplish and re-implement it using new constructs.
I tried yamlizr https://github.com/f2calv/yamlizr
It works pretty well for exporting Release Pipelines, except it doesn't export out Pre/Post deployment conditions. We use these for Approval gates. So hopefully in a future release it will be supported.
But per Microsoft they won't support the export to YAML for Release Pipelines it sounds like.
https://devblogs.microsoft.com/devops/replacing-view-yaml/#comment-2043

How to update ADF Pipeline level parameters during CICD

Being novice to ADF CICD i am currently exploring how we can update the pipeline scoped parameters when we deploy the pipeline from one enviornment to another.
Here is the detailed scenario -
I have a simple ADF pipeline with a copy activity moving files from one blob container to another
Example - Below there is copy activity and pipeline has two parameters named :
1- SourceBlobContainer
2- SinkBlobContainer
with their default values.
Here is how the dataset is configured to consume these Pipeline scoped parameters.
Since this is development environment its OK with the default values. But the Test environment will have the containers present with altogether different name (like "TestSourceBlob" & "TestSinkBlob").
Having said that, when CICD will happen it should handle this via CICD process by updating the default values of these parameters.
When read the documents, no where i found to handle such use-case.
Here are some links which i referred -
http://datanrg.blogspot.com/2019/02/continuous-integration-and-delivery.html
https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment
Thoughts on how to handle this will be much appreciated. :-)
There is another approach in opposite to ARM templates located in 'ADF_Publish' branch.
Many companies leverage that workaround and it works great.
I have spent several days and built a brand new PowerShell module to publish the whole Azure Data Factory code from your master branch or directly from your local machine. The module resolves all pains existed so far in any other solution, including:
replacing any property in JSON file (ADF object),
deploying objects in an appropriate order,
deployment part of objects,
deleting objects not existing in the source any longer,
stop/start triggers, etc.
The module is publicly available in PS Gallery: azure.datafactory.tools
Source code and full documentation are in GitHub here.
Let me know if you have any question or concerns.
There is a "new" way to do ci/cd for ADF that should handle this exact use case. What I typically do is add global parameters and then reference those everywhere (in your case from the pipeline parameters). Then in your build you can override the global parameters with the values that you want. Here are some links to references that I used to get this working.
The "new" ci/cd method following something like what is outlined here Azure Data Factory CI-CD made simple: Building and deploying ARM templates with Azure DevOps YAML Pipelines. If you have followed this, something like this should work in your yaml:
overrideParameters: '-dataFactory_properties_globalParameters_environment_value "new value here"'
Here is an article that goes into more detail on the overrideParameters: ADF Release - Set global params during deployment
Here is a reference on global parameters and how to get them exposed to your ci/cd pipeline: Global parameters in Azure Data Factory

Azure Pipelines (DevOps): Custom Consumable Statistic/Metric

I have a build up on Azure Pipelines, and one of the steps provides a code metric that I would like to have be consumable after the build is done. Ideally, this would be in the form of a badge like this (where we have text on the left and the metric in the form of a number on the right). I'd like to put such a badge on the README of the repository to make this metric visible on a per-build basis.
Azure DevOps does have a REST API that one can use to access built-in aspects of a given build. But as far as I can tell there's no way to expose a custom statistic or value that is generated or provided during a build.
(The equivalent in TeamCity would be outputting ##teamcity[buildStatisticValue key='My Custom Metric' value='123'] via Console.WriteLine() from a simple C# program, that TeamCity can then consume and use/make available.)
Anyone have experience with this?
One option is you could use a combination of adding a build tag using a command:
##vso[build.addbuildtag]"My Custom Metric.123"
Then use the Tags - Get Build Tags API.
GET https://dev.azure.com/{organization}/{project}/_apis/build/builds/{buildId}/tags?api-version=5.0

Resources