How to update ADF Pipeline level parameters during CICD - azure

Being novice to ADF CICD i am currently exploring how we can update the pipeline scoped parameters when we deploy the pipeline from one enviornment to another.
Here is the detailed scenario -
I have a simple ADF pipeline with a copy activity moving files from one blob container to another
Example - Below there is copy activity and pipeline has two parameters named :
1- SourceBlobContainer
2- SinkBlobContainer
with their default values.
Here is how the dataset is configured to consume these Pipeline scoped parameters.
Since this is development environment its OK with the default values. But the Test environment will have the containers present with altogether different name (like "TestSourceBlob" & "TestSinkBlob").
Having said that, when CICD will happen it should handle this via CICD process by updating the default values of these parameters.
When read the documents, no where i found to handle such use-case.
Here are some links which i referred -
http://datanrg.blogspot.com/2019/02/continuous-integration-and-delivery.html
https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment
Thoughts on how to handle this will be much appreciated. :-)

There is another approach in opposite to ARM templates located in 'ADF_Publish' branch.
Many companies leverage that workaround and it works great.
I have spent several days and built a brand new PowerShell module to publish the whole Azure Data Factory code from your master branch or directly from your local machine. The module resolves all pains existed so far in any other solution, including:
replacing any property in JSON file (ADF object),
deploying objects in an appropriate order,
deployment part of objects,
deleting objects not existing in the source any longer,
stop/start triggers, etc.
The module is publicly available in PS Gallery: azure.datafactory.tools
Source code and full documentation are in GitHub here.
Let me know if you have any question or concerns.

There is a "new" way to do ci/cd for ADF that should handle this exact use case. What I typically do is add global parameters and then reference those everywhere (in your case from the pipeline parameters). Then in your build you can override the global parameters with the values that you want. Here are some links to references that I used to get this working.
The "new" ci/cd method following something like what is outlined here Azure Data Factory CI-CD made simple: Building and deploying ARM templates with Azure DevOps YAML Pipelines. If you have followed this, something like this should work in your yaml:
overrideParameters: '-dataFactory_properties_globalParameters_environment_value "new value here"'
Here is an article that goes into more detail on the overrideParameters: ADF Release - Set global params during deployment
Here is a reference on global parameters and how to get them exposed to your ci/cd pipeline: Global parameters in Azure Data Factory

Related

How to run the same step in an Azure pipelines based on the defined environments?

I am setting up an Azure pipeline and need to get a specific step to run multiple times based on the defined environments. I also need to replace a placeholder in a file with variable values and I need to have different value for each environment. What is the most straight forward way to achieve that?
If you are using classic build pipeline or release pipeline, you can use task groups.
A task group allows you to encapsulate a sequence of tasks, already defined in a build or a release pipeline, into a single reusable task that can be added to a build or release pipeline, just like any other task. You can choose to extract the parameters from the encapsulated tasks as configuration variables, and abstract the rest of the task information. For more info about task group, please refer to Task groups for builds and releases (classic)
If you are using YAML pipeline, you can use templates. Templates let you define reusable content, logic, and parameters. Templates function in two ways. You can insert reusable content with a template or you can use a template to control what is allowed in a pipeline. For more info about template, please refer to Template types & usage.
If you want to pass different values to the template according to different environments, you can use Runtime parameters.
You can configure your task to run conditionally (when environmentname contains TEST, a certian variable is set to true etc...). Here you can see how to configure this -> https://www.bizstream.com/blog/custom-task-conditions-in-azure-pipelines/
I assume you are running a release-Pipeline? If so you can create a variable with the same name multiple times but set different stages where the variable will apply to. See here for example -> https://crmchap.co.uk/working-with-variables-in-an-azure-devops-release-pipeline/ There is a variable 'MySQLDBPassword' defined with "Scope"="Release". You can add a second variable with the same name but set the scope to the Stage/Environment-Name you want this to work with

Data Factory DevOps SSIS-IntegrationRuntime

We're planning to use CI/CD pipelines for Data Factory.
In one of our pipelines we use SSIS packages that needs to be called. To call SSIS packages you need to specify an Azure-SSIS IR that must be used.
The Azure-SSIS IR has a different naming on every environment.
Now, it is not possible to set this value dynamic (the option "Add dynamic content [Alt+P]" is not available on this field)
Is there a simple solution to change the Azure-SSIS IR during the deployment?
Thanks in advance
Your linked services aren't named by environment are they? (they most definitley should not be)
The default out of the box cloud runtime is also not named by environment.
Your runtimes should not be named by environment either.
IMHO your naming convention is incorrect. You should challenge it - there's no reason to include an environment designator in any runtime names.
Yes, your parent data factory should definitely have a different name per environment. That's where the distinction is made. Your runtimes should not.
In direct answer to your question, the way I have dealt with this in the past is added a powershell script task to the build part of DevOps that transforms the deployment asset and basically find/replaces the name the delivers the result as a build artifact

Azure functions - deploy by project in single repo?

We are building a set of serverless functions in Azure, but having difficulty deciding how to structure our source (Azure GIT) and DevOps to support them.
I am thinking of a single GIT repo, with all function apps housed independently within projects. We may have a lot of these function apps, we see great value in small code segments to do utility type of work, and I don't want dozens and dozens of independent repos just because of DevOps deployments. Is there a way to have a unique build and release process for each project, not the repo entirely? We aren't clear how this can be done and searches have come up empty on this. I thought it was possible to have unique build YAMLs per project across many projects in a single repo - but unclear how to implement the DevOps build and release pipleines to support this approach - ie; only a single function gets updated and we need to deploy - any guidance if this is possible and how to approach it would be great.
I haven't done this myself, but I'm in a similar situation where I'd like to have multiple functions (and other stuff) in a single Git repo for simplicity, but only build/deploy them as needed when they change. It looks like you can have multiple pipelines on a single repo with a different YAML file for each pipeline. The steps are documented in this link, and summarized below
In Azure DevOps, create a new Pipeline.
For the "Where is your code?" page, at the bottom choose the Use the classic editor option.
Select your source repo and branch.
On the "Select a template" screen, choose the YAML option at the top. Hit Apply.
There is a YAML file path field where you can specify the path and name of your YAML file for the pipeline.
You may want to set the pipeline to run manually if you don't want a build each time there's a commit to the repo.
EDIT There may be an easier way to do this now. If you go through the New Pipeline wizard, select your source location, on the Configure tab, at the bottom you can choose the Existing Azure Pipelines YAML file option. This lets you select a custom YAML file directly.

Azure Data Factory V2 multiple environments like in SSIS

I'm coming from a long SSIS background, we're looking to use Azure data factory v2 but I'm struggling to find any (clear) way of working with multiple environments. In SSIS we would have project parameters tied to the Visual Studio project configuration (e.g. development/test/production etc...) and say there were 2 parameters for SourceServerName and DestinationServerName, these would point to different servers if we were in development or test.
From my initial playing around I can't see any way to do this in data factory. I've searched google of course, but any information I've found seems to be around CI/CD then talks about Git 'branches' and is difficult to follow.
I'm basically looking for a very simple explanation and example of how this would be achieved in Azure data factory v2 (if it is even possible).
It works differently. You create an instance of data factory per environment and your environments are effectively embedded in each instance.
So here's one simple approach:
Create three data factories: dev, test, prod
Create your linked services in the dev environment pointing at dev sources and targets
Create the same named linked services in test, but of course these point at your tst systems
Now when you "migrate" your pipelines from dev to test, they use the same logical name (just like a connection manager)
So you don't designate an environment at execution time or map variables or anything... everything in test just runs against test because that's the way the linked servers have been defined.
That's the first step.
The next step is to connect only the dev ADF instance to Git. If you're a newcomer to Git it can be daunting but it's just a version control system. You save your code to it and it remembers every change you made.
Once your pipeline code is in git, the theory is that you migrate code out of git into higher environments in an automated fashion.
If you go through the links provided in the other answer, you'll see how you set it up.
I do have an issue with this approach though - you have to look up all of your environment values in keystore, which to me is silly because why do we need to designate the test servers hostname everytime we deploy to test?
One last thing is that if you a pipeline that doesn't use a linked service (say a REST pipeline), I haven't found a way to make that environment aware. I ended up building logic around the current data factories name to dynamically change endpoints.
This is a bit of a bran dump but feel free to ask questions.
Although it's not recommended - yes, you can do it.
Take a look at Linked Service - in this case, I have a connection to Azure SQL Database:
You have possibilities to use dynamic content for either the server name and database name.
Just add a parameter to your pipeline, pass it to the Linked Service and use in the required field.
Let me know whether I explained it clearly enough?
Yes, it's possible although not so simple as it was in VS for SSIS.
1) First of all: there is no desktop application for developing ADF, only the browser.
Therefore developers should make the changes in their DEV environment and from many reasons, the best way to do it is a way of working with GIT repository connected.
2) Then, you need "only":
a) publish the changes (it creates/updates adf_publish branch in git)
b) With Azure DevOps deploy the code from adf_publish replacing required parameters for target environment.
I know that at the beginning it sounds horrible, but the sooner you set up an environment like this the more time you save while developing pipelines.
How to do these things step by step?
I describe all the steps in the following posts:
- Setting up Code Repository for Azure Data Factory v2
- Deployment of Azure Data Factory with Azure DevOps
I hope this helps.

Migrating existing (entire) Azure DevOps pipeline to YAML based pipelines (in bulk)

I would like to move the existing Azure DevOps pipelines to YAML based for obvious advantages. The problem is there are many of these and each one has many jobs.
When I click around in Azure DevOps, the "View YAML" link only appears for one job at a time. So that's gonna be a lot of manual work to view YAMLs for each pipeline x jobs and move that to code.
But for each pipeline there seems to be a way to "export" the entire pipeline in json. I was wondering if there is a similar way to at least dump the entire pipeline as YAML if not an entire folder.
If there is an API which exports the same then even better.
Until now, what we supported is what you see, use View YAML to copy/paste the definition of agent job. There has another workaround to get the entire definition of one pipeline is use the API to get the JSON from a build definition, convert it to YAML, tweak the syntax, then if needed, update the tasks which are referenced.
First, use Get Build Definition api to get the entire definition
of one pipeline.
Invoke JSON to YAML converter. Copy/paste the JSON of definition
into this converter.
Copy the YAML to a YAML editor of Azure Devops. Then the most important step is tweak the syntax.
Replace the refName key values with task names and version. For this, you can go our tasks source code which opened in github, built in tasks can be found there(note: please see the task.json file of corresponding task)
Noted: Use this method has another disadvantage that you need very familiar with YAML syntax so that you can tweak the content which convert from JSON successfully.
This is done and there is blog post about exporting pipelineas YAML on devblogs
It's it worth to mention that the new system knows how to handle every feature listed here:
Single and multiple jobs
Checkout options
Execution plan parallelism
Timeout and name metadata
Demands
Schedules and other triggers
Pool selection, including jobs which differ from the default
All tasks and all inputs, including optimizing for default inputs
Job and step conditions
Task group unrolling
In fact, there are only two areas which we know aren’t covered.
Variables
Variables defined in YAML “shadow” (hide) variables set in the UI. Therefore, we didn’t want to export them into the YAML file in case you need an ability to alter them at runtime. If you have UI variables in your Classic pipeline, we mention them by name in the comments to remind you that you need to configure them in your new YAML pipeline definition.
Timezone translation
cron schedules in YAML are in UTC, while Classic schedules are in the organization’s timezone. Timezone handling has a lot of sharp edges, so we opted not to try to be clever. We export the schedule without doing any translation, so your scheduled builds might be off by a certain number of hours unless you manually modify them. Here again, we make a note in the comments to remind you.
But there won't be support for release pipelines:
No plans to do so. Classic RM pipelines are different enough in their execution that I can’t make the same strong guarantees about correctness as I can with classic Build. Also, a number of concepts were re-thought between RM and unified YAML pipelines. In some cases, there isn’t a direct translation for an RM feature. A human is required to think about what the pipeline is designed to accomplish and re-implement it using new constructs.
I tried yamlizr https://github.com/f2calv/yamlizr
It works pretty well for exporting Release Pipelines, except it doesn't export out Pre/Post deployment conditions. We use these for Approval gates. So hopefully in a future release it will be supported.
But per Microsoft they won't support the export to YAML for Release Pipelines it sounds like.
https://devblogs.microsoft.com/devops/replacing-view-yaml/#comment-2043

Resources