I'm coming from a long SSIS background, we're looking to use Azure data factory v2 but I'm struggling to find any (clear) way of working with multiple environments. In SSIS we would have project parameters tied to the Visual Studio project configuration (e.g. development/test/production etc...) and say there were 2 parameters for SourceServerName and DestinationServerName, these would point to different servers if we were in development or test.
From my initial playing around I can't see any way to do this in data factory. I've searched google of course, but any information I've found seems to be around CI/CD then talks about Git 'branches' and is difficult to follow.
I'm basically looking for a very simple explanation and example of how this would be achieved in Azure data factory v2 (if it is even possible).
It works differently. You create an instance of data factory per environment and your environments are effectively embedded in each instance.
So here's one simple approach:
Create three data factories: dev, test, prod
Create your linked services in the dev environment pointing at dev sources and targets
Create the same named linked services in test, but of course these point at your tst systems
Now when you "migrate" your pipelines from dev to test, they use the same logical name (just like a connection manager)
So you don't designate an environment at execution time or map variables or anything... everything in test just runs against test because that's the way the linked servers have been defined.
That's the first step.
The next step is to connect only the dev ADF instance to Git. If you're a newcomer to Git it can be daunting but it's just a version control system. You save your code to it and it remembers every change you made.
Once your pipeline code is in git, the theory is that you migrate code out of git into higher environments in an automated fashion.
If you go through the links provided in the other answer, you'll see how you set it up.
I do have an issue with this approach though - you have to look up all of your environment values in keystore, which to me is silly because why do we need to designate the test servers hostname everytime we deploy to test?
One last thing is that if you a pipeline that doesn't use a linked service (say a REST pipeline), I haven't found a way to make that environment aware. I ended up building logic around the current data factories name to dynamically change endpoints.
This is a bit of a bran dump but feel free to ask questions.
Although it's not recommended - yes, you can do it.
Take a look at Linked Service - in this case, I have a connection to Azure SQL Database:
You have possibilities to use dynamic content for either the server name and database name.
Just add a parameter to your pipeline, pass it to the Linked Service and use in the required field.
Let me know whether I explained it clearly enough?
Yes, it's possible although not so simple as it was in VS for SSIS.
1) First of all: there is no desktop application for developing ADF, only the browser.
Therefore developers should make the changes in their DEV environment and from many reasons, the best way to do it is a way of working with GIT repository connected.
2) Then, you need "only":
a) publish the changes (it creates/updates adf_publish branch in git)
b) With Azure DevOps deploy the code from adf_publish replacing required parameters for target environment.
I know that at the beginning it sounds horrible, but the sooner you set up an environment like this the more time you save while developing pipelines.
How to do these things step by step?
I describe all the steps in the following posts:
- Setting up Code Repository for Azure Data Factory v2
- Deployment of Azure Data Factory with Azure DevOps
I hope this helps.
Related
I am learning azure data factory and would really like to do its development in Visual studio environment. I have VS 2019 installed on my machine and I don't see an option to develop ADF in it.
Is there any version of VS that ADF can be developed in or we are right now stuck with developing it in web UI for the time?
I know BI development tools needed additional plug in to VS environment to work. Does ADF need something similar to that too.
If not, how can we back up our work done in web ADF. Is there an option to link it somehow with the azure repo or GIT?
Starting with ADF V2, development is really intended to be done completely in the web interface. I had the same question as you at the time, but now the web tools are quite good and I don't give it a second thought. While I'm sure there are other options for developing and deploying the ARM templates, do yourself a favor and use the web UI.
By default, Data Factory only saves code changes on "Publish". An optional configuration allows source control via Git integration. You can use either either Azure DevOps or Github. I highly recommend this approach, even if you only ever work in the main branch (fine for lone developers, a bad idea for collaboration). In this case, Publish takes the current state of the main branch and surfaces your artifacts to the ADF service. That means you will still need to Publish for your changes to be live.
NOTE: Git integration is also supported in Azure Synapse, where it has tremendous value for collaboration across a wide variety of artifact types.
We're planning to use CI/CD pipelines for Data Factory.
In one of our pipelines we use SSIS packages that needs to be called. To call SSIS packages you need to specify an Azure-SSIS IR that must be used.
The Azure-SSIS IR has a different naming on every environment.
Now, it is not possible to set this value dynamic (the option "Add dynamic content [Alt+P]" is not available on this field)
Is there a simple solution to change the Azure-SSIS IR during the deployment?
Thanks in advance
Your linked services aren't named by environment are they? (they most definitley should not be)
The default out of the box cloud runtime is also not named by environment.
Your runtimes should not be named by environment either.
IMHO your naming convention is incorrect. You should challenge it - there's no reason to include an environment designator in any runtime names.
Yes, your parent data factory should definitely have a different name per environment. That's where the distinction is made. Your runtimes should not.
In direct answer to your question, the way I have dealt with this in the past is added a powershell script task to the build part of DevOps that transforms the deployment asset and basically find/replaces the name the delivers the result as a build artifact
We are building a set of serverless functions in Azure, but having difficulty deciding how to structure our source (Azure GIT) and DevOps to support them.
I am thinking of a single GIT repo, with all function apps housed independently within projects. We may have a lot of these function apps, we see great value in small code segments to do utility type of work, and I don't want dozens and dozens of independent repos just because of DevOps deployments. Is there a way to have a unique build and release process for each project, not the repo entirely? We aren't clear how this can be done and searches have come up empty on this. I thought it was possible to have unique build YAMLs per project across many projects in a single repo - but unclear how to implement the DevOps build and release pipleines to support this approach - ie; only a single function gets updated and we need to deploy - any guidance if this is possible and how to approach it would be great.
I haven't done this myself, but I'm in a similar situation where I'd like to have multiple functions (and other stuff) in a single Git repo for simplicity, but only build/deploy them as needed when they change. It looks like you can have multiple pipelines on a single repo with a different YAML file for each pipeline. The steps are documented in this link, and summarized below
In Azure DevOps, create a new Pipeline.
For the "Where is your code?" page, at the bottom choose the Use the classic editor option.
Select your source repo and branch.
On the "Select a template" screen, choose the YAML option at the top. Hit Apply.
There is a YAML file path field where you can specify the path and name of your YAML file for the pipeline.
You may want to set the pipeline to run manually if you don't want a build each time there's a commit to the repo.
EDIT There may be an easier way to do this now. If you go through the New Pipeline wizard, select your source location, on the Configure tab, at the bottom you can choose the Existing Azure Pipelines YAML file option. This lets you select a custom YAML file directly.
Being novice to ADF CICD i am currently exploring how we can update the pipeline scoped parameters when we deploy the pipeline from one enviornment to another.
Here is the detailed scenario -
I have a simple ADF pipeline with a copy activity moving files from one blob container to another
Example - Below there is copy activity and pipeline has two parameters named :
1- SourceBlobContainer
2- SinkBlobContainer
with their default values.
Here is how the dataset is configured to consume these Pipeline scoped parameters.
Since this is development environment its OK with the default values. But the Test environment will have the containers present with altogether different name (like "TestSourceBlob" & "TestSinkBlob").
Having said that, when CICD will happen it should handle this via CICD process by updating the default values of these parameters.
When read the documents, no where i found to handle such use-case.
Here are some links which i referred -
http://datanrg.blogspot.com/2019/02/continuous-integration-and-delivery.html
https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment
Thoughts on how to handle this will be much appreciated. :-)
There is another approach in opposite to ARM templates located in 'ADF_Publish' branch.
Many companies leverage that workaround and it works great.
I have spent several days and built a brand new PowerShell module to publish the whole Azure Data Factory code from your master branch or directly from your local machine. The module resolves all pains existed so far in any other solution, including:
replacing any property in JSON file (ADF object),
deploying objects in an appropriate order,
deployment part of objects,
deleting objects not existing in the source any longer,
stop/start triggers, etc.
The module is publicly available in PS Gallery: azure.datafactory.tools
Source code and full documentation are in GitHub here.
Let me know if you have any question or concerns.
There is a "new" way to do ci/cd for ADF that should handle this exact use case. What I typically do is add global parameters and then reference those everywhere (in your case from the pipeline parameters). Then in your build you can override the global parameters with the values that you want. Here are some links to references that I used to get this working.
The "new" ci/cd method following something like what is outlined here Azure Data Factory CI-CD made simple: Building and deploying ARM templates with Azure DevOps YAML Pipelines. If you have followed this, something like this should work in your yaml:
overrideParameters: '-dataFactory_properties_globalParameters_environment_value "new value here"'
Here is an article that goes into more detail on the overrideParameters: ADF Release - Set global params during deployment
Here is a reference on global parameters and how to get them exposed to your ci/cd pipeline: Global parameters in Azure Data Factory
Let's say I have an Azure App Service web app at foo.azurewebsites.net. The code for the web app (a simple Node.js server and React frontend) is hosted on VSTS, and a custom deployment script is configured build and deploy the web app every time code is pushed to the repository's master branch. In other words, the standard web app configuration.
Now, all of my API code (just a Node.js server) is in another repository on VSTS. I'd like to be able to do the following:
Have all requests to foo.azurewebsites.net/api be handled by the API server (an implication of this, which I would nonetheless like to state explicitly, is that the server can ask the browser to set cookies that the web app can then read, and vice versa).
Set up similar continuous deployment for the API server, such that it gets redeployed whenever there are code changes in the API repo.
Be able to maintain the web app and API repositories completely separately.
This seems like a fairly standard scenario...is there an accepted solution? I came across this, but it seems like a pretty hacky way to do it, not to mention the fact that I have no idea what the correct URL is for the web hook for VSTS and can't seem to find any information on it. Also, that example doesn't cover how to deal with point (1) above.
EDIT: Additional clarification
Note that the accepted answer on this question is not what I'm looking for. It describes how to pull from a second repository at deployment time, but not how to have that second repository trigger deployments, or how to handle the fact that the the second repository is its own server. Additionally, it introduces a dependency between the two repositories, since the deploy.cmd is presumably under source control in the first repository.
EDIT: Virtual Directories
Thanks to #CtrlDot for pointing out that Virtual Directories are the way to solve (1). Still seeking guidance on (2) and (3).
I think the concept you are referring to is called Virtual Directories
I'm not sure which VSTS task you are using to deploy, but based on the article provided, you should be to configure it to target only the virtual directory you want to deploy to.
EDIT
Sorry for not being more clear. The AzureRmWebAppDeployment task has a parameter for virtual application name. You would simply set that in your deployment pipeline for the API project (/api) and for the main project (leave it blank)