How to use the same pipeline in different environments with varying number of customers inside Azure Data Factory? - azure

I have a copy data pipeline in the Azure Data Factory. I need to deploy the same Data Factory instance in multiple environments like DEV, QA, PROD using Release Pipeline.
The pipeline transfer data from Customer Storage Account (Blob Container) to Centralized Data Lake. So, we can say - its a Many to One flow. (Many customers > One Data Lake)
Now, suppose I am in DEV environment & I have 1 demo customer there. I have defined an ADF pipeline for Copy Data. But in prod environment, the number of customers will grow. So, I don't want to create multiple copies of the same pipeline in production Data Factory.
I am looking out for a solution so that I can keep one copy pipeline in Data Factory and deploy/promote the same Data Factory from one environment to the other environment. And this should work even if the number of customers is varying from one to another.
I am also doing CI/CD in Azure Data Factory using Git integration with Azure Repos.

You will have to create additional linked services and datasets which do not exist in a non-production environment to ensure any new "customer" storage account is mapped to the pipeline instance.
With CI/CD routines, you can deliver this in an incremental manner i.e. parameterize you release pipeline with variable groups and update the data factory instance with newer pipelines with new datasets/linked services.

Related

azure devops library variables value to be passed to azure datafactory pipeline

I am trying to pass the following variables from the azure DevOps library as variables to the data factory pipeline. For some reason, the variables are not populating in the data factory pipeline.
enter image description here
When I check the pipeline I don't see the variables populated.
So is my understanding correct if we provide variables (eg moris#aix.com, lowes#aix.com, and so on)
will this be populated in the data factory pipeline as a variable too? If yes then I am unable o figure out the reason why is not getting updated with the variables in the data factory
enter image description here
From the images it seems to me that you are mixing up two types of pipelines:
Azure DevOps pipelines
Azure Data Factory pipelines
The first type (Azure DevOps) can be used to deploy an Azure Data Factory resource, and you can include Azure DevOps variable groups. You could then propagate these variables as global variables in the Data Factory and subsequently use them in Data Factory pipeline tasks.
To answer your questions: Yes, it is possible to populate Data Factory pipelines from Azure DevOps variable groups, but it does not happen automatically.
Edit: For this use case (Email adresses) I would rather use a parametrized lookup activity in Data Factory, which reads the Email adresses from a .csv-file in a storage account.

Can i change the ARM template of an existing data factory?

i have a github integrated Dev data factory and a Master data factory in azure. Now i want to use the Master Data factory's ARM template on the Dev data factory. Both of them have a set of common data sets, linked services, integration run times and triggers; also there are a few extra pipelines and data sets in the Master.
So if export the Master DF's ARM and use it on the Dev DF, will there be any overwrites/new creation of datasets/pipelines/ Intergration runtimes? Is it possible to do this in this way?
I would suggest you use git integration.
Then create Pull Request on Master DF. Now Approve and Merge Pull Request.
This way you will get All Linked Services and Datasets of Master DF into Dev DF.
For more information follow this official documentation

Clone pipelines from one ADF to another

I need to clone a copy of existing pipelines (pipeline count: 10-20) from one subscription to another subscription's (another ADF). Is there any way to do this activity using Azure Devops?
Option1:
Using Git Configuration, you can publish the data factory to the GIT branch. Connect your new data factory to the same repository and build from the branch. Resources, such as pipelines, datasets, and triggers, will carry through. You can delete the pipelines which are not used.
Option2:
You can manually copy the code (JSON code) of each pipeline, dataset, linked service and use the same code in the new data factory. (Use same names for when creating pipelines/datasets/linked services).

How to sequence pipeline execution trigger in Azure Data Factory

I am working on a migration project where I have a few SQL Server Integration Service projects that will be moved to Azure Data Factory. While I go through this we have a few jobs scheduled via SQL Server Agent which has multiple steps. If were to replicate the same using Azure Data Factory triggers is there a way to group multiple pipelines together and sequence the execution accordingly like we have multiple job steps in SQL Server Agents.
For instance:
Load all of the lookup tables
Load all of the staging tables
Load all of the dimension tables
Load Fact table
Please guide in the right direction.
You can use the Execute Pipeline Activity to build a master pipeline that runs your other pipelines. eg

Copy Data Factory selected pipelines from Azure one subscription to other

I have data pipeline in Azure Data Factory which copies the files from AWS S3 Bucket to Azure Data lake gen 2. So in order to build this pipeline I have created various resources like Azure Data Lake Gen 2 Storage, File system in ADLS with specific permissions, Data Factory, Source Dataset which connects to S3 Bucket, Target Dataset which connects to ADLS Gen2 folder.
So all these were created in a Dev subscription in Azure but now I want to deploy these resources in Prod subscription with least manual effort. I tried the ARM template approach which does not allow me to selectively choose the pipeline for migration. It copies everything present on the data factory which I don't want considering I may have different pipelines which are still under development phase and I do not want those to be migrated to Prod. I tried the powershell approach too which also has some limitations.
I would need expert advice on what is the best way to migrate the code from one subscription to other.

Resources