Clone pipelines from one ADF to another - azure

I need to clone a copy of existing pipelines (pipeline count: 10-20) from one subscription to another subscription's (another ADF). Is there any way to do this activity using Azure Devops?

Option1:
Using Git Configuration, you can publish the data factory to the GIT branch. Connect your new data factory to the same repository and build from the branch. Resources, such as pipelines, datasets, and triggers, will carry through. You can delete the pipelines which are not used.
Option2:
You can manually copy the code (JSON code) of each pipeline, dataset, linked service and use the same code in the new data factory. (Use same names for when creating pipelines/datasets/linked services).

Related

Running Code in Azure Repo from Azure Data Factory

Main problem:
I need to orchestrate the run of Python scripts using an Azure Data Factory pipeline.
What I have tried:
Databricks: The problem with this solution is that it is costly, a little slow (the need to spin up clusters), and it requires that I write my code in a notebook.
Batch activity from ADF: It too is costly and little slow. I don't have to write my code in a notebook, but I have to manually put it in a storage account, which is not great when debugging or updating.
My question:
Is there a way to run code in an Azure repo (or Github repo) directly from the Data Factory? Like the batch activity but instead of reading the code from a storage account read it from the repo itself?
Thanks for your help
Based on the statement in the document "Pipelines and activities in Azure Data Factory", the Azure Git Repos and GitHub Repos are not the supported source data store and sink data store for ADF pipelines. So, it is not possible to directly run the code from the git repository in the ADF pipelines.
However, ADF has the Source control option to allow you to configure a Git repository with either Azure Repos or GitHub. Then you can configure CI/CD pipelines on Azure DevOps to integrate with ADF. The CI/CD pipelines can directly run code from the git repository.
For more details, you can see the document "CI/CD in ADF".

Can i change the ARM template of an existing data factory?

i have a github integrated Dev data factory and a Master data factory in azure. Now i want to use the Master Data factory's ARM template on the Dev data factory. Both of them have a set of common data sets, linked services, integration run times and triggers; also there are a few extra pipelines and data sets in the Master.
So if export the Master DF's ARM and use it on the Dev DF, will there be any overwrites/new creation of datasets/pipelines/ Intergration runtimes? Is it possible to do this in this way?
I would suggest you use git integration.
Then create Pull Request on Master DF. Now Approve and Merge Pull Request.
This way you will get All Linked Services and Datasets of Master DF into Dev DF.
For more information follow this official documentation

How to use the same pipeline in different environments with varying number of customers inside Azure Data Factory?

I have a copy data pipeline in the Azure Data Factory. I need to deploy the same Data Factory instance in multiple environments like DEV, QA, PROD using Release Pipeline.
The pipeline transfer data from Customer Storage Account (Blob Container) to Centralized Data Lake. So, we can say - its a Many to One flow. (Many customers > One Data Lake)
Now, suppose I am in DEV environment & I have 1 demo customer there. I have defined an ADF pipeline for Copy Data. But in prod environment, the number of customers will grow. So, I don't want to create multiple copies of the same pipeline in production Data Factory.
I am looking out for a solution so that I can keep one copy pipeline in Data Factory and deploy/promote the same Data Factory from one environment to the other environment. And this should work even if the number of customers is varying from one to another.
I am also doing CI/CD in Azure Data Factory using Git integration with Azure Repos.
You will have to create additional linked services and datasets which do not exist in a non-production environment to ensure any new "customer" storage account is mapped to the pipeline instance.
With CI/CD routines, you can deliver this in an incremental manner i.e. parameterize you release pipeline with variable groups and update the data factory instance with newer pipelines with new datasets/linked services.

What service is used for triggering an Azure Pipeline that is assigned a Machine Learning task?

I have a model trained on SVM with dataset as CSV uploaded as blob in blob storage. How can I update the CSV and how the changes can be used to trigger the pipeline that re-train the ML model.
If you mean trigger the build/release pipeline in Azure DevOps, then you need to set CI/CD for the build/release pipeline. Thus the pipeline will be triggered when a new commit/changeset is pushed to the repository.
In your scenario seems you stored the csv file in blob storage but not the normal repository. So, you cannot trigger the pipeline directly.
However as a workaround you can try to create a new build pipeline (e.g Pipeline A) and run commands/scripts in a command line task to update the CSV file, then use this build pipeline (e.g Pipeline A) to trigger another pipeline (e.g Pipeline B). Thus Pipeline B will be triggered when you updated the CSV file successfully in Pipeline A.
Not familiar with Machine Learning, however find the following articles, hope that helps:
Machine Learning DevOps (MLOps) with Azure ML [Enabling CI/CD
for Machine Learning project with Azure Pipelines
If you don't want the csv upload to happen in a pipeline you can write an Azure Function or Azure Logic App. Those can be triggered on changes or creations of blobs. Inside you could do a rest call to either start your pipeline like here api-for-automating-azure-devops-pipelines or retrain your model.

Steps to Clone an Azure Resource

I have a very simple requirement to setup an Azure dev resource on the same subscription as the prod environment. This dev resource would be a 1 to 1 clone of a single Azure Datafactory resource.
Is there a way to simply clone the Azure DataFactory resource on the same subscription and rename it to a "...-dev" version?
(At present, I do not see a simple clone function...)
There is no clone option, but you can export the automation script and re-run that in order to replicate the resource. If you want to automate many similar deployments in Azure, look in to ARM Templates.
Keep in mind data factory has two key aspects:
The actual Data Factory service instance
Your Data Factory pipelines (i.e. the data workflows you write in Data Factory)
Both can be automated and deployed using ARM Templates. (2) above can also be linked to source control where you can then clone and re-use the definition files.

Resources