How can I do CICD of Databricks Notebook in Azure Devops? - azure

I want to do CICD of my Databricks Notebook. Steps I followed.
I have integrated my Databricks with Azure Repos.
Created a Build Artifact using YAML script which will hold my Notebook.
Deployed Build Artifact into Databricks workspace in YAML.
Now I want to
Execute and Schedule the Databricks notebook from the Azure DevOps pipeline itself.
How can setup multiple Environments like Stage, Dev, and Prod using YAML.
My Notebook itself call other notebooks. can I do this?
How can I solve this?

It's doable, and with Databricks Repos you really don't need to create build artifact & deploy it - it's better to use Repos API or databricks repos to update another checkout that will be used for tests.
For testing of notebooks I always recommend to use Nutter library from Microsoft that simplifies testing of notebooks by allowing to trigger their execution from the command-line.
You can include other notebooks using %run directive - it's important to use relative paths instead of absolute paths. You can organize dev/staging/prod either as folders inside the Repos, or as a fully separated environments - it's up to you.
I have a demo of notebooks testing & Repos integration with CI/CD - it contains all necessary instructions how to setup dev/staging/prod + Azure DevOps pipeline that will test notebook & trigger release pipeline.
The only one thing that I want to mention explicitly - for Azure DevOps you will need to use Azure DevOps personal access token because identity passthrough doesn't work with APIs yet.

Related

Automate deploying of synapse artifacts to devops repo

im trying to deploy some synapse artifacts to a synapse workspace with devops repo integration via a python runbook. By using the azure-synapse-artifacts library of the python azure sdk the artifacts are published directly to the live mode of the synapse workspace. Is there any way to deploy artifacts to a devops repo branch for synapse? Didnt find any devops repo apis or libaries, just for the direct integration of git.
We can use CICD in this case, as this process will help to move entities from one environment to others, and for this we need to configure our synapse work space as source in GIT.
Below are few straight steps we can follow:
Set up Azure Synapse workspace and configure pipeline in Azure Devops.
Under staging while creating DevOps project, we can select Add Artifacts and select GIT.
Configure the workflow file and add workflow.
You can refer to MS Docs for detailed explanation of each step in achieving this task

Configure Azure DevOps repository in Databricks through ARM template or Powershell

I am looking for a sample ARM template which can setup my Azure DevOps repository into Azure Databricks. This will help me deploy my Master branch directly on ADB workspace.
I tried to do manually on portal and it works, but the repos path for the notebooks shows my email_id, which is not good in Production.
I want to configure through a Powershell OR an ARM template while creating Databricks. The same problem I am facing on Azure dataFactory as well.
Please help me resolve it.
It's not possible as of today - there is no API for creating a checkout. It will be possible only when Databricks Repos will start to provide corresponding API for creating the checkouts of repositories, not only "Update checkout" API that is available right now.
If you're concerned with the checkout created in your own folder, you can just create a Folder inside Repos, call it like "Production", and then do checkout inside that folder (pictures are taken from my demo of Repos with Azure DevOps):
To deploy Notebooks from your master branch to another workspace, I would recommend to trigger a deployment pipeline from the master branch onto the target databricks worskpace.
That way, no need to setup Repos in the target environment.
You use Repos in your development workspace (with your email in path)
You commit to the branch you work on and eventually merge / PR to master
Once on Master branch, a DevOps pipeline is triggered and deploys the notebook to your target workspace on the path you want

Azure databricks CI CD pipeline to delete notebooks on production

I have a CI/CD pipeline in place to deploy notebooks from dev to production in an Azure databricks workspace.
However, it is not deleting the notebooks from production, when those notebooks have been removed from development and are no longer in Azure git repository.
I want to delete all notebooks which have been removed from source, as a part of build/release process.
Is there a way to achieve this?
The easiest way is when there are new commits in Azure DevOps git repository, you could redeploy the notebooks by checked the Clean Workspace Folderoption:
Otherwise, you could add a powershell script task to compare files in two folders. The follow case may give you a start: Comparing folders and content with PowerShell

How do I version control Azure ML workspaces with custom environments and pipelines?

I'm trying to figure out how viable Azure ML in production; I would like to accomplish the following:
Specify custom environments for my pipelines using a pip file and use them in a pipeline
Declaratively specify my workspace, environments and pipelines in an Azure DevOps repo
Reproducibly deploy my Azure ML workspace to my subscription using an Azure DevOps pipeline
I found an explanation of how to specify environments using notebooks but this seems ill-suited for the second and third requirements I have.
Currently, we have a python script, pipeline.py that uses the azureml-sdkto create, register and run all of our ML artifacts (envs, pipelines, models). We call this script in our Azure DevOps CI pipeline with a Python Script task after building the right pip env from the requirements file in our repo.
However, it is worth noting there is YAML support for ML artifact definition. Though I don't know if the existing support will cover all of your bases (though that is the plan).
Here's some great docs from MSFT to get you started:
GitHub Template repo of an end-to-end example of ML pipeline + deployment
How to define/create an environment (using Pip or Conda) and use it in a remote compute context
Azure Pipelines guidance on CI/CD for ML Service
Defining ML pipelines in YAML

Is it possible to execute automation maven project from Azure devops pipeline

I have built an automation framework using Java, Selenium Webdriver, Maven, TestNG. Currently, I am using Jenkins for pipeline and CI.
Now new requirement assigned to me is using Azure DevOps as CI tool and execute all test from there instead of Jenkins.
After some research, I am getting the following :
Upload code to Github or other azure supported repo. and create a pipeline.
Write your Java code using Visual studio code and then it will be far easy to execute from Azure DevOps.
Is there any better way to do this?
You need to follow steps below. Main effort is of tools integration if those are not present in azure devops portal :
I am not sure which code repository you are using however if you are not using one which is supported by Azure devops, then you need integrate it with azure devops portal.
Create agentpool in azure devops with same configuration as your jenkins agent.
Create build pipeline in azure devops. It will ask your repository name . Give the same.
4.While Creating pipeline it will ask you whether to create azure pipline yaml or not . Say "Yes" and it will create sample yaml file in code repository.
Open Yaml file.
give your agentpool name where they have mentioned it.
under section " steps "
mention all steps which you want to do run test cases. you would mentioned same
thing is jenkins pipeline under stages --> steps like shell ''' '''
save yaml and run it. you are done
NOTE : Main thing is configuration of agentpool. you need to make it should have all software tools ( except jenkin agent jar :) ).

Resources