How to create Azure databricks cluster using Service Principal - azure

I have azure databricks workspace and I added service principal in that workspace using databricks cli. I have been trying to create cluster using service principal and not able to figure it. Can any help me?
I am able to create cluster using my account but I want to create using Service Principal and want it to be the owner of the cluster not me.
Also, it there a way I can transfer the ownership of my cluster to Service Principal?

First, answering the second question - no, you can't change the owner of the cluster.
To create a cluster that will have Service Principal as owner you need to execute creation operation under its identity. To do this you need to perform following steps:
Prepare a JSON file with cluster definition as described in the documentation
Set DATABRICKS_HOST environment variable to an address of your workspace:
export DATABRICKS_HOST=https://adb-....azuredatabricks.net
Generate AAD token for Service principal as described in documentation and assign its value to DATABRICKS_TOKEN or DATABRICKS_AAD_TOKEN environment variables (see docs).
Create Databricks cluster using databricks-cli providing name of JSON file with cluster specification (docs):
databricks clusters create --json-file create-cluster.json
P.S. Another approach (really recommended) is to use Databricks Terraform provider to script your Databricks infrastructure - it's used by significant number of Databricks customers, and much easier to use compared with command-line tools.

Related

how to rename Databricks job cluster name during runtime

I have created an ADF pipeline with Notebook activity. This notebook activity automatically creates databricks job clusters with autogenerated job cluster names.
1. Rename Job Cluster during runtime from ADF
I'm trying to rename this job cluster name with the process/other names during runtime from ADF/ADF linked service.
instead of job-59, i want it to be replaced with <process_name>_
2. Rename ClusterName Tag
Wanted to replace Default generated ClusterName Tag to required process name
Settings for the job can be updated using the Reset or Update endpoints.
Cluster tags allow you to easily monitor the cost of cloud resources used by various groups in your organization. You can specify tags as key-value pairs when you create a cluster, and Azure Databricks applies these tags to cloud resources like VMs and disk volumes, as well as DBU usage reports.
For detailed information about how pool and cluster tag types work together, see Monitor usage using cluster, pool, and workspace tags.
For convenience, Azure Databricks applies four default tags to each cluster: Vendor, Creator, ClusterName, and ClusterId.
These tags propagate to detailed cost analysis reports that you can access in the Azure portal.
Checkout an example how billing works.

GCP Service account key management and usage in Terraform

I am creating CI/CD pipeline for Terraform so that my GCP resource creation would be automated. But Terraform needs Service account to do the job, I create the service account and the key is downloaded to my machine, but what should be the correct way to store it so when running Cloud build pipeline so that Terraform would pick on it and execute scripts.
provider "google" {
credentials = file(var.cred_file)
project = var.project_name
region = var.region
}
Is it okay to store this file in Cloud storage bucket ? Or there are some better alternatives ?
On GCP you have the bucket option to keep sensitive information and you can use access control lists (ACLs) to define who has access on your buckets and objects. GCP offers the next options to storage and I think that the better is according with your needs, just ensure that the option provides you the security tools to keep your files safe. I think that once you are Granting permissions to your Cloud Build service account, you can pass the path to the service account key in code

connect AAD to existing AKS that has

Working with Azure, we started with AKS last year. On creation of the AKS clusters we use, we checked what needed to be done up front to enable rbac at a later moment and we then thought that setting 'rbac' to 'enabled' was the only thing we needed. This results in the following:
Now we're trying to implement rbac integration of AKS with AAD, but I read some seemingly conflicting pre-requisites. Some say that in order to integrate AAD and AKS, you need rbac enabled at cluster creation. I believe we have set that correct, looking at the picture above.
But then in the Azure docs, it is mentioned that you need to create a cluster and add some AAD-integration keys for the client and server applications.
My question is actually two-fold:
when people say you need rbac enabled in your aks cluster during creation do they actually mean you should select the 'rbac:enabled' box AND make sure you create the AAD-related applications up front and also configure these during cluster creation?
Is there a way to setup the AKS-AAD rbac connection on a cluster that has rbac:enabled but misses the aadProfile configuration?
I believe we indeed need to re-create all our clusters, but I want to know for sure by asking here as it's not 100% clear to me from what I've read online (also here at stack exchange) and it's going to be an awful lot of work.
For all of your requirements, you only need to make sure the RBAC enabled for your AKS cluster and it only can enable in the creation time. Then you can update the credential of the existing AKS AAD profile like this:
Before update:
CLI update command:
az aks update-credentials -g yourResourceGroup -n yourAKSCluster --reset-aad --aad-server-app-id appId --aad-server-app-secret appSecret --aad-client-app-id clientId --aad-tenant-id tenantId
After update:
yes, that is correct
no, there is no way of doing that. you need to recreate.

Azure Databricks move Log Analytics

Databricks VMs are pointing to Default Log Analytics but I want to point them to another one
If I try to move VMs to antoher workpacks it tells me that its locked
Error: cannot perform delete operation because following scope(s) are locked
Unfortunately, you are not allowed to move Log Analytics for the Managed Resource Group created in Azure Databricks using Azure portal.
Reason: By default, you cannot perform any write operation on the managed resource group which created by Azure Databricks.
If you try to modify anything in the managed resource group, you will see this error message:
{"details":[{"code":"ScopeLocked","message":"The scope '/subscriptions/xxxxxxxxxxxxxxxx/resourceGroups/databricks-rg-chepra-d7ensl75cgiki' cannot perform write operation because following scope(s) are locked: '/subscriptions/xxxxxxxxxxxxxxxxxxxx/resourceGroups/databricks-rg-chepra-d7ensl75cgiki'. Please remove the lock and try again."}]}
Possible way: You can specify tags as key-value pairs when while creating/modifying clusters, and Azure Databricks will apply these tags to cloud resources.
Possible way: Configure your Azure Databricks cluster to use the monitoring library.
This article shows how to send application logs and metrics from Azure Databricks to a Log Analytics workspace. It uses the Azure Databricks Monitoring Library.
Hope this helps.

Generate Azure Databricks Token using Powershell script

I need to generate Azure Databricks token using Powershell script.
I am done with creation of Azure Databricks using ARM template , now i am looking to generate Databricks token using powershell script .
Kindly let me know how to create Databricks token using Powershell script
The only way to generate a new token is via the api which requires you to have a token in the first place.
Or use the Web ui manually.
There is no official powershell commands for databricks, there are some unofficial ones but they still require you to generate a token manually first.
https://github.com/DataThirstLtd/azure.databricks.cicd.tools
Disclaimer I'm the author of these.
UPDATE: these powershell commands can now authenticate using a service principal instead of a bearer token (or can generate a bearer token for you).
so right now there is no way to use the API directly after deploying an Azure Databricks Workspace. I assume that you want to use it as part of an CI/CD pipeline - right? Reason is that you first need to manually create an API token which you can then use for all subsequent API requests.
But I will investigate and keep you updated here!
another option is to create it via terraform.
https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/token
mind you, it creates the token as whomever you az login'd as. so if you az login as yourself (when it spawns a browser asking who to log in as), that's who the token will be created as (assuming that user has permissions in the databricks workspace) and contributor (or custom read role, reader role doesn't grant the right permissions) permissions into the resource group that houses the workspace.
you can always use az login -u username#email.com -p to log in as someone else, assuming that user doesn't have MFA then run the terraform init/plan/apply. mind you, if you have a backend storage, that user also has to have permissions to that backend storage as well so it can create/update any tfstate files stored there.

Resources