AzureML Authentication for registering datasets using azureml.core.dataset - azure-machine-learning-service

I am trying to register a data set programmatically using azurecli authentication
What I tried
authentication = AzureCliAuthentication()
workspace = Workspace.from_config( "config.json"), auth=authentication)
store = Datastore.get(workspace, datastore_name)
path = [(store, filePath)]
dataset = Dataset.Tabular.from_delimited_files(path=path)
I am logged in using azure-cli
az login
I am currently the owner of the datastore_name which is a gen2 datalake instance in the same subscription / region as the Azure ML workspace
Question
I get an interactive login each time I get to the Dataset.Tabular.from_ line. How can make it use the azure cli creds ?
I plan to use the same python script as part of CI/CD pipeline azurecli task to register the datasets across multiple workspaces

You can also use cli_auth
from azureml.core.authentication import AzureCliAuthentication
cli_auth = AzureCliAuthentication()
ws = Workspace(
subscription_id="your-sub-id",
resource_group="your-resource-group-id",
workspace_name="your-workspace-name",
auth=cli_auth
)

You can use the CLI command directly if you wish to use a script to register in Azure ML using Devops.
az ml dataset register [--file]
[--output-metadata-file]
[--path]
[--resource-group]
[--show-template]
[--skip-validation]
[--subscription-id]
[--workspace-name]

Related

azureml tabular dataset over azure gen2 datalake

What have I tried
set up an AzureML DataStore using Identity based authentication
set up an AzureML Dataset for a single file under a specific file system
workspace = Workspace.from_config("config.json", auth= auth)
dataset = Dataset.get_by_name(workspace, 'engage_event_type')
frame = dataset.to_pandas_dataframe()
I am able to explore the dataset from azure portal and it displays the right data correctly.
However running ^ where auth is a Service Principal which has the same rights as Azure Workspace Instance I get a bunch of calls like, but no errors / exceptions / completion.
The data underneath is < 10kb
Resolving access token for scope "https://datalake.azure.net//.default" using identity of type "SP".
Resolving access token for scope "https://datalake.azure.net//.default" using identity of type "SP".
I have tried running the script on a local compute
I have tried running the script on a compute instance
both gave the same issue

Azure passing secrets without depending on Azure Keyvaults

I am building an Azure ML Pipeline for batch scoring. In one step I need to access a key stored in the workspace's Azure Keyvault.
However, I want to strictly separate the authoring environment (responsible for creating the datasets, building the environment, building and running the pipeline) and the production environment (responsible for transforming data, running the prediction etc.).
Therefore, code in the production environment should be somewhat Azure agnostic. I want to be able to submit my inference script to Google Cloud Compute Instances, if needed.
Thus my question is:
What is the best practise to pass secrets to remote runs without having the remote script retrieve it from the keyvault itself?
Is there a way to have redacted environment variables or command line arguments?
Thanks!
Example of what I would like to happen:
# import all azure dependencies
secret = keyvault.get_secret("my_secret")
pipeline_step = PythonScriptStep(
script_name="step_script.py",
arguments=["--input_data", input_data, "--output_data", output_data],
compute_target=compute,
params=["secret": secret] # This will create an env var on the remote?
)
pipeline = Pipeline(workspace, steps=[pipeline_step])
PipelineEndpoint.publish(...)
An within step_script.py:
# No imports from azureml!
secret = os.getenv("AML_PARAMETER_secret")
do_something(secret)

How to create Azure databricks cluster using Service Principal

I have azure databricks workspace and I added service principal in that workspace using databricks cli. I have been trying to create cluster using service principal and not able to figure it. Can any help me?
I am able to create cluster using my account but I want to create using Service Principal and want it to be the owner of the cluster not me.
Also, it there a way I can transfer the ownership of my cluster to Service Principal?
First, answering the second question - no, you can't change the owner of the cluster.
To create a cluster that will have Service Principal as owner you need to execute creation operation under its identity. To do this you need to perform following steps:
Prepare a JSON file with cluster definition as described in the documentation
Set DATABRICKS_HOST environment variable to an address of your workspace:
export DATABRICKS_HOST=https://adb-....azuredatabricks.net
Generate AAD token for Service principal as described in documentation and assign its value to DATABRICKS_TOKEN or DATABRICKS_AAD_TOKEN environment variables (see docs).
Create Databricks cluster using databricks-cli providing name of JSON file with cluster specification (docs):
databricks clusters create --json-file create-cluster.json
P.S. Another approach (really recommended) is to use Databricks Terraform provider to script your Databricks infrastructure - it's used by significant number of Databricks customers, and much easier to use compared with command-line tools.

Authenticate With Workspace

I have a Pipeline registered in my AML workspace. Now I would like to trigger a pipeline run from an Azure Notebook in the same Workspace.
In order to get a reference object to the workspace in the notebook I need to authenticate, e.g.
ws = Workspace.from_config()
However, InteractiveLoginAthentication is blocked by my company's domain and MsiAuthentication throws an error as well. ServicePrincipalAuthentication works, but how do I keep the secret safe? What is the prefered way of dealing with secrets in the Azure Machine Learning Service Notebooks?

Generate Azure Databricks Token using Powershell script

I need to generate Azure Databricks token using Powershell script.
I am done with creation of Azure Databricks using ARM template , now i am looking to generate Databricks token using powershell script .
Kindly let me know how to create Databricks token using Powershell script
The only way to generate a new token is via the api which requires you to have a token in the first place.
Or use the Web ui manually.
There is no official powershell commands for databricks, there are some unofficial ones but they still require you to generate a token manually first.
https://github.com/DataThirstLtd/azure.databricks.cicd.tools
Disclaimer I'm the author of these.
UPDATE: these powershell commands can now authenticate using a service principal instead of a bearer token (or can generate a bearer token for you).
so right now there is no way to use the API directly after deploying an Azure Databricks Workspace. I assume that you want to use it as part of an CI/CD pipeline - right? Reason is that you first need to manually create an API token which you can then use for all subsequent API requests.
But I will investigate and keep you updated here!
another option is to create it via terraform.
https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/token
mind you, it creates the token as whomever you az login'd as. so if you az login as yourself (when it spawns a browser asking who to log in as), that's who the token will be created as (assuming that user has permissions in the databricks workspace) and contributor (or custom read role, reader role doesn't grant the right permissions) permissions into the resource group that houses the workspace.
you can always use az login -u username#email.com -p to log in as someone else, assuming that user doesn't have MFA then run the terraform init/plan/apply. mind you, if you have a backend storage, that user also has to have permissions to that backend storage as well so it can create/update any tfstate files stored there.

Resources