I am trying to connect MS Azure databricks with data lake storage v2, and not able to match the client, secret scope and key.
I have data in a Azure data lake v2. I am trying to follow these instructions:
https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-datalake-gen2.html#requirements-azure-data-lake
I have created a 'service principle' with the role "Storage Blob Data Contributor", obtained
I have created secret scopes in both Azure Keyvault and Databricks with keys and values
when I try the code below, the authentication fails to recognize the secret scope & key. It is not clear to me from the documentation if it is necessary to use the Azure Keyvault or Databricks secret scope.
val configs = Map(
"fs.azure.account.auth.type" -> "OAuth",
"fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id" -> "<CLIENT-ID>",
"fs.azure.account.oauth2.client.secret" -> dbutils.secrets.get(scope = "<SCOPE-NAME>", key = "<KEY-VALUE>"),
"fs.azure.account.oauth2.client.endpoint" -> "https://login.microsoftonline.com/XXXXXXXXXX/oauth2/token")
If anybody could help on this, please advise / confirm:
what should be CLIENT-ID : I understand this to be from the storage account;
where should the SCOPE-NAME and KEY-VALUE be created, in Azure Keyvault or Databricks?
The XXXX in https://login.microsoftonline.com/XXXXXXXXXX/oauth2/token should be your TenantID (get this from the Azure Active Directory tab in the Portal > Properties > DirectoryID).
The Client ID is the ApplicationID/Service Principal ID (sadly these names are used interchangeably in the Azure world - but they are all the same thing).
If you have not created a service principal yet follow these instructions: https://learn.microsoft.com/en-us/azure/storage/common/storage-auth-aad-app#register-your-application-with-an-azure-ad-tenant - make sure you grant the service principal access to your lake once it is created.
You should create a scope and secret for the Principal ID Key - as this is something you want to hide from free text. You cannot create this in the Databricks UI (yet). Use one of these:
CLI - https://docs.databricks.com/user-guide/secrets/secrets.html#create-a-secret
PowerShell - https://github.com/DataThirstLtd/azure.databricks.cicd.tools/wiki/Set-DatabricksSecret
REST API - https://docs.databricks.com/api/latest/secrets.html#put-secret
Right now I do not think can create secrets in Azure KeyVault - though I expect to see that in the future. Technically you could manually integrate with Key Vault using their API's but it would give you another headache in needing a secret credential to connect to key vault.
I was facing the same issue , the only thing i did extra was to assign the default permission of the application to datalake gen2's blob container in azure storage explorer . It required the object id of the application , which is not the one available on the UI , it can be taken by using the command "az ad sp show --id " on azure-cli .
After assign the permission on blob container, create a new file, and then try to access it,
Related
I am trying to create a keyvault backed scope in databricks. I am able to successfully create the scope but when I try to add a key to the scope I see the following error:
Error: b'{"error_code":"BAD_REQUEST","message":"Cannot write secrets to Azure KeyVault-backed scope abc"}'
These are steps I have followed and all commands were run on windows cmd:
Create key vault in Azure
Generate AAD token for databricks - az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
Add AAD token to environment variables on windows
Add AAD token to databricks cfg file on windows - databricks configure --aad-token
Create scope - databricks secrets create-scope --scope abc --scope-backend-type AZURE_KEYVAULT --resource-id <keyvault-id> --dns-name <keyvault-dns> --initial-manage-principal users
Add key to scope - databricks secrets put --scope abc --key abc-key << this where I see the error
According to the documentation this is not possible:
To reference secrets stored in an Azure Key Vault, you can create a secret scope backed by Azure Key Vault. You can then leverage all of the secrets in the corresponding Key Vault instance from that secret scope. Because the Azure Key Vault-backed secret scope is a read-only interface to the Key Vault, the PutSecret and DeleteSecret Secrets API 2.0 operations are not allowed. To manage secrets in Azure Key Vault, you must use the Azure SetSecret REST API or Azure portal UI.
Using Az CLI, you could use the az keyvault secret set command.
I have an external web application which has the option to access a storage account using the service principal.
I want to access Azure storage account/blob by the external application loading the data directly into the datalake account.
So here is what I am trying to do:
Set up a service principal (using Azure AD app registration)
Create a Storage account and store the access key in Azure Key Vault.
Add the service principal secret to the same key vault.
Create a policy within Key vault for the service principal to have access to read Keys and Secrets within Key Vault.
Also create a policy within Key Vault for service principal to have contributor role to access storage account.
Also grant access to storage account container to service principal.
But I cannot connect, and I am unable to authorize the connection.
I am confused on which steps I am missing to resolve this.
As you want to access the storage account using service principal, you do not need to store the storage account access in the key vault.
The steps you can follow up to access storage account by service principal:
Create a service principal (Azure AD App Registration)
Create a storage account
Assign Storage Blob Data Contributor role to the service principal
Now you would be able to access the Azure Storage Blob data using your service principal
Note: You do not need to store the service principal client secret in the key vault. Because you need the client secret again to access the key vault first.
Thanks #RamaraoAdapa-MT
This works
Finally, I setup like you said,,
SAS -> service principle -> permission to storage account -> storage account.
In this case, no need for Key vault.
Thanks you Guys,
Anupam Chand, RamaraoAdapa-MT
I have configured the Diagnostics Extension on my Azure cloud project so that I can collect the IIS logs and publish them to a storage account on azure.
However, I do not want to store the secret key of the storage account in the cscfg file, so I unchecked the "Don't remove storage key secret from project configuration (.cscfg) file". Please check the following.
I want to store the key of the storage account in the azure vault and I want Azure to pull the key from the azure vault while configuring the diagnostics extension during publishing of the code.
The code is published via Devops yaml pipeline.
Is there any way to instruct the Azure pipeline to read the storage account key from Azure vault and use it for configuring the diagnostics extension during publishing code?
You need to use "Variable groups" feature of Azure Devops to link secrets from key vault into your pipeline, and forward them to your task.
Add secret to key vault
Create service connection in AzureDevops with permissions to access key vault
Create variable group and link secrets from key vault
Link variable group created in previous step into your .yaml pipeline
Any secret from variable group is accessible from within the pipeline like $(VariableName).
More information here.
I currently have the storage account creation resource in my deployment template, but I need the Storage Access Key from the user. Is it possible to create Storage account & get Access Key?
This deployment is being done on the Azure Portal. I am currently using a custom template for deployment.
Update
I have a section in CreateUIDefinition file for asking the user to create a new Storage Account as below
screenshot of preview
Now this Storage Account will be created once I move over to the "Review + Create" tab after validation and hit "Create".
But I need the Access Key of this Storage Account that the user is creating, so that I can store its value into a Key Vault Secret for later use.
Is this possible?
If you just want to make sure that your storage account has been created so that you can get its Access Key by:
[listKeys(resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccountName')), providers('Microsoft.Storage', 'storageAccounts').apiVersions[0]).keys[0].value]
for your subsequence resources, you can just define its create order in your ARM template by dependsOn element. Details see here.
Update:
If you want to create a storage account first so that you can get its access key and save it to a key vault, I think you can do that.First of all you should define that your key vault is dependson Azure Storage account. So that Azure Storage account will be created first. Based on this doc, we can read access key from the first created Azure Storage account and save it into your Azure Key Vault. Of course in Microsoft.KeyVault/vaults/secrets you also need set :
"dependsOn": [
"[resourceId('Microsoft.KeyVault/vaults', parameters('keyVaultName'))]"
]
to make sure that your key vault has been created.
You can create scopes in Databricks backed by Azure Keyvault instead of using the Databricks CLI. However, when you try to create a Scope, an obscure error message (with a spelling mistake!) is shown. It appears as not many people encounter this error:
"Internal error happened while granting read/list permission to Databricks ervice principal to KeyVault: XYZ"
Setting the Manage Principal to All Users does NOT help in this case.
I figured that this was a Service Principal issue in Azure AD. This particular user I was logged on to Databricks with was not an AD contributer and only had Contributer role on the Databricks and Keyvault service. I could not find any default Object ID in AD for Databricks so I assumed it was creating a service principal on the fly and connecting Databricks with Keyvault (I might be wrong here - it might already exist in AD when you enable the Databricks resource provider).
Logging in as an Admin with the rights to create service principals solved the problem. After that you can see in the Key Vault the DB service principal used in for the key retrieval:
As mentioned by #rcabr in his above comment there is already an SP by name 'AzureDatabricks' inside Enterprize Application, you need to get the object id details and add it in the access policy of the key vault. With this, the Databricks will be able to access the KeyVault