I am trying to read keyvault secret from Synapse notebook using:
s = TokenLibrary.getSecret(kv, secret_name)
It works when I am running it in debug mode, but fails when it is scheduled. I granted Synapse server managed identity Get and List secret policy. What is different when it is scheduled?
Related
I am trying to create a keyvault backed scope in databricks. I am able to successfully create the scope but when I try to add a key to the scope I see the following error:
Error: b'{"error_code":"BAD_REQUEST","message":"Cannot write secrets to Azure KeyVault-backed scope abc"}'
These are steps I have followed and all commands were run on windows cmd:
Create key vault in Azure
Generate AAD token for databricks - az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d
Add AAD token to environment variables on windows
Add AAD token to databricks cfg file on windows - databricks configure --aad-token
Create scope - databricks secrets create-scope --scope abc --scope-backend-type AZURE_KEYVAULT --resource-id <keyvault-id> --dns-name <keyvault-dns> --initial-manage-principal users
Add key to scope - databricks secrets put --scope abc --key abc-key << this where I see the error
According to the documentation this is not possible:
To reference secrets stored in an Azure Key Vault, you can create a secret scope backed by Azure Key Vault. You can then leverage all of the secrets in the corresponding Key Vault instance from that secret scope. Because the Azure Key Vault-backed secret scope is a read-only interface to the Key Vault, the PutSecret and DeleteSecret Secrets API 2.0 operations are not allowed. To manage secrets in Azure Key Vault, you must use the Azure SetSecret REST API or Azure portal UI.
Using Az CLI, you could use the az keyvault secret set command.
Can we use managed identities with databricks? What I'm actually trying to achieve is, I have a cluster in databricks, I want it to be able to access secrets or keys stored in an azure key vault.
We generally perform this with VM, by enabling the managed identity and allowing that identity via access policy or Role-based access policy(RBAC) in key vaults.
Can we leverage the concept of manged identities in a similar way with databricks as well? Or is there any other way possible which I can use to access the secrets in key vault from databricks clusters?
P.S. The secret accessed in key vault will be used in init script of the databricks cluster, to perform decrypt opertations.
Managed identity in Azure Databricks isn't supported yet. But right now you can pass the value of secret as an environment variable, and it will be available in your init script - just specify in cluster configuration:
MY_PASSWORD={{secrets/scope/key}}
and then use in the init script:
if [ -n "$MY_PASSWORD" ]; then
use password
else
exit 1
fi
I'm attempting to build out my DevOps pipeline to deploy a DataFactory, Databricks Notebooks & Azure Data Warehouse,
I have my resource subscriptions setup for both Dev and Prod. deploying to Prod is more tricky than it seems.
my keyvault has GET/LIST Permissions for both Secret & Keys for the Target DataFactory.
https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment
I have used the above guide to setup my target data factory in prod - and it is stood up correctly with all the connection strings setup and keyvault permissions set.
but I am stuck on this portion :
Grant permissions to the Azure Pipelines agent The Azure Key Vault
task may fail with an Access Denied error if the proper permissions
aren't present. Download the logs for the release, and locate the .ps1
file with the command to give permissions to the Azure Pipelines
agent. You can run the command directly, or you can copy the principal
ID from the file and add the access policy manually in the Azure
portal. Get and List are the minimum permissions required.
when I deploy my release I get the following error on the KeyVault task :
The specified Azure service connection needs to have Get, List secret management permissions on the selected key vault. To set these permissions, download the ProvisionKeyVaultPermissions.ps1 script from build/release logs and execute it, or set them from the Azure portal
I've added this power shell script ProvisionKeyVaultPermissions.ps1 to my repo and added it to my task but it just runs forever ? unsure if I'm missing something here.
hope this is clear/ please ask for any additional info.
I wonder if it's the DevOps service connection that's missing the permissions.
You can check access policies for the vault from the console. You should see your service connection as an APPLICATION; it needs the GET and LIST privileges as the document your following says. My understanding is that these are privileges for the account that's deploying your code, rather than the account that will run your code.
You can create scopes in Databricks backed by Azure Keyvault instead of using the Databricks CLI. However, when you try to create a Scope, an obscure error message (with a spelling mistake!) is shown. It appears as not many people encounter this error:
"Internal error happened while granting read/list permission to Databricks ervice principal to KeyVault: XYZ"
Setting the Manage Principal to All Users does NOT help in this case.
I figured that this was a Service Principal issue in Azure AD. This particular user I was logged on to Databricks with was not an AD contributer and only had Contributer role on the Databricks and Keyvault service. I could not find any default Object ID in AD for Databricks so I assumed it was creating a service principal on the fly and connecting Databricks with Keyvault (I might be wrong here - it might already exist in AD when you enable the Databricks resource provider).
Logging in as an Admin with the rights to create service principals solved the problem. After that you can see in the Key Vault the DB service principal used in for the key retrieval:
As mentioned by #rcabr in his above comment there is already an SP by name 'AzureDatabricks' inside Enterprize Application, you need to get the object id details and add it in the access policy of the key vault. With this, the Databricks will be able to access the KeyVault
I am trying to connect MS Azure databricks with data lake storage v2, and not able to match the client, secret scope and key.
I have data in a Azure data lake v2. I am trying to follow these instructions:
https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-datalake-gen2.html#requirements-azure-data-lake
I have created a 'service principle' with the role "Storage Blob Data Contributor", obtained
I have created secret scopes in both Azure Keyvault and Databricks with keys and values
when I try the code below, the authentication fails to recognize the secret scope & key. It is not clear to me from the documentation if it is necessary to use the Azure Keyvault or Databricks secret scope.
val configs = Map(
"fs.azure.account.auth.type" -> "OAuth",
"fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id" -> "<CLIENT-ID>",
"fs.azure.account.oauth2.client.secret" -> dbutils.secrets.get(scope = "<SCOPE-NAME>", key = "<KEY-VALUE>"),
"fs.azure.account.oauth2.client.endpoint" -> "https://login.microsoftonline.com/XXXXXXXXXX/oauth2/token")
If anybody could help on this, please advise / confirm:
what should be CLIENT-ID : I understand this to be from the storage account;
where should the SCOPE-NAME and KEY-VALUE be created, in Azure Keyvault or Databricks?
The XXXX in https://login.microsoftonline.com/XXXXXXXXXX/oauth2/token should be your TenantID (get this from the Azure Active Directory tab in the Portal > Properties > DirectoryID).
The Client ID is the ApplicationID/Service Principal ID (sadly these names are used interchangeably in the Azure world - but they are all the same thing).
If you have not created a service principal yet follow these instructions: https://learn.microsoft.com/en-us/azure/storage/common/storage-auth-aad-app#register-your-application-with-an-azure-ad-tenant - make sure you grant the service principal access to your lake once it is created.
You should create a scope and secret for the Principal ID Key - as this is something you want to hide from free text. You cannot create this in the Databricks UI (yet). Use one of these:
CLI - https://docs.databricks.com/user-guide/secrets/secrets.html#create-a-secret
PowerShell - https://github.com/DataThirstLtd/azure.databricks.cicd.tools/wiki/Set-DatabricksSecret
REST API - https://docs.databricks.com/api/latest/secrets.html#put-secret
Right now I do not think can create secrets in Azure KeyVault - though I expect to see that in the future. Technically you could manually integrate with Key Vault using their API's but it would give you another headache in needing a secret credential to connect to key vault.
I was facing the same issue , the only thing i did extra was to assign the default permission of the application to datalake gen2's blob container in azure storage explorer . It required the object id of the application , which is not the one available on the UI , it can be taken by using the command "az ad sp show --id " on azure-cli .
After assign the permission on blob container, create a new file, and then try to access it,