I'am trying to setup connection between Databricks and Azure data lake storage gen2 using Unity Catalog External Locations feature.
Assumptions:
Adls is behind private endpoint
Databricks workspace is in private vnet, i've added Private and Public subnet of the workspace to ADLS account in "Firewalls and virtual networks" (service endpoint)
I've grant the ACL's to the service principal on container lvl of the storage account.
After creating service principal with Storage Blob Data Contributor role (i've also tried Storage Blob Data Owner, Storage Account Contributor and Contributor roles) and creating storage credentials with External Location associated with it, i got an error:
Error in SQL statement: UnityCatalogServiceException: [RequestId=6f9a0a07-513c-45a5-b2aa-a67dd7d7e662 ErrorClass=INVALID_STATE] Failed to access cloud storage: AbfsRestOperationException
on the other hand:
After creating mount connection using the same service prinicpal i am able to connect the storage and write/read data to it.
Do you have any ideas?
When i try connect to the Adls using Managed Identity with the "Access Connector" the problem is gone, but it is now in public preview:
https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/azure-managed-identities
I have the same issue. I did notice then when the storage account network firewall is disabled on the datalake it works using the service principle as the storage credential.
I tried to add the public IP addresses from databricks found here but that did fail as well.
Not sure how (from what IP address) to discover how Unity Catalog connects to the storage account.
I have raised a support ticket with Microsoft and Databricks, will update once i hear more.
I fixed the issue by creating two Databricks connectors, one for accessing the metastore storage account and the other for accessing the data lake store account.
Related
I have a storage account in a subscription which has a VNet that the storage account is setup to use. This works well in the kubernetes cluster in that subscription that attached to that Vnet. NFS works fine to the the storage account in question.
But we have a secondary subscription for failover in a paired region (East US and West US) that I'd like to have that k8s cluster also be able to mount the NFS share.
I've tried creating a peering and adding the secondary subscription's VNet (which doesn't overlap) to the Storage account, but the k8s cluster in the secondary subscription times out connecting the share.
I didn't do any routing options when creating the peering, but I would have assumed that this would just work.
Does anyone have any instructions on how to get this working so that the secondary cluster can access the NFS share?
The storage sync service and/or storage account can be moved to a different resource group, subscription, or Azure AD tenant. After the storage sync service or storage account is moved, you need to give the Microsoft.StorageSync application access to the storage account.
Click Access control (IAM) on the left-hand table of contents.
Click the Role assignments tab to the list the users and applications (service principals) that have access to your storage account.
Verify Microsoft.StorageSync or Hybrid File Sync Service (old application name) appears in the list with the Reader and Data Access role.
This GitHub document on Azure file share can give you better insights.
Hello Azure Data factory experts,
I have this error I am trying to connect from Azure data factory to Data lake Gen2 by creating Linked services in Azure data factory, but I got this error.
How can anyone help?
BR,
Mohammed
ADLS Gen2 operation failed for: Storage operation '' on container
'XXXXXXXXXX' get failed with 'Operation returned an invalid status
code 'Forbidden''. Possible root causes: (1). It's possible because
the service principal or managed identity don't have enough permission
to access the data. (2). It's possible because some IP address ranges
of Azure Data Factory are not allowed by your Azure Storage firewall
settings. Azure Data Factory IP ranges please refer
https://learn.microsoft.com/en-us/azure/data-factory/azure-integration-runtime-ip-addresses..
Account: 'azurestoragforalltype'. ErrorCode:
'AuthorizationPermissionMismatch'. Message: 'This request is not
authorized to perform this operation using this permission.'.
RequestId: '83628e11-d01f-0020-33c3-cc8430000000'. TimeStamp: 'Fri, 29
Oct 2021 12:49:39 GMT'.. Operation returned an invalid status code
'Forbidden' Activity ID: ef66fc97-6bbb-4d4e-99fb-95c0ba3a4c31.
Hi you should assign yourself at least contributor access to the lake and then try connect to the lake using managed identity from adf, this should solve your problem
Below are the different authentication types which Azure Data Lake Storage Gen2 connector supports:
Account key authentication
Service principal authentication
System-assigned managed identity authentication
User-assigned managed identity authentication
As per your error message you might be using service principal or managed identity authentication method in Azure data lake Gen2 connector.
You must grant proper permissions for service principal/managed identity. Grant at least Execute permission for ALL upstream folders and the file system, along with Read permission for the files to copy. Alternatively, in Access control (IAM), grant at least the Storage Blob Data Reader role.
You can check this document to see examples on how the permissions works in Azure data lake Gen2.
I tried to export an azure Sql database to an azure blob storage via the Azuer portal and got an error:
Error encountered during the service operation. ;
Exception Microsoft.SqlServer.Management.Dac.Services.ServiceException:Unexpected exception encountered while retrieving metadata for blob https://<blobstoragename>.blob.core.windows.net/databases/<databaseName>_12.10.2020-11:13:24.bacpac;.; Inner exception Microsoft.WindowsAzure.Storage.StorageException:The remote server returned an error: (403) Forbidden.;
Inner exception System.Net.WebException:The remote server returned an error: (403) Forbidden.
In the blob storage account's firewall settings all networks access is denied. It's only possible to connect for selected networks and I activated the option "Allow trusted Microsoft services to access this storage account". The Sql Server and the storage have an private endpoint connection to the same network.
I setup an vm in the same network which was able to access the blob storage.
Is it possible to export a sql database to the azure storage when the public network access is denied? If yes, which setting am I missing?
According to my research, when exporting a SQL database to the azure storage, the Azure Storage account behind a firewall is currently not supported. For more details, please refer to here. Besides, you can vote up the feedback to make Microsoft improve the features.
Is it possible to export a sql database to the azure storage when the public network access is denied?
Yes, it's impossible. But it will limit the access according the IP address.
If we only set the Storage firewall settings: Allow access from Selected network and Allow trusted Microsoft services to access this storage account, we will get the 403 error when access the storage from Azure SQL database.
The thing you missed is that when we set Allow access from Selected network, the Storage firewall will be more like Azure SQL database firewall settings! We can see there is an client IP in Firewall setting. We must add the client IP to the firewall then Azure SQL database could access it.
We have azure analysis services setup that pulls data from ADLS Gen2 (JSON files). When we try to process or model the tables from within SSMS it throws the following error - –
Failed to save modifications to the server. Error returned: 'The
credentials provided for the AzureBlobs source are invalid.
When I open up the storage to all networks then no issues. However I am worried about the security aspect opening up storage account like that.
My Quesiton is : Any pointers to why SSMS would throw such an error?
Tried to create SP as admin on AAS server and added the same SP to storage blob as contributor but no luck.
Add contributor will never help for solve this problem.
As you can see in practice, this has nothing to do with RBAC roles.
The key to the problem lies in the network, you need to set the storage firewall.
You have two ways:
1, Add the outbound ip of the service you are using to the allowed list of storage .
2, Or integrate your service with Azure VNET, and then add this virtual network to the allow list of the storage firewall.
This is all of the ip address of azure service we can get:(You need to add the ip address of the corresponding service to the allowed list about firewall in storage.)
https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20200824.json
As a newbie of Azure, I plan to build a cloud computing service with a free trial account.
I first created a Storage account. The Deployment model is Resource Manager as recommended so that I chose Blob storage as the Account kind.
Then I created an HDInsight cluster. But in the Data source configuration, the aforementioned Blob storage account can not be selected but with a warning - Could not reach the storage!. However, If I have created the Storage account with Classic as the Deployment model, the created Storage account can be selected as the Data source.
Anyone have any idea about why is it so?
Thanks in advance! I got stuck up here for long time
If you have selected 'Resource Manger' as the Deployment model, then the storage account should be of type 'general purpose azure blob storage account', you might have created azure blob only storage type account.