After completing tutorial 1, I am working on this tutorial 2 from Microsoft Azure team to run the following query (shown in step 3). But the query execution gives the error shown below:
Question: What may be the cause of the error, and how can we resolve it?
Query:
SELECT
TOP 100 *
FROM
OPENROWSET(
BULK 'https://contosolake.dfs.core.windows.net/users/NYCTripSmall.parquet',
FORMAT='PARQUET'
) AS [result]
Error:
Warning: No datasets were found that match the expression 'https://contosolake.dfs.core.windows.net/users/NYCTripSmall.parquet'. Schema cannot be determined since no files were found matching the name pattern(s) 'https://contosolake.dfs.core.windows.net/users/NYCTripSmall.parquet'. Please use WITH clause in the OPENROWSET function to define the schema.
NOTE: The path of the file in the container is correct, and actually I generated the following query just by right clicking the file inside container and generated the script as shown below:
Remarks:
Azure Data Lake Storage Gen2 account name: contosolake
Container name: users
Firewall settings used on the Azure Data lake account:
Azure Data Lake Storage Gen2 account is allowing public access (ref):
Container has required access level (ref)
UPDATE:
The owner of the subscription is someone else, and I did not get the option Check the "Assign myself the Storage Blob Data Contributor role on the Data Lake Storage Gen2 account" box described in item 3 of Basics tab > Workspace details section of tutorial 1. I also do not have permissions to add roles - although I'm the owner of synapse workspace. So I am using workaround described in the Configure anonymous public read access for containers and blobs from Azure team.
--Workaround
If you are unable to granting Storage Blob Data Contributor, use ACL to grant permissions.
All users that need access to some data in this container also needs
to have the EXECUTE permission on all parent folders up to the root
(the container). Learn more about how to set ACLs in Azure Data Lake
Storage Gen2.
Note:
Execute permission on the container level needs to be set within the
Azure Data Lake Gen2. Permissions on the folder can be set within
Azure Synapse.
Go to the container holding NYCTripSmall.parquet.
--Update
As per your update in comments, it seems you would have to do as below.
Contact the Owner of the storage account, and ask them to perform the following tasks:
Assign the workspace MSI to the Storage Blob Data Contributor role on
the storage account
Assign you to the Storage Blob Data Contributor role on the storage
account
--
I was able to get the query results following the tutorial doc you have mentioned for the same dataset.
Since you confirm that the file is present and in the right path, refresh linked ADLS source and publish query before running, just in case if a transient issue.
Two things I suspect are
Try setting Microsoft network routing in Network Routing settings in ADLS account.
Check if built-in pool is online and you have atleast contributer roles on both Synapse workspace and Storage account. (If the current credentials using to run the query has not created the resources)
Related
I'm new to DataBricks Unity Catalog and I'm trying to follow the quickstart notebook on https://docs.databricks.com/_static/notebooks/unity-catalog-example-notebook.html.
It seems to me I did whatever I had to do:
I created a Databricks access connector in Azure (which becomes a managed identity)
I created a storage Account ADLS Gen2 (DAtalake with hierarchical namespace) plus container
On my datalake container I assigned Storage Blob Data Contributor role to the managed identity above
I created a new Databricks Premium Workspace
I created a new metastore in Unity Catalog that "binds" the access connector to the DataLake
Bound the metastore to the premium databricks workspace
I gave my Databricks user Admin permission on the above Databricks workspace
I created a new cluster in the same premium workspaces, choosing framework 11.1 and "single user" access mode
I ran the workspace, which correctly created a new catalog, assinged proper rights to it, created a schema, confirmed that I am the owner for that schema
The only (but most important) SQL command of the same notebook that fails is the one that tries to create a managed Delta table and insert two records:
CREATE TABLE IF NOT EXISTS quickstart_catalog_mauromi.quickstart_schema_mauromi.quickstart_table
(columnA Int, columnB String) PARTITIONED BY (columnA);
When I run it, it starts working and in fact it starts creating the folder structure for this delta table in my storage account
, however then it fails with the following error:
java.util.concurrent.ExecutionException: Failed to acquire a SAS token for list on /data/a3b9da69-d82a-4e0d-9015-51646a2a93fb/tables/eab1e2cc-1c0d-4ee4-9a57-18f17edcfabb/_delta_log due to java.util.concurrent.ExecutionException: com.databricks.sql.managedcatalog.acl.UnauthorizedAccessException: PERMISSION_DENIED: request not authorized
Please consider that I didn't have any folder created under "unity-catalog" container before running the table creation command. So it seems that is can successfully create the folder structure, but after it creates the "table" folder, it can't acquare "the SAS token".
So I can't understand since I am an admin in this workspace and since Databricks managed identity is assigned the contributor role on the storage container, and since Databricks actually starts creating the other folders. What else should I configure?
I found it: you need to only to assign, at container level, the Storage Blob Data Contributor role to the Azure Databricks Connector. In fact, you need to assign the same role and the same connector at STORAGE ACCOUNT level.
I couldn't find this information in the documentation and I frankly can't understand why this is needed since the delta table path was created.
However, this way, it works.
I solved this issue by doing the following:
Grant the "Access Connector for Azure Databricks" the permission "Storage Blob Data Reader" at the Storage Account level.
Grant the "Access Connector for Azure Databricks" the permission "Storage Blob Data Contributor" at the container level used by the workspace.
That keeps the permissions a bit more restrictive without having to go down the 'Owner' level.
I am new to azure. We have azure data lake storage set. I am trying to set the link services from the data factory to the azure data lake storage gen2. It keeps failing when I test the link service to the data lake storage. As far as I can see, I have granted the "Storage blob contributor" role to the user in the azure data lake storage. I still keep getting permission denied error when I test the link services
ADLS Gen2 operation failed for: Storage operation '' on container 'testconnection' get failed with 'Operation returned an invalid status code 'Forbidden''. Possible root causes: (1). It's possible because the service principal or managed identity don't have enough permission to access the data. (2). It's possible because some IP address ranges of Azure Data Factory are not allowed by your Azure Storage firewall settings. Azure Data Factory IP ranges please refer https://learn.microsoft.com/en-us/azure/data-factory/azure-integration-runtime-ip-addresses.. Account: 'dlsisrdatapoc001'. ErrorCode: 'AuthorizationFailure'. Message: 'This request is not authorized to perform this operation.'.
What I could observe is that when I open the network to all (public) in the data lake storage, it works, when I set the firewall with CIDR it fails. Couldn't narrow the cause of the problem. I do have the "Allow azure services on the trusted services list to access this account" checked.
Completely lost
As mentioned in the error description, the error usually occurs if you don't have sufficient permissions to perform the action or if you don't add the required IPs in the firewall settings of your storage account.
To resolve the error, please check if you added the Storage Blob Data Contributor role to your managed identity along with the user like below:
Go to Azure Portal -> Storage Accounts -> Your Storage Account -> Access Control (IAM) ->Add role assignment
Make sure to select the managed identity, based on the authentication method you selected while creating linked service.
As mentioned in this MsDoc, make sure to add all the required IPs based on your resource location and service tag.
Download the JSON file to know the IP range for service tag in your resource location and add them in the firewall settings like below:
Make sure to select the Resource type as
Microsoft.DataFactory/factories while choosing CIDR.
For more in detail, please refer below links:
Error when I am trying to connect between Azure Data factory and Azure Data lake Gen2 by Anushree Garg
Storage Accoung V2 access with firewall, VNET to data factory V2 by Cindy Pau
Hey everyone I am trying to connect Powerbi to my data lake gen 2 on azure I am set as
Storage Blob Data Contributor Aswell as Storage Blob Data Reader on the Storage account level i am not sure if I am doing something wrong but I also used the MS docs yet nothing
https://learn.microsoft.com/en-us/power-query/connectors/datalakestorage
The URL is formatted according to this format https://.dfs.core.windows.net//
To resolve Access to the resource is forbidden error, try following ways:
As suggested by Etienne Oosthuysen, check the date-time settings of your system.
As per documentation:
Only this format is supported https://<accountname>.dfs.core.windows.net/<container>
It doesn't support filename or subfolder like this,https://<accountname>.dfs.core.windows.net/<container>/<filename> or https://<accountname>.dfs.core.windows.net/<container>/<subfolder>
You can refer to Get Data from Azure Data Lake Gen 2 : Access to the resource is forbidden
After disabling both Enable soft delete for blobs and containers in azure storage account connection issue was resolved.
To Change Settings
Go To Azure storage account ---> Data Protection--->Disable soft delete for blobs/Enable soft delete for containers
I uploaded my CSV file into my Azure Data Lake Storage Gen2 using Azure Synapse portal. Then I tried select Top 100 rows and got an error after running auto-generated SQL.
Auto-generated SQL:
SELECT
TOP 100 *
FROM
OPENROWSET(
BULK 'https://accountname.dfs.core.windows.net/filesystemname/test_file/contract.csv',
FORMAT = 'CSV',
PARSER_VERSION='2.0'
) AS [result]
Error:
File 'https://accountname.dfs.core.windows.net/filesystemname/test_file/contract.csv'
cannot be opened because it does not exist or it is used by another process.
This error in Synapse Studio has link (which leads to self-help document) underneath it which explains the error itself.
Do you have rights needed on the storage account?
You must have Storage Blob Data Contributor or Storage Blob Data Reader in order for this query to work.
Summary from the docs:
You need to have a Storage Blob Data Owner/Contributor/Reader role to
use your identity to access the data. Even if you are an Owner of a
Storage Account, you still need to add yourself into one of the
Storage Blob Data roles.
Check out the full documentation for Control Storage account access for serverless SQL pool
If your storage account is protected with firewall rules then take a look at this stack overflow answer.
Reference full docs article.
I just took your code & updated the path to what I have and it worked just worked fine
SELECT
TOP 100 *
FROM
OPENROWSET(
BULK 'https://XXX.dfs.core.windows.net/himanshu/NYCTaxi/PassengerCountStats.csv',
FORMAT = 'CSV',
PARSER_VERSION='2.0'
) AS [result]
Please check if the path to which you have uploaded the file and the one used in the script is the same .
You can do this to check that
Navigate to WS -> Data -> ADLS gen2 -> Go to the file -> right click go to the property and copy the Uri from there paste in the script .
I'm connecting ADF to blob storage v2 using a managed identity following this doc: Doc1
When it comes to test the connection with my first dataset, I am successful when I test the connection to the linkedservice. When I try by the filepath, and enter "testfolder" (which exists in the blob) it fails returning a generic forbidden error displayed at the end of this post.
However, when I opt to "browse" the folders in the dataset portal, the folder "testfolder" does show up. But when I select it, it will not show me anything within that folder.
The Data Factory managed instance is given the role of Contributor, granting full access to manage all resources. Is there some other hidden issue or possible way to narrow down the issue? My instinct is that this is something within the blob container since I can view the containers, but not their contents.
Error message:
It seems that you don't give the role of azure blob storage.
Please fellow this:
1.click IAM in azure blob storage,navigate to Role assignments and add role assignment.
2.choose role according your need and select your data factory.
3.A few minute later,you can retry to choose file path.
Hope this can help you.