Synapse LINK Load streaming DataFrame from Azure Cosmos DB container - azure

I am trying to use feed changes on synapse, I am using synapse link to connect to cosmos,
dfStream = spark.readStream\
.format("cosmos.oltp")\
.option("spark.synapse.linkedService", "<enter linked service name>")\
.option("spark.cosmos.container", "<enter container name>")\
.option("spark.cosmos.changeFeed.readEnabled", "true")\
.option("spark.cosmos.changeFeed.startFromTheBeginning", "true")\
.option("spark.cosmos.changeFeed.checkpointLocation", "/localReadCheckpointFolder")\
.option("spark.cosmos.changeFeed.queryName", "streamQuery")\
.load()
But I'm getting the error below:
Error : org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsRestOperationException: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, DELETE, https://adlsgarage7.dfs.core.windows.net/adlsgarage7/localReadCheckpointFolder/streamQuery?

You need the permission to access as a contributor the container of the Data Lake Account that has been connected to the workspace at the time of creation. You need Blob Storage Contributor ARM access to the account adlsgarage7 or at least the container adlsgarage7.
You should also make sure to write the name of the linked service you connect to and the container.

Related

DataBricks UnityCatalog create table fails with "Failed to acquire a SAS token UnauthorizedAccessException: PERMISSION_DENIED: request not authorized"

I'm new to DataBricks Unity Catalog and I'm trying to follow the quickstart notebook on https://docs.databricks.com/_static/notebooks/unity-catalog-example-notebook.html.
It seems to me I did whatever I had to do:
I created a Databricks access connector in Azure (which becomes a managed identity)
I created a storage Account ADLS Gen2 (DAtalake with hierarchical namespace) plus container
On my datalake container I assigned Storage Blob Data Contributor role to the managed identity above
I created a new Databricks Premium Workspace
I created a new metastore in Unity Catalog that "binds" the access connector to the DataLake
Bound the metastore to the premium databricks workspace
I gave my Databricks user Admin permission on the above Databricks workspace
I created a new cluster in the same premium workspaces, choosing framework 11.1 and "single user" access mode
I ran the workspace, which correctly created a new catalog, assinged proper rights to it, created a schema, confirmed that I am the owner for that schema
The only (but most important) SQL command of the same notebook that fails is the one that tries to create a managed Delta table and insert two records:
CREATE TABLE IF NOT EXISTS quickstart_catalog_mauromi.quickstart_schema_mauromi.quickstart_table
(columnA Int, columnB String) PARTITIONED BY (columnA);
When I run it, it starts working and in fact it starts creating the folder structure for this delta table in my storage account
, however then it fails with the following error:
java.util.concurrent.ExecutionException: Failed to acquire a SAS token for list on /data/a3b9da69-d82a-4e0d-9015-51646a2a93fb/tables/eab1e2cc-1c0d-4ee4-9a57-18f17edcfabb/_delta_log due to java.util.concurrent.ExecutionException: com.databricks.sql.managedcatalog.acl.UnauthorizedAccessException: PERMISSION_DENIED: request not authorized
Please consider that I didn't have any folder created under "unity-catalog" container before running the table creation command. So it seems that is can successfully create the folder structure, but after it creates the "table" folder, it can't acquare "the SAS token".
So I can't understand since I am an admin in this workspace and since Databricks managed identity is assigned the contributor role on the storage container, and since Databricks actually starts creating the other folders. What else should I configure?
I found it: you need to only to assign, at container level, the Storage Blob Data Contributor role to the Azure Databricks Connector. In fact, you need to assign the same role and the same connector at STORAGE ACCOUNT level.
I couldn't find this information in the documentation and I frankly can't understand why this is needed since the delta table path was created.
However, this way, it works.
I solved this issue by doing the following:
Grant the "Access Connector for Azure Databricks" the permission "Storage Blob Data Reader" at the Storage Account level.
Grant the "Access Connector for Azure Databricks" the permission "Storage Blob Data Contributor" at the container level used by the workspace.
That keeps the permissions a bit more restrictive without having to go down the 'Owner' level.

Could not create lake database from synapse notebooks

New to azure synapse, trying to create database (Managed table) from synapse notebook. I also added Storage blob data contributor for synapse workspace and specific user. I have attached the error details.
%%SQL
CREATE DATABASE sample
Error: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: java.nio.file.AccessDeniedException Operation failed: "This request is not authorized to perform this operation.", 403, HEAD, https://XXXXXXXXXX.dfs.core.windows.net/XXXXXXXXXXXXX/?upn=false&action=getAccessControl&timeout=90)
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:193)
org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:137)
org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:124)
org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:153)
org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:151)
org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$2(HiveSessionStateBuilder.scala:60)
org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager$lzycompute(SessionCatalog.scala:99)
org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager(SessionCatalog.scala:99)
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:218)
org.apache.spark.sql.execution.command.CreateDatabaseCommand.run(ddl.scala:82)
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
The error indicates, that your account doesn't have enough permissions to the workspace. Can you please make sure to check whether the blob storage account role is assigned to Storage Blob Data Contributor or not.
you can also go through here for permissions.

Azure Data Storage Access from Databrikcs

I can not access Azure Data Lake Storage from Databrikcs.
I have no premium Azure Databricks service. I am trying to access ADLS Gen 2 Directly as per latest documentation: https://learn.microsoft.com/en-us/azure/databricks/data/data-sources/azure/adls-gen2/azure-datalake-gen2-sp-access#access-adls-gen2-directly
I have granted the service principle "Contributor permissions" on this account
This is the Error message from notebook:
Operation failed: "This request is not authorized to perform this operation using this permission.", 403, GET, https://geolocationinc.dfs.core.windows.net/instruments?upn=false&resource=filesystem&maxResults=500&timeout=90&recursive=false, AuthorizationPermissionMismatch, "This request is not authorized to perform this operation using this permission. ...;
this is my spark config setup:
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account-name>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account-name>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account-name>.dfs.core.windows.net", dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"))
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account-name>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")```
The correct role is "Storage Blob Data Contributor" not "Contributor".

AuthenticationException when creating Azure ML Dataset from Azure Data Lake Gen2 Datastore

I have an Azure Data Lake Gen2 with public endpoint and a standard Azure ML instance.
I have created both components with my user and I am listed as Contributor.
I want to use data from this data lake in Azure ML.
I have added the data lake as a Datastore using Service Principal authentication.
I then try to create a Tabular Dataset using the Azure ML GUI I get the following error:
Access denied
You do not have permission to the specified path or file.
{
"message": "ScriptExecutionException was caused by StreamAccessException.\n StreamAccessException was caused by AuthenticationException.\n 'AdlsGen2-ListFiles (req=1, existingItems=0)' for '[REDACTED]' on storage failed with status code 'Forbidden' (This request is not authorized to perform this operation using this permission.), client request ID '1f9e329b-2c2c-49d6-a627-91828def284e', request ID '5ad0e715-a01f-0040-24cb-b887da000000'. Error message: [REDACTED]\n"
}
I have tried having our Azure Portal Admin, with Admin access to both Azure ML and Data Lake try the same and she gets the same error.
I tried creating the Dataset using Python sdk and get a similar error:
ExecutionError:
Error Code: ScriptExecution.StreamAccess.Authentication
Failed Step: 667ddfcb-c7b1-47cf-b24a-6e090dab8947
Error Message: ScriptExecutionException was caused by StreamAccessException.
StreamAccessException was caused by AuthenticationException.
'AdlsGen2-ListFiles (req=1, existingItems=0)' for 'https://mydatalake.dfs.core.windows.net/mycontainer?directory=mydirectory/csv&recursive=true&resource=filesystem' on storage failed with status code 'Forbidden' (This request is not authorized to perform this operation using this permission.), client request ID 'a231f3e9-b32b-4173-b631-b9ed043fdfff', request ID 'c6a6f5fe-e01f-0008-3c86-b9b547000000'. Error message: {"error":{"code":"AuthorizationPermissionMismatch","message":"This request is not authorized to perform this operation using this permission.\nRequestId:c6a6f5fe-e01f-0008-3c86-b9b547000000\nTime:2020-11-13T06:34:01.4743177Z"}}
| session_id=75ed3c11-36de-48bf-8f7b-a0cd7dac4d58
I have created Datastore and Datasets of both a normal blob storage and a managed sql database with no issues and I have only contributor access to those so I cannot understand why I should not be Authorized to add data lake. The fact that our admin gets the same error leads me to believe there are some other issue.
I hope you can help me identify what it is or give me some clue of what more to test.
Edit:
I see I might have duplicated this post: How to connect AMLS to ADLS Gen 2?
I will test that solution and close this post if it works
This was actually a duplicate of How to connect AMLS to ADLS Gen 2?.
The solution is to give the service principal that Azure ML uses to access the data lake the Storage Blob Data Reader access. And note you have to wait at least some minutes for this to have effect.

How to read a blob in Azure databricks with SAS

I'm new to Databricks. I write sample code to read Storage Blob in Azure Databricks.
blob_account_name = "sars"
blob_container_name = "mpi"
blob_sas_token =r"**"
ini_path = "58154388-b043-4080-a0ef-aa5fdefe22c8"
inputini = 'wasbs://%s#%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, ini_path)
spark.conf.set("fs.azure.sas.%s.%s.blob.core.windows.net"% (blob_container_name, blob_account_name), blob_sas_token)
print(inputini)
ini=sc.textFile(inputini).collect()
It throw error:
Container mpi in account sars.blob.core.windows.net not found
I guess it doesn't attach the SAS token in WASBS link, so that it doesn't permission to read the data.
How to attach the SAS in wasbs link.
This is excepted behaviour, you cannot access the read private storage from Databricks. In order to access private data from storage where firewall is enabled or when created in a vnet, you will have to Deploy Azure Databricks in your Azure Virtual Network then whitelist the Vnet address range in the firewall of the storage account. You could refer to configure Azure Storage firewalls and virtual networks.
WITH PRIVATE ACCESS:
When you have provided access level to "Private (no anonymous access)".
Output: Error message
shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container carona in account cheprasas.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.
WITH CONTAINER ACCESS:
When you have provided access level to "Container (Anonymous read access for containers and blobs)".
Output: You will able to see the output without any issue.
Reference: Quickstart: Run a Spark job on Azure Databricks using the Azure portal.

Resources