Azure Data Storage Access from Databrikcs - azure

I can not access Azure Data Lake Storage from Databrikcs.
I have no premium Azure Databricks service. I am trying to access ADLS Gen 2 Directly as per latest documentation: https://learn.microsoft.com/en-us/azure/databricks/data/data-sources/azure/adls-gen2/azure-datalake-gen2-sp-access#access-adls-gen2-directly
I have granted the service principle "Contributor permissions" on this account
This is the Error message from notebook:
Operation failed: "This request is not authorized to perform this operation using this permission.", 403, GET, https://geolocationinc.dfs.core.windows.net/instruments?upn=false&resource=filesystem&maxResults=500&timeout=90&recursive=false, AuthorizationPermissionMismatch, "This request is not authorized to perform this operation using this permission. ...;
this is my spark config setup:
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account-name>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account-name>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account-name>.dfs.core.windows.net", dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"))
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account-name>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")```

The correct role is "Storage Blob Data Contributor" not "Contributor".

Related

Trying to set a link services through registerd app to azure data lake storage and keep getting 24200 error

I am new to azure. We have azure data lake storage set. I am trying to set the link services from the data factory to the azure data lake storage gen2. It keeps failing when I test the link service to the data lake storage. As far as I can see, I have granted the "Storage blob contributor" role to the user in the azure data lake storage. I still keep getting permission denied error when I test the link services
ADLS Gen2 operation failed for: Storage operation '' on container 'testconnection' get failed with 'Operation returned an invalid status code 'Forbidden''. Possible root causes: (1). It's possible because the service principal or managed identity don't have enough permission to access the data. (2). It's possible because some IP address ranges of Azure Data Factory are not allowed by your Azure Storage firewall settings. Azure Data Factory IP ranges please refer https://learn.microsoft.com/en-us/azure/data-factory/azure-integration-runtime-ip-addresses.. Account: 'dlsisrdatapoc001'. ErrorCode: 'AuthorizationFailure'. Message: 'This request is not authorized to perform this operation.'.
What I could observe is that when I open the network to all (public) in the data lake storage, it works, when I set the firewall with CIDR it fails. Couldn't narrow the cause of the problem. I do have the "Allow azure services on the trusted services list to access this account" checked.
Completely lost
As mentioned in the error description, the error usually occurs if you don't have sufficient permissions to perform the action or if you don't add the required IPs in the firewall settings of your storage account.
To resolve the error, please check if you added the Storage Blob Data Contributor role to your managed identity along with the user like below:
Go to Azure Portal -> Storage Accounts -> Your Storage Account -> Access Control (IAM) ->Add role assignment
Make sure to select the managed identity, based on the authentication method you selected while creating linked service.
As mentioned in this MsDoc, make sure to add all the required IPs based on your resource location and service tag.
Download the JSON file to know the IP range for service tag in your resource location and add them in the firewall settings like below:
Make sure to select the Resource type as
Microsoft.DataFactory/factories while choosing CIDR.
For more in detail, please refer below links:
Error when I am trying to connect between Azure Data factory and Azure Data lake Gen2 by Anushree Garg
Storage Accoung V2 access with firewall, VNET to data factory V2 by Cindy Pau

Could not create lake database from synapse notebooks

New to azure synapse, trying to create database (Managed table) from synapse notebook. I also added Storage blob data contributor for synapse workspace and specific user. I have attached the error details.
%%SQL
CREATE DATABASE sample
Error: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: java.nio.file.AccessDeniedException Operation failed: "This request is not authorized to perform this operation.", 403, HEAD, https://XXXXXXXXXX.dfs.core.windows.net/XXXXXXXXXXXXX/?upn=false&action=getAccessControl&timeout=90)
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:193)
org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:137)
org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:124)
org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:153)
org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:151)
org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$2(HiveSessionStateBuilder.scala:60)
org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager$lzycompute(SessionCatalog.scala:99)
org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager(SessionCatalog.scala:99)
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:218)
org.apache.spark.sql.execution.command.CreateDatabaseCommand.run(ddl.scala:82)
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
The error indicates, that your account doesn't have enough permissions to the workspace. Can you please make sure to check whether the blob storage account role is assigned to Storage Blob Data Contributor or not.
you can also go through here for permissions.

Write data to Azure Data Lake Storage Gen 2 using Azure Synapse Analytics notebook

I am connecting to a RESTful api using Azure Synapse Analytics notebook and write the json file to Azure Data Lake Storage Gen 2.
pyspark code:
import requests
response = requests.get('https://api.web.com/v1/data.json')
data = response.json()
from pyspark.sql import *
df = spark.read.json(sc.parallelize([data]))
from pyspark.sql.types import *
account_name = "name of account"
container_name = "name of container"
relative_path = "name of file path" #abfss://<container_name>#<storage_account_name>.dfs.core.windows.net/<path>
adls_path = 'abfss://%s#%s.dfs.core.windows.net/%s' % (container_name, account_name, relative_path)
spark.conf.set('fs.%s#%s.dfs.core.windows.net/%s' % (container_name, account_name), "account_key") #not sure I'm doing the configuration right
df.write.mode("overwrite").json(adls_path)
Error:
Py4JJavaError : An error occurred while calling o536.json.
: Operation failed: "This request is not authorized to perform this operation.", 403, HEAD, https://storageaccount.dfs.core.windows.net/container/?upn=false&action=getAccessControl&timeout=90
Note: Storage Blob Data Contributor: Use to grant read/write/delete permissions to Blob storage resources.
If you are not assigning Storage Blob Data Contributor to users who are accessing the storage account, they will be not able to access the data from ADLS gen2 due to the lack of permission on the storage account.
If they try to access data from ADLS gen2 without the "Storage Blob Data Contributor" role on the storage account, they will receive the error message: Operation failed: "This request is not authorized to perform this operation.",403.
Once the storage account is created, select Access control (IAM) from the left navigation. Then assign the following roles or ensure they are already assigned.
Assign yourself to the Storage Blob Data Owner role on the Storage Account.
After granting Storage Blob Data Contributor role on the storage account wait for 5-10 minutes and re-try the operation.

AuthenticationException when creating Azure ML Dataset from Azure Data Lake Gen2 Datastore

I have an Azure Data Lake Gen2 with public endpoint and a standard Azure ML instance.
I have created both components with my user and I am listed as Contributor.
I want to use data from this data lake in Azure ML.
I have added the data lake as a Datastore using Service Principal authentication.
I then try to create a Tabular Dataset using the Azure ML GUI I get the following error:
Access denied
You do not have permission to the specified path or file.
{
"message": "ScriptExecutionException was caused by StreamAccessException.\n StreamAccessException was caused by AuthenticationException.\n 'AdlsGen2-ListFiles (req=1, existingItems=0)' for '[REDACTED]' on storage failed with status code 'Forbidden' (This request is not authorized to perform this operation using this permission.), client request ID '1f9e329b-2c2c-49d6-a627-91828def284e', request ID '5ad0e715-a01f-0040-24cb-b887da000000'. Error message: [REDACTED]\n"
}
I have tried having our Azure Portal Admin, with Admin access to both Azure ML and Data Lake try the same and she gets the same error.
I tried creating the Dataset using Python sdk and get a similar error:
ExecutionError:
Error Code: ScriptExecution.StreamAccess.Authentication
Failed Step: 667ddfcb-c7b1-47cf-b24a-6e090dab8947
Error Message: ScriptExecutionException was caused by StreamAccessException.
StreamAccessException was caused by AuthenticationException.
'AdlsGen2-ListFiles (req=1, existingItems=0)' for 'https://mydatalake.dfs.core.windows.net/mycontainer?directory=mydirectory/csv&recursive=true&resource=filesystem' on storage failed with status code 'Forbidden' (This request is not authorized to perform this operation using this permission.), client request ID 'a231f3e9-b32b-4173-b631-b9ed043fdfff', request ID 'c6a6f5fe-e01f-0008-3c86-b9b547000000'. Error message: {"error":{"code":"AuthorizationPermissionMismatch","message":"This request is not authorized to perform this operation using this permission.\nRequestId:c6a6f5fe-e01f-0008-3c86-b9b547000000\nTime:2020-11-13T06:34:01.4743177Z"}}
| session_id=75ed3c11-36de-48bf-8f7b-a0cd7dac4d58
I have created Datastore and Datasets of both a normal blob storage and a managed sql database with no issues and I have only contributor access to those so I cannot understand why I should not be Authorized to add data lake. The fact that our admin gets the same error leads me to believe there are some other issue.
I hope you can help me identify what it is or give me some clue of what more to test.
Edit:
I see I might have duplicated this post: How to connect AMLS to ADLS Gen 2?
I will test that solution and close this post if it works
This was actually a duplicate of How to connect AMLS to ADLS Gen 2?.
The solution is to give the service principal that Azure ML uses to access the data lake the Storage Blob Data Reader access. And note you have to wait at least some minutes for this to have effect.

Synapse LINK Load streaming DataFrame from Azure Cosmos DB container

I am trying to use feed changes on synapse, I am using synapse link to connect to cosmos,
dfStream = spark.readStream\
.format("cosmos.oltp")\
.option("spark.synapse.linkedService", "<enter linked service name>")\
.option("spark.cosmos.container", "<enter container name>")\
.option("spark.cosmos.changeFeed.readEnabled", "true")\
.option("spark.cosmos.changeFeed.startFromTheBeginning", "true")\
.option("spark.cosmos.changeFeed.checkpointLocation", "/localReadCheckpointFolder")\
.option("spark.cosmos.changeFeed.queryName", "streamQuery")\
.load()
But I'm getting the error below:
Error : org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsRestOperationException: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, DELETE, https://adlsgarage7.dfs.core.windows.net/adlsgarage7/localReadCheckpointFolder/streamQuery?
You need the permission to access as a contributor the container of the Data Lake Account that has been connected to the workspace at the time of creation. You need Blob Storage Contributor ARM access to the account adlsgarage7 or at least the container adlsgarage7.
You should also make sure to write the name of the linked service you connect to and the container.

Resources