I'm trying to create a delta table from Azure Synapse Notebook. I was getting an error. I also added my current IP address to the storage account. I was able to write as a delta file but when I am trying to create a delta table it throws an error. I Checked all Microsoft documents for this issue they are telling me to add an IP address to the storage account. Is that anything I am missing or else it is a bug? Thanks in Advance.
Error: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: java.nio.file.AccessDeniedException Operation failed: "This request is not authorized to perform this operation.", 403, HEAD, https://xxxxxxxxxxxxxx.dfs.core.windows.net/xxxxxxxfilesystem/?upn=false&action=getAccessControl&timeout=90)org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:193)org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:137)org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:124)org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:44)org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:59)org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:98)org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:98)org.apache.spark.sql.catalyst.catalog.SessionCatalog.databaseExists(SessionCatalog.scala:266)
The error indicates, your account doesn't have enough permissions to the workspace.
Add the RBAC Storage Blob Data Contributor to the user that is running the notebook.
You can go through the links from here.
Related
I'm trying to create an Azure Synapse Link for Azure SQL Database, using the steps from here:
https://learn.microsoft.com/en-us/azure/synapse-analytics/synapse-link/connect-synapse-link-sql-database
After I create the link connection and I want to start it I receive the following error:
The connection to the sink database is failed. Detailed error message is: Login failed for user ''.
ConnectionToAzureDB
LinkConnection
Also I have configurated the Azure SQL database to use ADD Auth. The connection to the Azure Database seems to be working.
My user ( used to create the Synapse workspace is Subscription Owner)
The user is also owner of the storage account.
I added the SQL Managed Identity as Storage Blob Data Contributor
Did anyone else got this error and manage to fix it?
There are certain limitations while connecting SQL Database to Synapse Link as per document:
When setting up your workspace, users must select "Disable Managed Virtual Network" and "Allow connections from any IP addresses."
A link connection cannot be enabled by Azure Synapse link for SQL if the database owner does not have a mapped log in. it will cause to get error.The (ALTER AUTHORIZATION command can be used to workaround this problem by changing the database owner to an user.)
With fewer than 100 DTUs, the Free, Basic, or Standard tiers do not allow Azure Synapse Link for SQL.
With is limitation I tried to Connect SQL Database to Synapse Link and able to connect without error:
I was trying to create a Synapse Link service with On Premises SQL Server and getting following error
Failed to enable Synapse Link on the source due to 'Failed to enable the source database: Some internal error happened due to 'Calling internal service failed: Failed to execute non query on change publisher with status code 400 and error Fail to non-query change publisher with error: 'sqlErrorCode - 22301; exceptionCode - TransferServiceUnknowError; error - A database operation failed with the following error: 'Could not update the metadata. The failure occurred when executing the command '(null)'. The error/state returned was 15517/1: 'Cannot execute as the database principal because the principal "dbo" does not exist, this type of principal cannot be impersonated, or you do not have permission.'. Use the action and error to determine the cause of the failure and resubmit the request.'; detailedError - A database operation failed with the following error: 'Could not update the metadata. The failure occurred when executing the command '(null)'. The error/state returned was 15517/1: 'Cannot execute as the database principal because the principal "dbo" does not exist, this type of principal cannot be impersonated, or you do not have permission.'. Use the action and error to determine the cause of the failure and resubmit the request.'
I resolved by by changing the corresponding database user to 'sa' and it works.
use [YourCorrespondingDatabase] EXEC sp_changedbowner 'sa'
I have a spark notebook which I am running with the help of pipeline. The notebook is running fine manually but in the pipeline it is giving error for file location. In the code I am loading the file in a data frame. The file location in the code is abfss://storage_name/folder_name/* and in pipeline it is taking abfss://storage_name/filename.parquet\n
This is the error
{
"errorCode": "6002",
"message": "org.apache.spark.sql.AnalysisException: Path does not exist: abfss://storage_name/filename.parquet\n at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4(DataSource.scala:806)\n\n at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4$adapted(DataSource.scala:803)\n\n at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)\n\n at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)\n\n at scala.util.Success.$anonfun$map$1(Try.scala:255)\n\n at scala.util.Success.map(Try.scala:213)\n\n at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)\n\n at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)\n\n at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)\n\n at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)\n\n at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)\n\n at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)\n\n at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)\n\n at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)\n\n at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)\n",
"failureType": "UserError",
"target": "notebook_name",
"details": []
}
The above error mainly happens because of permission issue, the synapse workspace required lack of permissions to access storage account, so you need to grant storage blob contributor role.
To add storage account contributor role to your workspace, refer this Microsoft documentation
And also, make sure to check whether you are following ADLS gen2 proper syntax or not.
abfss://<container_name>#<storage_account_name>.dfs.core.windows.net/<path>
Sample code
df = spark.read.load('abfss://<container_name>#<storage_account_name>.dfs.core.windows.net/samplefile.parquet>', format='parquet')
For more detail information refer this link.
Added my synapse workspace under the required access. Hence, worked.
New to azure synapse, trying to create database (Managed table) from synapse notebook. I also added Storage blob data contributor for synapse workspace and specific user. I have attached the error details.
%%SQL
CREATE DATABASE sample
Error: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: java.nio.file.AccessDeniedException Operation failed: "This request is not authorized to perform this operation.", 403, HEAD, https://XXXXXXXXXX.dfs.core.windows.net/XXXXXXXXXXXXX/?upn=false&action=getAccessControl&timeout=90)
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:193)
org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:137)
org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:124)
org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:153)
org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:151)
org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$2(HiveSessionStateBuilder.scala:60)
org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager$lzycompute(SessionCatalog.scala:99)
org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager(SessionCatalog.scala:99)
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:218)
org.apache.spark.sql.execution.command.CreateDatabaseCommand.run(ddl.scala:82)
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
The error indicates, that your account doesn't have enough permissions to the workspace. Can you please make sure to check whether the blob storage account role is assigned to Storage Blob Data Contributor or not.
you can also go through here for permissions.
I am trying to access data files stored in ADLS location via Azure Databricks using storage account access keys.
To access data files, I am using python notebook in azure databricks and below command works fine,
spark.conf.set(
"fs.azure.account.key.<storage-account-name>.dfs.core.windows.net",
"<access-key>"
)
However, when I try to list the directory using below command, it throws an error
dbutils.fs.ls("abfss://<container-name>#<storage-account-name>.dfs.core.windows.net")
ERROR:
ExecutionError: An error occurred while calling z:com.databricks.backend.daemon.dbutils.FSUtils.ls.
: Operation failed: "This request is not authorized to perform this operation using this permission.", 403, GET, https://<storage-account-name>.dfs.core.windows.net/<container-name>?upn=false&resource=filesystem&maxResults=500&timeout=90&recursive=false, AuthorizationPermissionMismatch, "This request is not authorized to perform this operation using this permission. RequestId:<request-id> Time:2021-08-03T08:53:28.0215432Z"
I am not sure on what permission would it require and how can I proceed with it.
Also, I am using ADLS Gen2 and Azure Databricks(Trial - premium).
Thanks in advance!
The complete config key is called "spark.hadoop.fs.azure.account.key.adlsiqdigital.dfs.core.windows.net"
However it would be beneficial for a production environment to use a service account and a mount point. This way the actions on the storage can be traced back to this application more easily than just with the generic access key and the mount point avoid specifying the connection string everywhere in your code.
Try this out.
spark.conf.set("fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net","<your-storage-account-access-key>")
dbutils.fs.mount(source = "abfss://<container-name>#<your-storage-account-name>.dfs.core.windows.net/", mount_point = "/mnt/test")
You can mount ADLS storage account using access key via Databricks and then read/write data. Please try below code:
dbutils.fs.mount(
source = "wasbs://<container-name>#<storage-account-name>.blob.core.windows.net",
mount_point = "/mnt/<mount-name>",
extra_configs = {"fs.azure.account.key.<storage-account-name>.blob.core.windows.net":dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")})
dbutils.fs.ls("/mnt/<mount-name>")
I have an Azure Data Lake Gen2 with public endpoint and a standard Azure ML instance.
I have created both components with my user and I am listed as Contributor.
I want to use data from this data lake in Azure ML.
I have added the data lake as a Datastore using Service Principal authentication.
I then try to create a Tabular Dataset using the Azure ML GUI I get the following error:
Access denied
You do not have permission to the specified path or file.
{
"message": "ScriptExecutionException was caused by StreamAccessException.\n StreamAccessException was caused by AuthenticationException.\n 'AdlsGen2-ListFiles (req=1, existingItems=0)' for '[REDACTED]' on storage failed with status code 'Forbidden' (This request is not authorized to perform this operation using this permission.), client request ID '1f9e329b-2c2c-49d6-a627-91828def284e', request ID '5ad0e715-a01f-0040-24cb-b887da000000'. Error message: [REDACTED]\n"
}
I have tried having our Azure Portal Admin, with Admin access to both Azure ML and Data Lake try the same and she gets the same error.
I tried creating the Dataset using Python sdk and get a similar error:
ExecutionError:
Error Code: ScriptExecution.StreamAccess.Authentication
Failed Step: 667ddfcb-c7b1-47cf-b24a-6e090dab8947
Error Message: ScriptExecutionException was caused by StreamAccessException.
StreamAccessException was caused by AuthenticationException.
'AdlsGen2-ListFiles (req=1, existingItems=0)' for 'https://mydatalake.dfs.core.windows.net/mycontainer?directory=mydirectory/csv&recursive=true&resource=filesystem' on storage failed with status code 'Forbidden' (This request is not authorized to perform this operation using this permission.), client request ID 'a231f3e9-b32b-4173-b631-b9ed043fdfff', request ID 'c6a6f5fe-e01f-0008-3c86-b9b547000000'. Error message: {"error":{"code":"AuthorizationPermissionMismatch","message":"This request is not authorized to perform this operation using this permission.\nRequestId:c6a6f5fe-e01f-0008-3c86-b9b547000000\nTime:2020-11-13T06:34:01.4743177Z"}}
| session_id=75ed3c11-36de-48bf-8f7b-a0cd7dac4d58
I have created Datastore and Datasets of both a normal blob storage and a managed sql database with no issues and I have only contributor access to those so I cannot understand why I should not be Authorized to add data lake. The fact that our admin gets the same error leads me to believe there are some other issue.
I hope you can help me identify what it is or give me some clue of what more to test.
Edit:
I see I might have duplicated this post: How to connect AMLS to ADLS Gen 2?
I will test that solution and close this post if it works
This was actually a duplicate of How to connect AMLS to ADLS Gen 2?.
The solution is to give the service principal that Azure ML uses to access the data lake the Storage Blob Data Reader access. And note you have to wait at least some minutes for this to have effect.