Trying to set a link services through registerd app to azure data lake storage and keep getting 24200 error - azure

I am new to azure. We have azure data lake storage set. I am trying to set the link services from the data factory to the azure data lake storage gen2. It keeps failing when I test the link service to the data lake storage. As far as I can see, I have granted the "Storage blob contributor" role to the user in the azure data lake storage. I still keep getting permission denied error when I test the link services
ADLS Gen2 operation failed for: Storage operation '' on container 'testconnection' get failed with 'Operation returned an invalid status code 'Forbidden''. Possible root causes: (1). It's possible because the service principal or managed identity don't have enough permission to access the data. (2). It's possible because some IP address ranges of Azure Data Factory are not allowed by your Azure Storage firewall settings. Azure Data Factory IP ranges please refer https://learn.microsoft.com/en-us/azure/data-factory/azure-integration-runtime-ip-addresses.. Account: 'dlsisrdatapoc001'. ErrorCode: 'AuthorizationFailure'. Message: 'This request is not authorized to perform this operation.'.
What I could observe is that when I open the network to all (public) in the data lake storage, it works, when I set the firewall with CIDR it fails. Couldn't narrow the cause of the problem. I do have the "Allow azure services on the trusted services list to access this account" checked.
Completely lost

As mentioned in the error description, the error usually occurs if you don't have sufficient permissions to perform the action or if you don't add the required IPs in the firewall settings of your storage account.
To resolve the error, please check if you added the Storage Blob Data Contributor role to your managed identity along with the user like below:
Go to Azure Portal -> Storage Accounts -> Your Storage Account -> Access Control (IAM) ->Add role assignment
Make sure to select the managed identity, based on the authentication method you selected while creating linked service.
As mentioned in this MsDoc, make sure to add all the required IPs based on your resource location and service tag.
Download the JSON file to know the IP range for service tag in your resource location and add them in the firewall settings like below:
Make sure to select the Resource type as
Microsoft.DataFactory/factories while choosing CIDR.
For more in detail, please refer below links:
Error when I am trying to connect between Azure Data factory and Azure Data lake Gen2 by Anushree Garg
Storage Accoung V2 access with firewall, VNET to data factory V2 by Cindy Pau

Related

Azure blob storage - SAS - Data Factory

I was able to blob test connection and it's successful, but when I attempt to look for the storage path it shows this error. screenshot
Full error:
Failed to load
Blob operation failed for: Blob Storage on container '' and path '/' get failed with 'The remote server returned an error: (403) Forbidden.'. Possible root causes: (1). Grant service principal or managed identity appropriate permissions to do copy. For source, at least the “Storage Blob Data Reader” role. For sink, at least the “Storage Blob Data Contributor” role. For more information, see https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage?tabs=data-factory#service-principal-authentication. (2). It's possible because some IP address ranges of Azure Data Factory are not allowed by your Azure Storage firewall settings. Azure Data Factory IP ranges please refer https://docs.microsoft.com/en-us/azure/data-factory/azure-integration-runtime-ip-addresses. If you allow trusted Microsoft services to access this storage account option in firewall, you must use https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage?tabs=data-factory#managed-identity. For more information on Azure Storage firewalls settings, see https://docs.microsoft.com/en-us/azure/storage/common/storage-network-security?tabs=azure-portal.. The remote server returned an error: (403) Forbidden.StorageExtendedMessage=Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
Context: I'm trying to copy data from SQL db to Snowflake and I am using Azure Data Factory for that. Since this doesn't publish, I enable the staged copy and connect blob storage.
I already tried to check network and it's set for all network. I'm not sure what I'm missing here because I found a youtube video that has it working but they didn't show an issue related/similar to this one. https://www.youtube.com/watch?v=5rLbBpu1f6E.
I also tried to retain empty storage path but trigger for copy data pipeline isn't successfully to.
Full error from trigger:
Operation on target Copy Contacts failed: Failure happened on 'Sink' side. ErrorCode=FileForbidden,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error occurred when trying to upload a blob, detailed message: dbo.vw_Contacts.txt,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.WindowsAzure.Storage.StorageException,Message=The remote server returned an error: (403) Forbidden.,Source=Microsoft.WindowsAzure.Storage,StorageExtendedMessage=Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
I created Blob storage and generated SAS token for that. I created a blob storage linked service using SAS URI It created successfully.
Image for reference:
When I try to retrieve the path I got below error
I changed the networking settings of storage account by enabling enabled from all networks of storage account
Image for reference:
I try to retrieve the path again in data factory. It worked successfully. I was able to retrieve the path.
Image for reference:
Another way is by whitelisting the IP addresses we can resolve this issue.
From the error message:
'The remote server returned an error: (403) Forbidden.'
It's likely the authentication method you're using doesn't have enough permissions on the blob storage to list the paths. I would recommend using the Managed Identity of the Data Factory to do this data transfer.
Take the name of the Data Factory
Assign the Blob Data Contributor role in the context of the container or the blob storage to the ADF Managed Identity (step 1).
On your blob linked service inside of Data Factory, choose the managed identity authentication method.
Also, if you stage your data transfer on the blob storage, you have to make sure the user can write to the blob storage, and also bulk permissions on SQL Server.

even trigger set up on blob storage fails in azure factory

I am setting up a event trigger on a blob storage v2in data factory pipeline, when i publish the pipeline I keep getting this error below, i have only set up storage recently but i cant see any thing out of place, do I need to set up even subscription in blob storage and create event from the storage itself as there are option to set up automation in there
The attempt to configure storage notifications for the provided storage account hmtest1 failed. Please ensure that your storage account meets the requirements described at https://aka.ms/storageevents. The error is Failed to retrieve credentials for request=RequestUri=https://management.azure.com/subscriptions
{"code":"InvalidAuthenticationToken","message":"The received access token is not valid: at least one of the claims 'puid' or 'altsecid' or 'oid' should be present. If you are accessing as application please make sure service principal is properly created in the tenant."}}
{"code":"InvalidAuthenticationToken","message":"The received access token is not valid: at least one of the claims 'puid' or 'altsecid' or 'oid' should be present. If you are accessing as application please make sure service principal is properly created in the tenant."}}
AFAIK, In ADF, this error occurs when the Data factory is not registered in the Resource providers.
To resolve this, we need to register Data factory in the Resource Providers.
Go to Subscriptions->your account->Resource providers and check whether Data factory is Registered or not.
If it is showing as NotRegistered then select it and click on Register.
After successfully registered, create a new data factory workspace and check the Storage event trigger.
If it still gives the same error, register the EventGrid as well and re-check.

Azure Synapse severless SQL pool - query execution fails

After completing tutorial 1, I am working on this tutorial 2 from Microsoft Azure team to run the following query (shown in step 3). But the query execution gives the error shown below:
Question: What may be the cause of the error, and how can we resolve it?
Query:
SELECT
TOP 100 *
FROM
OPENROWSET(
BULK 'https://contosolake.dfs.core.windows.net/users/NYCTripSmall.parquet',
FORMAT='PARQUET'
) AS [result]
Error:
Warning: No datasets were found that match the expression 'https://contosolake.dfs.core.windows.net/users/NYCTripSmall.parquet'. Schema cannot be determined since no files were found matching the name pattern(s) 'https://contosolake.dfs.core.windows.net/users/NYCTripSmall.parquet'. Please use WITH clause in the OPENROWSET function to define the schema.
NOTE: The path of the file in the container is correct, and actually I generated the following query just by right clicking the file inside container and generated the script as shown below:
Remarks:
Azure Data Lake Storage Gen2 account name: contosolake
Container name: users
Firewall settings used on the Azure Data lake account:
Azure Data Lake Storage Gen2 account is allowing public access (ref):
Container has required access level (ref)
UPDATE:
The owner of the subscription is someone else, and I did not get the option Check the "Assign myself the Storage Blob Data Contributor role on the Data Lake Storage Gen2 account" box described in item 3 of Basics tab > Workspace details section of tutorial 1. I also do not have permissions to add roles - although I'm the owner of synapse workspace. So I am using workaround described in the Configure anonymous public read access for containers and blobs from Azure team.
--Workaround
If you are unable to granting Storage Blob Data Contributor, use ACL to grant permissions.
All users that need access to some data in this container also needs
to have the EXECUTE permission on all parent folders up to the root
(the container). Learn more about how to set ACLs in Azure Data Lake
Storage Gen2.
Note:
Execute permission on the container level needs to be set within the
Azure Data Lake Gen2. Permissions on the folder can be set within
Azure Synapse.
Go to the container holding NYCTripSmall.parquet.
--Update
As per your update in comments, it seems you would have to do as below.
Contact the Owner of the storage account, and ask them to perform the following tasks:
Assign the workspace MSI to the Storage Blob Data Contributor role on
the storage account
Assign you to the Storage Blob Data Contributor role on the storage
account
--
I was able to get the query results following the tutorial doc you have mentioned for the same dataset.
Since you confirm that the file is present and in the right path, refresh linked ADLS source and publish query before running, just in case if a transient issue.
Two things I suspect are
Try setting Microsoft network routing in Network Routing settings in ADLS account.
Check if built-in pool is online and you have atleast contributer roles on both Synapse workspace and Storage account. (If the current credentials using to run the query has not created the resources)

Azure Data Factory to Azure Blob Storage Permissions

I'm connecting ADF to blob storage v2 using a managed identity following this doc: Doc1
When it comes to test the connection with my first dataset, I am successful when I test the connection to the linkedservice. When I try by the filepath, and enter "testfolder" (which exists in the blob) it fails returning a generic forbidden error displayed at the end of this post.
However, when I opt to "browse" the folders in the dataset portal, the folder "testfolder" does show up. But when I select it, it will not show me anything within that folder.
The Data Factory managed instance is given the role of Contributor, granting full access to manage all resources. Is there some other hidden issue or possible way to narrow down the issue? My instinct is that this is something within the blob container since I can view the containers, but not their contents.
Error message:
It seems that you don't give the role of azure blob storage.
Please fellow this:
1.click IAM in azure blob storage,navigate to Role assignments and add role assignment.
2.choose role according your need and select your data factory.
3.A few minute later,you can retry to choose file path.
Hope this can help you.

How to read a blob in Azure databricks with SAS

I'm new to Databricks. I write sample code to read Storage Blob in Azure Databricks.
blob_account_name = "sars"
blob_container_name = "mpi"
blob_sas_token =r"**"
ini_path = "58154388-b043-4080-a0ef-aa5fdefe22c8"
inputini = 'wasbs://%s#%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, ini_path)
spark.conf.set("fs.azure.sas.%s.%s.blob.core.windows.net"% (blob_container_name, blob_account_name), blob_sas_token)
print(inputini)
ini=sc.textFile(inputini).collect()
It throw error:
Container mpi in account sars.blob.core.windows.net not found
I guess it doesn't attach the SAS token in WASBS link, so that it doesn't permission to read the data.
How to attach the SAS in wasbs link.
This is excepted behaviour, you cannot access the read private storage from Databricks. In order to access private data from storage where firewall is enabled or when created in a vnet, you will have to Deploy Azure Databricks in your Azure Virtual Network then whitelist the Vnet address range in the firewall of the storage account. You could refer to configure Azure Storage firewalls and virtual networks.
WITH PRIVATE ACCESS:
When you have provided access level to "Private (no anonymous access)".
Output: Error message
shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container carona in account cheprasas.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.
WITH CONTAINER ACCESS:
When you have provided access level to "Container (Anonymous read access for containers and blobs)".
Output: You will able to see the output without any issue.
Reference: Quickstart: Run a Spark job on Azure Databricks using the Azure portal.

Resources