Azure Data Factory connecting to Blob Storage via Access Key - azure

I'm trying to build a very basic data flow in Azure Data Factory pulling a JSON file from blob storage, performing a transformation on some columns, and storing in a SQL database. I originally authenticated to the storage account using Managed Identity, but I get the error below when attempting to test the connection to the source:
com.microsoft.dataflow.broker.MissingRequiredPropertyException:
account is a required property for [myStorageAccountName].
com.microsoft.dataflow.broker.PropertyNotFoundException: Could not
extract value from [myStorageAccountName] - RunId: xxx
I also see the following message in the Factory Validation Output:
[MyDataSetName] AzureBlobStorage does not support SAS,
MSI, or Service principal authentication in data flow.
With this I assumed that all I would need to do is switch my Blob Storage Linked Service to an Account Key authentication method. After I switched to Account Key authentication though and select my subscription and storage account, when testing the connection I get the following error:
Connection failed Fail to connect to
https://[myBlob].blob.core.windows.net/: Error Message: The
remote server returned an error: (403) Forbidden. (ErrorCode: 403,
Detail: This request is not authorized to perform this operation.,
RequestId: xxxx), make sure the
credential provided is valid. The remote server returned an error:
(403) Forbidden.StorageExtendedMessage=, The remote server returned an
error: (403) Forbidden. Activity ID:
xxx.
I've tried selecting from Azure directly and also entering the key manually and get the same error either way. One thing to note is the storage account only allows access to specified networks. I tried connecting to a different, public storage account and am able to access fine. The ADF account has the Storage Account Contributor role and I've added the IP address of where I am working currently as well as the IP range of Azure Data Factory that I found here: https://learn.microsoft.com/en-us/azure/data-factory/azure-integration-runtime-ip-addresses
Also note, I have about 5 copy data tasks working perfectly fine with Managed Identity currently, but I need to start doing more complex operations.
This seems like a similar issue as Unable to create a linked service in Azure Data Factory but the Storage Account Contributor and Owner roles I have assigned should supersede the Reader role as suggested in the reply. I'm also not sure if the poster is using a public storage account or private.
Thank you in advance.

At the very bottom of the article listed above about white listing IP ranges of the integration runtime, Microsoft says the following:
When connecting to Azure Storage account, IP network rules have no
effect on requests originating from the Azure integration runtime in
the same region as the storage account. For more details, please refer
this article.
I spoke to Microsoft support about this and the issue is that white listing public IP addresses does not work for resources within the same region because since the resources are on the same network, they connect to each other using private IP's rather than public.
There are four options to resolve the original issue:
Allow access from all networks under Firewalls and Virtual Networks in the storage account (obviously this is a concern if you are storing sensitive data). I tested this and it works.
Create a new Azure hosted integration runtime that runs in a different region. I tested this as well. My ADF data flow is running in East region and I created a runtime that runs in East 2 and it worked immediately. The issue for me here is I would have to have this reviewed by security before pushing to prod because we'd be sending data across the public network, even though it's encrypted, etc, it's still not as secure as having two resources talking to each other in the same network.
Use a separate activity such as an HDInsight activity like Spark or an SSIS package. I'm sure this would work, but the issue with SSIS is cost as we would have to spin up an SSIS DB and then pay for the compute. You also need to execute multiple activities in the pipeline to start and stop the SSIS pipeline before and after execution. Also I don't feel like learning Spark just for this.
Finally, the solution that works that I used is I created a new connection that replaced the Blob Storage with a Data Lakes Gen 2 connection for the data set. It worked like a charm. Unlike Blob Storage connection, Managed Identity is supported for Azure Data Lakes Storage Gen 2 as per this article. In general, the more specific the connection type, the more likely the features will work for the specific need.

This is what you faced now:
From the description we know that is a connection error of storage. I also set the contributer role to the data factory, but still get the problem.
The problem comes from the network and firewall of your storage account. Please have a check of it.
Make sure you have add the client id and the 'Trusted Microsoft services' exception.
Have a look of this doc:
https://learn.microsoft.com/en-us/azure/storage/common/storage-network-security#trusted-microsoft-services
Then, go to your adf, choose these:
After that, it should be ok.

Related

How to change where azure storage emulator stores its files

I am attempting to debug an application that is comprised of several microservices. Part of the cross service messaging is carried out by storing information in azure blob storage by one service to be read by the other. For local testing we use Azure storage emulator.
Recently my AD logon had to be recreated by our IT team. My username has gone from , to <myname.COMPANYNAME> and since then Azure storage emulator has failed me.
Attempting to view all local blob storage results in an error "Unable to retrieve child resources." though I can can confirm that each container still exists manually. Hunting online suggests the problem is due to the period in my AD logon name (changing this is non trivial due to it needing to be done by another department)
Unable to retrieve child resources.
Details:
{
"name": "RestError",
"message": "The specifed resource name contains invalid
characters.\nRequestId:b305591f-acf0-4e2e-8cc6-e3305fa18fab\nTime:2021-09-
My current thinking is to try and configure the emulator to not store its files in my user account but I have yet to find anywhere that this can be carried out - the config file mentioned in this question doesn't appear to have what I need.
For this a successful answer would be guidance on how to relocate the storage explorer without IT having to create a new logon, or a workaround that will allow storage explorer and the services to retrieve my various blob stores.
Please check this thread > Azure Storage Emulator store data on specific path - Stack Overflow if it can help related to azure storage emulator.
NOTE:
The Azure Storage Emulator is now deprecated. Microsoft recommends
that you use the Azurite emulator for local development with
Azure Storage Refer
Most cases change in Logon names doesn’t have affect on blob, but maybe in few cases due to name connected permissions or SID.
After changing the username, check if any permissions or roles assigned previously are given to that new one and make sure if DN and SID are not modified, to access resources or check all the configurations that have done previously that depend only on DN. The Storage Emulator supports only a single fixed account and a well-known authentication key.
1.Try to restart the emulator and check whenever tried with new port or any newconfiguration.
See this Thread
The invalid characters in the error in most of the cases happens with container name (all lowercase, no special characters) .
Try to check once and refer below threads for resolution possibilities,if it is container issue.
SO ref1
SO Ref 2
Storage Explorer has several options for how and where it can source the information needed to connect to your proxy. To change which option is being used, go to Settings (gear icon on the left vertical toolbar) > Application > Proxy. Network Connections in Azure Storage Explorer | Microsoft Docs

Azure Analysis services connection to storage account not working

We have azure analysis services setup that pulls data from ADLS Gen2 (JSON files). When we try to process or model the tables from within SSMS it throws the following error - –
Failed to save modifications to the server. Error returned: 'The
credentials provided for the AzureBlobs source are invalid.
When I open up the storage to all networks then no issues. However I am worried about the security aspect opening up storage account like that.
My Quesiton is : Any pointers to why SSMS would throw such an error?
Tried to create SP as admin on AAS server and added the same SP to storage blob as contributor but no luck.
Add contributor will never help for solve this problem.
As you can see in practice, this has nothing to do with RBAC roles.
The key to the problem lies in the network, you need to set the storage firewall.
You have two ways:
1, Add the outbound ip of the service you are using to the allowed list of storage .
2, Or integrate your service with Azure VNET, and then add this virtual network to the allow list of the storage firewall.
This is all of the ip address of azure service we can get:(You need to add the ip address of the corresponding service to the allowed list about firewall in storage.)
https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20200824.json

Azure Data Factory Event Trigger - Storage Account Key in Json?

we have a storage account that is locked down. My pipeline has connections that reference a key vault to get the access token for the storage account.
When I create an event trigger in ADF, ADF lets me find and connect to the storage account (without asking for a key or prompting me to select the linked service connection). It tells me what files it will include based on my begins with and ends with values (it found 2 files). It saves successfully.
When I publish it, I get this error in between publish to adf-publish and generating the arm templates.
The attempt to configure storage notifications for the provided storage account ****failed. Please ensure that your storage account meets the requirements described at https://aka.ms/storageevents. The error is Failed to retrieve credentials for request=RequestUri=https://management.azure.com/subscriptions/********/resourceGroups/<resource group name>/providers/Microsoft.Storage/storageAccounts/<storage account name here to gen 2 data lake>/listAccountSas, Method=POST, response=StatusCode=400, StatusDescription=Bad Request, IsSuccessStatusCode=False, Content=System.Net.HttpWebResponse, responseContent={"error":{"code":"InvalidValuesForRequestParameters","message":"Values for request parameters are invalid: keyToSign."}}
I believe this is due to the fact that ADF trigger creation process (and therefore its JSON) does not allow you to point to a Key Vault to get the access token for the storage account you are connecting to. Is this the issue? Is there a fix for this?
Appreciate any help, thanks - April
I think the storage account is attached to a VNET and running behind the firewall. I faced similar issue because of this. You may remove the firewall once and configure the trigger and then bring the firewall back.
It's not strictly necessary to disable the firewall. You can also use this feature on your storage account.

Make sure the ACL and firewall rule is correctly configured in the Azure Data Lake Store account

I'm coping CSV files from Azure blob to Azure Data Lake using Azure data factory using Copy data tool.
I'm following this link: https://learn.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-copy-data-tool
Fron Copy data tool my source configuration and test connection successed. However, the destination connection (that is Data lake) is creating problem.
I'm getting error : Make sure the ACL and firewall rule is correctly configured in the Azure Data Lake Store account.
I followed this link for Fairwall setting: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-secure-data (Set IP address range for data access)
Enabled fairwall and Allow access to Azure service "ON"
Still, I'm getting same error. Could any one please suggest. How to fix this?
Get your Managed Identity Application ID from Azure Data Factory properties.
Go to Azure Data Lake Storage and navigate to Data Explorer -> Access -> Add and then provide the ID in the 'Select User or group' field.
It will identify your Azure Data Factory instance/resource and then provide ACLs(R/W/X) as per your requirement.
Except the firewall setting, please also be sure that your account has necessary permission on the target ADLS account. Please refer to this doc for more details: https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-data-lake-store#linked-service-properties
Your account and application ADF need to have permission to work on ADLS. Also, see did you gave permission to children folders as well.

U-SQL job does not access Azure SQL Database

I was trying to retrieve data from Azure SQL Database using Azure Data Lake analytics by following this guide. I run U-SQL job on Azure Data Lake analytics and got following error:
Failed to connect to data source: 'SampleSource', with error(s):
'Cannot open server '' requested by the login. Client with
IP address '25.66.9.211' is not allowed to access the server. To
enable access, use the Windows Azure Management Portal or run
sp_set_firewall_rule on the master database to create a firewall rule
for this IP address or address range. It may take up to five minutes
for this change to take effect.'
After running my job couple of times, I observed IP range that needs to be added in server is pretty wide. It seems we need to add 25.66.xxx.xxx. I have two questions:
How can we narrow this range?
Why the typical setting that allows all azure services access doesn't work?
At this point you will have to add the full range (which are all internal IPs). The reason the typical setting is not working, is that the machines needing access are not managed in the same way. We are working on having them added. What data center is the database in?

Resources