Azure Data Factory with Managed private endpoints is failed to connect - azure

I am configuring the Azure Data Factory that reads the data from the storage account and updates the Azure SQL Server.
I have created the Managed private endpoints(manually) for both Storage account & Azure SQL server
Managed private endpoints:
Enabled Interactive Authoring:
and disabled the Public network access on both the Storage account and Azure SQL server.
But, it is failing to connect with the Storage account and Azure SQL server
Azure SQL Server Connection:
Storage Account Connection:
Failed to Connect - Storage Account#: 9972
Failed to Connect - SQL Server#: 22339
Update#1 As suggested in comment, I have associated the linked services with IR
It seems to be Connecting
But, Pipeline works only when I allow
Otherwise, it fails with
The service Principal has permission on the Storage account
Permissions:

I was able to fix this with the following Terraform Code
// Create Private Endpoint for Data Factory Portal
module "pedatafactoryportal" {
source = "./modules/privateendpoint/"
resource_group_name = azurerm_resource_group.resource_group.name
location = azurerm_resource_group.resource_group.location
name = var.privateendpointdatafactory_portal_name
subnet_id = azurerm_subnet.endpoint_subnet.id
private_link_enabled_resource_id = azurerm_data_factory.datafactory.id
private_dns_zone_name = azurerm_private_dns_zone.datafactoryportalzone.name
subresource_names = ["portal"]
resource_name = "portal"
depends_on = [
azurerm_data_factory.datafactory,
azurerm_private_dns_zone.datafactoryportalzone
]
}

Related

403 Forbidden when accessing Storage Account through firewall from Azure Synapse's dedicated SQL pool

Getting a 403 Forbidden when trying to access a firewall'd Storage Account from a dedicated SQL pool in Azure Synapse.
It works when I disable the Storage Account firewall.
Relevant configuration:
Vnet: 10.0.0.0/16 with a Snet of 10.0.2.0/24
Storage account
Hierarchical Namespace: enabled
Resource instances added: Microsoft.Synapse/workspaces
"Allow" Azure services on the trusted services list to access this storage account: enabled
Public IP address from (AWS-hosted) added to the firewall allowlist (the one initiating the COPY INTO command)
Virtual network: Linked to above Vnet
Storage Blob Data Contributor role added for the Synapse Workspace app
No specific ACL on the container/file system
Synapse Workspace
Managed Virtual Network: enabled
Managed Private Endpoint: added for Blob and Data Lake access to the storage account, approved
Linked Service connection test to Blob and DFS: successful
Dedicated SQL pool
Master key created
Database scoped credential added
External data source added with CREATE EXTERNAL DATA SOURCE [DataSource] WITH (TYPE = HADOOP, LOCATION = 'abfss://${var.datalake_container_name}#${var.datalake_hostname}', CREDENTIAL = [ScopedCredential]);
Error in the StorageBlobLogs:
OperationName=GetBlob
StatusCode=403
StatusText=AuthorizationFailure
CallerIpAddress=10.0.0.11:34573
AuthenticationType=AccountKey
Error in the client app:
'copy into "myschema"."mytable" from 'https://mystorageaccount.blob.core.windows.net/mycontainer/abcde/' with (credential = (identity = 'Storage Account Key', secret = 'xxx'), file_type = 'csv', fieldterminator = ',', rowterminator = '0x0a', firstrow = 2, encoding = 'utf8');
Not able to validate external location because The remote server returned an error: (403) Forbidden.
Any pointers would be appreciated.
The problem was that the COPY INTO command does not support Storage Account Access key.
This works:
copy into "myschema"."mytable"
from 'https://mystorageaccount.blob.core.windows.net/mycontainer'
with (credential = (identity = 'Managed Identity'), file_type = 'csv', fieldterminator = ',', rowterminator = '0x0a', firstrow = 2, encoding = 'utf8');
This is supported in this Microsoft docs page:
When accessing storage that is protected with the firewall, you can use User Identity or Managed Identity.
However this docs page mentioned only Serverless SQL pools, not (also) dedicated SQL pools.

Create firewalls for Azure Synapse Workspace when Public Network Access is Disabled?

I am trying to provision Synapse Workspace with terraform(with public network access disabled). The synapse workspaces are created and status is succeeded too. Most of the resources are created/provisioned successfully.
However, I also would like to add/create firewall rules on the Synapse workspace. I noticed on Portal that when the Public Network Access is Disabled, you can't create or modify firewalls. Also that is the error I get when I tried to add firewalls on the synapse workspace with terraform.
Failure sending request: StatusCode=400 -- Original Error: Code='PublicNetworkAccessDenied' Message='Unable to create or modify firewall rules when public network interface for the Synapse Workspace is disabled. To manage firewall rules, please enable the public network access.'
So if need to add firewalls to the workspace, do I need to enable the public network access?
If so I will update my terraform code to set Public Network Access to True.
Actually when I tried to enable Public Network Access via Portal and save it, it said Deployment failed, so it failed and state went from Succeeded to Failed and it said check Deployment for more information. But I couldn't find information about that given deployment anywhere. With Failed status, I can't create firewalls, as Terraform complains workspace is in failed status. So I recreated the workspace again via Terraform with public network access disabled, now the workspaces are in succeeded state.
Would it work if via Terraform(as via Portal it failed) we update Public Network Access to True? Would the Workspace state be Succeeded after updating to True?
What if it fails to update the public network access to True? Synapse workspace would go to Failed state and then I can't do anything with the provisioning with Terraform. As Terraform would complain that the workspace isn't in Succeeded state or it is in failed state.
In the case the workspace goes to failed status, how can I correct it to make it Succeeded?
Please suggest the best solution forward.
In short, I want to add/create firewall rules to my Synapse workspace via Terraform. I am unable to, as the public network access is disabled for the given workspace.
Thank you for your help.
Update: I can remove the code that adds or modifies firewall rules, which needs workspace to be public access enabled. Because I want my Synapse workspace to have public network access as disabled always. So adding firewall rules doesn't make sense.
But I also get the below error when I try to provision Synapse with public network access disabled. I have few role assignments for Synapse and they failed with the below error. My understanding is that the agent that runs this pipeline isn't having access to this Synapse workspace, because it is public and the created synapse is private? My pipelines are running on a self hosted agents.
│ Error: listing synapse role definitions accesscontrol.RoleDefinitionsClient#ListRoleDefinitions: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="PublicNetworkAccessDenied" Message="The public network interface on this Workspace is not accessible. To connect to this Workspace, use the Private Endpoint from inside your virtual network or enable public network access for this workspace."
│
│ with module.synapse.azurerm_synapse_role_assignment.this["synapse-administrator"],
│ on .terraform/modules/synapse/rbac.tf, in resource "azurerm_synapse_role_assignment" "this":
│ resource "azurerm_synapse_role_assignment" "this" {
│
I tried to reproduce the same in my environment .
Code:
resource "azurerm_synapse_workspace" "example" {
name = "kaexamplesynapse"
resource_group_name = data.azurerm_resource_group.example.name
location = data.azurerm_resource_group.example.location
storage_data_lake_gen2_filesystem_id = azurerm_storage_data_lake_gen2_filesystem.example.id
sql_administrator_login = "sqladminuser"
sql_administrator_login_password = "H#Sh1CoR3!"
public_network_access_enabled = false
identity {
type = "SystemAssigned"
}
}
With below code:
resource "azurerm_storage_account" "example" {
name = "adlsexamplestorageacc"
resource_group_name = data.azurerm_resource_group.example.name
location = data.azurerm_resource_group.example.location
account_tier = "Standard"
account_replication_type = "LRS"
account_kind = "StorageV2"
is_hns_enabled = "true"
}
resource "azurerm_storage_data_lake_gen2_filesystem" "example" {
name = "default"
storage_account_id = azurerm_storage_account.example.id
}
resource "azurerm_synapse_workspace" "example" {
name = "kaexamplesynapse"
resource_group_name = data.azurerm_resource_group.example.name
location = data.azurerm_resource_group.example.location
storage_data_lake_gen2_filesystem_id = azurerm_storage_data_lake_gen2_filesystem.example.id
sql_administrator_login = "xxx"
sql_administrator_login_password = "xxx!"
identity {
type = "SystemAssigned"
}
}
resource "azurerm_synapse_firewall_rule" "example" {
name = "AllowAll"
synapse_workspace_id = azurerm_synapse_workspace.example.id
start_ip_address = "0.0.0.0"
end_ip_address = "255.255.255.255"
}
I could create the firewall rule to azure synapse workspace successfully .
Note:
Selecting the Disable for public network won’t allow any firewall rules to configure.
From https://learn.microsoft.com/en-us/answers/questions/664868/azure-synapse-disable-public-network-access.html If Synapse public network access is disabled , you may have to use "self-hosted" agent as Microsoft hosted agents will fail as you may already have
some pipelines that configure Synapse, like Synapse roles or Managed private endpoints.

Connecting Blob Storage to a Synapse Workspace with Public Network Workspace Access Disabled

I'm trying to connect blob storage from my RG storage account to the Data tab in my synapse workspace, but I get the following error: "The public network interface on this Workspace is not accessible. To connect to this Workspace, use the Private Endpoint from inside your virtual network or enable public network access for this workspace."
Public network access to my workspace must be disabled for company reasons. I made private endpoint connections on my synapse resource to Dev, Sql, and Sql-On-Demand, but I'm not sure where to go from there.
Thanks!
Go to Azure Synapse -> Manage -> Managed private endpoints -> +New and add private endpoints.
Accessing blob storage : If you already created linked service the follow below image. If not, please created and follow this Ms Doc for creating linked service.
Fastest way to access Azure blob storage
For for information follow this reference by Dennes Torres.

Azure function connection to Azure Blob storage behind Vnet issue

We are currently migrating to a new Azure Subscription and are having issues executing Azure Functions that worked as expected in our old Azure Subscription. The man difference between our old Subscription and our new Subscription is that we have set up a Virtual Network with Subnets and have deployed our Resources behind the Subnets.
We have also had to migrate from an Azure App Service in the old Subscription to a Azure App Environment in the new Subscription.
Our Azure environment consist of:
App Service Environment
App Service Plan I1
The Azure App Environment and Storage Containers are on the same Virtual Network but different Sub Nets. The Function is using a Managed Identity which has Owner Role on Storage Account.
The code listed below worked just fine in our old environment which did not contain the Virtual Network, but fails in our new environment.
Any guidance would be greatly appreciated.
The Azure function which connects to Azure Storage works when run locally from Visual Studio 2019, but fails when run from Azure portal.
Code Snippet below:
This section works just fine:
string storageConnectionString = XXXXConn.ConnectionETLFileContainer();//Get Storage connection string
var myDirectory = "XXXX/Uploads"; ///XXXX-etl-file-ingest/ABSS/Uploads/ CloudStorageAccount storageAccount = CloudStorageAccount.Parse(storageConnectionString);
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();// Create a CloudBlobClient object for credentialed access to Azure Blob. CloudBlobContainer blobContainer = blobClient.GetContainerReference("XXXX-etl-blobfile-ingest");// Get a reference to the Blob Container we created previously. CloudBlobDirectory blobDirectory = blobContainer.GetDirectoryReference(myDirectory);// Get a reference to the Blob Directory.
var blobs = blobDirectory.ListBlobs(useFlatBlobListing: true); //set useFlatBlobListing as true
This statement fails: Failure occurs when trying to iterate through the Blob files and get specific file info.
foreach (var myblob in blobs)
In the azure portal open storage account blade under that go to configuration blade , you will be able to see the list of networks for which your storage account has allowed access to.Once you have the allowed network list kindly check if the function app is on one of those networks if not then you need to get the network on which your function app is hosted added to the list.
Update 2:
The simplest explanation/cause that I found is when an App Service or Function App has the setting WEBSITE_VNET_ROUTE_ALL set to 1, all traffic to public endpoints is blocked. So if your Storage Account has no private endpoint configured, requests to it will fail.
Docs: "To block traffic to public addresses, you must have the application setting WEBSITE_VNET_ROUTE_ALL set to 1."
https://learn.microsoft.com/en-us/azure/app-service/web-sites-integrate-with-vnet#network-security-groups
Update 1:
My answer below was only a workaround for my problem. Turns out I did not link the Private DNS Zone (this is created for you when you create a new Private Endpoint) to my VNET.
To do this, go to your Private DNS Zone in the Azure Portal and click on Virtual network links in the left menu bar. There add a new link to the VNET your Function is integrated in.
This may not have been relevant for the OP, but hopefully it will help others.
Original answer:
In my case this was solved by enabling the Microsoft.Storage Service Endpoint on the App Service's subnet (dedicated subnet).

How to read a blob in Azure databricks with SAS

I'm new to Databricks. I write sample code to read Storage Blob in Azure Databricks.
blob_account_name = "sars"
blob_container_name = "mpi"
blob_sas_token =r"**"
ini_path = "58154388-b043-4080-a0ef-aa5fdefe22c8"
inputini = 'wasbs://%s#%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, ini_path)
spark.conf.set("fs.azure.sas.%s.%s.blob.core.windows.net"% (blob_container_name, blob_account_name), blob_sas_token)
print(inputini)
ini=sc.textFile(inputini).collect()
It throw error:
Container mpi in account sars.blob.core.windows.net not found
I guess it doesn't attach the SAS token in WASBS link, so that it doesn't permission to read the data.
How to attach the SAS in wasbs link.
This is excepted behaviour, you cannot access the read private storage from Databricks. In order to access private data from storage where firewall is enabled or when created in a vnet, you will have to Deploy Azure Databricks in your Azure Virtual Network then whitelist the Vnet address range in the firewall of the storage account. You could refer to configure Azure Storage firewalls and virtual networks.
WITH PRIVATE ACCESS:
When you have provided access level to "Private (no anonymous access)".
Output: Error message
shaded.databricks.org.apache.hadoop.fs.azure.AzureException: shaded.databricks.org.apache.hadoop.fs.azure.AzureException: Container carona in account cheprasas.blob.core.windows.net not found, and we can't create it using anoynomous credentials, and no credentials found for them in the configuration.
WITH CONTAINER ACCESS:
When you have provided access level to "Container (Anonymous read access for containers and blobs)".
Output: You will able to see the output without any issue.
Reference: Quickstart: Run a Spark job on Azure Databricks using the Azure portal.

Resources