How to connect to Azure Databricks using Service Principal? - azure

I am trying to launch a cluster using Azure DataBricks using portal but I am getting an issue saying "Subnet provided does not have security group associated to it."
But I want to connect it using the service Principal.
Please help!!

While deploying Azure Databricks in your Virtual Network, make sure to associate Network Security group (NSG) rules that allows communication with the Azure Databricks control plane.
The virtual network must include two subnets dedicated to Azure Databricks:
A private subnet with a configured network security group that allows
cluster-internal communication
A public subnet with a configured network security group that allows
communication with the Azure Databricks control plane.
The following table displays the current network security group rules used by Azure Databricks. In order to ensure that your Azure Databricks service runs smoothly, Azure Databricks can change these rules at any time. This topic and table will be updated whenever such a modification occurs.
Reference: Deploy Azure Databricks in your Virtual Network (VNET Injection)
Hope this helps.

Related

Can I add an Azure Synapse Workspace to my own Virtual Network?

I'm building a data-architecture with Azure Synapse and an ADLS2 data lake/storage account. If I want to create a Linked service that moves data from the storage account to the serverless/dedicated SQL-pool I have several options.
In my storage account, I can "enable access from all networks" so that the Integration Runtime in my synapse workspace is free to bypass the Storage account firewall. For obvious reasons this is not ideal.
I can create my Synapse workspace with "Managed virtual network" option enabled - and create a managed private endpoint between the two services.
However, I'm wondering whether it's also possible (Like it is in other Azure services) to add my Synapse workspace to an already existing virtual network, and use private endpoints to connect the services instead of managed private endpoints? I.e. connecting the services through "non-managed" private endpoints? When I browse the documentation it seems impossible;
When you create a Synapse workspace, you can choose to enable a
Managed virtual network to be associated with it. If you do not enable
Managed virtual network for your workspace when you create it, your
workspace is in a shared virtual network along with other Synapse
workspaces that do not have a Managed virtual network associated with
it. If you enabled Managed virtual network when you created the
workspace, then your workspace is associated with a dedicated virtual
network managed by Azure Synapse. These virtual networks are not
created in your customer subscription. Therefore, you will not be able
to grant traffic from these virtual networks access to your secured
storage account using network rules described above.
Any idea as to why Synapse was created as such? Is there any way to bypass this limitation?

Whitelist Azure Automation account in Azure Storage to read zipped module from blob and upload in azure automation module

I have uploaded the zipped powershell module in Azure Blob Storage and my networking is selected as Allow access from Selected networks. I am running below command to upload module in azure automation-
New-AzAutomationModule -ResourceGroupName $automationrg -AutomationAccountName $automationaccount -Name ($Mod.Name).Replace('.zip','') -ContentLink $Blob.ICloudBlob.Uri.AbsoluteUri
After running this command, I am getting below error.
[error]{"Message":"Module is not accessible. Exception: This request is not authorized to perform this operation."}
##[error]PowerShell exited with code '1'.
I checked and get to know that it is a firewall issue and I can not select access from all virtual networks. How Can I whitelist azure automation in storage networking?
I checked and get to know that it is a firewall issue and I can not select access from all virtual networks. How Can I whitelist azure automation in storage networking?
Whitelisting all Automation account's IP address is not option because it is impossible to keep up with updates hundreds of IP-addresses.
In this Case, you can use Azure Private Link to connect networks to Azure Automation. But this has the main limitation:
Private Link support with Azure Automation is available only in Azure Commercial and Azure US Government clouds.
One option which might work for you is to use a Hybrid Worker Group in Azure Automation. The systems can be your physical systems that can reach Azure or your Azure VMs. You can then grant access to the IP addresses that are in your Hybrid Runbook Worker group.
References:
Use Azure Private Link to securely connect networks to Azure Automation
Azure Automation network configuration details
Access storage account with Automation Account / Runbook - MSFT Q&A

Azure Synapse VNet Integration with disabled "Managed Virtual Network"

I've an Azure Synapse created without the option "Enable Managed Virtual Network". I have to integrate Synapse in an Azure Virtual Network.
I'm following the documentation at this link.
What additional actions should I perform due to the missing configuration "Enable Managed Virtual Network"?
I would like to avoid to drop and recreate the Azure Synapse.
Ah in your case if you want your notebook to access a linked storage resources under a certain storage account, add managed private endpoints under your Azure Synapse Analytics Studio. The storage account name should be the one your notebook needs to access.
And to add this managed private endpoint you would have to enable a managed virtual network for your workspace. Else you the service will be inaccessible as in snip below after I repro.
its only this step (Create private endpoints for workspace linked storage) that may require, Else you are good to follow the doc ahead .

Azure Kubernetes Services - when it is required to set AKS Service Principle on other Azure Resources in order to have connection?

By default when creating an AKS cluster a service principal is being created for that cluster.
Then that Service Principal can be set on the level of some other Azure Resource (VM?) in order for them to be able to establish a network connection and for them to be able to communicate (except of of course general network settings)
I am really not sure and can not understand when this is required and when not. If for example I have db on VM level do I need to grant the AKS service principal access to the VM to be able to communicate with it through the network or not?
Can someone provide me some guidance for this, and not general documentation. When this is required to be used/set on the level of those other Azure resources and when it is not?
I cannot find proper explanation for this.
Thank you
Regarding your question about the DB, you do not need to give the service principal any access to that VM. Given that the Database runs outside of Kubernetes does not need to access that VM in any way. The database could even be in a different data center or hosted on another cloud provider entirely, applications running inside kubernetes will still be able to communicate with it as long as the traffic is allowed by firewalls etc.
I know you did not ask for generic documentation, but the documentation on Kubernetes Service Principals puts it well:
To interact with Azure APIs, an AKS cluster requires either an Azure
Active Directory (AD) service principal or a managed identity. A
service principal or managed identity is needed to dynamically create
and manage other Azure resources such as an Azure load balancer or
container registry (ACR).
In other words, the Service principal is the identity that the Kubernetes cluster authenticates with when it interacts with other Azure resources such as:
Azure container registry: The images that the containers are created from must come from somehwere. If you are storing your custom images in a private registry the cluster must be authorized to pull images from the registry. If the private registry is an Azure container registry the service principal must be authorized for those operations
Networking: Kubernetes must be able to dynamically configure routetables and to register external IP's for services in a loadbalancer. Again, the service principal is used as identity so it must be authorized
Storage: To access disk resources and mount them into pods
Azure Container instances: In case you are using the virtual kubelet to dynamically add compute resources to your cluster Kubernetes must be allow to manage containers on ACI.
To delegate access to other Azure resources you can use the azure cli to assign a role to a an assignee on a certain scope:
az role assignment create --assignee <appId> --scope <resourceScope> --role Contributor
Here is a detailed list of all cluster identity permissions in use

Azure Databricks: Accessing Blob Storage Behind Firewall

I am reading files on an Azure Blob Storage account (gen 2) from an Azure Databricks Notebook. Both services are in the same region (West Europe). Everything works fine, except when I add a firewall in front of the storage account. I have opted to allow "trusted Microsoft services":
However, running the notebook now ends up with an access denied error:
com.microsoft.azure.storage.StorageException: This request is not authorized to perform this operation.
I tried to access the storage directly from Spark and by mounting it with dbutils, but same thing.
I would have assumed that Azure Databricks counts as a trusted Microsoft service? Furthermore I couldn't find solid information on IP ranges for Databricks regions that could be added to the firewall rules.
Yes, the Azure Databricks does not count as a trusted Microsoft service, you could see the supported trusted Microsoft services with the storage account firewall.
From networking, Here are two suggestions:
Find the Azure datacenter IP address (Original deprecated URL) and scope a region where your Azure Databricks located. Whitelist the IP list in the storage account firewall.
Deploy Azure Databricks in your Azure Virtual Network (Preview) then whitelist the VNet address range in the firewall of the storage account. You could refer to configure Azure Storage firewalls and virtual networks. Also, you have NSG to restrict inbound and outbound traffics from this Azure VNet. Note: you need to deploy Azure Databricks to your own VNet.
Hope this helps.
The described scenario only works if you deploy Azure Databricks in your own Azure Virtual Network (vnet). With this you are able to use Service Endpoints, so could add your Databricks vnet to the Blob Storage. With the default deployment this is not supported and not possible.
See the following Documentation for more details and a description how to get the vnet-injection feature enabled.
Enabling the mentioned exception does not work, as Azure Databricks is not in the list of trusted Services for Blob Storage. See the following Documentation which services still can access the storage account with the exception enabled.

Resources