Access Control from Databricks to Azure Storage Accounts and Containers - azure

Our Databricks workspace needs to access different data sets but we need to ensure that access control can be granted on a role or individual level. The data sets are planned to be available as files on Data Lake Gen2 that will be read into dataframes etc. These files in storage accounts can be organized as seen fit for access rights (either 1 storage account per dataset - which might hit the 256 limit soon - or 1 dataset per container and thus several datasets in a storage account).
Our architectural guidelines require the access to be via service principal. However, I think this would give each user in the Databricks workspace the same access rights to different storage accounts (datasets).
Is there another feasible solution with accessing storage accounts from Databricks via service principal but at the same time have fine-grained control about access rights of individual users or at least on a role-level? Can this be achieved on a container level or only on a storage account level?
I tried to use service principal to access storage accounts from within a Databricks workspace which then grants every user the access to the storage accounts.

Usually when user is working with the data it happens in two steps:
Checking permissions for accessing a specific piece of data
Actually accessing the data in the storage account if it's allowed
This schema is fully supported on Databricks with following:
If your organization is already adopted the Unity Catalog (UC), then it's easy - you just add storage accounts/containers as external locations, create tables for data in these locations, and then grant permissions on working with specific tables to users or (better) roles. Actual data access will be done
If you didn't adopt UC yet, then you can enforce access via Table Access Control (TACL). In this case you will need to attach a service principal to a TACL enabled cluster, but actual enforcement will happen by the TACL service, and data will be read/written only if user/role has permissions to do that.

Related

Recursively provisioning ACL access over Azure Storage container : time consuming

I have a ADLS Gen2 account deployed in Azure. We are populating data to different teams to transform. For the security reasons, We only providing ACLs permissions. Now as the data becoming huge in size, in case new team introduced, we are getting issue while providing access to container level.
Currently we are using Powershell. Its taking around 5+ hrs if data in container is 20GB+.
Is there any way to reduce the time? Any other language can we used or alternate solution ?
It sounds like you have a single storage container and are granting access on a per item basis.
This is not sustainable as the number of items and the number of teams grows.
You need to group the data in a way that you can grant access to a team for a set of data.
Possible options:
Create several storage accounts, grant access to teams on a storage account level
Create containers within the storage account, place data in containers, grant access on container level.

Is there any method by which I can restrict other user not to view my container in Azure data lake gen 2

Problem Statement- There are two different teams working on two different project for same client. Both team have access to azure resource group on which azure data lake storage has been created. Now Client want us to use same data lake storage for both project but they also want that team working on a specific containers should not have access to other containers which other team will use and vice-versa.
Example--
Azure data lake storage -both team have access to this
->container1--only team 1 should have access to this
->container2--only team 2 should have access to this
Can anyone please suggest that how can we achieve this.
Thanks In advance!!
You can manage the access to containers, directories and blobs by using Access control lists (ACLs) feature in Azure Data Lake Storage Gen2.
You can associate a security principal with an access level for files and directories. Each association is captured as an entry in an access control list (ACL). Each file and directory in your storage account has an access control list. When a security principal attempts an operation on a file or directory, An ACL check determines whether that security principal (user, group, service principal, or managed identity) has the correct permission level to perform the operation.
To manage the ACL on the container, follow the below steps:
Go to the container in the storage account.
Navigate to any container, directory, or blob. Right-click the object, and then select Manage ACL.
The Access permissions tab of the Manage ACL page appears. Use the controls in this tab to manage access to the object.
To add a security principal to the ACL, select the Add principal button.
Find the security principal by using the search box, and then click the Select button.
You should create a security group in Azure AD for each of your team, and then maintain permissions on the group rather than for individual users.
Refer: Access control lists (ACLs) in Azure Data Lake Storage Gen2

Unable to add Azure guest user to access storage accounts

Context: I have limited experience with Azure, but looking to add a few guest users external to our organization to allow them to use Azure blob storage to upload dataset they can use (e.g., add, edit, delete), but otherwise limit all of their permissions.
My approach is to create a storage account for each of them, then adjust the permissions for that account.
What I have done:
Create new storage account
Add external user as "Guest user"
For the storage account, adjusted the permissions such that for that specific user I added their Role Assignment as "Storage Blob Data Contributer"
Problem: When the user logs into their Azure portal they are unable to find this resource or seemingly get access to it. I'm wondering if there are other permissions I need to enable to make this work?
Storage Blob Data Contributor is a data plane role. To see the Storage Accounts, your guest users will need at least Reader role on the actual storage account (control plane). If you wanted just one role to allow both planes, you can give your users Reader and Data Access BuiltInRole
More Context
Azure operations can be divided into two categories - control plane and data plane. A simplistic way of thinking of this from an on-prem storage perspective is control plane give access to the physical (e.g. you have access to the server room where the disks are and you can swap out drives and needed) whereas data plane is you have permissions on the file share to view files.
When I talk to customers, I try to equate access to the portal as access to your on-prem datacenter. You only give it out to the people that need physical access.
You can also look at Azure Date Explorer but you still need the proper data/control plane permissions.

Azure Storage Account - Container level access and ACL

Azure Storage Account
In one of our use case, we would like to use Azure Storage for sharing it with customers so that they can upload their data to us.
In that context, we are planning to create storage account per customer. In order for customer to access the account, we are planning
to share the storage account keys.
We are facing following issues
How to create keys specific to azure storage account container, so that customer can only access specific container.
Is it possible to have individual keys and access at container level.
For certain container, we want to give read-write access.
For others, we want to give only read access.
If i have storage account keys, does that mean i have access to everything under that storage account.
Is there a better solution to this ? Essentially we need a ftp site for customers to upload data.
Sounds like you want to use a shared access signature (SAS):
https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview
A shared access signature (SAS) provides secure delegated access to resources in your storage account without compromising the security of your data. With a SAS, you have granular control over how a client can access your data. You can control what resources the client may access, what permissions they have on those resources, and how long the SAS is valid, among other parameters.
You can't have access key for a container level, there are for the whole Storage Account
To give access at a container level (or even finer grain) you need a Shared Access Signature. Documentation here
You can have as many SAS as you need, and you are allowed to define them with the desired permissions (read, read-write etc...)

How to browse Azure Data lake gen 2 using GUI tool

First some background:
I want to facilitate access to the different groups of data scientists in Azure Data Lake gen 2. However, we don’t want provide access to them to the entire data lake because they are not supposed to see all the data for security reasons. They must be able to see only some limited files/folders. We are doing that by adding the data scientists’ AAD groups to the ACL of the data lake folders. You can refer to the following links to get more insights and to know what I am talking about:
https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control
Now the problem:
Since the data scientists are granted access to a very specific/limited area, they are able to access/browse those folders/files using Azure databricks (python commands/code etc.). However, they are not able to browse using Azure Storage Explorer.
So is there some way so that they can browse the datalake using Azure storage explorer or some other GUI tool.
Or is it possible to create some custom role for such a scenario and grant that role to the data scientists AAD groups so that they may just have access to the specific area (i.e. a custom role that may be created that would only have “execute” access on the ADLS gen 2 file-systems.)
As far as I knew, we have no way to use RABC role to control access on some folders in the file system(container). Because when we assign role to ADD group, we need to define a scope. The smallest scope in Azure data lake gen2 is file system(container). If you just want to control access on it, you do not need to create custom role and you can directly use the build-in role Storage Blob Data Reader. If one user has the role, he can read all files in the file system. For more details, please refer to the document
It is not possible to access data via Storage Explorer only with ACL permissions assigned. Unfortunately, you need to use ACLs in combination with RBAC role assigned on the Storage Account level (e.g. Reader), to be able to see Storage Account itself from the Storage Explorer. Then you can introduce granular permissions using ACL on specific containers/folders/files, however with Reader still they will be able to see the names of all the containers in the Storage Account (but cannot see the containers content until specified via ACL or Data RBAC assignment on container level).
As you noticed, the only option to access specific folder/file using only ACL permissions is via code e.g. Powershell or Python.

Resources