Restrict Users permission to access data in ADLS - azure

Is it possible to allow only specific users from databricks to access specific data from Azure Data Lake Storage?
I want to allow only User 1 and User 2 to access data1.csv file and allow User 3 and User 4 to access data2.csv file.

It is a Premium feature in Azure Databricks that allows to authenticate to Azure Data Lake Store using the Azure Active Directory identity logged into Azure Databricks. With this feature customers can control which user can access which data through Azure Databricks.
This feature needs to be enabled on the cluster (see screenshot below) and once configured, users can then log-in & execute read/write commands to Azure Data Lake Store without the need to use service principle. The user can only read/write data based on the roles and ACLs the user has been granted on the Azure Data Lake Store.
Refer - https://learn.microsoft.com/en-us/azure/databricks/security/credential-passthrough/adls-passthrough#--enable-azure-data-lake-storage-credential-passthrough-for-a-standard-cluster

Related

Access Control from Databricks to Azure Storage Accounts and Containers

Our Databricks workspace needs to access different data sets but we need to ensure that access control can be granted on a role or individual level. The data sets are planned to be available as files on Data Lake Gen2 that will be read into dataframes etc. These files in storage accounts can be organized as seen fit for access rights (either 1 storage account per dataset - which might hit the 256 limit soon - or 1 dataset per container and thus several datasets in a storage account).
Our architectural guidelines require the access to be via service principal. However, I think this would give each user in the Databricks workspace the same access rights to different storage accounts (datasets).
Is there another feasible solution with accessing storage accounts from Databricks via service principal but at the same time have fine-grained control about access rights of individual users or at least on a role-level? Can this be achieved on a container level or only on a storage account level?
I tried to use service principal to access storage accounts from within a Databricks workspace which then grants every user the access to the storage accounts.
Usually when user is working with the data it happens in two steps:
Checking permissions for accessing a specific piece of data
Actually accessing the data in the storage account if it's allowed
This schema is fully supported on Databricks with following:
If your organization is already adopted the Unity Catalog (UC), then it's easy - you just add storage accounts/containers as external locations, create tables for data in these locations, and then grant permissions on working with specific tables to users or (better) roles. Actual data access will be done
If you didn't adopt UC yet, then you can enforce access via Table Access Control (TACL). In this case you will need to attach a service principal to a TACL enabled cluster, but actual enforcement will happen by the TACL service, and data will be read/written only if user/role has permissions to do that.

Is there any method by which I can restrict other user not to view my container in Azure data lake gen 2

Problem Statement- There are two different teams working on two different project for same client. Both team have access to azure resource group on which azure data lake storage has been created. Now Client want us to use same data lake storage for both project but they also want that team working on a specific containers should not have access to other containers which other team will use and vice-versa.
Example--
Azure data lake storage -both team have access to this
->container1--only team 1 should have access to this
->container2--only team 2 should have access to this
Can anyone please suggest that how can we achieve this.
Thanks In advance!!
You can manage the access to containers, directories and blobs by using Access control lists (ACLs) feature in Azure Data Lake Storage Gen2.
You can associate a security principal with an access level for files and directories. Each association is captured as an entry in an access control list (ACL). Each file and directory in your storage account has an access control list. When a security principal attempts an operation on a file or directory, An ACL check determines whether that security principal (user, group, service principal, or managed identity) has the correct permission level to perform the operation.
To manage the ACL on the container, follow the below steps:
Go to the container in the storage account.
Navigate to any container, directory, or blob. Right-click the object, and then select Manage ACL.
The Access permissions tab of the Manage ACL page appears. Use the controls in this tab to manage access to the object.
To add a security principal to the ACL, select the Add principal button.
Find the security principal by using the search box, and then click the Select button.
You should create a security group in Azure AD for each of your team, and then maintain permissions on the group rather than for individual users.
Refer: Access control lists (ACLs) in Azure Data Lake Storage Gen2

How to browse Azure Data lake gen 2 using GUI tool

First some background:
I want to facilitate access to the different groups of data scientists in Azure Data Lake gen 2. However, we don’t want provide access to them to the entire data lake because they are not supposed to see all the data for security reasons. They must be able to see only some limited files/folders. We are doing that by adding the data scientists’ AAD groups to the ACL of the data lake folders. You can refer to the following links to get more insights and to know what I am talking about:
https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control
Now the problem:
Since the data scientists are granted access to a very specific/limited area, they are able to access/browse those folders/files using Azure databricks (python commands/code etc.). However, they are not able to browse using Azure Storage Explorer.
So is there some way so that they can browse the datalake using Azure storage explorer or some other GUI tool.
Or is it possible to create some custom role for such a scenario and grant that role to the data scientists AAD groups so that they may just have access to the specific area (i.e. a custom role that may be created that would only have “execute” access on the ADLS gen 2 file-systems.)
As far as I knew, we have no way to use RABC role to control access on some folders in the file system(container). Because when we assign role to ADD group, we need to define a scope. The smallest scope in Azure data lake gen2 is file system(container). If you just want to control access on it, you do not need to create custom role and you can directly use the build-in role Storage Blob Data Reader. If one user has the role, he can read all files in the file system. For more details, please refer to the document
It is not possible to access data via Storage Explorer only with ACL permissions assigned. Unfortunately, you need to use ACLs in combination with RBAC role assigned on the Storage Account level (e.g. Reader), to be able to see Storage Account itself from the Storage Explorer. Then you can introduce granular permissions using ACL on specific containers/folders/files, however with Reader still they will be able to see the names of all the containers in the Storage Account (but cannot see the containers content until specified via ACL or Data RBAC assignment on container level).
As you noticed, the only option to access specific folder/file using only ACL permissions is via code e.g. Powershell or Python.

Manage Authorization To folders in Azure Data Lake from Excel

I am developing an Azure data lake and I want to connect Excel to the data lake.
How do you authorize users too see the data from Excel?
I have used two test users and given them different access to the resource group, the services etc, and they just don't get access. Only I, myself have access.
Is it possible to restrict the access so that excel can only see one specific folder in the data lake?
The normal way to do this is using an app registration, but I can not see how to connect an app to excel.
Users must be authenticated via ADFS and granted global permissions. You can specify O365 credentials and grant AAD access.
https://blogs.msdn.microsoft.com/freddyk/2018/06/29/aad-authentication/
https://learn.microsoft.com/en-us/azure/analysis-services/analysis-services-manage-users
You can apply access control in Azure Data Lake so that users can only see certain folders. https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control#common-scenarios-related-to-permissions

Faster way to grant access privileges to ADLS on HDInsight cluster provisioning?

I have an Azure Data Lake Store (ADLS) containing ~100k files that I need to access from an HDInsight cluster for analysis. When I provision the cluster via Azure Portal, I use this ADLS for the cluster's storage and assign rwx privileges for all files on the ADLS using a service principal + the "Data Lake Store Access" feature. This feature appears to grant access to each file one at a time, at a rate of about 2k per minute: it takes over an hour just to grant the permissions!
Is there a faster way to grant a new cluster rwx privileges on its associated ADLS?
Yes there is a better way to get this all set up. You need to, on a one-time basis, add permissions for an Azure Active Directory group to all your files and folders. Once that is set up, then whenever you create a new HDInsight cluster, the service principal simply needs to be made a member of the group.
So to summarize:
Create a new Azure Active Directory Group
Propagate permissions in your ADLS account to this group on the appropriate files and folders
Create your HDInsight cluster. Choose the right service principal
when creating it.
Add the service principal to the group created in
step 1
Hope this helps and do let me know if you have questions.

Resources