Recursively provisioning ACL access over Azure Storage container : time consuming - azure

I have a ADLS Gen2 account deployed in Azure. We are populating data to different teams to transform. For the security reasons, We only providing ACLs permissions. Now as the data becoming huge in size, in case new team introduced, we are getting issue while providing access to container level.
Currently we are using Powershell. Its taking around 5+ hrs if data in container is 20GB+.
Is there any way to reduce the time? Any other language can we used or alternate solution ?

It sounds like you have a single storage container and are granting access on a per item basis.
This is not sustainable as the number of items and the number of teams grows.
You need to group the data in a way that you can grant access to a team for a set of data.
Possible options:
Create several storage accounts, grant access to teams on a storage account level
Create containers within the storage account, place data in containers, grant access on container level.

Related

Access Control from Databricks to Azure Storage Accounts and Containers

Our Databricks workspace needs to access different data sets but we need to ensure that access control can be granted on a role or individual level. The data sets are planned to be available as files on Data Lake Gen2 that will be read into dataframes etc. These files in storage accounts can be organized as seen fit for access rights (either 1 storage account per dataset - which might hit the 256 limit soon - or 1 dataset per container and thus several datasets in a storage account).
Our architectural guidelines require the access to be via service principal. However, I think this would give each user in the Databricks workspace the same access rights to different storage accounts (datasets).
Is there another feasible solution with accessing storage accounts from Databricks via service principal but at the same time have fine-grained control about access rights of individual users or at least on a role-level? Can this be achieved on a container level or only on a storage account level?
I tried to use service principal to access storage accounts from within a Databricks workspace which then grants every user the access to the storage accounts.
Usually when user is working with the data it happens in two steps:
Checking permissions for accessing a specific piece of data
Actually accessing the data in the storage account if it's allowed
This schema is fully supported on Databricks with following:
If your organization is already adopted the Unity Catalog (UC), then it's easy - you just add storage accounts/containers as external locations, create tables for data in these locations, and then grant permissions on working with specific tables to users or (better) roles. Actual data access will be done
If you didn't adopt UC yet, then you can enforce access via Table Access Control (TACL). In this case you will need to attach a service principal to a TACL enabled cluster, but actual enforcement will happen by the TACL service, and data will be read/written only if user/role has permissions to do that.

Unable to add Azure guest user to access storage accounts

Context: I have limited experience with Azure, but looking to add a few guest users external to our organization to allow them to use Azure blob storage to upload dataset they can use (e.g., add, edit, delete), but otherwise limit all of their permissions.
My approach is to create a storage account for each of them, then adjust the permissions for that account.
What I have done:
Create new storage account
Add external user as "Guest user"
For the storage account, adjusted the permissions such that for that specific user I added their Role Assignment as "Storage Blob Data Contributer"
Problem: When the user logs into their Azure portal they are unable to find this resource or seemingly get access to it. I'm wondering if there are other permissions I need to enable to make this work?
Storage Blob Data Contributor is a data plane role. To see the Storage Accounts, your guest users will need at least Reader role on the actual storage account (control plane). If you wanted just one role to allow both planes, you can give your users Reader and Data Access BuiltInRole
More Context
Azure operations can be divided into two categories - control plane and data plane. A simplistic way of thinking of this from an on-prem storage perspective is control plane give access to the physical (e.g. you have access to the server room where the disks are and you can swap out drives and needed) whereas data plane is you have permissions on the file share to view files.
When I talk to customers, I try to equate access to the portal as access to your on-prem datacenter. You only give it out to the people that need physical access.
You can also look at Azure Date Explorer but you still need the proper data/control plane permissions.

How to browse Azure Data lake gen 2 using GUI tool

First some background:
I want to facilitate access to the different groups of data scientists in Azure Data Lake gen 2. However, we don’t want provide access to them to the entire data lake because they are not supposed to see all the data for security reasons. They must be able to see only some limited files/folders. We are doing that by adding the data scientists’ AAD groups to the ACL of the data lake folders. You can refer to the following links to get more insights and to know what I am talking about:
https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control
Now the problem:
Since the data scientists are granted access to a very specific/limited area, they are able to access/browse those folders/files using Azure databricks (python commands/code etc.). However, they are not able to browse using Azure Storage Explorer.
So is there some way so that they can browse the datalake using Azure storage explorer or some other GUI tool.
Or is it possible to create some custom role for such a scenario and grant that role to the data scientists AAD groups so that they may just have access to the specific area (i.e. a custom role that may be created that would only have “execute” access on the ADLS gen 2 file-systems.)
As far as I knew, we have no way to use RABC role to control access on some folders in the file system(container). Because when we assign role to ADD group, we need to define a scope. The smallest scope in Azure data lake gen2 is file system(container). If you just want to control access on it, you do not need to create custom role and you can directly use the build-in role Storage Blob Data Reader. If one user has the role, he can read all files in the file system. For more details, please refer to the document
It is not possible to access data via Storage Explorer only with ACL permissions assigned. Unfortunately, you need to use ACLs in combination with RBAC role assigned on the Storage Account level (e.g. Reader), to be able to see Storage Account itself from the Storage Explorer. Then you can introduce granular permissions using ACL on specific containers/folders/files, however with Reader still they will be able to see the names of all the containers in the Storage Account (but cannot see the containers content until specified via ACL or Data RBAC assignment on container level).
As you noticed, the only option to access specific folder/file using only ACL permissions is via code e.g. Powershell or Python.

is it possible to aggregate Azure resources from different subscriptions?

Our team has Windows Azure MSDN - Visual Studio Premium subscriptions for all our devs. I have been taking advantage of the $100 per month allowance and am building more infrastructure in the cloud.
However, I would like other members of our team to access certain of the assets. I am quite new to the Azure infrastructure, so this might be a dumb question. But can they access my blobs? and can I control exactly who can access my blobs?
They can obviously RDP into my VMs, that's not an issue. I assume they can hit my VMs too, via the IP address, inside Azure, etc. However, I am more interested in the Blobs. Mostly because I am starting to upload a lot of utility data (large sample datasets, common software we all install, etc.) and I would like to avoid all of us having to upload all of it again for each subscriptions.
As of today (11/8/2013), you cannot "pool" MSDN resources meaning..have 4 subscriptions add up to $400/month and do ala carte cloud services
You can have one admin/or several for multiple subscriptions, this will allow you to view the different subscriptions in the portal and manage them in a single spot
You can also have different deployment profiles, so one Visual Studio instance can deploy to different Azure accounts.
Specific to your question, you have blob access keys and if you share the name of the storage account and key...yes they can access your data located there.
Yes, it is possible to control access to your blobs by using SAS (Shared Access Signatures)
SAS grants granular access to container, blob, table, & queue
This should be a good resource to start with :
Manage Access to Windows Azure Storage Resources
Create and Use a Shared Access Signature
However, I would like other members of our team to access certain of
the assets. I am quite new to the Azure infrastructure, so this might
be a dumb question. But can they access my blobs? and can I control
exactly who can access my blobs?
To answer specifically this question, Yes your team members can access the data stored in any blob storage account in any of your subscription. There are two ways by which you can provide them access to blob storage:
By giving them account name/account key: Using this, they get full access to storage account and essentially become owners of that storage account.
By using Shared Access Signature: If you want to give them restricted access to blob storage, you would need to use SAS as described by Dan Dinu. SAS basically gives you a URL using which users in possession of that URL can explore storage (by writing some code), however it is not possible to identify which user accessed which storage. For that you would need to write something on your own.

Creating blob storage service programmatically in Azure

currently I'm playing around with Azure and thinking about a multi-tanent web app where users can create an instance of the app, where more users can register to upload and share files within this instance. I've created a blob storage service and created several containers. However, I'm not sure how customers may think about the fact, that they share their blob service with other users and files are only separated by containers. I would like that each user gets instead his own blob service. However the web app should be shared still by a single web worker role.
This sounds easy for every instance you create by hand, however I want the blob service to be created automatically as the user registers and creates his instance of the web app. Unfortunately I haven't found yet any information about how I could accomplish this. I've found only the blob storage api to query the service, not for creating it.
Can anybody lead me in the right direction? Is this even possible?
You can create a storage account programmatically (see "Create Storage Account": http://msdn.microsoft.com/en-us/library/hh264518.aspx), but I wouldn't recommend creating a different account for each user. The limit on how many storage accounts can be created per subscription is fairly low. (I believe the default is five and you can call to get your quota increased to twenty.)
In general, the recommendation is to go ahead and use the same storage account for all your customers. I believe your concern is about data security, but adding multiple storage accounts doesn't really change the security dynamic. (The trust boundary is still between you and the end user, since only your code will directly access storage.)

Resources