Azure Data Lake Service Principal write w/ Data Factory - azure

I have created a Service Principal, and set up the necessary linked services to utilise the credentials and secret key etc in ADF, here is a run down of how this is done:
https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-authenticate-using-active-directory
When i execute my pipeline, and the files are written to the ADL, i can see the top level folder (i am logged in the creator of the ADL service, and am also a contributor on the Resource Group), but i am absolutely unable to drill down any further.
The error i receive basically boils down an ACL error.
Interestingly, i also not at the Execution Location is listed as: East US 2 when using the service principal.
When i manually authenticate the ADL connection in Data Factory (with my own credentials), every works absolutely fine, and the 'execution location' is now listed, correctly, as North Europe.
Has anyone ever seen this?

Helpful Reading: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control
The problem that you are running into is like an ACL issue as you mentioned. By just having contributor access, you only have access and permission on the Management Plane and not the Data Plane of the account.
Here is the mental model for thinking about ACLs
If you need to be able to read a file, you need r-x access on that file, and --x permissions on the parent folder all the way up to root.
If you create a new folder, and you create an Default ACL entry for yourself, it will apply to all new files and folders created below it.
To address your issue, please ask a Super User (someone from the Owners group) to give you this access.
Alternatively if you are an owner, you will have RWX access to any files/folder indepedent of any ACLs.
This should solve your problem.

Related

Provide Azure VM access to multiple Storage Accounts

I have two storage accounts stinboundclient1 & stinboundclient2 and storage account have initial "stinbound" is common for both. Now inside storage accounts there are containers for each environment (dev,test,prod). Now I have a dev Virtual Machine (DevVM) and it needs access to only "dev" container of both storage accounts. What is the best way we can provide read/contributor access to VM using azure policy or custom role or any other approach?
Please do not suggest manual way of providing RBAC permission to VM bcoz its tedious task to provide each container that access as eventually there will 30-40 clients storage accounts.
Storage Account & Containers:
stinboundclient1/dev
stinboundclient1/test
stinboundclient1/prod
stinboundclient2/dev
stinboundclient2/test
stinboundclient2/prod
DevVM needs access to stinbound/dev*
Similarly Test and Prod need access respective containers::
TestVM needs access to stinbound*/test
ProdVM needs access to stinbound*/prod
It seems to me that what you are looking for is actually what Microsoft calls Attribute-based Access Control (ABAC).
That way, you can grant access to a scope and add a particular condition for this access to be effective on the name of the container, a tag to be present, etc.
This feature is still in Preview though.

Azure Data Factory to Azure Blob Storage Permissions

I'm connecting ADF to blob storage v2 using a managed identity following this doc: Doc1
When it comes to test the connection with my first dataset, I am successful when I test the connection to the linkedservice. When I try by the filepath, and enter "testfolder" (which exists in the blob) it fails returning a generic forbidden error displayed at the end of this post.
However, when I opt to "browse" the folders in the dataset portal, the folder "testfolder" does show up. But when I select it, it will not show me anything within that folder.
The Data Factory managed instance is given the role of Contributor, granting full access to manage all resources. Is there some other hidden issue or possible way to narrow down the issue? My instinct is that this is something within the blob container since I can view the containers, but not their contents.
Error message:
It seems that you don't give the role of azure blob storage.
Please fellow this:
1.click IAM in azure blob storage,navigate to Role assignments and add role assignment.
2.choose role according your need and select your data factory.
3.A few minute later,you can retry to choose file path.
Hope this can help you.

How to set up ACL by not using RBAC in ADLS gen2?

Please let me know how did you set up the ACL by not using RBAC. I tried the below steps:
Created a user in Active Directory
In Storage(Gen2) -> IAM -> Gave the reader access to the user
In Storage Explorer - > Right click on the root folder -> manage access - > Giving Read, Write and execute permission.
Still this is not working. I guess since i have given reader role in IAM, ACL is not getting applied.
However if i do not set read access in IAM. User is unable to see the storage account when he is logging to the Azure portal. Please Let me know how shall i apply ACL ?
I have 5 folders. I want to give rwx access to 3 folders for DE team and rx access to DS team.
If you use ACL to access ADLS Gen2 via Azure portal, it is impossible. Because in Azure portal, in default users will use account key to access ADLS Gen2. So users should have permission to list the account key. But ACL cannot do that. For more details, please refer to here. If you want to use ACL, I suggest you use azcopy.
For example
My ADLS Gen2
FileSystem
test
folder
result_csv
I want to list all files in the folder result_csv.
Configure ACL. For more details about ACL, please refer to here
Operation / result_csv/
list /result_csv --x r-x
Test
azcopy login --tenant-id <your tenant>
azcopy list "your url"

Folder level access control in ADLS Gen2 for upcoming users

I have a Gen2 storage account and created a container.
Folder Structure looks something like this
StorageAccount
->Container1
->normal-data
->Files 1....n
->sensitive-data
->Files 1....m
I want to give read only access to the user only for normal-data and NOT sensitive-data
This can be achieved by setting ACL's on the folder level and giving access to the security service principle.
But limitation of this approach is user can only access the files which are loaded into the directory after the ACL is set up, hence cannot access the files which are already present inside the directory.
Because of this limitation, new users cannot be given full read access (unless new users use the same service principle, which is not the ideal scenario in my usecase)
Please suggest a read-only access method in ADLS Gen2, where
If files are already present under a folder and a new user is onboarded, he should be able to read all the files under the folder
New user should get access to only normal-data folder and NOT to sensitive-data
PS : There is a script for assigning ACL's recursively. But as I will get close to million records each day under normal-data folder, it would not be feasible for me to use the recursive ACL script
You could create an Azure AD security group and give that group read only access to the read-only folder.
Then you can add new users to the security group.
See: https://learn.microsoft.com/en-us/azure/active-directory/fundamentals/active-directory-groups-create-azure-portal

Grant access to Azure Data Lake Gen2 Access via ACLs only (no RBAC)

my goal is to restrict access to a Azure Data Lake Gen 2 storage on a directory level (which should be possible according to Microsoft's promises).
I have two directories data, and sensitive in a data lake gen 2 container. For a specific user, I want to grant read access to the directory data and prevent any access to directory sensitive.
Along the documentation I removed all RBAC assignements for that user (on storage account as well as data lake container) so that I have no inherited read access on the directories. Then I added a Read-ACL statement to the data directory for that user.
My expectation:
The user can directly download files from the data directory.
The user can not access files of the sensitive directoy
Reality:
When I try to download files from the data directory I get a 403 ServiceCode=AuthorizationPermissionMismatch
az storage blob directory download -c containername -s data --account-name XXX --auth-mode login -d "./download" --recursive
RESPONSE Status: 403 This request is not authorized to perform this operation using this permission.
I expect that this should work. Otherwhise I only can grant access by assigning the Storage Blob Reader role but that applies to all directory and file within a container and cannot be overwritten by ACL statements. Did I something wrong here?
According to my research, if you want to grant a security principal read access to a file, we need to give the security principal Execute permissions to the container, and to each folder in the hierarchy of folders that lead to the file. for more details, please refer to the document
I found that I could not get ACLs to work without an RBAC role. I ended up creating a custom "Storage Blob Container Reader" RBAC role in my resource group with only permission "Microsoft.Storage/storageAccounts/blobServices/containers/read" to prevent access to listing and reading the actual blobs.

Resources