Our team has an Azure Data Lake Gen2. Another team would like to input data to the Data Lake but they should not be able to view our contents in the Data Lake. How can I achieve it?
I think partial access is not possible and need to create another Azure Data Lake for the external team to put data. Am I correct?
Look at this. You can add RBAC access to the data lake, so you can create a new folder where external team has write permissions and no permissions to other folders.
Another option is to create a different container in the storage account and configure IAM control access.
Related
First some background:
I want to facilitate access to the different groups of data scientists in Azure Data Lake gen 2. However, we don’t want provide access to them to the entire data lake because they are not supposed to see all the data for security reasons. They must be able to see only some limited files/folders. We are doing that by adding the data scientists’ AAD groups to the ACL of the data lake folders. You can refer to the following links to get more insights and to know what I am talking about:
https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control
Now the problem:
Since the data scientists are granted access to a very specific/limited area, they are able to access/browse those folders/files using Azure databricks (python commands/code etc.). However, they are not able to browse using Azure Storage Explorer.
So is there some way so that they can browse the datalake using Azure storage explorer or some other GUI tool.
Or is it possible to create some custom role for such a scenario and grant that role to the data scientists AAD groups so that they may just have access to the specific area (i.e. a custom role that may be created that would only have “execute” access on the ADLS gen 2 file-systems.)
As far as I knew, we have no way to use RABC role to control access on some folders in the file system(container). Because when we assign role to ADD group, we need to define a scope. The smallest scope in Azure data lake gen2 is file system(container). If you just want to control access on it, you do not need to create custom role and you can directly use the build-in role Storage Blob Data Reader. If one user has the role, he can read all files in the file system. For more details, please refer to the document
It is not possible to access data via Storage Explorer only with ACL permissions assigned. Unfortunately, you need to use ACLs in combination with RBAC role assigned on the Storage Account level (e.g. Reader), to be able to see Storage Account itself from the Storage Explorer. Then you can introduce granular permissions using ACL on specific containers/folders/files, however with Reader still they will be able to see the names of all the containers in the Storage Account (but cannot see the containers content until specified via ACL or Data RBAC assignment on container level).
As you noticed, the only option to access specific folder/file using only ACL permissions is via code e.g. Powershell or Python.
My company has two Azure environments. The first one was a temporary environment and is being re-purposed / decommissioned / I'm not sure. All I know is I need to get files from one Data Lake on one environment, to a DataLake on another. I've looked at adlcopy and azcopy and neither seem like they will do what I need done. Has anyone encountered this before and if so, what did you use to solve it?
Maybe you can think about Azure Data Factory, it can helps you transfer files or data from one Azure Data Lake to Another Data Lake.
You can reference Copy data to or from Azure Data Lake Storage Gen2 using Azure Data Factory.
This article outlines how to use Copy Activity in Azure Data Factory to copy data to and from Data Lake Storage Gen2. It builds on the Copy Activity overview article that presents a general overview of Copy Activity.
For example, you can learn from this tutorial: Quickstart: Use the Copy Data tool to copy data.
In this quickstart, you use the Azure portal to create a data factory. Then, you use the Copy Data tool to create a pipeline that copies data from a folder in Azure Blob storage to another folder.
Hope this helps.
I'm fairly new to Azure, and just trying out Azure Data Lake Analytics.
I created a new Azure Data Lake Analytics account for testing purposes and would like to delete it now, however I used an existing Azure Data Lake Storage (ADLS) account as the default storage account during setup. I now know I probably should have added the existing ADLS as associated data store.
I assume I can safely delete the Azure Data Lake Analytics account now without affecting the underlying default storage account, but I want to check before I do this as it would be a massive problem if this the existing ADLS gets deleted.
Any pointers would be much appreciated. thanks
The two are separate. Deleting the Azure Data Lake Analytics service will not affect the Azure Data Lake Store.
As a disclaimer, test test test. Set up another instance of both in the same way and then confirm the delete behaviour, just to be 110% sure.
Azure Data Lake Team here. I can positively confirm that deleting the Azure Data Lake Analytics account will NOT delete the default or any linked Azure Data Lake Store account associated with it.
I am moving the data from a Azure Data Lake to another Azure Data Lake Store that belongs to another subscription (tenant) using DataFactory.
I am getting error on uploading the LinkedService of Sink Data Lake like invalid credentials,
So Is it really possible actually what I am doing ? If it is,Kindly let me see some reference.
You can do this. For each data lake you need a separate service principal, each in the same subscription as the data lake. You can then create the two separate connections to the data lakes using each service principal.
You need both the service principal application ID and a key for it.
Note that the service principal not only needs access rights to the folder that you want to copy from or too, but also to the root folder of data lake. I do not know if it also requires access rights to all directories between the root and the source/destination directory.
Note also that regardless of whether you are reading or writing data to this data lake, the service principal does need execute rights as well.
This isn't a complete answer, but just deals with the service principal side of the configuration to creating a connection. That is what gave me the most trouble.
Leave a comment if you feel more information would be useful
If you have not read already, please refer to https://learn.microsoft.com/en-us/azure/data-factory/data-factory-azure-datalake-connector , Search for "Service principal authentication (recommended)". If you have already, I am presuming the error is because you have not provided the appropriate permissions for the entity to the ADLS folders. Without the exact error you are seeing, cannot say it is the source or the sink. In short can you provide more details of your context first and then what error you are seeing?
Thanks,
Sachin Sheth
Program Manager,
Azure Data Lake.
I created a database with some tables through a U-SQL script run through the Azure Data Lake Tools for Visual Studio (see screenshot below). Is that database stored in the Data Lake Store?
The file structure as shown in the Azure portal
In addition to Amit's answer:
The data that is stored in the store is stored in the \catalog folder of your default ADLS account. It will be charged at the same rate as the remaining data.
The cost of the data that is stored in the internal metadata service is internalized into the ADLA COGS calculations.
Some of the artifacts related to databases are stored in the Azure Data Lake Store. However not all of the artifacts related to databases are stored in the associated ADLS account. More specifically some of the metadata associated with the databases are stored in a ADL service-managed internal location that is not directly accessible to you. What you will see in the ADLS account is the data associated with the tables and databases in an internal format. Hope this information is useful.
Thanks,
Amit