I want to automate ACLS creation for the topics in my kafka cluster
I need a solution in a way such that if we want to add a new ACL we just need to update in a file so that it automatically picks it up and create the ACL for the respective user with respective access.
Related
Is there any way to view Acl change in logs for Azure DatalakeGen2? The diagnostic settings for the resource is already on. But I am not able to see any entries for changes which are done by me or anyone else.
Example-> I remove/add some acls on particular path in AzureGen2 storage accounts. Is there any way where I can keep track of such activities which are triggered?
You can check the ACL related logs in Activity Logs in Azure Data Lake Storage account.
Refer official document for more details: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-security-overview#activity-and-diagnostic-logs
I am new to azure and am trying to understand the below things. It would be helpful if anyone can share their knowledge on this.
Can the table be created in Cluster A be accessed in Cluster B if Cluster A is down?
What is the connection between the cluster and the data in the tables?
You need to have running process (cluster) to be able to access metastore, and read data, because data is stored in the customer's location, not directly accessible from the control plane that runs UI.
When you wrote data into table, then this data should be available in other cluster in following conditions:
the both clusters are using the same metastore
user has correct permissions (could be enforced via Table ACLs)
First some background:
I want to facilitate access to the different groups of data scientists in Azure Data Lake gen 2. However, we don’t want provide access to them to the entire data lake because they are not supposed to see all the data for security reasons. They must be able to see only some limited files/folders. We are doing that by adding the data scientists’ AAD groups to the ACL of the data lake folders. You can refer to the following links to get more insights and to know what I am talking about:
https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control
Now the problem:
Since the data scientists are granted access to a very specific/limited area, they are able to access/browse those folders/files using Azure databricks (python commands/code etc.). However, they are not able to browse using Azure Storage Explorer.
So is there some way so that they can browse the datalake using Azure storage explorer or some other GUI tool.
Or is it possible to create some custom role for such a scenario and grant that role to the data scientists AAD groups so that they may just have access to the specific area (i.e. a custom role that may be created that would only have “execute” access on the ADLS gen 2 file-systems.)
As far as I knew, we have no way to use RABC role to control access on some folders in the file system(container). Because when we assign role to ADD group, we need to define a scope. The smallest scope in Azure data lake gen2 is file system(container). If you just want to control access on it, you do not need to create custom role and you can directly use the build-in role Storage Blob Data Reader. If one user has the role, he can read all files in the file system. For more details, please refer to the document
It is not possible to access data via Storage Explorer only with ACL permissions assigned. Unfortunately, you need to use ACLs in combination with RBAC role assigned on the Storage Account level (e.g. Reader), to be able to see Storage Account itself from the Storage Explorer. Then you can introduce granular permissions using ACL on specific containers/folders/files, however with Reader still they will be able to see the names of all the containers in the Storage Account (but cannot see the containers content until specified via ACL or Data RBAC assignment on container level).
As you noticed, the only option to access specific folder/file using only ACL permissions is via code e.g. Powershell or Python.
I have an Azure Data Lake Store (ADLS) containing ~100k files that I need to access from an HDInsight cluster for analysis. When I provision the cluster via Azure Portal, I use this ADLS for the cluster's storage and assign rwx privileges for all files on the ADLS using a service principal + the "Data Lake Store Access" feature. This feature appears to grant access to each file one at a time, at a rate of about 2k per minute: it takes over an hour just to grant the permissions!
Is there a faster way to grant a new cluster rwx privileges on its associated ADLS?
Yes there is a better way to get this all set up. You need to, on a one-time basis, add permissions for an Azure Active Directory group to all your files and folders. Once that is set up, then whenever you create a new HDInsight cluster, the service principal simply needs to be made a member of the group.
So to summarize:
Create a new Azure Active Directory Group
Propagate permissions in your ADLS account to this group on the appropriate files and folders
Create your HDInsight cluster. Choose the right service principal
when creating it.
Add the service principal to the group created in
step 1
Hope this helps and do let me know if you have questions.
I was trying to create a new HDInsight cluster and wanted to connect to already created Azure Data Lake Store account(ADLS). I have selected HDI V3.5 as the cluster type for HDInsight. I was able to select my Data Lake as my storage, but when I created SPI account and when I tried to provide ADLS access to that account I don't see my ADLS root folder. Do I need to provide any additional permissions for my ADLS in order to appear in mange ADLS access blade? Any help would be appreciated.
You need to create a root folder in the ADLS account before creating the HDI cluster so that the root path is defined while creating the cluster.
For eg., create a root folder /clusters and the root path in your cluster would be /clusters/yourclustername. And also make sure you give access to the root folder in "Manage Access" blade while creating a HDI cluster.
Refer this for more info:
https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-hdinsight-hadoop-use-portal
ADLA file system has separate UNIX like access control called ACL (Access control list). You(login email id) should have ACL access at root folder to access the same in HDInsight cluster.