Is it possible to assign permissions automatically to groups for newly created components in Databricks? - databricks

I have assigned some permissions to databricks groups for existing Azure databricks components i.e. cluster & Jobs. Is it any way that we automatically assign permissions to newly created cluster & Jobs? I notice online documentation that only for existing components we can able to assign permissions. Imagine if we add new job and anyone can able to access it and run it! I understand, we have cluster policy to restrict number of workers or specific runtime to use but the permissions to groups i.e. CAN_MANAGE or CAN_MANAGE_RUN. I expect these permissions should automatically setup once new components created.
Sorry if it is a stupid question. Do we have any way to do it?

I tried to reproduce from my end.
Code:
resource "azurerm_databricks_workspace" "example" {
name = "databricks-test"
resource_group_name = data.azurerm_resource_group.example.name
location = data.azurerm_resource_group.example.location
sku = "standard"
tags = {
Environment = "Production"
}
}
resource "databricks_group" "my_group" {
display_name = "SomekavyaGroup"
allow_cluster_create = true
allow_instance_pool_create = true
}
resource "databricks_user_role" "my_user_account_admin" {
user_id = databricks_group.my_group.id
role = "account_admin"
}
Through terraform automation, we need to define groups and assign and define roles particularly or loop through defined groups and then run so that groups and roles and assigned simultaneously .But Its possible only , when we define the logic for the incoming users
But according to documentation role could be a pre-defined role such as account admin, or an instance profile ARN.
In other cases check servicePrincipal resource type - Microsoft Graph v1.0 | Microsoft Learn to microsoft graph to assign roles.

Related

How to create Azure Databricks Notebook via Terraform?

So I am completely new to the terraform and I found that by using this in terraform main.tf I can create Azure Databricks infrastructure:
resource "azurerm_databricks_workspace" "bdcc" {
depends_on = [
azurerm_resource_group.bdcc
]
name = "dbw-${var.ENV}-${var.LOCATION}"
resource_group_name = azurerm_resource_group.bdcc.name
location = azurerm_resource_group.bdcc.location
sku = "standard"
tags = {
region = var.BDCC_REGION
env = var.ENV
}
}
And I also found here
That by using this I can even create particular notebook in this Azure DataBricks infrastructure:
resource "databricks_notebook" "notebook" {
content_base64 = base64encode(<<-EOT
# created from ${abspath(path.module)}
display(spark.range(10))
EOT
)
path = "/Shared/Demo"
language = "PYTHON"
}
But since I am new to this, I am not sure in what order I should put those pieces of code together.
It would be nice if someone could point me to the full example of how to create notebook via terraform on Azure Databricks.
Thank you beforehand!
In general you can put these objects in any order - it's a job of the Terraform to detect dependencies between the objects and create/update them in the correct order. For example, you don't need to have depends_on in the azurerm_databricks_workspace resource, because Terraform will find that it needs resource group before workspace could be created, so workspace creation will follow the creation of the resource group. And Terraform is trying to make the changes in the parallel if it's possible.
But because of this, it's becoming slightly more complex when you have workspace resource together with workspace objects, like, notebooks, clusters, etc. As there is no explicit dependency, Terraform will try create notebook in parallel with creation of workspace, and it will fail because workspace doesn't exist - usually you will get a message about authentication error.
The solution for that would be to have explicit dependency between notebook & workspace, plus you need to configure authentication of Databricks provider to point to newly created workspace (there are differences between user & service principal authentication - you can find more information in the docs). At the end your code would look like this:
resource "azurerm_databricks_workspace" "bdcc" {
name = "dbw-${var.ENV}-${var.LOCATION}"
resource_group_name = azurerm_resource_group.bdcc.name
location = azurerm_resource_group.bdcc.location
sku = "standard"
tags = {
region = var.BDCC_REGION
env = var.ENV
}
}
provider "databricks" {
host = azurerm_databricks_workspace.bdcc.workspace_url
}
resource "databricks_notebook" "notebook" {
depends_on = [azurerm_databricks_workspace.bdcc]
...
}
Unfortunately, there is no way to put depends_on on the provider level, so you will need to put it into every Databricks resource that is created together with workspace. Usually the best practice is to have a separate module for workspace creation & separate module for objects inside Databricks workspace.
P.S. I would recommend to read some book or documentation on Terraform. For example, Terraform: Up & Running is very good intro

Azure Data Lake storage Gen2 permissions

I am currently building a data lake (Gen2) in Azure. I use Terraform to provision all the resources. However, I ran into some permission inconsistencies. According to the documentation, one can set permissions for the data lake with RBAC and ACLs.
My choice is to use ACLs since it allows for fine-grained permissions on directories within the data lake. In the data lake, I created a directory raw among other directories for which a certain group has r-- (read only) default permissions. The default means that all the objects under this directory are assigned the same permissions as the permissions on the directory. When users in that group are trying to access the data lake with Storage Explorer, they do not see a storage account and they do not see the actual filesystem/container in which the directory lives. So they are not able to access the directory for which they have read-only permissions.
So I was thinking of assigning the permissions needed to at least list storage accounts and filesystems (containers). Evaluating existing roles, I came to the following permissions:
Microsoft.Storage/storageAccounts/listKeys/action
Microsoft.Storage/storageAccounts/read
After applying permission 1, nothing changed. After applying permission 2 as well, users in the group could suddenly do everything in the data lake as if there was no ACL specified.
My question now is: how can I use ACLs (and RBAC) to create a data lake with directories with different permissions for different groups, so that groups are actually able to only read or write to those directories that are in the ACLs? In addition, they should be able to list storage accounts and filesystems (containers) for which they have access to certain directories.
I believe you also need to create access ACLs on the entire hierarchy of folders down to the file or folder you are trying to read, including the root container.
So if your folder "raw" was created in the top level then you'll need to create the following ACLs for that group...
"/" --x (access)
"/raw" r-x (access)
"/raw" r-x (default)
... and the default ACL will then give the group the read and execute ACL on all sub folders and files created.
You also need to give the group at least Reader RBAC permission on the resource - this can either be on the storage account, on just on the container if you want to restrict access to other containers.
You can set the ACLs on container with the ace property of the azurerm_storage_data_lake_gen2_filesystem Terraform resource and then set the ACLs on the folders using the azurerm_storage_data_lake_gen2_path Terraform resource.
Here's an example where I'm storing the object_id of the Azure Active Directory in a variable named aad_group_object_id.
# create the data lake
resource "azurerm_storage_account" "data_lake" {
....
}
# create a container named "acltest" with execute ACL for the group
resource "azurerm_storage_data_lake_gen2_filesystem" "data_lake_acl_test" {
name = "acltest"
storage_account_id = azurerm_storage_account.data_lake.id
ace {
type = "group"
scope = "access"
id = var.aad_group_object_id
permissions = "--x"
}
}
# create the folder "raw" and give read and execute access and default permissions to group
resource "azurerm_storage_data_lake_gen2_path" "folder_raw" {
path = "raw"
filesystem_name = azurerm_storage_data_lake_gen2_filesystem.data_lake_acl_test.name
storage_account_id = azurerm_storage_account.data_lake.id
resource = "directory"
ace {
type = "group"
scope = "access"
id = var.aad_group_object_id
permissions = "r-x"
}
ace {
type = "group"
scope = "default"
id = var.aad_group_object_id
permissions = "r-x"
}
}
I've not included it in the code example, but you'll also have to add the ACLs for the owning group, owner, mask and other identities that get added to the root container and sub folders. Otherwise you'll keep seeing in your Terraform plan that it tries to drop and recreate them each time.
You can just added this - unfortunately you need to add it to every folder you create, unless anyone knows a way around this.
ace {
permissions = "---"
scope = "access"
type = "other"
}
ace {
permissions = "r-x"
scope = "access"
type = "group"
}
ace {
permissions = "r-x"
scope = "access"
type = "mask"
}
ace {
permissions = "rwx"
scope = "access"
type = "user"
}

How to reload the terraform provider at runtime to use the different AWS profile

How to reload the terraform provider at runtime to use the different AWS profile.
Create a new user
resource "aws_iam_user" "user_lake_admin" {
name = var.lake_admin_user_name
path = "/"
tags = {
tag-key = "data-test"
}
}
provider "aws" {
access_key = aws_iam_access_key.user_lake_admin_AK_SK.id
secret_key = aws_iam_access_key.user_lake_admin_AK_SK.secret
region = "us-west-2"
alias = "lake-admin-profile"
}
this lake_admin user is created in the same file.
trying to use
provider "aws" {
access_key = aws_iam_access_key.user_lake_admin_AK_SK.id
secret_key = aws_iam_access_key.user_lake_admin_AK_SK.secret
region = "us-west-2"
alias = "lake-admin-profile"
}
resource "aws_glue_catalog_database" "myDB" {
name = "my-db"
provider = aws.lake-admin-profile
}
As I know terraform providers are executed first in all terraform files.
But is there any way we can reload the configurations of providers in the mid of terraform execution?
You can't do this directly.
You can apply the creation of the user in one root module and state and use its credentials in a provider for the second.
For the purposes of deploying infrastructure, you are likely better off with IAM Roles and assume role providers to handle this kind of situation.
Generally, you don't need to create infrastructure with a specific user. There's rarely an advantage to doing that. I can't think of a case where the principal creating infrastructure has any implied specific special access to the created infrastructure.
You can use a deployment IAM Role or IAM User to deploy everything in the account and then assign resource based and IAM policy to do the restrictions in the deployment.

terraform retrieve azurerm_recovery_services_protection_policy backup_policy_id

I'm building a terraform template to enable Azure recovery service vault for a VM. The recovery service vault is existing as well as the backup policy. I need a Data Source to retrieve the backup policy id, which is required by the resource "azurerm_recovery_services_protected_vm"
I can find data source of "azurerm_recovery_services_vault", but cannot find a data source for for recovery service policy. Then, to achieve this objective, I have to put a resource like
resource "azurerm_recovery_services_protection_policy_vm" "test"{
name = "DefaultPolicy"
resource_group_name = "${var.recovery_vault_resource_group_name}"
recovery_vault_name = "${var.recovery_vault_name}"
backup = {
frequency = "Daily"
time = "09:30"
}
retention_daily = {
count = 10
}
}
The challenge is that the DefaultPolicy can vary from vault to vault. And I don't want to change it or make it the same across my whole tenant.
Is there any way I can retrieve the recovery policy ID without creating one?
According to this provider reference data resource only returns recovery services vault: id, location, sku and tags.
So there currently is no way of doing that in terraform.

Adding tags created to child resources created by terraform

Terraform v0.11.9
+ provider.aws v1.41.0
I want to know if there is a way to update a resource that is not directly created in the plan but by a resource in the plan. The example is creating a managed Active Directory by using aws_directory_service_directory This process creates a security group and I want to add tags to the security group. Here is the snippet I'm using to create the resource
resource "aws_directory_service_directory" "NewDS" {
name = "${local.DSFQDN}"
password = "${var.ADPassword}"
size = "Large"
type = "MicrosoftAD"
short_name = "${local.DSShortName}"
vpc_settings {
vpc_id = "${aws_vpc.this.id}"
subnet_ids = ["${aws_subnet.private.0.id}",
"${aws_subnet.private.1.id}",
]
}
tags = "${merge(var.tags, var.ds_tags, map("Name", format("%s", local.VPCname)))}"
}
I can reference the newly created security group using
"${aws_directory_service_directory.NewDS.security_group_id}"
I can't use that to update the resource. I want to add all of the tags I have on the directory to the security, as well as updating the Name tag. I've tried using a local-exec provisioner, but the results have not been consistent and getting the map of tags to the command without hard coding it has not worked.
Thanks
I moved the local provider out of the directory service resource and into a dummy resource.
resource "null_resource" "ManagedADTags"
{
provisioner "local-exec"
{
command = "aws --profile ${var.profile} --region ${var.region} ec2 create-tags --
resources ${aws_directory_service_directory.NewDS.security_group_id} --tags
Key=Name,Value=${format("${local.security_group_prefix}-%s","ManagedAD")}"
}
}
(The command = is a single line)
Using the format command allowed me to send the entire list of tags to the resource. Terraform doesn't "manage" it, but it does allow me to update it as part of the plan.
You can then leverage the aws_ec2_tag resource, which works on non-ec2 resources as well, on conjunction with the provider attribute ignore_tags. Please refer to another answer I made on the topic for more detail.
aws already exposes api for that where you can tag resources not just a resource. not sure why terraform is not implementing that
Just hit this as well. Turns out the tags propagate from the directory service. So if you tag your directory appropriately, the name tag from your directory service will be applied to the security group.

Resources