Automating Permissions for Databricks SQL Tables or Views - databricks

Trying to automate the setup of Databricks SQL.
I have done it from the UI and it works, so this is a natural next step.
The one thing I am unsure about is how to automate granting of the access to SQL tables and/or views using REST. I am trying to avoid a Notebooks job.
I have seen this microsoft documentation and downloaded the specification but when I opened it with Postman, I see permissions/objectType/Object id, but the only sample I have seen there is for "queries". It just seems to be applicable for Queries and Dashboards. Can't this be done for Tables and views? There is no further documentation that I could see.
So, basically how to do something like
grant select on tablename to group using REST api without using a Notebook job. I am interested to see if I can just call a REST endpoint from our release pipeline (Azure DevOps)

As of right now, there is no REST API for setting Table ACLs. But it's available as part of the Unity Catalog that is right now in the public preview.
If you can't use Unity Catalog yet, then you still have a possibility to automate assignment of Table ACLs by using databricks_sql_permissions resource of Databricks Terraform Provider - it sets permissions by executing SQL commands on a cluster, but this is hidden from administrator.

This is an extension to Alex Ott `s answer giving some details on what I tried to make the databricks_sql_permissions Resource work for Databricks SQL as was the OP's original question. All this assumes that one does not want/can use Unity Catalog which follows a different permission model and has a different Terraform resource, namely databricks_grants Resource.
Alex`s answer refers to table ACLs which had me surprised as the OP (and myself) were looking for Databricks SQL object security and not table ACLs in the classic workspace. But from what I understand so far, it seems the two are closely interlinked and the Terraform provider addresses table ACLs in the classic workspace (i.e. non-SQL) which are mirrored to SQL objects in the SQL workspace. It follows that if you like to steer SQL permissions in Databricks SQL via Terraform, you need to enable table ACLs in classic workspace (in admin console). If you (for whatever reason) cannot enable table ACLs, it seems to me the only other option is via sql scripts in the SQL workspace with the disadvantage of having to explicitly write out grants and revokes. Potentially an alternative is to throw away all permissions before one only runs grant statements but this has other negative implications.
So here is my approach:
Enable table ACL in classic workspace (this has no implications in classic workspace if you don`t use table ACL-enabled clusters afaik)
Use azurerm_databricks_workspace resource to register Databricks Azure infrastructure
Use databricks_sql_permissions Resource to manage table ACLs and thus SQL object security
Below is a minimal example that worked for me and may inspire others. It certainly does not follow Terraform config guidance but is merely used for minimal illustration.
NOTE: Due to a Terraform issue I had to ignore changes from attribute public_network_access_enabled, see GitHub issues: "azurerm_databricks_workspace" forces replacement on public_network_access_enabled while it never existed #15222
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=3.0.0"
}
databricks = {
source = "databricks/databricks"
version = "=1.4.0"
}
}
backend "azurerm" {
resource_group_name = "tfstate"
storage_account_name = "tfsa"
container_name = "tfstate"
key = "terraform.tfstate"
}
}
provider "azurerm" {
features {}
}
provider "databricks" {
azure_workspace_resource_id = "/subscriptions/mysubscriptionid/resourceGroups/myresourcegroup/providers/Microsoft.Databricks/workspaces/mydatabricksworkspace"
}
resource "azurerm_databricks_workspace" "adbtf" {
customer_managed_key_enabled = false
infrastructure_encryption_enabled = false
load_balancer_backend_address_pool_id = null
location = "westeurope"
managed_resource_group_name = "databricks-rg-myresourcegroup-abcdefg12345"
managed_services_cmk_key_vault_key_id = null
name = "mydatabricksworkspace"
network_security_group_rules_required = null
public_network_access_enabled = null
resource_group_name = "myresourcegroup"
sku = "premium"
custom_parameters {
machine_learning_workspace_id = null
nat_gateway_name = "nat-gateway"
no_public_ip = false
private_subnet_name = null
private_subnet_network_security_group_association_id = null
public_ip_name = "nat-gw-public-ip"
public_subnet_name = null
public_subnet_network_security_group_association_id = null
storage_account_name = "dbstorageabcde1234"
storage_account_sku_name = "Standard_GRS"
virtual_network_id = null
vnet_address_prefix = "10.139"
}
tags = {
creator = "me"
}
lifecycle {
ignore_changes = [
public_network_access_enabled
]
}
}
data "databricks_current_user" "me" {}
resource "databricks_sql_permissions" "database_test" {
database = "test"
privilege_assignments {
principal = "myuser#mydomain.com"
privileges = ["USAGE"]
}
}
resource "databricks_sql_permissions" "table_test_student" {
database = "test"
table = "student"
privilege_assignments {
principal = "myuser#mydomain.com"
privileges = ["SELECT", "MODIFY"]
}
}
output "adb_id" {
value = azurerm_databricks_workspace.adbtf.id
}
NOTE: Serge Smertin (Terraform Databricks maintainer) mentioned in GitHub issues: [DOC] databricks_sql_permissions Resource to be deprecated ? #1215 that the databricks_sql_permissions resource is deprecated but I could not find any indication about that in the docs, only a recommendation to use another resource when leveraging Unity Catalog which I'm not doing.

Related

Allocate AWS SSO Permission Set to Groups in Accounts

Working to fully code the aws sso set up
So far coded via Terraform I have all permission-sets and using scim to pull in groups.
Allocation of the permission sets to groups in accounts (I have over 100 accounts) is done by hand. I want to allocate permission sets to groups in selected accounts via IaC (Terraform) but I cant for the life of me find working code.
Ive tried using
aws_sso_permission_set_group_assignment,
aws_sso_permission_set_group_attachment,
aws_sso_group_permission_set_assignment,
aws_sso_group_permission_set_attachment,
aws_sso_permission_set_attachment,
aws_sso_permission_set_assignment,
These i found in some old docs but they dont work :( giving The provider hashicorp/aws does not support resource type
Does anyone have any advice they can offer of how to remedy this or how they managed to surmount this issue
Here is example of code tried
resource "aws_sso_group_permission_set_attachment" "example" {
group_id = "93sd433ee-cd43e4b-cfww-434e-re33-707a0987eb"
permission_set_id = "arn:aws:sso:::permissionSet/ssoins-63456a11we432d8/ps-1231ded3d42fcrr2"
account_id = "8765322052550"
}
resource "aws_sso_group_permission_set_attachment" "example" {
permission_set_arn = "arn:aws:sso:::permissionSet/ssoins-63456a11we432d8/ps-1231ded3d42fcrr2"
group_name = "93sd433ee-cd43e4b-cfww-434e-re33-707a0987eb"
account_id = "8765322052550"
}
ssoadmin_account_assignment resource is something which you might be looking for, please go through all the available attributes in the resource to match your needs.
resource "aws_ssoadmin_account_assignment" "example" {
instance_arn = tolist(data.aws_ssoadmin_instances.example.arns)[0]
permission_set_arn = "arn_of_the_permission_set" # replace this with actually permission set arn
principal_id = "group_id" # replace this with groupID
principal_type = "GROUP"
target_id = "012347678910" # replace with account ID
target_type = "AWS_ACCOUNT"
}

How to define the AWS Athena s3 output location using terraform when using aws_glue_catalog_database and aws_glue_catalog_table resources

Summary
The terraform config below creates aws_glue_catalog_database and aws_glue_catalog_table resources, but does not define an s3 bucket output location which is necessary to use these resources in the context of Athena. I can add the s3 output location manually through the AWS console, but need to do it programmatically using terraform.
Detail
Minimal example terraform config which creates an aws glue database and table:
resource "aws_glue_catalog_database" "GlueDB" {
name = "gluedb"
}
resource "aws_glue_catalog_table" "GlueTable" {
name = "gluetable"
database_name = aws_glue_catalog_database.gluedb.name
table_type = "EXTERNAL_TABLE"
parameters = {
EXTERNAL = "TRUE"
}
storage_descriptor {
location = var.GLUE_SOURCE_S3_LOCATION
input_format = "org.apache.hadoop.mapred.TextInputFormat"
output_format = "org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat"
ser_de_info {
name = "jsonserde"
serialization_library = "org.openx.data.jsonserde.JsonSerDe"
parameters = {
"serialization.format" = "1"
}
}
columns {
name = "messageId"
type = "string"
comment = ""
}
}
}
The aim is to be able to access the table either via the Athena query editor (AWS console), or using the python boto3 library (boto3.client('athena')).
However, before Athena access works in either case I need to define an output location for the query. This is easy to do in the AWS console (Amazon Athena -> Query Editor -> Manage settings -> Location of query result), but I need to do this via terraform so that the entire aws infrastructure stack can be setup in one go.
There is a terraform resource called aws_athena_workgroup which has an output_location property, but it is unclear how a separate aws_athena_workgroup resource would be related to the aws_glue_catalog_database already defined (there doesn't seem to be any way to link these two resources).
This answer suggests importing the existing primary workgroup into terraform and modifying it. But what I need is a terraform implementation which sets everything up from scratch in one go.
Any suggestions on how to wire up an s3 output-location in terraform so the above glue resources can be used in the context of Athena would be greatly appreciated!
AWS Glue and Athena are two independent services. Glue doesn't have to know the Athena query output location configuration at all. It just stores the query results run in Athena.
You can simply create a new resources for aws_athena_workgroup next to Glue resources and define the result configuration bucket.
resource "aws_athena_workgroup" "example" {
name = "example"
configuration {
enforce_workgroup_configuration = true
publish_cloudwatch_metrics_enabled = true
result_configuration {
output_location = "s3://${aws_s3_bucket.example.bucket}/output/"
encryption_configuration {
encryption_option = "SSE_KMS"
kms_key_arn = aws_kms_key.example.arn
}
}
}
}

Unable to get machine type information for machine type n1-standard-2 in zone us-central-c because of insufficient permissions - Google Cloud Dataflow

I am not sure what I am missing but somehow I am not able to start the job and gets failed with insufficient permission:
Here is terraform code I run:
resource "google_dataflow_job" "poc-pubsub-stream" {
project = local.project_id
region = local.region
zone = local.zone
name = "poc-pubsub-to-cloud-storage"
template_gcs_path = "gs://dataflow-templates-us-central1/latest/Cloud_PubSub_to_GCS_Text"
temp_gcs_location = "gs://${module.poc-bucket.bucket.name}/tmp"
enable_streaming_engine = true
on_delete = "cancel"
service_account_email = google_service_account.poc-stream-sa.email
parameters = {
inputTopic = google_pubsub_topic.poc-topic.id
outputDirectory = "gs://${module.poc-bucket.bucket.name}/"
outputFilenamePrefix = "poc-"
outputFilenameSuffix = ".txt"
}
labels = {
pipeline = "poc-stream"
}
depends_on = [
module.poc-bucket,
google_pubsub_topic.poc-topic,
]
}
My SA permission that is used in the terraform code:
Any thoughts what I am missing?
The error describes being unable to get the machine type information because of insufficient permissions. To access the machine type information, add the roles/compute.viewer role to your service account.
The roles/compute.viewer role, to access machine type information and view other settings.
Refer to this doc for more information about the required permissions to create a Dataflow job.
It seems my DataFlow job needed to provide these following options. Per docs it was optional but in my case that needed to be defined.
...
network = data.terraform_remote_state.dev.outputs.network.network_name
subnetwork = data.terraform_remote_state.dev.outputs.network.subnets["us-east4/us-east4-dev"].self_link
...

Vault - Assign multiple aliases for one identity group

I've been trying to assign multiple group aliases, meaning, multiple AD groups in our company, into one identity group. So far we've had an identity group for each alias, and we realized that doesn't make sense, as they all carry the same policies.
We are using Terraform in order to maintain and provision our infrastructure.
This is my expected form:
resource "vault_identity_group" "saas-mfi" {
metadata = {
productname = "mfi"
}
name = "saas-mfi"
policies = [
"eaas-key",
"secret-store-mfi"
]
type = "external"
}
resource "vault_identity_group_alias" "alias_1" {
canonical_id = vault_identity_group.saas-mfi.id
mount_accessor = var.org_local_mount_accessor
name = "alias_1"
}
resource "vault_identity_group_alias" "alias_2" {
canonical_id = vault_identity_group.saas-mfi.id
mount_accessor = var.org_local_mount_accessor
name = "alias_2"
}
resource "vault_identity_group_alias" "alias_3" {
canonical_id = vault_identity_group.saas-mfi.id
mount_accessor = var.org_local_mount_accessor
name = "alias_3"
}
When I try to apply this configuration, I get the following error:
Error: Provider produced inconsistent result after apply
Of course, the issue does not stand with the provider. But it seems like one identity group can't have more than one alias to itself. Which is weird, as in the UI, there is a tab for identity groups called "Aliases", in plural.
If anybody has any information regarding this matter, I would really appreciate that.
I was trying to do the same thing but just came across the following paragraph in the documentation for identity:
External group serves as a mapping to a group that is outside of the identity store. External groups can have one (and only one) alias. This alias should map to a notion of group that is outside of the identity store.
From the section on External vs Internal Groups.

Terraform provider for Mongodb Atlas: where is my db?

I have successfully created a project, user and cluster via the Mongodb terraform provider, however I am expecting to see a database already created under my new cluster, which is not to be found. I am not sure what it is missing or incorrect and I could not find any example/info in the documentation that diverges from what I implemented myself. Here are the relevant info from my main.tf file:
# Create a db user
resource "mongodbatlas_database_user" "mongodb_user" {
username = "${var.database_username}"
password = "${random_string.master_password.result}"
project_id = "${mongodbatlas_project.mongodb.id}"
database_name = "admin"
roles {
role_name = "readWrite"
database_name = "admin"
}
}
group
resource "mongodbatlas_project" "mongodb"{
org_id = "${var.mongodb_atlas_org_id}"
name = "${var.project_name}-${var.stage}"
}
cluster
# Create a cluster
resource "mongodbatlas_cluster" "mongodb-cluster" {
project_id = "${mongodbatlas_project.mongodb.id}"
name = "${var.cluster_name}-${var.stage}"
num_shards = 1
replication_factor = 3
backup_enabled = true
auto_scaling_disk_gb_enabled = true
mongo_db_major_version = "4.0"
//Provider Settings "block"
provider_name = "AWS"
disk_size_gb = 100
provider_disk_iops = 300
provider_encrypt_ebs_volume = false
provider_instance_size_name = "M40"
provider_region_name = "us-east-1"
}
Any help/advice is greatly appreciated.
Thank you
The database creation is a CRUD operation, and the MongoDB Atlas API does not supports CRUD operation.
Also, Terraform is used to deploy your infrastructure and not the data inside it. You can create your own REST API which connects to the cluster created by Terraform, uses the user created by Terraform to connect, and then perform any CRUD operation you want.
Hope this answers your question.
Database and collection creation in MongoDb Atlas is a developer jobs.
I suggest you define the User's permissions on Terraform (mongodbatlas_custom_db_role).
So you can to restrict the acces, database name and collection name. it's a good approach. 😉
https://registry.terraform.io/providers/mongodb/mongodbatlas/latest/docs/resources/custom_db_role

Resources