terraform: database does not exist - terraform

I am fairly new to Terraform and trying to create database and schema using terraform in Snowflake.
To start with I have written a script to create a Database and Schema.
However when I run the below code. On the first run It creates the Database and then throw error on schema -- Database SNOWACC does not exist.
and when I run again it creates the schema successfully.
Seems like terraform is not getting the latest DB status in same script.
terraform {
required_providers {
snowflake = {
source = "chanzuckerberg/snowflake"
version = "0.25.36"
}
}
}
provider "snowflake" {
alias = "sys_admin"
role = "DBA"
}
resource "snowflake_database" "db" {
provider = snowflake.sys_admin
name = "SNOWACC"
comment = "Database created for snowflake monitoring through terraform"
}
resource "snowflake_schema" "schema" {
provider = snowflake.sys_admin
database = "SNOWACC"
name = "MONITOR"
comment = "Schema for snowflake monitoring created by TF"
}
I have a feeling, I will have to run DB and Schema script separately and then all object (table, view, task etc) seperately.
I am hoping If there is better way.

Related

How to define the AWS Athena s3 output location using terraform when using aws_glue_catalog_database and aws_glue_catalog_table resources

Summary
The terraform config below creates aws_glue_catalog_database and aws_glue_catalog_table resources, but does not define an s3 bucket output location which is necessary to use these resources in the context of Athena. I can add the s3 output location manually through the AWS console, but need to do it programmatically using terraform.
Detail
Minimal example terraform config which creates an aws glue database and table:
resource "aws_glue_catalog_database" "GlueDB" {
name = "gluedb"
}
resource "aws_glue_catalog_table" "GlueTable" {
name = "gluetable"
database_name = aws_glue_catalog_database.gluedb.name
table_type = "EXTERNAL_TABLE"
parameters = {
EXTERNAL = "TRUE"
}
storage_descriptor {
location = var.GLUE_SOURCE_S3_LOCATION
input_format = "org.apache.hadoop.mapred.TextInputFormat"
output_format = "org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat"
ser_de_info {
name = "jsonserde"
serialization_library = "org.openx.data.jsonserde.JsonSerDe"
parameters = {
"serialization.format" = "1"
}
}
columns {
name = "messageId"
type = "string"
comment = ""
}
}
}
The aim is to be able to access the table either via the Athena query editor (AWS console), or using the python boto3 library (boto3.client('athena')).
However, before Athena access works in either case I need to define an output location for the query. This is easy to do in the AWS console (Amazon Athena -> Query Editor -> Manage settings -> Location of query result), but I need to do this via terraform so that the entire aws infrastructure stack can be setup in one go.
There is a terraform resource called aws_athena_workgroup which has an output_location property, but it is unclear how a separate aws_athena_workgroup resource would be related to the aws_glue_catalog_database already defined (there doesn't seem to be any way to link these two resources).
This answer suggests importing the existing primary workgroup into terraform and modifying it. But what I need is a terraform implementation which sets everything up from scratch in one go.
Any suggestions on how to wire up an s3 output-location in terraform so the above glue resources can be used in the context of Athena would be greatly appreciated!
AWS Glue and Athena are two independent services. Glue doesn't have to know the Athena query output location configuration at all. It just stores the query results run in Athena.
You can simply create a new resources for aws_athena_workgroup next to Glue resources and define the result configuration bucket.
resource "aws_athena_workgroup" "example" {
name = "example"
configuration {
enforce_workgroup_configuration = true
publish_cloudwatch_metrics_enabled = true
result_configuration {
output_location = "s3://${aws_s3_bucket.example.bucket}/output/"
encryption_configuration {
encryption_option = "SSE_KMS"
kms_key_arn = aws_kms_key.example.arn
}
}
}
}

Automating Permissions for Databricks SQL Tables or Views

Trying to automate the setup of Databricks SQL.
I have done it from the UI and it works, so this is a natural next step.
The one thing I am unsure about is how to automate granting of the access to SQL tables and/or views using REST. I am trying to avoid a Notebooks job.
I have seen this microsoft documentation and downloaded the specification but when I opened it with Postman, I see permissions/objectType/Object id, but the only sample I have seen there is for "queries". It just seems to be applicable for Queries and Dashboards. Can't this be done for Tables and views? There is no further documentation that I could see.
So, basically how to do something like
grant select on tablename to group using REST api without using a Notebook job. I am interested to see if I can just call a REST endpoint from our release pipeline (Azure DevOps)
As of right now, there is no REST API for setting Table ACLs. But it's available as part of the Unity Catalog that is right now in the public preview.
If you can't use Unity Catalog yet, then you still have a possibility to automate assignment of Table ACLs by using databricks_sql_permissions resource of Databricks Terraform Provider - it sets permissions by executing SQL commands on a cluster, but this is hidden from administrator.
This is an extension to Alex Ott `s answer giving some details on what I tried to make the databricks_sql_permissions Resource work for Databricks SQL as was the OP's original question. All this assumes that one does not want/can use Unity Catalog which follows a different permission model and has a different Terraform resource, namely databricks_grants Resource.
Alex`s answer refers to table ACLs which had me surprised as the OP (and myself) were looking for Databricks SQL object security and not table ACLs in the classic workspace. But from what I understand so far, it seems the two are closely interlinked and the Terraform provider addresses table ACLs in the classic workspace (i.e. non-SQL) which are mirrored to SQL objects in the SQL workspace. It follows that if you like to steer SQL permissions in Databricks SQL via Terraform, you need to enable table ACLs in classic workspace (in admin console). If you (for whatever reason) cannot enable table ACLs, it seems to me the only other option is via sql scripts in the SQL workspace with the disadvantage of having to explicitly write out grants and revokes. Potentially an alternative is to throw away all permissions before one only runs grant statements but this has other negative implications.
So here is my approach:
Enable table ACL in classic workspace (this has no implications in classic workspace if you don`t use table ACL-enabled clusters afaik)
Use azurerm_databricks_workspace resource to register Databricks Azure infrastructure
Use databricks_sql_permissions Resource to manage table ACLs and thus SQL object security
Below is a minimal example that worked for me and may inspire others. It certainly does not follow Terraform config guidance but is merely used for minimal illustration.
NOTE: Due to a Terraform issue I had to ignore changes from attribute public_network_access_enabled, see GitHub issues: "azurerm_databricks_workspace" forces replacement on public_network_access_enabled while it never existed #15222
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=3.0.0"
}
databricks = {
source = "databricks/databricks"
version = "=1.4.0"
}
}
backend "azurerm" {
resource_group_name = "tfstate"
storage_account_name = "tfsa"
container_name = "tfstate"
key = "terraform.tfstate"
}
}
provider "azurerm" {
features {}
}
provider "databricks" {
azure_workspace_resource_id = "/subscriptions/mysubscriptionid/resourceGroups/myresourcegroup/providers/Microsoft.Databricks/workspaces/mydatabricksworkspace"
}
resource "azurerm_databricks_workspace" "adbtf" {
customer_managed_key_enabled = false
infrastructure_encryption_enabled = false
load_balancer_backend_address_pool_id = null
location = "westeurope"
managed_resource_group_name = "databricks-rg-myresourcegroup-abcdefg12345"
managed_services_cmk_key_vault_key_id = null
name = "mydatabricksworkspace"
network_security_group_rules_required = null
public_network_access_enabled = null
resource_group_name = "myresourcegroup"
sku = "premium"
custom_parameters {
machine_learning_workspace_id = null
nat_gateway_name = "nat-gateway"
no_public_ip = false
private_subnet_name = null
private_subnet_network_security_group_association_id = null
public_ip_name = "nat-gw-public-ip"
public_subnet_name = null
public_subnet_network_security_group_association_id = null
storage_account_name = "dbstorageabcde1234"
storage_account_sku_name = "Standard_GRS"
virtual_network_id = null
vnet_address_prefix = "10.139"
}
tags = {
creator = "me"
}
lifecycle {
ignore_changes = [
public_network_access_enabled
]
}
}
data "databricks_current_user" "me" {}
resource "databricks_sql_permissions" "database_test" {
database = "test"
privilege_assignments {
principal = "myuser#mydomain.com"
privileges = ["USAGE"]
}
}
resource "databricks_sql_permissions" "table_test_student" {
database = "test"
table = "student"
privilege_assignments {
principal = "myuser#mydomain.com"
privileges = ["SELECT", "MODIFY"]
}
}
output "adb_id" {
value = azurerm_databricks_workspace.adbtf.id
}
NOTE: Serge Smertin (Terraform Databricks maintainer) mentioned in GitHub issues: [DOC] databricks_sql_permissions Resource to be deprecated ? #1215 that the databricks_sql_permissions resource is deprecated but I could not find any indication about that in the docs, only a recommendation to use another resource when leveraging Unity Catalog which I'm not doing.

Create new resources on muti-accounts on AWS?

I am building a small tool using Terraform to generate sandbox on AWS. I will be the owner of every new sandbox and each user will be added as an IAM user with appropriate rights.
The input file looks like this: users.auto.tfvars
sandboxs_manager = "pierre-alexandre.mousset"
dev_team_members = [
{ name = "brian.davids", is_enabled = true }, { name = "tom.hanks", is_enabled = true }]
Which is going to create 2 different AWS accounts:
pierre-alexandre.mousset+brian.davids#company.com
pierre-alexandre.mousset+tom.hanks#company.com
What I am now trying to achieve, is to create a simple S3 bucket on every new aws_organizations_account generated by Terraform.
This is how I am generating AWS accounts on my Terraform:
resource "aws_organizations_account" "this" {
for_each = local.all_user_names
name = "Dev Sandbox ${each.value}"
email = "${var.manager}+sbx_${each.value}#company.com"
role_name = "Administrator"
parent_id = var.sandbox_organizational_unit_id
}
Is there a way to loop over every id generated by aws_organizations_account to create a S3 bucket on each of those newly created account?
Based on this Github issue, I would need to use muti-provider which is not yet supported by Terraform and would probably look like this before to generate my s3 buckets:
provider "aws" {
for_each = local.aws_accounts
alias = each.value.aws_account_id
assume_role {
role_arn = "arn:aws:iam::${aws_organizations_account.this.id}:role/TerraformAccessRole"
}
}
Is there any way to deal with this?

Terraform: Unable to fetch connection strings when using 'data' resource for existing cluster

I'm new to terraform, and trying to setup mongoatlas privatelink for an existing cluster. The issue here is that, terraform is able to setup the privatelink, but when I try to fetch the privatelink string, then it gives the following error
data.mongodbatlas_cluster.cluster-atlas.connection_strings[0].aws_private_link is empty map of string
I'm using data resource of terraform to fetch the details of the cluster. Though I have read on some forums that data resource are read before creating resources so for that I'm using depends_on on the data resource but still facing the issue here.
My cluster code:
data "mongodbatlas_cluster" "cluster-atlas" {
project_id = var.atlasprojectid
name = "mongoatlas-cluster-xxxxxxx"
depends_on = [time_sleep.wait_300_seconds]
}
output "atlasclusterstring" {
value = data.mongodbatlas_cluster.cluster-atlas.connection_strings
}
output "plstring" {
value = lookup(data.mongodbatlas_cluster.cluster-
atlas.connection_strings[0].aws_private_link, aws_vpc_endpoint.ptfe_service.id)
}

Terraform provider for Mongodb Atlas: where is my db?

I have successfully created a project, user and cluster via the Mongodb terraform provider, however I am expecting to see a database already created under my new cluster, which is not to be found. I am not sure what it is missing or incorrect and I could not find any example/info in the documentation that diverges from what I implemented myself. Here are the relevant info from my main.tf file:
# Create a db user
resource "mongodbatlas_database_user" "mongodb_user" {
username = "${var.database_username}"
password = "${random_string.master_password.result}"
project_id = "${mongodbatlas_project.mongodb.id}"
database_name = "admin"
roles {
role_name = "readWrite"
database_name = "admin"
}
}
group
resource "mongodbatlas_project" "mongodb"{
org_id = "${var.mongodb_atlas_org_id}"
name = "${var.project_name}-${var.stage}"
}
cluster
# Create a cluster
resource "mongodbatlas_cluster" "mongodb-cluster" {
project_id = "${mongodbatlas_project.mongodb.id}"
name = "${var.cluster_name}-${var.stage}"
num_shards = 1
replication_factor = 3
backup_enabled = true
auto_scaling_disk_gb_enabled = true
mongo_db_major_version = "4.0"
//Provider Settings "block"
provider_name = "AWS"
disk_size_gb = 100
provider_disk_iops = 300
provider_encrypt_ebs_volume = false
provider_instance_size_name = "M40"
provider_region_name = "us-east-1"
}
Any help/advice is greatly appreciated.
Thank you
The database creation is a CRUD operation, and the MongoDB Atlas API does not supports CRUD operation.
Also, Terraform is used to deploy your infrastructure and not the data inside it. You can create your own REST API which connects to the cluster created by Terraform, uses the user created by Terraform to connect, and then perform any CRUD operation you want.
Hope this answers your question.
Database and collection creation in MongoDb Atlas is a developer jobs.
I suggest you define the User's permissions on Terraform (mongodbatlas_custom_db_role).
So you can to restrict the acces, database name and collection name. it's a good approach. 😉
https://registry.terraform.io/providers/mongodb/mongodbatlas/latest/docs/resources/custom_db_role

Resources