How to create Azure Databricks Notebook via Terraform?

How to create Azure Databricks Notebook via Terraform? - terraform

So I am completely new to the terraform and I found that by using this in terraform main.tf I can create Azure Databricks infrastructure:
resource "azurerm_databricks_workspace" "bdcc" {
depends_on = [
azurerm_resource_group.bdcc
]
name = "dbw-${var.ENV}-${var.LOCATION}"
resource_group_name = azurerm_resource_group.bdcc.name
location = azurerm_resource_group.bdcc.location
sku = "standard"
tags = {
region = var.BDCC_REGION
env = var.ENV
}
}
And I also found here
That by using this I can even create particular notebook in this Azure DataBricks infrastructure:
resource "databricks_notebook" "notebook" {
content_base64 = base64encode(<<-EOT
# created from ${abspath(path.module)}
display(spark.range(10))
EOT
)
path = "/Shared/Demo"
language = "PYTHON"
}
But since I am new to this, I am not sure in what order I should put those pieces of code together.
It would be nice if someone could point me to the full example of how to create notebook via terraform on Azure Databricks.
Thank you beforehand!

In general you can put these objects in any order - it's a job of the Terraform to detect dependencies between the objects and create/update them in the correct order. For example, you don't need to have depends_on in the azurerm_databricks_workspace resource, because Terraform will find that it needs resource group before workspace could be created, so workspace creation will follow the creation of the resource group. And Terraform is trying to make the changes in the parallel if it's possible.
But because of this, it's becoming slightly more complex when you have workspace resource together with workspace objects, like, notebooks, clusters, etc. As there is no explicit dependency, Terraform will try create notebook in parallel with creation of workspace, and it will fail because workspace doesn't exist - usually you will get a message about authentication error.
The solution for that would be to have explicit dependency between notebook & workspace, plus you need to configure authentication of Databricks provider to point to newly created workspace (there are differences between user & service principal authentication - you can find more information in the docs). At the end your code would look like this:
resource "azurerm_databricks_workspace" "bdcc" {
name = "dbw-${var.ENV}-${var.LOCATION}"
resource_group_name = azurerm_resource_group.bdcc.name
location = azurerm_resource_group.bdcc.location
sku = "standard"
tags = {
region = var.BDCC_REGION
env = var.ENV
}
}
provider "databricks" {
host = azurerm_databricks_workspace.bdcc.workspace_url
}
resource "databricks_notebook" "notebook" {
depends_on = [azurerm_databricks_workspace.bdcc]
...
}
Unfortunately, there is no way to put depends_on on the provider level, so you will need to put it into every Databricks resource that is created together with workspace. Usually the best practice is to have a separate module for workspace creation & separate module for objects inside Databricks workspace.
P.S. I would recommend to read some book or documentation on Terraform. For example, Terraform: Up & Running is very good intro

Related

How can I reuse the terraform output values in another deployment

I have a terraform code that deploys Azure resources and outputs a bunch of values I need to use for further configuration.
I have initialized terraform backend and the state is saved to an Azure Storage account, I see the tfstate file with all the correct values.
FYI I have added this configuration but still no luck, also I am running the terraform init command in the second location so the backend is initialized with the same state:
backend "azurerm" {
storage_account_name = "${var.STATE_STORAGE_ACCOUNT_NAME}"
container_name = "${var.STATE_CONTAINER_NAME}"
key = "${var.STATE_STORAGE_ACCOUNT_KEY}"
}
What I want to be able to do is pull this state in some way so I can do terraform output -raw some_output in a different location than where I deployed the resources.
I can't seem to find a way to do this. How could this be achieved? Thanks

It really depends on your use case. You can take two different approaches:
Import resource with "data" source
Data sources allow Terraform use information defined outside of Terraform, defined by another separate Terraform configuration, or modified by functions.
Terraform docs
For AWS it would be something like this:
// create ssm parameter in Terraform code A
resource "aws_ssm_parameter" "secret" {
name = "/secret"
type = "String"
value = "SecretValue"
}
// Import this resource in Terraform code B
data "aws_ssm_parameter" "imported_secret" {
name = "/secret"
}
// So later you can reference it
locals any {
secretValue = data.aws_ssm_parameter.secret.value
}
Create modules
Modules are containers for multiple resources that are used together. A module consists of a collection of .tf and/or .tf.json files kept together in a directory. Modules are the main way to package and reuse resource configurations with Terraform.
Terraform docs
It is basic example of Terraform modules. We created module vpc where source of this module is in ../../modules/vpc directory and we referenced this module by module.vpc in rds module.
module "vpc" {
source = "../../modules/vpc"
env = var.env
azs = var.azs
cidr = var.cidr
db_subnets = var.db_subnets
private_subnets = var.private_subnets
public_subnets = var.public_subnets
}
module "rds" {
source = "../../modules/rds"
db_subnets_cidr_blocks = module.vpc.db_subnets_cidr_block
private_subnets_cidr_blocks = module.vpc.private_subnets_cidr_block
public_subnets_cidr_blocks = module.vpc.public_subnets_cidr_block
vpc_id = module.vpc.vpc_id
env = var.env
db_subnets_ids = module.vpc.db_subnets
}

I didn't find a straight forward way to do this, so the solution was, since the state file was being saved to Azure Blob Storage:
Run an Azure CLI command to get the blob locally:
az storage blob download --container-name tstate --file $tf_state_file_name --name $tf_state_file_name --account-key $tf_state_key --account-name $tf_state_storage_account_name
Where the local file name is: $tf_state_file_name
Read the desired values using JQ:
jq '.outputs.storage_account_name.value' ./$tf_state_file_name -r
You can read the values raw thanks to the -r paramater. This is the same as doing:
terraform output -raw storage_account_name

Using databricks workspace in the same configuration as the databricks provider

I'm having some trouble getting the azurerm & databricks provider to work together.
With the azurerm provider, setup my workspace
resource "azurerm_databricks_workspace" "ws" {
name = var.workspace_name
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
sku = "premium"
managed_resource_group_name = "${azurerm_resource_group.rg.name}-mng-rg"
custom_parameters {
virtual_network_id = data.azurerm_virtual_network.vnet.id
public_subnet_name = var.public_subnet
private_subnet_name = var.private_subnet
}
}
No matter how I structure this, I can't say seem to get the azurerm_databricks_workspace.ws.id to work in the provider statement for databricks in the the same configuration. If it did work, the above workspace would be defined in the same configuration and I'd have a provider statement that looks like this:
provider "databricks" {
azure_workspace_resource_id = azurerm_databricks_workspace.ws.id
}
Error:
I have my ARM_* environment variables set to identify as a Service Principal with Contributor on the subscription.
I've tried in the same configuration & in a module and consuming outputs. The only way I can get it to work is by running one configuration for the workspace and a second configuration to consume the workspace.
This is super suboptimal in that I have a fair amount of repeating values across those configurations and it would be ideal just to have one.
Has anyone been able to do this?
Thank you :)

I've had the exact same issue with a not working databricks provider because I was working with modules. I separated the databricks infra (Azure) with databricks application (databricks provider).
In my databricks module I added the following code at the top, otherwise it would use my azure setup:
terraform {
required_providers {
databricks = {
source = "databrickslabs/databricks"
version = "0.3.1"
}
}
}
In my normal provider setup I have the following settings for databricks:
provider "databricks" {
azure_workspace_resource_id = module.databricks_infra.databricks_workspace_id
azure_client_id = var.ARM_CLIENT_ID
azure_client_secret = var.ARM_CLIENT_SECRET
azure_tenant_id = var.ARM_TENANT_ID
}
And of course I have the azure one. Let me know if it worked :)

If you experience technical difficulties with rolling out resources in this example, please make sure that environment variables don't conflict with other provider block attributes. When in doubt, please run TF_LOG=DEBUG terraform apply to enable debug mode through the TF_LOG environment variable. Look specifically for Explicit and implicit attributes lines, that should indicate authentication attributes used. The other common reason for technical difficulties might be related to missing alias attribute in provider "databricks" {} blocks or provider attribute in resource "databricks_..." {} blocks. Please make sure to read alias: Multiple Provider Configurations documentation article.

From the error message, it looks like Authentication is not configured for provider could you please configure it through the one of following options mentioned above.
For more details, refer Databricks provider - Authentication.
For passing the custom_parameters, you may checkout the SO thread which addressing the similar issue.
In case if you need more help on this issue, I would suggest to open an issue here: https://github.com/terraform-providers/terraform-provider-azurerm/issues

Terraform vsphere_tag unwanted deletion

I'm using Terraform to deploy some dev and prod VMs on our VMware vCenter infrastructure and use vsphere tags to define responsibilities of VMs. I therefore added the following to the (sub)module:
resource "vsphere_tag" "tag" {
name = "SYS-Team"
category_id = "Responsibility"
description = "Systems group"
}
...
resource "vsphere_virtual_machine" "web" {
tags = [vsphere_tag.tag.id]
...
}
Now, when I destroy e.g. the dev infra, it also deletes the prod vsphere tag and leave the VMs without the tag.
I tried to skip the deletion with the lifecycle, but then I would need to separately delete each resource which I don't like.
lifecycle {
prevent_destroy = true
}
Is there a way to add an existing tag without having the resource managed by Terraform? Something hardcoded without having the tag included as a resource like:
resource "vsphere_virtual_machine" "web" {
tags = [{
name = "SYS-Team"
category_id = "Responsibility"
description = "Systems group"
}
]
...
}

You can use Terraform's data sources to refer to things that are either not managed by Terraform or are managed in a different Terraform context when you need to retrieve some output from the resource such as an automatically generated ID.
In this case you could use the vsphere_tag data source to look up the id of the tag:
data "vsphere_tag_category" "responsibility" {
name = "Responsibility"
}
data "vsphere_tag" "sys_team" {
name = "SYS-Team"
category_id = data.vsphere_tag_category.responsibility.id
}
...
resource "vsphere_virtual_machine" "web" {
tags = [data.vsphere_tag.sys_team.id]
...
}
This will use a vSphere tag that has either been created externally or managed by Terraform in another place, allowing you to easily run terraform destroy to destroy the VM but keep the tag.

If I understood correctly the real problem is on:
when I destroy e.g. the dev infra, it also deletes the prod vsphere tag
That should not be happening!
any cross environment deletions is a red flag
It feels like the problem is in your pipeline, not your terraform code ...
The deployment pipelines I create the dev resources are not mixed with prod,
and IF they are mix, you are just asking for troubles,
your team should make redesigning that a high priority
You do ask:
Is there a way to add an existing tag without having the resource managed by Terraform?
Yes you can use PowerCLI for that:
https://blogs.vmware.com/PowerCLI/2014/03/using-vsphere-tags-powercli.html
the command to add tags is really simple:
New-Tag –Name “jsmith” –Category “Owner”
You could even integrate that into terraform code with a null_resource something like:
resource "null_resource" "create_tag" {
provisioner "local-exec" {
when = "create"
command = "New-Tag –Name “jsmith” –Category “Owner”"
interpreter = ["PowerShell"]
}
}

How to share Terraform variables across workpaces/modules?

Terraform Cloud Workspaces allow me to define variables, but I'm unable to find a way to share variables across more than one workspace.
In my example I have, lets say, two workspaces:
Database
Application
In both cases I'll be using the same AzureRM credentials for connectivity. The following are common values used by the workspaces to connect to my Azure subscription:
provider "azurerm" {
subscription_id = "00000000-0000-0000-0000-000000000000"
client_id = "00000000-0000-0000-0000-000000000000"
client_secret = "00000000000000000000000000000000"
tenant_id = "00000000-0000-0000-0000-000000000000"
}
It wouldn't make sense to duplicate values (in my case I'll have probably 10 workspaces).
Is there a way to do this?
Or the correct approach is to define "database" and "application" as a Module, and then use Workspaces (DEV, QA, PROD) to orchestrate them?

In Terraform Cloud, the Workspace object is currently the least granular location where you can specify variable values directly. There is no built in mechanism to share variable values between workspaces.
However, one way to approach this would be to manage Terraform Cloud with Terraform itself. The tfe provider (named after Terraform Enterprise for historical reasons, since it was built before Terraform Cloud launched) will allow Terraform to manage Terraform Cloud workspaces and their associated variables.
variable "workspaces" {
type = set(string)
}
variable "common_environment_variables" {
type = map(string)
}
provider "tfe" {
hostname = "app.terraform.io" # Terraform Cloud
}
resource "tfe_workspace" "example" {
for_each = var.workspaces
organization = "your-organization-name"
name = each.key
}
resource "tfe_variable" "example" {
# We'll need one tfe_variable instance for each
# combination of workspace and environment variable,
# so this one has a more complicated for_each expression.
for_each = {
for pair in setproduct(var.workspaces, keys(var.common_environment_variables)) : "${pair[0]}/${pair[1]}" => {
workspace_name = pair[0]
workspace_id = tfe_workspace.example[pair[0]].id
name = pair[1]
value = var.common_environment_variables[pair[1]]
}
}
workspace_id = each.value.workspace_id
category = "env"
key = each.value.name
value = each.value.value
sensitive = true
}
With the above configuration, you can set var.workspaces to contain the names of the workspaces you want Terraform to manage and var.common_environment_variables to the environment variables you want to set for all of them.
Note that for setting credentials on a provider the recommended approach is to set them in environment variables rather than Terraform variables, because that then makes the Terraform configuration itself agnostic to how those credentials are obtained. You could potentially apply the same Terraform configuration locally (outside of Terraform Cloud) using the integration with Azure CLI auth, while the Terraform Cloud execution environment would often use a service principal.
Therefore to provide the credentials in the Terraform Cloud environment you'd put the following environment variables in var.common_environment_variables:
ARM_CLIENT_ID
ARM_TENANT_ID
ARM_SUBSCRIPTION_ID
ARM_CLIENT_SECRET
If you use Terraform Cloud itself to run operations on this workspace managing Terraform Cloud (naturally, you'd need to set this one up manually to bootstrap, rather than having it self-manage) then you can configure var.common_environment_variables as a sensitive variable on that workspace.
If you instead set it via Terraform variables passed into the provider "azurerm" block (as you indicated in your example) then you force any person or system running the configuration to directly populate those variables, forcing them to use a service principal vs. one of the other mechanisms and preventing Terraform from automatically picking up credentials set using az login. The Terraform configuration should generally only describe what Terraform is managing, not settings related to who is running Terraform or where Terraform is being run.
Note though that the state for the Terraform Cloud self-management workspace will include
a copy of those credentials as is normal for objects Terraform is managing, so the permissions on this workspace should be set appropriately to restrict access to it.

You can now use variable sets to reuse variable across multiple workspaces

How to deploy app service with terraform on Azure

I want to create azure app service using terraform.
resource "azurerm_app_service" "one-app-sts" {
name = "${var.environment}-one-app-sts"
location = "${var.location}"
resource_group_name = "${azurerm_resource_group.one.name}"
app_service_plan_id = "${azurerm_app_service_plan.one.id}"
app_settings {
"Serilog:WriteTo:0:Args:workspaceId" = "${azurerm_log_analytics_workspace.one.workspace_id}"
}
tags {
environment = "${var.environment}"
source = "terraform"
}
}
For testing it should be named test-one-app-sts, for production prod-one-app-sts.
I tried to inject variables with tfvar file, however terrafrom plans to rename services instead of creating new one.
How would I make a script in a way so I can create/destroy as many different environemnts as a want (dev,test,prev,uat,prod)?
PS> Example is for service, but I also have service bus, databases, functions as part of an environment, that should be recreated/destroyed.

Well, figured it out: workspace is an answer https://www.terraform.io/docs/state/workspaces.html
I can deploy any number of copies of my environment =)
terraform workspace new test
will create new state, only for test.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string