Using databricks workspace in the same configuration as the databricks provider - terraform

I'm having some trouble getting the azurerm & databricks provider to work together.
With the azurerm provider, setup my workspace
resource "azurerm_databricks_workspace" "ws" {
name = var.workspace_name
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
sku = "premium"
managed_resource_group_name = "${azurerm_resource_group.rg.name}-mng-rg"
custom_parameters {
virtual_network_id = data.azurerm_virtual_network.vnet.id
public_subnet_name = var.public_subnet
private_subnet_name = var.private_subnet
}
}
No matter how I structure this, I can't say seem to get the azurerm_databricks_workspace.ws.id to work in the provider statement for databricks in the the same configuration. If it did work, the above workspace would be defined in the same configuration and I'd have a provider statement that looks like this:
provider "databricks" {
azure_workspace_resource_id = azurerm_databricks_workspace.ws.id
}
Error:
I have my ARM_* environment variables set to identify as a Service Principal with Contributor on the subscription.
I've tried in the same configuration & in a module and consuming outputs. The only way I can get it to work is by running one configuration for the workspace and a second configuration to consume the workspace.
This is super suboptimal in that I have a fair amount of repeating values across those configurations and it would be ideal just to have one.
Has anyone been able to do this?
Thank you :)

I've had the exact same issue with a not working databricks provider because I was working with modules. I separated the databricks infra (Azure) with databricks application (databricks provider).
In my databricks module I added the following code at the top, otherwise it would use my azure setup:
terraform {
required_providers {
databricks = {
source = "databrickslabs/databricks"
version = "0.3.1"
}
}
}
In my normal provider setup I have the following settings for databricks:
provider "databricks" {
azure_workspace_resource_id = module.databricks_infra.databricks_workspace_id
azure_client_id = var.ARM_CLIENT_ID
azure_client_secret = var.ARM_CLIENT_SECRET
azure_tenant_id = var.ARM_TENANT_ID
}
And of course I have the azure one. Let me know if it worked :)

If you experience technical difficulties with rolling out resources in this example, please make sure that environment variables don't conflict with other provider block attributes. When in doubt, please run TF_LOG=DEBUG terraform apply to enable debug mode through the TF_LOG environment variable. Look specifically for Explicit and implicit attributes lines, that should indicate authentication attributes used. The other common reason for technical difficulties might be related to missing alias attribute in provider "databricks" {} blocks or provider attribute in resource "databricks_..." {} blocks. Please make sure to read alias: Multiple Provider Configurations documentation article.

From the error message, it looks like Authentication is not configured for provider could you please configure it through the one of following options mentioned above.
For more details, refer Databricks provider - Authentication.
For passing the custom_parameters, you may checkout the SO thread which addressing the similar issue.
In case if you need more help on this issue, I would suggest to open an issue here: https://github.com/terraform-providers/terraform-provider-azurerm/issues

Related

Hashicorp Vault Required Provider Configuration in Terraform

My GitLab CI pipeline terraform configuration requires a couple of required_provider blocks to be declared. These are "hashicorp/azuread" and "hashicorp/vault" and so in my provider.tf file, I have given the below declaration:
terraform {
required_providers {
azuread = {
source = "hashicorp/azuread"
version = "~> 2.0.0"
}
vault = {
source = "hashicorp/vault"
version = "~> 3.0.0"
}
}
}
When my GitLab pipeline runs the terraform plan stage however, it throws the following error:
Error: Invalid provider configuration
Provider "registry.terraform.io/hashicorp/vault" requires explicit configuraton.
Add a provider block to the root module and configure the providers required
arguments as described in the provider documentation.
I realise my required provider block for hashicorp/vault is incomplete/not properly configured but despite all my efforts to find an example of how it should be configured, I have simply run into a brick wall.
Any help with a very basic example would be greatly appreciated.
It depends on the version of Terraform you are using. However, for each provider there is (in the top right corner) a Use Provider button which explains how to add the required blocks of code to your files.
Each provider has some additional configuration parameters which could be added and some are required.
So based on the error, I think you are missing the second part of the configuration:
provider "vault" {
# Configuration options
}
There is also an explanation on how to upgrade to version 3.0 of the provider. You might also want to take a look at Hashicorp Learn examples and Github repo with example code.

How to create Azure Databricks Notebook via Terraform?

So I am completely new to the terraform and I found that by using this in terraform main.tf I can create Azure Databricks infrastructure:
resource "azurerm_databricks_workspace" "bdcc" {
depends_on = [
azurerm_resource_group.bdcc
]
name = "dbw-${var.ENV}-${var.LOCATION}"
resource_group_name = azurerm_resource_group.bdcc.name
location = azurerm_resource_group.bdcc.location
sku = "standard"
tags = {
region = var.BDCC_REGION
env = var.ENV
}
}
And I also found here
That by using this I can even create particular notebook in this Azure DataBricks infrastructure:
resource "databricks_notebook" "notebook" {
content_base64 = base64encode(<<-EOT
# created from ${abspath(path.module)}
display(spark.range(10))
EOT
)
path = "/Shared/Demo"
language = "PYTHON"
}
But since I am new to this, I am not sure in what order I should put those pieces of code together.
It would be nice if someone could point me to the full example of how to create notebook via terraform on Azure Databricks.
Thank you beforehand!
In general you can put these objects in any order - it's a job of the Terraform to detect dependencies between the objects and create/update them in the correct order. For example, you don't need to have depends_on in the azurerm_databricks_workspace resource, because Terraform will find that it needs resource group before workspace could be created, so workspace creation will follow the creation of the resource group. And Terraform is trying to make the changes in the parallel if it's possible.
But because of this, it's becoming slightly more complex when you have workspace resource together with workspace objects, like, notebooks, clusters, etc. As there is no explicit dependency, Terraform will try create notebook in parallel with creation of workspace, and it will fail because workspace doesn't exist - usually you will get a message about authentication error.
The solution for that would be to have explicit dependency between notebook & workspace, plus you need to configure authentication of Databricks provider to point to newly created workspace (there are differences between user & service principal authentication - you can find more information in the docs). At the end your code would look like this:
resource "azurerm_databricks_workspace" "bdcc" {
name = "dbw-${var.ENV}-${var.LOCATION}"
resource_group_name = azurerm_resource_group.bdcc.name
location = azurerm_resource_group.bdcc.location
sku = "standard"
tags = {
region = var.BDCC_REGION
env = var.ENV
}
}
provider "databricks" {
host = azurerm_databricks_workspace.bdcc.workspace_url
}
resource "databricks_notebook" "notebook" {
depends_on = [azurerm_databricks_workspace.bdcc]
...
}
Unfortunately, there is no way to put depends_on on the provider level, so you will need to put it into every Databricks resource that is created together with workspace. Usually the best practice is to have a separate module for workspace creation & separate module for objects inside Databricks workspace.
P.S. I would recommend to read some book or documentation on Terraform. For example, Terraform: Up & Running is very good intro

Access Azure Function App system keys in Terraform

I want to create Azure EventGrid subscription using Terraform.
resource "azurerm_eventgrid_system_topic_event_subscription" "function_app" {
name = "RunOnBlobUploaded"
system_topic = azurerm_eventgrid_system_topic.function_app.name
resource_group_name = azurerm_resource_group.rg.name
included_event_types = [
"Microsoft.Storage.BlobCreated"
]
subject_filter {
subject_begins_with = "/blobServices/default/containers/input"
}
webhook_endpoint {
url = "https://thumbnail-generator-function-app.azurewebsites.net/runtime/webhooks/blobs?functionName=Create-Thumbnail&code=<BLOB-EXTENSION-KEY>"
}
}
By following this doc, I successfully deployed it and it works. However, the webhook_endpoint URL needs <BLOB-EXTENSION-KEY> which is hardcoded right now and found from the following place in the portal:
In order to not commit a secret to GitHub, I want to get this value by reference, ideally using Terraform.
According to my research, it seems there is no way in Terraform to reference that value.
The closest one is this data source azurerm_function_app_host_keys in Terraform. However, it doesn't cover the blobs_extension key!
Is there any good way to reference blobs_extension in Terraform without a hardcoded value?
Thanks in advance!
If TF does not support it yet, you can create your own External Data Source which is going to use azure cli or sdk to get the value you want, and return it to your TF for further use.

How to share Terraform variables across workpaces/modules?

Terraform Cloud Workspaces allow me to define variables, but I'm unable to find a way to share variables across more than one workspace.
In my example I have, lets say, two workspaces:
Database
Application
In both cases I'll be using the same AzureRM credentials for connectivity. The following are common values used by the workspaces to connect to my Azure subscription:
provider "azurerm" {
subscription_id = "00000000-0000-0000-0000-000000000000"
client_id = "00000000-0000-0000-0000-000000000000"
client_secret = "00000000000000000000000000000000"
tenant_id = "00000000-0000-0000-0000-000000000000"
}
It wouldn't make sense to duplicate values (in my case I'll have probably 10 workspaces).
Is there a way to do this?
Or the correct approach is to define "database" and "application" as a Module, and then use Workspaces (DEV, QA, PROD) to orchestrate them?
In Terraform Cloud, the Workspace object is currently the least granular location where you can specify variable values directly. There is no built in mechanism to share variable values between workspaces.
However, one way to approach this would be to manage Terraform Cloud with Terraform itself. The tfe provider (named after Terraform Enterprise for historical reasons, since it was built before Terraform Cloud launched) will allow Terraform to manage Terraform Cloud workspaces and their associated variables.
variable "workspaces" {
type = set(string)
}
variable "common_environment_variables" {
type = map(string)
}
provider "tfe" {
hostname = "app.terraform.io" # Terraform Cloud
}
resource "tfe_workspace" "example" {
for_each = var.workspaces
organization = "your-organization-name"
name = each.key
}
resource "tfe_variable" "example" {
# We'll need one tfe_variable instance for each
# combination of workspace and environment variable,
# so this one has a more complicated for_each expression.
for_each = {
for pair in setproduct(var.workspaces, keys(var.common_environment_variables)) : "${pair[0]}/${pair[1]}" => {
workspace_name = pair[0]
workspace_id = tfe_workspace.example[pair[0]].id
name = pair[1]
value = var.common_environment_variables[pair[1]]
}
}
workspace_id = each.value.workspace_id
category = "env"
key = each.value.name
value = each.value.value
sensitive = true
}
With the above configuration, you can set var.workspaces to contain the names of the workspaces you want Terraform to manage and var.common_environment_variables to the environment variables you want to set for all of them.
Note that for setting credentials on a provider the recommended approach is to set them in environment variables rather than Terraform variables, because that then makes the Terraform configuration itself agnostic to how those credentials are obtained. You could potentially apply the same Terraform configuration locally (outside of Terraform Cloud) using the integration with Azure CLI auth, while the Terraform Cloud execution environment would often use a service principal.
Therefore to provide the credentials in the Terraform Cloud environment you'd put the following environment variables in var.common_environment_variables:
ARM_CLIENT_ID
ARM_TENANT_ID
ARM_SUBSCRIPTION_ID
ARM_CLIENT_SECRET
If you use Terraform Cloud itself to run operations on this workspace managing Terraform Cloud (naturally, you'd need to set this one up manually to bootstrap, rather than having it self-manage) then you can configure var.common_environment_variables as a sensitive variable on that workspace.
If you instead set it via Terraform variables passed into the provider "azurerm" block (as you indicated in your example) then you force any person or system running the configuration to directly populate those variables, forcing them to use a service principal vs. one of the other mechanisms and preventing Terraform from automatically picking up credentials set using az login. The Terraform configuration should generally only describe what Terraform is managing, not settings related to who is running Terraform or where Terraform is being run.
Note though that the state for the Terraform Cloud self-management workspace will include
a copy of those credentials as is normal for objects Terraform is managing, so the permissions on this workspace should be set appropriately to restrict access to it.
You can now use variable sets to reuse variable across multiple workspaces

Terrafrom - Deploy to multiple azure subscriptions

I have been trying to use the same terraform stack to deploy resources in multiple azure subscriptions. Also need to pass parameters between these resources in different subscriptions. I had tried to use multiple Providers, but that is not supported.
Error: provider.azurerm: multiple configurations present; only one configuration is allowed per provider
If you have a way or an idea on how to accomplish this please let me know.
You can use multiple providers by using alias (doku).
# The default provider configuration
provider "azurerm" {
subscription_id = "xxxxxxxxxx"
}
# Additional provider configuration for west coast region
provider "azurerm" {
alias = "y"
subscription_id = "yyyyyyyyyyy"
}
And then specify whenever you want to use the alternative provider:
resource "azurerm_resource_group" "network_x" {
name = "production"
location = "West US"
}
resource "azurerm_resource_group" "network_y" {
provider = "azurerm.y"
name = "production"
location = "West US"
}
Markus answer is correct, but it is the right solution if you need to access more than one subscription in the same set of Terraform sources.
If your purpose is to use one subscription as sandbox and the other for real, you should simply move the provider information out of Terraform scripts. There are more than one way to manage this:
Workspaces
Backend configuration
A wrapper script in bash/Powershell/python in Terragrunt style
Symbolic links can also be used to share files in multiple folders
I use a combination of the last three as workspaces are too rigid for our needs.
I got this error code for a silly reason as a Terraform beginner, maybe someone here has the same problem:
I saved a backup of my main.tf file as something like mymainbackup1.tf and Terraform interpreted it as a real .tf file even though it wasn't main.tf, therefore it thought I had more than one provider registered.
I changed the file to the .txt extension and Terraform stopped interpreting that file and stopped giving the error.

Resources