No partial configuration for terraform_remote_state backend? - terraform

Partial Configuration allows us to specify backend configurations from command line.
terraform init \
-backend-config="region=${AWS_DEFAULT_REGION}" \
-backend-config="bucket=${TF_VAR_BACKEND_BUCKET}" \
-backend-config="key=${TF_VAR_BACKEND_KEY}" \
-backend-config="encrypt=true"
Having thought the same can be used for terraform_remote_state.
data "terraform_remote_state" "vpc" {
backend = "s3"
config { }
}
However, it causes the error.
Error: Error refreshing state: 1 error(s) occurred:
* data.terraform_remote_state.vpc: 1 error(s) occurred:
* data.terraform_remote_state.vpc: data.terraform_remote_state.vpc: InvalidParameter: 1 validation error(s) found.
- minimum field size of 1, GetObjectInput.Key.
It looks terraform_remote_state requires explicit configurations as indicated in Terraform terraform_remote_state Partial Configuration.
data "terraform_remote_state" "vpc" {
backend = "s3"
config {
encrypt = "true"
bucket = "${var.BACKEND_BUCKET}"
key = "${var.BACKEND_KEY}"
}
}
Question
Is there a way to use the partial configuration or is it current limitation of Terraform not being able to use partial configuration for terraform_remote_state?

The partial configurations only applies to initialization of early parameters before any variables are evaluated.
The concept does not apply to "normal" resources (and in this sense, a data block is "normal"). However, since you hold your secrets in corresponding TF_VAR_* environment varibles, explicitly stating those seems better than implicitly relying on their presence. The code is clearer, and all used values are stated in the code. This is good practice.
So the question is: Why would you want to avoid to explicily state the required variables?
Addendum:
As you indicated in the comments, you want
a single location to hold one information
As you are using environment variables in your initialization process (via --backend-config parameter) and in your code (via variable access to environment variables), you are effectively using one single place to manage the information for both entries!
(Note that the possibility to omit the values in the backend is a mere workaround due to the order terraform processes the files.)
Please also reconsider the difference between backend (this is, where terraform saves its state to) and remote_state (this is just a normal data provider that gives information about any remote state you might desire (even those which are on completely separate cloud instances, accessed by potentially different credentials)). Thus, specifying the credentials explicitly as those used by the backend, is a special usecase.

Related

Provider requires dynamic output of resource: what to do?

I am successfully creating a vmc_sddc resource. One of the attributes returned from that is "nsxt_reverse_proxy_url".
I need to use the "nsxt_reverse_proxy_url" value for another provider's (nsxt) input.
Unfortunately, Terraform rejects this construct saying the "host name must be provided". In other words, the dynamic value is not accepted as input.
Question: Is there any way to use the dynamically-created value from a resource as input to another provider?
Here is the code:
resource "vmc_sddc" "harpoon_sddc" {
sddc_name = var.sddc_name
vpc_cidr = var.vpc_cidr
num_host = 1
provider_type = "AWS"
region = data.vmc_customer_subnets.my_subnets.region
vxlan_subnet = var.vxlan_subnet
delay_account_link = false
skip_creating_vxlan = false
sso_domain = "vmc.local"
deployment_type = "SingleAZ"
sddc_type = "1NODE"
}
provider "nsxt" {
host = vmc_sddc.harpoon_sddc.nsxt_reverse_proxy_url // DOES NOT WORK
vmc_token = var.api_token
allow_unverified_ssl = true
enforcement_point = "vmc-enforcementpoint"
}
Here is the error message from Terraform:
╷
│ Error: host must be provided
│
│ with provider["registry.terraform.io/vmware/nsxt"],
│ on main.tf line 55, in provider "nsxt":
│ 55: provider "nsxt" {
│
Thank you
As you've found, some providers cannot handle unknown values as part of their configuration during planning, and so it doesn't work to dynamically configure them based on objects being created in the same run in the way you tried.
In situations like this, there are two main options:
On your first run you can use terraform apply -target=vmc_sddc.harpoon_sddc to ask Terraform to focus only on the objects needed to create that one object, excluding anything related to the nsxt provider. Once that apply completes successfully you can then run terraform apply as normal and Terraform will already know the value of vmc_sddc.harpoon_sddc.nsxt_reverse_proxy_url so the provider configuration can succeed.
This is typically the best choice for a long-lived configuration that you don't expect to be recreating often, since you can just do this one-off extra step once during initial creation and then use Terraform as normal after that, as long as you never need to recreate vmc_sddc.harpoon_sddc.
You can split the configuration into two separate configurations handling the different layers. The first layer would be responsible for the "vmc" level of abstraction, allowing you to terraform apply that in isolation, and then the second configuration would be responsible for the "nsxt" level of abstraction building on top, which you can run terraform apply on once you've got the first configuration running.
This is a variant of the first option where the separation between the first and second steps is explicit in the configuration structure itself, which means that you don't need to add any extra options when you run Terraform but you do now need to manage two configurations. This approach is therefore better than the first only if you will be routinely destroying and re-creating these objects, so that you can make it explicit in the code that this is a two-step process.
In principle some providers can be designed to tolerate unknown values as input and do offline planning in that case, but it isn't technically possible for all providers because sometimes there really isn't any way to create a meaningful plan without connecting to the remote system to ask it questions. I'm not familiar with this provider so I don't know if it's requiring a hostname for a strong technical reason or just because the provider developers didn't consider the possibility that you might use it in this way, and so if your knowledge of nsxt leads you to think that it might be possible in principle for it to do offline planning then a third option would be to ask the developers if it would be feasible to defer connecting to the given host until the apply phase, in which case you wouldn't need to do any extra steps like the above.

Clarification on changes made outside of Terraform

I don't fully understand how Terraform handles external changes. Let's take an example:
resource "aws_instance" "ec2-test" {
ami = "ami-0d71ea30463e0ff8d"
instance_type = "t2.micro"
}
1: security group modification
The default security group has been manually replaced by another one. Terraform detects the change:
❯ terraform plan --refresh-only
aws_instance.ec2-test: Refreshing state... [id=i-5297abcc6001ce9a8]
Note: Objects have changed outside of Terraform
Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this plan:
# aws_instance.ec2-test has changed
~ resource "aws_instance" "ec2-test" {
id = "i-5297abcc6001ce9a8"
~ security_groups = [
- "default",
+ "test",
]
tags = {}
~ vpc_security_group_ids = [
+ "sg-8231be9a95a4b1886",
- "sg-f2fc3af19c4adefe0",
]
# (28 unchanged attributes hidden)
# (7 unchanged blocks hidden)
}
No change planned:
❯ terraform plan
aws_instance.ec2-test: Refreshing state... [id=i-5297abcc6001ce9a8]
No changes. Your infrastructure matches the configuration.
Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.
It seems normal as we did not set the security_groups argument in the resource block (the desired state is aligned with the current state).
2: IAM instance profile added
An IAM role has been manually attached to the instance. Terraform also detects the change:
❯ terraform plan --refresh-only
aws_instance.ec2-test: Refreshing state... [id=i-5297abcc6001ce9a8]
Note: Objects have changed outside of Terraform
Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this plan:
# aws_instance.ec2-test has changed
~ resource "aws_instance" "ec2-test" {
+ iam_instance_profile = "test"
id = "i-5297abcc6001ce9a8"
tags = {}
# (30 unchanged attributes hidden)
# (7 unchanged blocks hidden)
}
This is a refresh-only plan, so Terraform will not take any actions to undo these. If you were expecting these changes then you can apply this plan to record the updated values in the Terraform state without
changing any remote objects.
However, Terraform also plans to revert the change:
❯ terraform plan
aws_instance.ec2-test: Refreshing state... [id=i-5297abcc6001ce9a8]
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# aws_instance.ec2-test will be updated in-place
~ resource "aws_instance" "ec2-test" {
- iam_instance_profile = "test" -> null
id = "i-5297abcc6001ce9a8"
tags = {}
# (30 unchanged attributes hidden)
# (7 unchanged blocks hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
I tried to figure out why these two changes don't produce the same effect. This article highlights differences depending on the argument default values: https://nedinthecloud.com/2021/12/23/terraform-apply-when-external-change-happens/
But the security_groups and iam_instance_profile arguments seems similar (optional with no default value), so why Terraform is handling these two cases differently?
(tested with Terraform v1.2.2, hashicorp/aws 4.21.0)
The handling of these situations unfortunately depends a lot on decisions made by the provider developer, since it's the provider's responsibility to decide how to reconcile any differences between the configuration and the prior state. (The "prior state" is what Terraform calls the state that results from running the "refresh" steps to synchronize with the remote system).
Terraform Core takes the values you've defined in the configuration (if any optional arguments are unset, Terraform Core uses null to represent that) and the values from the prior state and sends both of them to the provider to implement the planning step. The provider can then do whatever logic it wants as long as the planned new value for each attribute is consistent with the input. "Consistent" means that one of the following conditions is true:
The planned value is equal to the value set in the configuration.
This is the most straightforward situation to follow, but there are various reasons why a provider might not do this, which I'll discuss later.
The planned value is equal to the value stored in the prior state.
This represents situations where the value in the prior state is functionally equivalent to the value in the configuration but not exactly equal, such as if the remote system treats a particular string as case insensitive and the two values differ only in case.
The provider indicated in its schema that this is a value that can be decided by the remote system, such as an object ID that's generated by the remote system during the apply step, and the corresponding value in the configuration was null to represent the argument not being set at all.
In this case the provider gets to choose whichever value it wants, because the configuration says nothing about the attribute and thus the remote system has authority on what the value is.
From what you've described, it sounds like in your first example the provider used approach number 3, while in the second example the provider used approach number 1.
Since I am not the developer of this provider I cannot say for certain why the developers made the decisions they did here, but one common reason why a provider developer might choose option three is for situations where a particular value can potentially be set by multiple different resource types, in which case the provider might be designed to treat an absent argument in the configuration as meaning "keep whatever the remote system already has", whereas a non-null argument in the configuration would mean "set the remote system to use this given value".
For iam_instance_profile it seems like the provider considers null to be a valid configuration value for that argument and uses it to represent the EC2 instance having no associated instance profile at all. For vpc_security_groups and security_groups though, leaving the argument set to null in the configuration (or omitting it, which is equivalent) the provider treats that as "keep whatever the remote system has", and so Terraform just acknowledges the change but doesn't propose to undo it.
Based on my knowledge about EC2, I can guess that the reason here is probably that the underlying EC2 API has two different ways to set security groups: you can either use the legacy EC2-Classic style of specifying a security group by name (the security_groups argument in the provider), or the new EC2-VPC style of specifying it by ID (the vpc_security_group_ids argument in the provider). Whichever of the two you choose, the remote system will presumably populate the other one automatically and therefore without this special exception in the provider it would be impossible for any configuration to converge unless you set both security_groups and vpc_security_group_ids and set them to both refer to the same security groups. To avoid that, I think the provider just lets whichever one of the two you left unset automatically track the remote system, which has the side-effect the provider cannot automatically "fix" changes made outside of Terraform unless you set at least one of them so the provider can see what the correct value ought to be.
Terraform's ability to reconcile changes in the remote system by resetting back to match the configuration is a "best effort" mechanism because in many cases that requirement comes into conflict with other requirements, and provider developers must therefore decide on a case-by-case basis what to prioritize. Although Terraform does try its best to tell you about changes outside of Terraform and to propose fixing them where possible, the only certain way to keep your Terraform configuration and your remote system synchronized is to prevent anyone from making changes outside of Terraform, for example using IAM policies in AWS.

In Terraform 0.12, how to skip creation of resource, if resource name already exists?

I am using Terraform version 0.12. I have a requirement to skip resource creation if resource with the same name already exists.
I did the following for this :
Read the list of custom images,
data "ibm_is_images" "custom_images" {
}
Check if image already exists,
locals {
custom_vsi_image = contains([for x in data.ibm_is_images.custom_images.images: "true" if x.visibility == "private" && x.name == var.vnf_vpc_image_name], "true")
}
output "abc" {
value="${local.custom_vsi_image}"
}
Create only if image exists is false.
resource "ibm_is_image" "custom_image" {
count = "${local.custom_vsi_image == true ? 0 : 1}"
depends_on = ["data.ibm_is_images.custom_images"]
href = "${local.image_url}"
name = "${var.vnf_vpc_image_name}"
operating_system = "centos-7-amd64"
timeouts {
create = "30m"
delete = "10m"
}
}
This works fine for the first time with "terraform apply". It finds that the image did not exists, so it creates image.
When I run "terraform apply" for the second time. It is deleting the resource "custom_image" that is created above. Any idea why it is deleting the resource, when it is run for the 2nd time ?
Also, how to create a resource based on some condition(like only when it does not exists) ?
In Terraform, you're required to decide explicitly what system is responsible for the management of a particular object, and conversely which systems are just consuming an existing object. There is no way to make that decision dynamically, because that would make the result non-deterministic and -- for objects managed by Terraform -- make it unclear which configuration's terraform destroy would destroy the object.
Indeed, that non-determinism is why you're seeing Terraform in your situation flop between trying to create and then trying to delete the resource: you've told Terraform to only manage that object if it doesn't already exist, and so the first time you run Terraform after it exists Terraform will see that the object is no longer managed and so it will plan to destroy it.
If you goal is to manage everything with Terraform, an important design task is to decide how object dependencies flow within and between Terraform configurations. In your case, it seems like there is a producer/consumer relationship between a system that manages images (which may or may not be a Terraform configuration) and one or more Terraform configurations that consume existing images.
If the images are managed by Terraform then that suggests either that your main Terraform configuration should assume the image does not exist and unconditionally create it -- if your decision is that the image is owned by the same system as what consumes it -- or it should assume that the image does already exist and retrieve the information about it using a data block.
A possible solution here is to write a separate Terraform configuration that manages the image and then only apply that configuration in situations where that object isn't expected to already exist. Then your configuration that consumes the existing image can just assume it exists without caring about whether it was created by the other Terraform configuration or not.
There's a longer overview of this situation in the Terraform documentation section Module Composition, and in particular the sub-section Conditional Creation of Objects. That guide is focused on interactions between modules in a single configuration, but the same underlying principles apply to dependencies between configurations (via data sources) too.

Declare multiple providers for a list of regions

I have a Terraform module that manages AWS GuardDuty.
In the module, an aws_guardduty_detector resource is declared. The resource allows no specification of region, although I need to configure one of these resources for each region in a list. The region used needs to be declared by the provider, apparently(?).
Lack of module for_each seems to be part of the problem, or, at least, module for_each, if it existed, might let me declare the whole module, once for each region.
Thus, I wonder, is it possible to somehow declare a provider, for each region in a list?
Or, short of writing a shell script wrapper, or doing code generation, is there any other clean way to solve this problem that I might not have thought of?
To support similar processes I have found two approaches to this problem
Declare multiple AWS providers in the Terraform module.
Write the module to use a single provider, and then have a separate .tfvars file for each region you want to execute against.
For the first option, it can get messy having multiple AWS providers in one file. You must give each an alias and then each time you create a resource you must set the provider property on the resource so that Terraform knows which region provider to execute against. Also, if the provider for one of the regions can not initialize, maybe the region is down, then the entire script will not run, until you remove it or the region is back up.
For the second option, you can write the Terraform for what resources you need to set up and then just run the module multiple times, once for each regional .tfvars file.
prod-us-east-1.tfvars
prod-us-west-1.tfvars
prod-eu-west-2.tfvars
My preference is to use the second option as the module is simpler and less duplication. The only duplication is in the .tfvars files and should be more manageable.
EDIT: Added some sample .tfvars
prod-us-east-1.tfvars:
region = "us-east-1"
account_id = "0000000000"
tags = {
env = "prod"
}
dynamodb_read_capacity = 100
dynamodb_write_capacity = 50
prod-us-west-1.tfvars:
region = "us-west-1"
account_id = "0000000000"
tags = {
env = "prod"
}
dynamodb_read_capacity = 100
dynamodb_write_capacity = 50
We put whatever variables might need to be changed for the service or feature based on environment and/or region. For instance in a testing environment, the dynamodb capacity may be lower than in the production environment.

Passing an attribute from a resource with count [0 or 1] defined to a module - possible?

Terraform 0.12.13, azurerm provider 1.35
Some background: I have a set of Azure App Services, hosted on an App Service Plan, in a Resource Group, in an Azure location. I now need to duplicate this stack in a different Azure location and add some additional resources like Traffic Managers and CNAMEs and whatnot in order to implement high availability. Architecturally we have Primary resources, and then a smaller subset of Secondary resources in the secondary region (not everything needs to be duplicated). Not every deployment will require high availability, so I need to be able to instantiate or not instantiate the Secondaries at run-time.
Because I was trying to be a good software engineer, I created modules to instantiate most of this stuff - one for the app services, one for the app service plan, one for the traffic managers, and so on.
The problem I have now is that I'm using the old count + ternary operator trick to control whether the secondary resources get created, and this is breaking because 1) count isn't allowed as a module meta-argument yet and 2) I can't figure out how to pass exported attributes from a resource controlled by the count meta-argument to a module as an input variable.
The following code may make this clearer.
resource "azurerm_resource_group" "appservices_secondary" {
name = "foo-services-ca-${local.secondary_release_stage_name}-${var.pipeline}-rg"
location = local.secondary_location
count = var.enable_high_availability ? 1 : 0
}
# Create the app service plan to host the secondary app services
module "plan_secondary" {
source = "./app_service_plan"
release_stage_name = local.secondary_release_stage_name
# HERE'S THE PROBLEMATIC LINE
appsvc_resource_group_name = azurerm_resource_group.appservices_secondary[0].name
location = local.secondary_location
pipeline = var.pipeline
}
If count resolves to 1 (var.enable_high_availability = true) then everything's fine.
If count resolves to 0 (var.enable_high_availability = false) then terraform plan fails:
Error: Invalid index
on .terraform\modules\services\secondary.tf line 25, in module "plan_secondary":
25: appsvc_resource_group_name = azurerm_resource_group.appservices_secondary[0].name
|----------------
| azurerm_resource_group.appservices_secondary is empty tuple
The given key does not identify an element in this collection value.
If I change the input variable value to azurerm_resource_group.appservices_secondary.name then it won't pass terraform validate because it recognizes that it needs [count.index].
Is there a simple way to resolve this? I'm increasingly thinking this is a design problem and I should have built the modules with count = [1..2] rather than count = 1 (primary) and count = [0 || 1] (secondary) but that will require me to rewrite all the modules and I'd like to avoid that if there's some clever workaround.
In order to resolve this you can use a conditional expression for appsvc_resource_group_name to provide some alternative value to use when the azurerm_resource_group.appservices_secondary resource has count = 0:
appsvc_resource_group_name = length(azurerm_resource_group.appservices_secondary) > 0 ? azurerm_resource_group.appservices_secondary[0].name : "default-value"
It looks like this other module is not useful in situations where high availability is disabled. In that case, you might want to define the variable as being optional with a default of null so that you can recognize when it isn't set in the module:
variable "appsvc_resource_group_name" {
type = string
default = null
}
Elsewhere in the configuration you can test var.appsvc_resource_group_name != null to see if it's enabled.
When following the module composition patterns I'd likely instead build this as two modules, using one of the following two strategies:
One module for building a "normal" (non-HA) stack and another module for building a HA stack, and then choose which one to use in the root module of each configuration depending on whether a particular configuration needs the normal or HA mode.
Alternatively, if the HA stack is always a superset of the "normal" stack, have one module for the normal stack, and then another module that consumes the outputs of the first and describes the extended resources needed for HA mode.
Here's an example of the second of those approaches, just to illustrate what I mean by it:
module "primary_example" {
source = "./primary_example"
# whatever arguments are needed
}
module "secondary_example" {
source = "./secondary_example"
# Make sure the primary module exports as outputs all of the
# values required to extend to HA mode, and then just pass
# that whole object through to secondary.
primary = module.primary_example
}
In a configuration that doesn't need HA mode you can then omit module "secondary_example".
The module composition patterns are about decomposing the configuration into small pieces that describe one self-contained capability and then letting the root module select from those capabilities whatever subset of them are relevant and connecting them in a suitable way.
In this case, I'm treating non-HA infrastructure as one capability and then HA extensions to that infrastructure as a second capability that depends on the first, connecting them together in a dependency inversion style so that the HA extensions can just assume that a non-HA deployment already exists and that information about it will be passed in by its caller.

Resources