Clarification on changes made outside of Terraform - terraform

I don't fully understand how Terraform handles external changes. Let's take an example:
resource "aws_instance" "ec2-test" {
ami = "ami-0d71ea30463e0ff8d"
instance_type = "t2.micro"
}
1: security group modification
The default security group has been manually replaced by another one. Terraform detects the change:
❯ terraform plan --refresh-only
aws_instance.ec2-test: Refreshing state... [id=i-5297abcc6001ce9a8]
Note: Objects have changed outside of Terraform
Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this plan:
# aws_instance.ec2-test has changed
~ resource "aws_instance" "ec2-test" {
id = "i-5297abcc6001ce9a8"
~ security_groups = [
- "default",
+ "test",
]
tags = {}
~ vpc_security_group_ids = [
+ "sg-8231be9a95a4b1886",
- "sg-f2fc3af19c4adefe0",
]
# (28 unchanged attributes hidden)
# (7 unchanged blocks hidden)
}
No change planned:
❯ terraform plan
aws_instance.ec2-test: Refreshing state... [id=i-5297abcc6001ce9a8]
No changes. Your infrastructure matches the configuration.
Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.
It seems normal as we did not set the security_groups argument in the resource block (the desired state is aligned with the current state).
2: IAM instance profile added
An IAM role has been manually attached to the instance. Terraform also detects the change:
❯ terraform plan --refresh-only
aws_instance.ec2-test: Refreshing state... [id=i-5297abcc6001ce9a8]
Note: Objects have changed outside of Terraform
Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this plan:
# aws_instance.ec2-test has changed
~ resource "aws_instance" "ec2-test" {
+ iam_instance_profile = "test"
id = "i-5297abcc6001ce9a8"
tags = {}
# (30 unchanged attributes hidden)
# (7 unchanged blocks hidden)
}
This is a refresh-only plan, so Terraform will not take any actions to undo these. If you were expecting these changes then you can apply this plan to record the updated values in the Terraform state without
changing any remote objects.
However, Terraform also plans to revert the change:
❯ terraform plan
aws_instance.ec2-test: Refreshing state... [id=i-5297abcc6001ce9a8]
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# aws_instance.ec2-test will be updated in-place
~ resource "aws_instance" "ec2-test" {
- iam_instance_profile = "test" -> null
id = "i-5297abcc6001ce9a8"
tags = {}
# (30 unchanged attributes hidden)
# (7 unchanged blocks hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
I tried to figure out why these two changes don't produce the same effect. This article highlights differences depending on the argument default values: https://nedinthecloud.com/2021/12/23/terraform-apply-when-external-change-happens/
But the security_groups and iam_instance_profile arguments seems similar (optional with no default value), so why Terraform is handling these two cases differently?
(tested with Terraform v1.2.2, hashicorp/aws 4.21.0)

The handling of these situations unfortunately depends a lot on decisions made by the provider developer, since it's the provider's responsibility to decide how to reconcile any differences between the configuration and the prior state. (The "prior state" is what Terraform calls the state that results from running the "refresh" steps to synchronize with the remote system).
Terraform Core takes the values you've defined in the configuration (if any optional arguments are unset, Terraform Core uses null to represent that) and the values from the prior state and sends both of them to the provider to implement the planning step. The provider can then do whatever logic it wants as long as the planned new value for each attribute is consistent with the input. "Consistent" means that one of the following conditions is true:
The planned value is equal to the value set in the configuration.
This is the most straightforward situation to follow, but there are various reasons why a provider might not do this, which I'll discuss later.
The planned value is equal to the value stored in the prior state.
This represents situations where the value in the prior state is functionally equivalent to the value in the configuration but not exactly equal, such as if the remote system treats a particular string as case insensitive and the two values differ only in case.
The provider indicated in its schema that this is a value that can be decided by the remote system, such as an object ID that's generated by the remote system during the apply step, and the corresponding value in the configuration was null to represent the argument not being set at all.
In this case the provider gets to choose whichever value it wants, because the configuration says nothing about the attribute and thus the remote system has authority on what the value is.
From what you've described, it sounds like in your first example the provider used approach number 3, while in the second example the provider used approach number 1.
Since I am not the developer of this provider I cannot say for certain why the developers made the decisions they did here, but one common reason why a provider developer might choose option three is for situations where a particular value can potentially be set by multiple different resource types, in which case the provider might be designed to treat an absent argument in the configuration as meaning "keep whatever the remote system already has", whereas a non-null argument in the configuration would mean "set the remote system to use this given value".
For iam_instance_profile it seems like the provider considers null to be a valid configuration value for that argument and uses it to represent the EC2 instance having no associated instance profile at all. For vpc_security_groups and security_groups though, leaving the argument set to null in the configuration (or omitting it, which is equivalent) the provider treats that as "keep whatever the remote system has", and so Terraform just acknowledges the change but doesn't propose to undo it.
Based on my knowledge about EC2, I can guess that the reason here is probably that the underlying EC2 API has two different ways to set security groups: you can either use the legacy EC2-Classic style of specifying a security group by name (the security_groups argument in the provider), or the new EC2-VPC style of specifying it by ID (the vpc_security_group_ids argument in the provider). Whichever of the two you choose, the remote system will presumably populate the other one automatically and therefore without this special exception in the provider it would be impossible for any configuration to converge unless you set both security_groups and vpc_security_group_ids and set them to both refer to the same security groups. To avoid that, I think the provider just lets whichever one of the two you left unset automatically track the remote system, which has the side-effect the provider cannot automatically "fix" changes made outside of Terraform unless you set at least one of them so the provider can see what the correct value ought to be.
Terraform's ability to reconcile changes in the remote system by resetting back to match the configuration is a "best effort" mechanism because in many cases that requirement comes into conflict with other requirements, and provider developers must therefore decide on a case-by-case basis what to prioritize. Although Terraform does try its best to tell you about changes outside of Terraform and to propose fixing them where possible, the only certain way to keep your Terraform configuration and your remote system synchronized is to prevent anyone from making changes outside of Terraform, for example using IAM policies in AWS.

Related

Shall we include ForceNew or throw an error during "terraform apply" for a new TF resource?

Context: Implementation a Terraform Provider via TF Provider SDKv2 by following an official tutorial.
As a result, all of the schema elements in the corresponding Terraform Provider resource aws_launch_configuration are marked as ForceNew: true. This behavior instructs Terraform to first destroy and then recreate the resource if any of the attributes change in the configuration, as opposed to trying to update the existing resource.
TF tutorial suggests we should add ForceNew: true for every non-updatable field like:
"base_image": {
Type: schema.TypeString,
Required: true,
ForceNew: true,
},
resource "example_instance" "ex" {
name = "bastion host"
base_image = "ubuntu_17.10" # base_image updates are not supported
}
However one might run into the following:
Let's consider "important" resources foo_db_instance (a DB instance that should be deleted / recreated in exceptional scenarios) (related unanswered question) that has name attribute:
resource "foo_db_instance" "ex" {
name = "bar" # name updates are not supported
...
}
However its underlying API was written in a weird way and it doesn't support updates for name attribute. There're 2 options:
Following approach of the tutorial, we might add ForceNew: true and then, if a user doesn't pay attention to terraform plan output it might recreate foo_db_instance.ex when updating name attribute by accident that will create an outage.
Don't follow the approach from the tutorial and don't add ForceNew: true. As a result terraform plan will not output the error and it will make it look like the update is possible. However when running terraform apply a user will run into an error, if we add a custom code to resourceUpdate() like this:
func resourceUpdate(ctx context.Context, d *schema.ResourceData, meta interface{}) diag.Diagnostics {
if d.HasChanges("name) {
return diag.Errorf("error updating foo_db_instance: name attribute updates are not supported")
}
...
}
There're 2 disadvantages of this approach:
non-failing output of terraform plan
we might need some hack to restore tf state to override d.Set(name, oldValue).
Which approach should be preferrable?
I know there's prevent_destroy = true lifecycle attribute but it seems like it won't prevent this specific scenario (it only prevents from accidental terraform destroy).
The most typical answer is to follow your first option, and then allow Terraform to report in its UI that the change requires replacement and allow the user to decide how to proceed.
It is true that if someone does not read the plan output then they can potentially make a change they did not intend to make, but in that case the user is not making use of the specific mechanism that Terraform provides to help users avoid making undesirable changes.
You mentioned prevent_destroy = true and indeed that this a setting that's relevant to this situation, and is in fact exactly what that option is for: it will cause Terraform to raise an error if the plan includes a "replace" action for the resource that was annotated with that setting, thereby preventing the user from accepting the plan and thus from destroying the object.
Some users also wrap Terraform in automation which will perform more complicated custom policy checks on the generated plan, either achieving a similar effect as prevent_destroy (blocking the operation altogether) or alternatively just requiring an additional confirmation to help ensure that the operator is aware that something unusual is happening. For example, in Terraform Cloud a programmatic policy can report a "soft failure" which causes an additional confirmation step that might be approvable only by a smaller subset of operators who are better equipped to understand the impact of what's being proposed.
It is in principle possible to write logic in either the CustomizeDiff function (which runs during planning) or the Update function (which runs during the apply step) to return an error in this or any other situation you can write logic for in the Go programming language. Of these two options I would say that CustomizeDiff would be preferable since that would then prevent creating a plan at all, rather than allowing the creation of a plan and then failing partway through the apply step, when some other upstream changes may have already been applied.
However, to do either of these would be inconsistent with the usual behavior users expect for Terraform providers. The intended model is for a Terraform provider to describe the effect of a change as accurately as possible and then allow the operator to make the final decision about whether the proposed change is acceptable, and to cancel the plan and choose another strategy if not.

Provider requires dynamic output of resource: what to do?

I am successfully creating a vmc_sddc resource. One of the attributes returned from that is "nsxt_reverse_proxy_url".
I need to use the "nsxt_reverse_proxy_url" value for another provider's (nsxt) input.
Unfortunately, Terraform rejects this construct saying the "host name must be provided". In other words, the dynamic value is not accepted as input.
Question: Is there any way to use the dynamically-created value from a resource as input to another provider?
Here is the code:
resource "vmc_sddc" "harpoon_sddc" {
sddc_name = var.sddc_name
vpc_cidr = var.vpc_cidr
num_host = 1
provider_type = "AWS"
region = data.vmc_customer_subnets.my_subnets.region
vxlan_subnet = var.vxlan_subnet
delay_account_link = false
skip_creating_vxlan = false
sso_domain = "vmc.local"
deployment_type = "SingleAZ"
sddc_type = "1NODE"
}
provider "nsxt" {
host = vmc_sddc.harpoon_sddc.nsxt_reverse_proxy_url // DOES NOT WORK
vmc_token = var.api_token
allow_unverified_ssl = true
enforcement_point = "vmc-enforcementpoint"
}
Here is the error message from Terraform:
╷
│ Error: host must be provided
│
│ with provider["registry.terraform.io/vmware/nsxt"],
│ on main.tf line 55, in provider "nsxt":
│ 55: provider "nsxt" {
│
Thank you
As you've found, some providers cannot handle unknown values as part of their configuration during planning, and so it doesn't work to dynamically configure them based on objects being created in the same run in the way you tried.
In situations like this, there are two main options:
On your first run you can use terraform apply -target=vmc_sddc.harpoon_sddc to ask Terraform to focus only on the objects needed to create that one object, excluding anything related to the nsxt provider. Once that apply completes successfully you can then run terraform apply as normal and Terraform will already know the value of vmc_sddc.harpoon_sddc.nsxt_reverse_proxy_url so the provider configuration can succeed.
This is typically the best choice for a long-lived configuration that you don't expect to be recreating often, since you can just do this one-off extra step once during initial creation and then use Terraform as normal after that, as long as you never need to recreate vmc_sddc.harpoon_sddc.
You can split the configuration into two separate configurations handling the different layers. The first layer would be responsible for the "vmc" level of abstraction, allowing you to terraform apply that in isolation, and then the second configuration would be responsible for the "nsxt" level of abstraction building on top, which you can run terraform apply on once you've got the first configuration running.
This is a variant of the first option where the separation between the first and second steps is explicit in the configuration structure itself, which means that you don't need to add any extra options when you run Terraform but you do now need to manage two configurations. This approach is therefore better than the first only if you will be routinely destroying and re-creating these objects, so that you can make it explicit in the code that this is a two-step process.
In principle some providers can be designed to tolerate unknown values as input and do offline planning in that case, but it isn't technically possible for all providers because sometimes there really isn't any way to create a meaningful plan without connecting to the remote system to ask it questions. I'm not familiar with this provider so I don't know if it's requiring a hostname for a strong technical reason or just because the provider developers didn't consider the possibility that you might use it in this way, and so if your knowledge of nsxt leads you to think that it might be possible in principle for it to do offline planning then a third option would be to ask the developers if it would be feasible to defer connecting to the given host until the apply phase, in which case you wouldn't need to do any extra steps like the above.

Terraform ignore_changes for resource output

Is there anyway to ignore changes to resource output? Or tell terraform to not refresh it?
A terraform resource I'm using returns a state_info output (map of string) that can be modified by processes outside of Terraform. I want to ignore these changes. Is this possible.
resource "aiven_vpc_peering_connection" "this" {
lifecycle {
ignore_changes = [
state_info
]
}
}
state_info is getting set to null outside of Terraform. I'm using state_info in other terraform resources. It's failing with aiven_vpc_peering_connection.this.state_info is empty map of string on subsequent terraform plans I run
The ignore_changes mechanism instructs Terraform to disregard a particular argument when it's comparing values in the configuration with values in the prior state snapshot, so it doesn't have any effect for attributes that are only saved in the prior state due to them not being explicitly configurable.
It sounds like what you want is instead to have Terraform disregard a particular argument when it's updating the prior state to match remote objects (the "refresh" step), so that the result would end up being a mixture of new content from the remote API and content previously saved in the state. Terraform has no mechanism to achieve that: the values stored in the state after refreshing are exactly what the provider returned. This guarantee can be important for some resource types because retaining an old value for one argument while allowing others to change could make the result inconsistent, if e.g. the same information is presented in multiple different ways.
The closest you can get to what you described is to use the value as exported by the upstream resource and then specify ignore_changes on the resource where you ultimately use that value, telling Terraform to ignore the changes in the upstream object when comparing the downstream object with its configuration.

In Terraform 0.12, how to skip creation of resource, if resource name already exists?

I am using Terraform version 0.12. I have a requirement to skip resource creation if resource with the same name already exists.
I did the following for this :
Read the list of custom images,
data "ibm_is_images" "custom_images" {
}
Check if image already exists,
locals {
custom_vsi_image = contains([for x in data.ibm_is_images.custom_images.images: "true" if x.visibility == "private" && x.name == var.vnf_vpc_image_name], "true")
}
output "abc" {
value="${local.custom_vsi_image}"
}
Create only if image exists is false.
resource "ibm_is_image" "custom_image" {
count = "${local.custom_vsi_image == true ? 0 : 1}"
depends_on = ["data.ibm_is_images.custom_images"]
href = "${local.image_url}"
name = "${var.vnf_vpc_image_name}"
operating_system = "centos-7-amd64"
timeouts {
create = "30m"
delete = "10m"
}
}
This works fine for the first time with "terraform apply". It finds that the image did not exists, so it creates image.
When I run "terraform apply" for the second time. It is deleting the resource "custom_image" that is created above. Any idea why it is deleting the resource, when it is run for the 2nd time ?
Also, how to create a resource based on some condition(like only when it does not exists) ?
In Terraform, you're required to decide explicitly what system is responsible for the management of a particular object, and conversely which systems are just consuming an existing object. There is no way to make that decision dynamically, because that would make the result non-deterministic and -- for objects managed by Terraform -- make it unclear which configuration's terraform destroy would destroy the object.
Indeed, that non-determinism is why you're seeing Terraform in your situation flop between trying to create and then trying to delete the resource: you've told Terraform to only manage that object if it doesn't already exist, and so the first time you run Terraform after it exists Terraform will see that the object is no longer managed and so it will plan to destroy it.
If you goal is to manage everything with Terraform, an important design task is to decide how object dependencies flow within and between Terraform configurations. In your case, it seems like there is a producer/consumer relationship between a system that manages images (which may or may not be a Terraform configuration) and one or more Terraform configurations that consume existing images.
If the images are managed by Terraform then that suggests either that your main Terraform configuration should assume the image does not exist and unconditionally create it -- if your decision is that the image is owned by the same system as what consumes it -- or it should assume that the image does already exist and retrieve the information about it using a data block.
A possible solution here is to write a separate Terraform configuration that manages the image and then only apply that configuration in situations where that object isn't expected to already exist. Then your configuration that consumes the existing image can just assume it exists without caring about whether it was created by the other Terraform configuration or not.
There's a longer overview of this situation in the Terraform documentation section Module Composition, and in particular the sub-section Conditional Creation of Objects. That guide is focused on interactions between modules in a single configuration, but the same underlying principles apply to dependencies between configurations (via data sources) too.

No partial configuration for terraform_remote_state backend?

Partial Configuration allows us to specify backend configurations from command line.
terraform init \
-backend-config="region=${AWS_DEFAULT_REGION}" \
-backend-config="bucket=${TF_VAR_BACKEND_BUCKET}" \
-backend-config="key=${TF_VAR_BACKEND_KEY}" \
-backend-config="encrypt=true"
Having thought the same can be used for terraform_remote_state.
data "terraform_remote_state" "vpc" {
backend = "s3"
config { }
}
However, it causes the error.
Error: Error refreshing state: 1 error(s) occurred:
* data.terraform_remote_state.vpc: 1 error(s) occurred:
* data.terraform_remote_state.vpc: data.terraform_remote_state.vpc: InvalidParameter: 1 validation error(s) found.
- minimum field size of 1, GetObjectInput.Key.
It looks terraform_remote_state requires explicit configurations as indicated in Terraform terraform_remote_state Partial Configuration.
data "terraform_remote_state" "vpc" {
backend = "s3"
config {
encrypt = "true"
bucket = "${var.BACKEND_BUCKET}"
key = "${var.BACKEND_KEY}"
}
}
Question
Is there a way to use the partial configuration or is it current limitation of Terraform not being able to use partial configuration for terraform_remote_state?
The partial configurations only applies to initialization of early parameters before any variables are evaluated.
The concept does not apply to "normal" resources (and in this sense, a data block is "normal"). However, since you hold your secrets in corresponding TF_VAR_* environment varibles, explicitly stating those seems better than implicitly relying on their presence. The code is clearer, and all used values are stated in the code. This is good practice.
So the question is: Why would you want to avoid to explicily state the required variables?
Addendum:
As you indicated in the comments, you want
a single location to hold one information
As you are using environment variables in your initialization process (via --backend-config parameter) and in your code (via variable access to environment variables), you are effectively using one single place to manage the information for both entries!
(Note that the possibility to omit the values in the backend is a mere workaround due to the order terraform processes the files.)
Please also reconsider the difference between backend (this is, where terraform saves its state to) and remote_state (this is just a normal data provider that gives information about any remote state you might desire (even those which are on completely separate cloud instances, accessed by potentially different credentials)). Thus, specifying the credentials explicitly as those used by the backend, is a special usecase.

Resources