How to avoid "Objects have changed outside of Terraform"? - terraform

Recently upgraded my Terraform project to AWS provider 3.74.0 and TF 1.1.4 (from much older versions).
I'm suddenly getting this autoscaling schedule reporting external changes:
resource "aws_autoscaling_schedule" "api-svc-tst-down-schedule" {
scheduled_action_name = "api-svc-tst-down-schedule"
min_size = 0
max_size = 1
desired_capacity = 0
// Minute Hour DayOfMonth Month DayOfWeek
recurrence = "0 13 * * *"
autoscaling_group_name = aws_autoscaling_group.api-svc-tst-asg.name
lifecycle {
ignore_changes = [start_time]
}
}
The plan command is now reporting:
Note: Objects have changed outside of Terraform
Terraform detected the following changes made outside of Terraform since the
last "terraform apply":
# aws_autoscaling_schedule.api-svc-tst-down-schedule has changed
~ resource "aws_autoscaling_schedule" "api-svc-tst-down-schedule" {
id = "api-svc-tst-down-schedule"
~ start_time = "2022-01-31T13:00:00Z" -> "2022-02-01T13:00:00Z"
# (7 unchanged attributes hidden)
}
If I apply the plan, it doesn't appear that TF changes the ASG (I'm assuming it just updates its state file) and the notification goes away until the next day.
I note that the AWS console does show that the Scheduled action has a Start time, which seems to be being set by AWS.
I tried adding start_time to ignored_changes but it didn't seem to make a difference, still reported as externally changed.
Is this a known issue with Terraform (I'm not seeing anything via googling)?
How can I prevent TF from being marked as externally changed?
Edit: I also tried setting the start_time attribute as suggested in the comments. But the detected changes warning came back the next day.
Edit 2: I also tried deleting and re-adding the resource via Terraform, but it still gets marked as changed the next day.

This undesirable behavior was an intentional change introduced in Terraform version 0.15.4.
It cannot currently be avoided. The only workaround is that all team members (and tooling) must be educated to ignore "expected drift".
Note that this "expected drift" behavior is not limited to just aws_autoscaling_schedule resources, or even just the AWS provider. The issue happens on many different platforms/types for any resource where the cloud vendor updates the attribute after the resource is created.
Many resources will report drift immediately after being created - often you can get rid of the report by immediately doing an apply or refresh to update the TF state and as long as AWS doesn't make changes to those attributes, you won't see the resource reported as changed again.
Other resource attributes (like aws_autoscaling_schedule.start_time) get updated by the cloud vendor regularly. These types of resources will intermittently report "Objects have changed outside of Terraform", whenever you run plan.
There is a locked open issue to track: https://github.com/hashicorp/terraform/issues/28803.
Note that the issue is locked because Hashicorp got tired of people telling them how negatively this affects their teams.

Related

Shall TF Provider delete resources from state if the resource is in "DELETING" state (similarly to 404)?

Context: I'm creating a new TF provider.
TF official docs say that
When you create something in Terraform but delete it manually, Terraform should gracefully handle it. If the API returns an error when the resource doesn't exist, the read function should check to see if the resource is available first. If the resource isn't available, the function should set the ID to an empty string so Terraform "destroys" the resource in state. The following code snippet is an example of how this can be implemented; you do not need to add this to your configuration for this tutorial.
if resourceDoesntExist {
d.SetID("")
return
}
It's pretty clear when resourceDoesntExist := response.code == 404 but what about the case where the resource is in DELETING state (which means that the resource is going to be removed in like 30 minutes and at that point GET request will start returning 404).
Shall it be treated as 404 too? What about the corresponding data source, shall it return an error?

Understanding the difference between ~ and -

I'm working on importing one of our rds instance into terraform.
Terraform plan shows ~ and -
~ maintenance_window = "sat:06:10-sat:06:40" -> (known after apply)
- max_allocated_storage = 0 -> null
Both of these values are not defined in the configuration, I would like to understand, why it is showing - and Do we configure null variables also in the module?
Using Terraform 0.12.28
Basically:
~ the value is in the state and is changing after the plan
- the value is in the state and you are trying to remove it (null value)
maintenance_window is showing ~ because its value is going to change, and in your specific case its value is computed and hence known after applying the changes. From the docs:
maintenance_window - (Optional) The window to perform maintenance in. Syntax: "ddd:hh24:mi-ddd:hh24:mi". Eg: "Mon:00:00-Mon:03:00". See RDS Maintenance Window docs for more information.
If that window is fine for you, you can specify that as an argument or let Terraform change it to its default value.
max_allocated_storage is showing - because when you imported the resource in the state, it imported all Terraform known arguments, but you are not specifying that one. In particular from the docs:
max_allocated_storage - (Optional) When configured, the upper limit to which Amazon RDS can automatically scale the storage of the DB instance. Configuring this will automatically ignore differences to allocated_storage. Must be greater than or equal to allocated_storage or 0 to disable Storage Autoscaling.
In this case you can set max_allocated_storage = 0 in order to not show any change in the plan for that argument

terraform 0.13.5 resources overwrite each other on consecutive calls

I am using terraform 0.13.5 to create aws_iam resources
I have 2 terraform resources as follows
module "calls_aws_iam_policy_attachment" {
# This calls an external module to
# which among other things creates a policy attachment
# resource attaching the roles to the policy
source = ""
name = "xoyo"
roles = ["rolex", "roley"]
policy_arn = "POLICY_NAME"
}
resource "aws_iam_policy_attachment" "policies_attached" {
# This creates a policy attachment resource attaching the roles to the policy
# The roles here are a superset of the roles in the above module
roles = ["role1", "role2", "rolex", "roley"]
policy_arn = "POLICY_NAME"
name = "NAME"
# I was hoping that adding the depends on block here would mean this
# resource is always created after the above module
depends_on = [ module.calls_aws_iam_policy_attachment ]
}
The first module creates a policy and attaches some roles. I cannot edit this module
The second resource attaches more roles to the same policy along with other policies
the second resource depends_on the first resource, so I would expect that the policy attachements of the second resource always overwrite those of the first resource
In reality, the policy attachments in each resource overwrite each other on each consecutive build. So that on the first build, the second resources attachments are applied and on the second build the first resources attachements are applied and so on and so forth.
Can someone tell me why this is happening? Does depends_on not work for resources that overwrite each other?
Is there an easy fix without combining both my resources together into the same resource?
As to why this is happening:
during the first run terraform deploys the first resources, then the second ones - this order is due to the depends_on relation (the next steps work regardless of any depends_on). The second ones overwrite the first ones
during the second deploy terraform looks at what needs to be done:
the first ones are missing (were overwritten), they need to be created
the second ones are fine, terraform ignores them for this update
now only the first ones will be created and they will overwrite the second ones
during the third run the same happens but the exact other way around, seconds are missing, first are ignored, second overwrite first
repeat as often as you want, you will never end up with a stable deployment.
Solution: do not specify conflicting things in terraform. Terraform is supposed to be a description of what the infrastructure should look like - and saying "this resource should only have property A" and "this resource should only have property B" is contradictory, terraform will not be able to handle this gracefully.
What you should do specifically: do not use aws_iam_policy_attachment, basically ever, look at the big red box in the docs. Use multiple aws_iam_role_policy_attachment instead, they are additive, they will not overwrite each other.

Terraform doesn't seem to pick up manual changes

I have a very frustrating Terraform issue, I made some changes to my terraform script which failed when I applied the plan. I've gone through a bunch of machinations and probably made the situation worse as I ended up manually deleting a bunch of AWS resources in trying to resolve this.
So now I am unable to use Terraform at all (refresh, plan, destroy) all get the same error.
The Situation
I have a list of Fargate services, and a set of maps which correlate different features of the fargate services such as the "Target Group" for the load balancer (I've provided some code below). The problem appears to be that Terraform is not picking up that these resources have been manually deleted or is somehow getting confused because they don't exist. At this point if I run a refresh, plan or destroy I get an error stating that a specific list is empty, even though it isn't (or should not be).
In the failed run I added a new service to the list below along with a new url (see code below)
Objective
At this point I would settle for destroying the entire environment (its my dev environment), however; ideally I want to just get the system working such that Terraform will detect the changes and work properly.
Terraform Script is Valid
I have reverted my Terraform scripts back to the last known good version. I have run the good version against our staging environment and it works fine.
Configuration Info
MacOS Mojave 10.14.6 (18G103)
Terraform v0.12.24.
provider.archive v1.3.0
provider.aws v2.57.0
provider.random v2.2.1
provider.template v2.1.2
The Terraform state file is being stored in a S3 bucket, and terraform init --reconfigure has been called.
What I've done
I was originally getting a similar error but it was in a different location, after many hours Googling and trying stuff (which I didn't write down) I decided to manually remove the AWS resources associated with the problematic code (the ALB, Target Groups, security groups)
Example Terraform Script
Unfortunately I can't post the actual script as it is private, but I've posted what I believe is the pertinent parts but have redacted some info. The reason I mention this is that any syntax type error you might see would be caused by this redaction, as I stated above the script works fine when run in our staging environment.
globalvars.tf
In the root directory. In the case of the failed Terraform run I added a new name to the service_names (edd = "edd") list (I added as the first element). In the service_name_map_2_url I added the new entry (edd = "edd") as the last entry. I'm not sure if the fact that I added these elements in different 'order' is the problem, although it really shouldn't since I access the map via the name and not by index
variable "service_names" {
type = list(string)
description = "This is a list/array of the images/services for the cluster"
default = [
"alert",
"alert-config"
]
}
variable service_name_map_2_url {
type = map(string)
description = "This map contains the base URL used for the service"
default = {
alert = "alert"
alert-config = "alert-config"
}
}
alb.tf
In modules/alb. In this module we create an ALB and then a target group for each service, which looks like this. The items from globalvars.tf are passed into this script
locals {
numberOfServices = length(var.service_names)
}
resource "aws_alb" "orchestration_alb" {
name = "orchestration-alb"
subnets = var.public_subnet_ids
security_groups = [var.alb_sg_id]
tags = {
environment = var.environment
group = var.tag_group_name
app = var.tag_app_name
contact = var.tag_contact_email
}
}
resource "aws_alb_target_group" "orchestration_tg" {
count = local.numberOfServices
name = "${var.service_names[count.index]}-tg"
port = 80
protocol = "HTTP"
vpc_id = var.vpc_id
target_type = "ip"
deregistration_delay = 60
tags = {
environment = var.environment
group = var.tag_group_name
app = var.tag_app_name
contact = var.tag_contact_email
}
health_check {
path = "/${var.service_name_map_2_url[var.service_names[count.index]]}/health"
port = var.app_port
protocol = "HTTP"
healthy_threshold = 2
unhealthy_threshold = 5
interval = 30
timeout = 5
matcher = "200-308"
}
}
output.tf
This is the output of the alb.tf, other things are outputted but this is the one that matters for this issue
output "target_group_arn_suffix" {
value = aws_alb_target_group.orchestration_tg.*.arn_suffix
}
cloudwatch.tf
In modules/cloudwatch. I attempt to create a dashboard
data "template_file" "Dashboard" {
template = file("${path.module}/dashboard.json.template")
vars = {
...
alert-tg = var.target_group_arn_suffix[0]
alert-config-tg = var.target_group_arn_suffix[1]
edd-cluster-name = var.ecs_cluster_name
alb-arn-suffix = var.alb-arn-suffix
}
}
Error
When I run terraform refresh (or plan or destroy) I get the following error (I get the same error for alert-config as well)
Error: Invalid index
on modules/cloudwatch/cloudwatch.tf line 146, in data "template_file" "Dashboard":
146: alert-tg = var.target_group_arn_suffix[0]
|----------------
| var.target_group_arn_suffix is empty list of string
The given key does not identify an element in this collection value.
AWS Environment
I have manually deleted the ALB. Dashboard and all Target Groups. I would expect (and this has worked in the past) that Terraform would detect this and update its state file appropriately such that when running a plan it would know it has to create the ALB and target groups.
Thank you
Terraform trusts its state as the single source of truth. Using Terraform in the presence of manual change is possible, but problematic.
If you manually remove infrastructure, you need to run terraform state rm [resource path] on the manually removed resource.
Gruntwork has what they call The Golden Rule of Terraform:
The master branch of the live repository should be a 1:1 representation of what’s actually deployed in production.

How to set timeout for terraform apply?

Many a times I get '[TRACE] dag/walk: vertex ' when I apply terraform apply on certain tf. I would like to set timeout instead of going on forever.
Thanks
Several examples - https://github.com/hashicorp/terraform/issues/16458
https://github.com/terraform-providers/terraform-provider-aws/issues/2068
But all of them focus on specific solution. I dont want inifinite loops whatsoever reason I just want a flag for apply that would stop trying after certain time. Iam thinking of an external command to kill it but I want to see if there is actual terraform solution before I implement it.
Today Terraform SDK have special fields for resources timeouts. Official documentation here.
For example, you can add timeouts for some operations in resource description:
resource "<resource_name>" "<resource_name>" {
...
timeouts {
create = "1h30m",
update = "2h",
delete = "20m"
}
}

Resources