How to set timeout for terraform apply? - terraform

Many a times I get '[TRACE] dag/walk: vertex ' when I apply terraform apply on certain tf. I would like to set timeout instead of going on forever.
Thanks
Several examples - https://github.com/hashicorp/terraform/issues/16458
https://github.com/terraform-providers/terraform-provider-aws/issues/2068
But all of them focus on specific solution. I dont want inifinite loops whatsoever reason I just want a flag for apply that would stop trying after certain time. Iam thinking of an external command to kill it but I want to see if there is actual terraform solution before I implement it.

Today Terraform SDK have special fields for resources timeouts. Official documentation here.
For example, you can add timeouts for some operations in resource description:
resource "<resource_name>" "<resource_name>" {
...
timeouts {
create = "1h30m",
update = "2h",
delete = "20m"
}
}

Related

Shall TF Provider delete resources from state if the resource is in "DELETING" state (similarly to 404)?

Context: I'm creating a new TF provider.
TF official docs say that
When you create something in Terraform but delete it manually, Terraform should gracefully handle it. If the API returns an error when the resource doesn't exist, the read function should check to see if the resource is available first. If the resource isn't available, the function should set the ID to an empty string so Terraform "destroys" the resource in state. The following code snippet is an example of how this can be implemented; you do not need to add this to your configuration for this tutorial.
if resourceDoesntExist {
d.SetID("")
return
}
It's pretty clear when resourceDoesntExist := response.code == 404 but what about the case where the resource is in DELETING state (which means that the resource is going to be removed in like 30 minutes and at that point GET request will start returning 404).
Shall it be treated as 404 too? What about the corresponding data source, shall it return an error?

How to avoid "Objects have changed outside of Terraform"?

Recently upgraded my Terraform project to AWS provider 3.74.0 and TF 1.1.4 (from much older versions).
I'm suddenly getting this autoscaling schedule reporting external changes:
resource "aws_autoscaling_schedule" "api-svc-tst-down-schedule" {
scheduled_action_name = "api-svc-tst-down-schedule"
min_size = 0
max_size = 1
desired_capacity = 0
// Minute Hour DayOfMonth Month DayOfWeek
recurrence = "0 13 * * *"
autoscaling_group_name = aws_autoscaling_group.api-svc-tst-asg.name
lifecycle {
ignore_changes = [start_time]
}
}
The plan command is now reporting:
Note: Objects have changed outside of Terraform
Terraform detected the following changes made outside of Terraform since the
last "terraform apply":
# aws_autoscaling_schedule.api-svc-tst-down-schedule has changed
~ resource "aws_autoscaling_schedule" "api-svc-tst-down-schedule" {
id = "api-svc-tst-down-schedule"
~ start_time = "2022-01-31T13:00:00Z" -> "2022-02-01T13:00:00Z"
# (7 unchanged attributes hidden)
}
If I apply the plan, it doesn't appear that TF changes the ASG (I'm assuming it just updates its state file) and the notification goes away until the next day.
I note that the AWS console does show that the Scheduled action has a Start time, which seems to be being set by AWS.
I tried adding start_time to ignored_changes but it didn't seem to make a difference, still reported as externally changed.
Is this a known issue with Terraform (I'm not seeing anything via googling)?
How can I prevent TF from being marked as externally changed?
Edit: I also tried setting the start_time attribute as suggested in the comments. But the detected changes warning came back the next day.
Edit 2: I also tried deleting and re-adding the resource via Terraform, but it still gets marked as changed the next day.
This undesirable behavior was an intentional change introduced in Terraform version 0.15.4.
It cannot currently be avoided. The only workaround is that all team members (and tooling) must be educated to ignore "expected drift".
Note that this "expected drift" behavior is not limited to just aws_autoscaling_schedule resources, or even just the AWS provider. The issue happens on many different platforms/types for any resource where the cloud vendor updates the attribute after the resource is created.
Many resources will report drift immediately after being created - often you can get rid of the report by immediately doing an apply or refresh to update the TF state and as long as AWS doesn't make changes to those attributes, you won't see the resource reported as changed again.
Other resource attributes (like aws_autoscaling_schedule.start_time) get updated by the cloud vendor regularly. These types of resources will intermittently report "Objects have changed outside of Terraform", whenever you run plan.
There is a locked open issue to track: https://github.com/hashicorp/terraform/issues/28803.
Note that the issue is locked because Hashicorp got tired of people telling them how negatively this affects their teams.

Terraform providers - how would you represent a resource that doesn't have clearly defined CRUD operations?

For work I'm learning Go and Terraform. I read in their tutorial how the different contexts are defined but I'm not clear on exactly when these different contexts are called and what triggers them.
From looking at the Hashicups example it looks like when you put this:
resource "hashicups_order" "new" {
items {
coffee {
id = 3
}
quantity = 2
}
items {
coffee {
id = 2
}
quantity = 2
}
}
in your Terraform file that is going to go look at hashicups_order remove the hashicups prefix and look for a resource called order. The order resource provides the following contexts:
func resourceOrder() *schema.Resource {
return &schema.Resource{
CreateContext: resourceOrderCreate,
ReadContext: resourceOrderRead,
UpdateContext: resourceOrderUpdate,
DeleteContext: resourceOrderDelete,
What isn't clear to me is what triggers each context . From that example it seems like since you are increasing the value of quantity it will trigger the update context. If this were the first run and no previous state existed it would trigger create etc.
However it my case the resource is a server and one API resource I want to present to the user is server power control. However you would never "create/destroy" this resource... or would you? You could read the current power state and you could update the power state but, at least intuitively, you wouldn't create or destroy it. I'm having trouble wrapping my head around how this would be modeled in Terraform/Go. I conceptually understand the coffee resource in the example but I'm having trouble making the leap to imagining what that looks like as something like a server power capability or other things without a clear matching to the different CRUD operations.

terraform 0.13.5 resources overwrite each other on consecutive calls

I am using terraform 0.13.5 to create aws_iam resources
I have 2 terraform resources as follows
module "calls_aws_iam_policy_attachment" {
# This calls an external module to
# which among other things creates a policy attachment
# resource attaching the roles to the policy
source = ""
name = "xoyo"
roles = ["rolex", "roley"]
policy_arn = "POLICY_NAME"
}
resource "aws_iam_policy_attachment" "policies_attached" {
# This creates a policy attachment resource attaching the roles to the policy
# The roles here are a superset of the roles in the above module
roles = ["role1", "role2", "rolex", "roley"]
policy_arn = "POLICY_NAME"
name = "NAME"
# I was hoping that adding the depends on block here would mean this
# resource is always created after the above module
depends_on = [ module.calls_aws_iam_policy_attachment ]
}
The first module creates a policy and attaches some roles. I cannot edit this module
The second resource attaches more roles to the same policy along with other policies
the second resource depends_on the first resource, so I would expect that the policy attachements of the second resource always overwrite those of the first resource
In reality, the policy attachments in each resource overwrite each other on each consecutive build. So that on the first build, the second resources attachments are applied and on the second build the first resources attachements are applied and so on and so forth.
Can someone tell me why this is happening? Does depends_on not work for resources that overwrite each other?
Is there an easy fix without combining both my resources together into the same resource?
As to why this is happening:
during the first run terraform deploys the first resources, then the second ones - this order is due to the depends_on relation (the next steps work regardless of any depends_on). The second ones overwrite the first ones
during the second deploy terraform looks at what needs to be done:
the first ones are missing (were overwritten), they need to be created
the second ones are fine, terraform ignores them for this update
now only the first ones will be created and they will overwrite the second ones
during the third run the same happens but the exact other way around, seconds are missing, first are ignored, second overwrite first
repeat as often as you want, you will never end up with a stable deployment.
Solution: do not specify conflicting things in terraform. Terraform is supposed to be a description of what the infrastructure should look like - and saying "this resource should only have property A" and "this resource should only have property B" is contradictory, terraform will not be able to handle this gracefully.
What you should do specifically: do not use aws_iam_policy_attachment, basically ever, look at the big red box in the docs. Use multiple aws_iam_role_policy_attachment instead, they are additive, they will not overwrite each other.

Right way to delete and then reindex ES documents

I have a python3 script that attempts to reindex certain documents in an existing ElasticSearch index. I can't update the documents because I'm changing from an autogenerated id to an explicitly assigned id.
I'm currently attempting to do this by deleting existing documents using delete_by_query and then indexing once the delete is complete:
self.elasticsearch.delete_by_query(
index='%s_*' % base_index_name,
doc_type='type_a',
conflicts='proceed',
wait_for_completion=True,
refresh=True,
body={}
)
However, the index is massive, and so the delete can take several hours to finish. I'm currently getting a ReadTimeoutError, which is causing the script to crash:
WARNING:elasticsearch:Connection <Urllib3HttpConnection: X> has failed for 2 times in a row, putting on 120 second timeout.
WARNING:elasticsearch:POST X:9200/base_index_name_*/type_a/_delete_by_query?conflicts=proceed&wait_for_completion=true&refresh=true [status:N/A request:140.117s]
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='X', port=9200): Read timed out. (read timeout=140)
Is my approach correct? If so, how can I make my script wait long enough for the delete_by_query to complete? There are 2 timeout parameters that can be passed to delete_by_query - search_timeout and timeout, but search_timeout defaults to no timeout (which is I think what I want), and timeout doesn't seem to do what I want. Is there some other parameter I can pass to delete_by_query to make it wait as long as it takes for the delete to finish? Or do I need to make my script wait some other way?
Or is there some better way to do this using the ElasticSearch API?
You should set wait_for_completion to False. In this case you'll get task details and will be able to track task progress using corresponding API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query-task-api
Just to explain more in the form of codebase explained by Random for the newbee in ES/python like me:
ES = Elasticsearch(['http://localhost:9200'])
query = {'query': {'match_all': dict()}}
task_id = ES.delete_by_query(index='index_name', doc_type='sample_doc', wait_for_completion=False, body=query, ignore=[400, 404])
response_task = ES.tasks.get(task_id) # check if the task is completed
isCompleted = response_task["completed"] # if complete key is true it means task is completed
One can write custom definition to check if the task is completed in some interval using while loop.
I have used python 3.x and ElasticSearch 6.x
You can use the 'request_timeout' global param. This will reset the Connections timeout settings, as mentioned here
For example -
es.delete_by_query(index=<index_name>, body=<query>,request_timeout=300)
Or set it at connection level, for example
es = Elasticsearch(**(get_es_connection_parms()),timeout=60)

Resources