Terraform plan: Saved plan is stale - terraform

How do I force Terraform to rebuild its plans and tfstate files from scratch?
I'm considering moving my IAC from GCP's Deployment Manager to Terraform, so I thought I'd run a test, since my TF is pretttty rusty. In my first pass, I successfully deployed a network, subnet, firewall rule, and Compute instance. But it was all in a single file and didn't scale well for multiple environments.
I decided to break it out into modules (network and compute), and I was done with the experiment for the day, so I tore everything down with a terraform destroy
So today I refactored everything into its modules, and accidentally copypasta-ed the network resource from the network module to the compute module. Ran a terraform plan, and then a terraform apply, and it complained about the network already existing.
And I thought that it was because I had somehow neglected to tear down the network I'd created the night before? So I popped over to the GCP console, and yeah, it was there, so...I deleted it. In the UI. Sigh. I'm my own chaos engineer.
Anyway, somewhere right around there, I discovered my duplicate resource and removed it, realizing that the aforementioned complaint about the "network resource already existing" was coming from the 2nd module to run.
And I ran a terraform plan again, and it didn't complain about anything, so I ran a terraform apply, and that's when I got the "stale plan" error. I've tried the only thing I could think of - terraform destroy, terraform refresh - and then would try a plan and apply after that,
I could just start fresh from a new directory and new names on the tfstate/tfplan files, but it bothers me that I can't seem to reconcile this "stale plan" error. Three questions:
Uggh...what did I do wrong? Besides trying to write good code after a 2-hour meeting?
Right now this is just goofing around, so who cares if everything gets nuked? I'm happy to lose all created resources. What are my options in this case?
If I end up going to prod with this, obviously idempotence is a priority here, so what are my options then, if I need to perform some disaster recovery? (Ultimately, I would be using remote state to make sure we've got the tfstate file in a safe place.
I'm on Terraform 0.14.1, if that matters.

Saved plan is stale means out of date. Your plan is matching the current state of your infrastructure.
Either the infrastructure was changed outside of terraform or used terraform apply without -save flag.
Way 1: To fix that you could run terraform plan with the -out flag to save the new plan and re-apply it later on.
Way 2: But more easily I would use terraform refresh and after that terraform apply

I created the infrastructure via the gcloud CLI first for testing purposes. As soon as it was proven as working, I transferred the configuration to gitlab and encountered the same issue in one of my jobs. The issue disappeared after I changed the network's and cluster's names.

Related

Resolving broken deleted state in terraform

When terraform tries to deploy something and then times out in a state like pending or deleting the state will eventually update to successful or deleted but this never gets updated in the tf state so when I try to run something again it errors because the state doesn't match.
Error: error waiting for EC2 Transit Gateway VPC Attachment (tgw-attach-xxxxxxxxx) deletion: unexpected state 'failed', wanted target 'deleted'. last error: %!s(<nil>)
What is the correct way to handle this? Can I do something within terraform to get it to recognise the latest state in AWS? Is it a bug on tf's part?
tl; dr
It's probably less of a bug and more of a design choice.
You should investigate and if appropriate (e.g. the resource was created or deleted successfully and the state was not updated appropriately), you could either
run terraform refresh, which will cause Terraform to refresh its state file against what actually exists with the cloud provider
manually reconcile the situation by manipulating the Terraform state with the terraform state command, removing deleted resources or adding created resources
Detail
Unlike CloudFormation, Terraform's approach to 'failures' is to just drop everything and error out, leaving the operator to investigate the issue and attempt to resolve it themselves. As a result, operations which timeout are classed as failures and so the relevant resources are often not updated in Terraform's state.
Terraform does give us some recourse to handle this however. For one, we can manually manipulate Terraform's state file. We can add resources or remove resources from the state file as we like, though this should be done with caution.
We can also ask Terraform to 'refresh' its state, basically comparing the state file to reality. Implicitly this should remove resources which no longer exist, but it will not adopt resources into the state file which were provisioned outside of a successful Terraform run.
As an aside, timeouts relating to the interaction with any service provider, are a feature of the relevant Terraform Provider, in this case the AWS Provider. Only the Providers can expose configurable timeouts. For example, the AzureRM Provider does provide a means to configure timeouts, but it appears the AWS Provider does not.
Efforts are presumably made to incorporate sensible timeout values, but it's not unusual to see trivial operations take an age to complete properly.

Is there a way to reuse a terraform script and make changes to it?

I'm new to this terraform world and I've been assigned into the task of creating many configurations to azure with it.
I'm developing a main.tf script (which creates some resources, like resource group, vnets, kubernetes cluster, app services, etc.) and while coding it and executing
Terraform apply, it seems to only apply what changed doing in fact updates.
Then we deleted the resource group the script created and a colegue of mine had to run the same script with terraform creating a resource group with another name since i didn't had a required permission, after that, if i run the command Terraform apply it fails and gives errors, that say that the resource cannot be created because it already exists.
After reading some documentation i found that it might be because of the state
https://www.terraform.io/docs/state/index.html
Is the update of a script something that only works for each session of terraform?
Even doing a Terraform refresh doesn't seem to work.
Or probably I'm just mistaking and there is no way to update some resources.
EDIT: for some reason the state file that was on the storage only had a few things, the solution was to delete everything and create again.
For the new resources, there is nothing more, the Terraform script helps you create the resources you set in the script.
For the existing resources, when you make changes in the script that you already deployed via the Terraform, then it will check the state file to make sure what changes the resources should update. If there is no state file ( or you delete it), then it will deploy the Terraform script directly, but if any resources you want to deploy already exists, then it will fail due to the existing resources. And the command terraform refresh just updates the last state of the resources in the Terraform script that you already deployed. If the deployment failed and the state file has no resources in it, then refresh is not useful.
If someone else ran terraform apply for you because you didn't have access, and now you want to modify that terraform and run it yourself, you need to get the state file that was generated when that other person ran it. You absolutely have to maintain the Terraform state file somewhere, so that it can be accessed on subsequent runs. You should really configure a Terraform backend, instead of using local state files.
You need to be aware that Terraform stores everything it does in the state file, and refers to that file before every run. A terraform refresh only tells Terraform to refresh the state of the things that are in the state file, it doesn't rebuild the state file from scratch. Understanding Terraform state files is so fundamental to the use of Terraform that you really need to understand this before using it.

How do I cause terraform to skip destroying resources?

I am using terraform to provision an Azure AKS Kubernetes cluster, including a bunch of namespaces, deployments (e.g., cert-manager, external-dns, etc), secrets, and so on. These all get deleted when the cluster is torn down, but some of them cannot be deleted by terraform. This happens most often with namespaces, like the following (it never actually finishes removing all content):
"Operation cannot be fulfilled on namespaces "cert-manager": The system is ensuring all content is removed from this namespace. Upon completion, this namespace will automatically be purged by the system."
How do I cause terraform to ignore these resources when destroying?
On the surface, this seems like a big ask from Terraform
Terraform manages state, so it knows what it created, and what resources depend on each other. When it destroys something, it knows what dependencies to destroy as well, and this sets up an ordering of operations.
So it seems you're saying you want Terraform to control the creation, but to "forget" to destroy some things, despite it keeping a map of dependencies. This seems like a good way to get a corrupt state.
So with that caveat in mind, perhaps you could try "terraform state rm" judiciously, so that terraform isn't managing the things that need to be skipped when destroying things.
Something like
terraform apply
some script that picks holes in the state with "terraform state rm"
terraform destroy
The hard part is making sure all the things that remain do not reference anything that has been "rm'd" - terraform will get mad at you and probably refuse to do it

Backing up of Terraform statefile

I usually run all my Terraform scripts through Bastion server and all my code including the tf statefile resides on the same server. There happened this incident where my machine accidentally went down (hard reboot) and somehow the root filesystem got corrupted. Now my statefile is gone but my resources still exist and are running. I don't want to again run terraform apply to recreate the whole environment with a downtime. What's the best way to recover from this mess and what can be done so that this doesn't get repeated in future.
I have already taken a look at terraform refresh and terraform import. But are there any better ways to do this ?
and all my code including the tf statefile resides on the same server.
As you don't have .backup file, I'm not sure if you can recover the statefile smoothly in terraform way, do let me know if you find a way :) . However you can take few step which will help you come out from situation like this.
The best practice is keep all your statefiles in some remote storage like S3 or Blob and configure your backend accordingly so that each time you destroy or create a new stack, it will always contact the statefile remotely.
On top of it, you can take the advantage of terraform workspace to avoid the mess of statefile in multi environment scenario. Also consider creating a plan for backtracking and versioning of previous deployments.
terraform plan -var-file "" -out "" -target=module.<blue/green>
what can be done so that this doesn't get repeated in future.
Terraform blue-green deployment is the answer to your question. We implemented this model quite a while and it's running smoothly. The whole idea is modularity and reusability, same templates is working for 5 different component with different architecture without any downtime(The core template remains same and variable files is different).
We are taking advantage of Terraform module. We have two module called blue and green, you can name anything. At any given point of time either blue or green will be taking traffic. If we have some changes to deploy we will bring the alternative stack based on state output( targeted module based on terraform state), auto validate it then move the traffic to the new stack and destroy the old one.
Here is an article you can keep as reference but this exactly doesn't reflect what we do nevertheless good to start with.
Please see this blog post, which, unfortunately, illustrates import being the only solution.
If you are still unable to recover the terraform state. You can create a blueprint of terraform configuration as well as state for a specific aws resources using terraforming But it requires some manual effort to edit the state for managing the resources back. You can have this state file, run terraform plan and compare its output with your infrastructure. It is good to have remote state especially using any object stores like aws s3 or key value store like consul. It has support for locking the state when multiple transactions happened at a same time. Backing up process is also quite simple.

Is terraform destroy needed before terraform apply?

Is terraform destroy needed before terraform apply? If not, what is a workflow you follow when updating existing infrastructure and how do you decide if destroy is needed?
That would be pretty non-standard, in my opinion. Terraform destroy is only used in cases where you want to completely wipe your infrastructure. One of the biggest features of terraform is that it can do an intelligent delta of your desired infrastructure and your existing infrastructure and only make the changes needed. By performing a refresh, plan and apply you can ensure that terraform:
refresh - Has an up-to-date understanding of your current infrastructure. This is important in case anything was changed manually, outside of your terraform script.
plan - Prepares a list for you to review of what terraform intends to modify, or delete (or leave alone).
apply - Performs the changes laid out in the plan.
By executing these 3 commands in sequence terraform will only perform the changes necessary, in the order required, to bring your environments in line with any changes to your terraform file.
Where I find destroy to be useful is in non-production environments or in cases where you are performing a restructure that's so invasive that starting from scratch would ensure a safer build.
*There are also edge cases where terraform may fail to understand the correct order of operations (do I modify a security group first or a security group rule?), or it will find itself in a dependency cycle and will be unable to perform an operation. In those cases, however, running destroy is a nuclear solution. In general, I would perform the problem change manually (via command line, or AWS Console, if I'm in AWS), to nudge it along and then run a refresh, plan, apply sequence to get back on track.
No terraform destroy is not needed before terraform apply.
Your Terraform configuration (*.tf and *.tfvars files) describes the desired state of your infrastructure. It says "this is how I want my infrastructure to be."
You use the terraform tool to plan and apply changes to get your infrastructure into the desired state you have described. You can make those changes incrementally without destroying anything.
A typical workflow might be:
make changes to .tf and .tfvars files
refresh state
plan changes
review the planned changes
apply those changes
If you wanted to completely destroy that infrastructure you'd use terraform plan -destroy to see what Terraform intends to destroy. If you are happy with that you'd then use terraform destroy to destroy it.
Typically, destroy is rarely used, unless you are provisioning infrastructure for a temporary purpose (e.g., builds) or testing your ability to provision from a clean slate with different parameters. Even then, you could use a count parameter on resources to temporarily provision resources by increasing the count, then decreasing it again when no longer needed.
More comments after #mwielbut's answer.
Instead of option apply + destroy, you need to run terraform with option taint + apply
Normally we don't need run terraform destroy at all. It is a really dangerous option, especially for a production environment.
with option plan and apply, it is good enough to update the infrastructure with code.
But if you do need to destroy some resources and re-build something which is already created, you can use the option of taint, which is the right answer for your question, it is so important and missed in #mwielbut's answer.
The terraform taint command manually marks a Terraform-managed resource as tainted, forcing it to be destroyed and recreated on the next apply.
This command will not modify infrastructure but does modify the state file in order to mark a resource as tainted. Once a resource is marked as tainted, the next plan will show that the resource will be destroyed and recreated and the next apply will implement this change.
Refer:
command taint:
https://www.terraform.io/docs/commands/taint.html
a sample of option taint:
https://www.terraform.io/docs/modules/usage.html
Terraform destroy destroys all the resources and it is not required if you want to apply incremental changes. Destroy should be only used if you want to destroy the whole infrastructure.
No need to use the destroy command before apply. as long as you are in testing period you can use destroy command or destroy the complete infra you can use destroy command
You can use below flow
terraform init
terraform plan
terraform apply
if you made any manual changes that needs to update in your state file, use below command to update the state file.
Terrafrom refresh
You don't need to run to terraform destroy . If you have made any changes to you infrastructure, [added/ removed a resource], on next terraform plan & terraform apply, the changes will be reflected automatically
Terraform apply always refreshes the Terraform state, so if you change anything, it auto recognizes the changes, lets say you've updated your NSG rules, added new VM, deleted old VM, so when you run terraform apply again, your old state gets updated with the new state where you've Added/Updated/Deleted.
If you use terraform destroy, it just kills the entire state and you'll be back to the new state if you are running terraform apply.
You need to use terraform destroy only if you think you just want to bring down your infrastructure and you don't really need it.
For minor - major changes like Adding Components, Updating Rules, Deleting other things, you can use plan and apply without any problem.
Simply NO.
You don't need to run terraform apply before terraform destroyو Your terraform (.tf) files describe the state of your infrastructure.
terraform apply always refresh your infrastructure. And it identifies the state of infrastructure and updates it.
terraform destroy only use is to bring down and completely wipe down your infrastructure. (You have to think twice before using it) you can use terraform plan and terraform refresh to ensure the state of the infrastructure.
You could always manually destroy your instances, after only running your terraform apply. Then when you run terraform apply it will create brand new instances without the terraform destroy.
No! you don't need to run terraform destroy when you need a modification of resources! This is the beauty of Infra-as-Code.
Here are some more details on Terraform init, plan, apply and destroy -
terraform init command is used to initialize a working directory containing Terraform configuration files. This is the first command that should be run after writing a new Terraform configuration or cloning an existing one from version control. It is safe to run this command multiple times.
terraform plan command creates an execution plan. By default, creating a plan consists of:
a) Reading the current state of any already-existing remote objects to make sure that the Terraform state is up-to-date.
b) Comparing the current configuration to the prior state and noting any differences.
c) Proposing a set of change actions that should, if applied, make the remote objects match the configuration.
terraform apply command executes the actions proposed in a Terraform plan. (you can do an apply without plan however it's not a best practice)
terraform destroy command is a convenient way to destroy all remote objects managed by a particular Terraform configuration.
Core Terraform workflows:
The core Terraform workflow has five steps:
Write - Author infrastructure as code.
Terraform init - it’ll automatically download and install partner and community provider directly to the local disk so that it can be used by other commands Plugin_Installation, Backend_Initialization, ChildModule_Installation and Community and third party plugin
Terraform plan - Preview changes before applying.
Terraform Apply - Provision reproducible infrastructure.
Terraform destroy - It will destroy your infrastructure.
No need for terraform destroy, as it will just destroy all the resources created.
You just need to provide the backend configuration in your tf file.
Backend configuration is the configuration in order to retrieve terraform state files.
Terraform apply first time will create your cloud infrastructure, this will update your state file also.
And next apply terraform will compare what new/update resources are to be done with what is already there using state file and will deploy accordingly.

Resources