I usually run all my Terraform scripts through Bastion server and all my code including the tf statefile resides on the same server. There happened this incident where my machine accidentally went down (hard reboot) and somehow the root filesystem got corrupted. Now my statefile is gone but my resources still exist and are running. I don't want to again run terraform apply to recreate the whole environment with a downtime. What's the best way to recover from this mess and what can be done so that this doesn't get repeated in future.
I have already taken a look at terraform refresh and terraform import. But are there any better ways to do this ?
and all my code including the tf statefile resides on the same server.
As you don't have .backup file, I'm not sure if you can recover the statefile smoothly in terraform way, do let me know if you find a way :) . However you can take few step which will help you come out from situation like this.
The best practice is keep all your statefiles in some remote storage like S3 or Blob and configure your backend accordingly so that each time you destroy or create a new stack, it will always contact the statefile remotely.
On top of it, you can take the advantage of terraform workspace to avoid the mess of statefile in multi environment scenario. Also consider creating a plan for backtracking and versioning of previous deployments.
terraform plan -var-file "" -out "" -target=module.<blue/green>
what can be done so that this doesn't get repeated in future.
Terraform blue-green deployment is the answer to your question. We implemented this model quite a while and it's running smoothly. The whole idea is modularity and reusability, same templates is working for 5 different component with different architecture without any downtime(The core template remains same and variable files is different).
We are taking advantage of Terraform module. We have two module called blue and green, you can name anything. At any given point of time either blue or green will be taking traffic. If we have some changes to deploy we will bring the alternative stack based on state output( targeted module based on terraform state), auto validate it then move the traffic to the new stack and destroy the old one.
Here is an article you can keep as reference but this exactly doesn't reflect what we do nevertheless good to start with.
Please see this blog post, which, unfortunately, illustrates import being the only solution.
If you are still unable to recover the terraform state. You can create a blueprint of terraform configuration as well as state for a specific aws resources using terraforming But it requires some manual effort to edit the state for managing the resources back. You can have this state file, run terraform plan and compare its output with your infrastructure. It is good to have remote state especially using any object stores like aws s3 or key value store like consul. It has support for locking the state when multiple transactions happened at a same time. Backing up process is also quite simple.
Related
I use terraform to initialize some OpenStack cloud resources.
I have a scenario where I would need to initialize/prepare a volume disk using a temporary compute resource. Once volume is fully initialized, I would no longer need the temporary compute resource but need to attach to another compute resource (different network configuration and other settings making reuse of first impossible). As you might have guessed, I cannot reach directly the expected long term goal without the intermediary step.
I know I could drive a state machine or some sort of processing queue from outside terraform to achieve this, but I wonder if it was possible to do it nicely in one single run of terraform.
The best I could think of, is that a main terraform script would trigger creation/destruction of the intermediate compute resource by launching a another terraform instance responsible just for the intermediate resources (using terraform apply followed by terraform destroy). However it requires extra care such as ensuring unique folder to deal with concurrent "main" resource initialization and makes the whole a bit messy I think.
I wonder if it was possible to do it nicely in one single run of terraform.
Sadly, no. Any "solution" which you could possibly implement for that (e.g. running custom scripts through local-exec, etc) in a single TF will only be convoluted mess, and will only lead to more issues that it solves in the long term.
The proper way, as you wrote, is to use dedicated CI/CD pipeline for a multistage deployment. Alternatively, don't use TF at all, and use other IaC tool.
Is there a way to inspect the state of existing resources without importing them into your state?
Currently I'm setting up GCP resources in a throw away project and then importing it into a throw away TF file, then inspecting the state of the resource and creating my final resource in my prod files from that base.
Is it possible to get the state of an resource that hasn't been imported?
Not sure if I got well your question, because while reading it I have only one thing in my mind "WHY?"
Terraform is meant for idempotency. All is meant to be replacable, reproductible, not copyable (or mockable).
So by default, there is no concept of "throw away" and "final" resources (actually it is a shoot in your foot and go in a way where Terraform is not meant for...)
You have the concept of workspaces, where you could have a test state to confirm all is good as you want. And then use the same code (with differents variables/configs if you want) where you could create the production resources.
IMHO, there is no reason to do it the "hard way"...
Terraform workspaces
Does Terraform allow storing the state file in SVN? If this is not directly supported by Terraform, do any third party/ open source options exist?
When using Terraform it's not typical to store the Terraform state in the same version control repository as the configuration that drives it, because the expected workflow to use version control with Terraform is to review and commit the proposed changes first, and only then apply the changes to your real infrastructure from your main branch.
To understand why it might help to think about the relationship between an application's main code and its database. We don't typically store the main database for a web application in the version control repository along with the code, because the way we interact with the two are different: many developers can be concurrently working on and proposing changes to the application source code, and our version control system is often able to merge together those separate proposals to reduce collisions, but for the application's central database it's more common to use locks so that two writers don't try to change the same data (or interconnected data) at the same time.
In this way, Terraform state is roughly analogous to Terraform's "backend database". When using Terraform in a team setting then, you'd typically select one of the backends that stores state remotely and supports locking, and then anyone working with that particular Terraform configuration will find that Terraform will take out a lock before making any remote system modifications, hold that lock throughout its work, and then write the newly-updated state to the backend before releasing the lock.
Although you specifically asked about Subversion, my suggestions here are intended to apply to all version control systems. Version control is a good place to keep the source code for your Terraform modules, but it's not a good place to keep your Terraform state.
How do I force Terraform to rebuild its plans and tfstate files from scratch?
I'm considering moving my IAC from GCP's Deployment Manager to Terraform, so I thought I'd run a test, since my TF is pretttty rusty. In my first pass, I successfully deployed a network, subnet, firewall rule, and Compute instance. But it was all in a single file and didn't scale well for multiple environments.
I decided to break it out into modules (network and compute), and I was done with the experiment for the day, so I tore everything down with a terraform destroy
So today I refactored everything into its modules, and accidentally copypasta-ed the network resource from the network module to the compute module. Ran a terraform plan, and then a terraform apply, and it complained about the network already existing.
And I thought that it was because I had somehow neglected to tear down the network I'd created the night before? So I popped over to the GCP console, and yeah, it was there, so...I deleted it. In the UI. Sigh. I'm my own chaos engineer.
Anyway, somewhere right around there, I discovered my duplicate resource and removed it, realizing that the aforementioned complaint about the "network resource already existing" was coming from the 2nd module to run.
And I ran a terraform plan again, and it didn't complain about anything, so I ran a terraform apply, and that's when I got the "stale plan" error. I've tried the only thing I could think of - terraform destroy, terraform refresh - and then would try a plan and apply after that,
I could just start fresh from a new directory and new names on the tfstate/tfplan files, but it bothers me that I can't seem to reconcile this "stale plan" error. Three questions:
Uggh...what did I do wrong? Besides trying to write good code after a 2-hour meeting?
Right now this is just goofing around, so who cares if everything gets nuked? I'm happy to lose all created resources. What are my options in this case?
If I end up going to prod with this, obviously idempotence is a priority here, so what are my options then, if I need to perform some disaster recovery? (Ultimately, I would be using remote state to make sure we've got the tfstate file in a safe place.
I'm on Terraform 0.14.1, if that matters.
Saved plan is stale means out of date. Your plan is matching the current state of your infrastructure.
Either the infrastructure was changed outside of terraform or used terraform apply without -save flag.
Way 1: To fix that you could run terraform plan with the -out flag to save the new plan and re-apply it later on.
Way 2: But more easily I would use terraform refresh and after that terraform apply
I created the infrastructure via the gcloud CLI first for testing purposes. As soon as it was proven as working, I transferred the configuration to gitlab and encountered the same issue in one of my jobs. The issue disappeared after I changed the network's and cluster's names.
I am new to terraform. Can someone please explain
why do we need to save .tfstate file in local or remote storage,
when terraform apply always refreshes the state file with new infrastructure.
Thanks in advance.
The state file tracks the resources that Terraform is managing, whether it created them or imported them. Terraform's refresh only detects drift in managed resources and won't detect if you have created new resources outside of the state file.
If you lose the state you will end up with orphaned resources that are not being managed by Terraform. If, for some reason, you are okay with that or you have some other way of sharing state with other team members/CI and backing it up then you're fine.
Of course, using Terraform's remote state neatly solves those things so you should use it if you care about any of those things or think you might need to in the future (you probably will).
I will add a more developer-oriented perspective to help understand.
Think about you are using yarn or npm to do a NodeJS app, package.json is like your tf files, while yarn.lock or package-lock.json.
Dont take literally though, as terraform state file has physical underlying implications.