How to Rollback to Previous State in terraform - terraform

I am working on terraform tasks and trying to understand how state files work. I have created main.tf file which has
vpc,firewall,subnet,compute_instance
which has to be create in GCP. So i have applied this to GCP environment and a file name terraform.tfstate file got created and i did backup of this file into folder called 1st-run.
Now i have updated my main.tf with
2vpc,2firewalls,2subnets,compute_instance
as i need to add another nic for my vm.Did terraform apply and environment got created and terraform.tfstate file got created. I did backup of this file into folder called 2nd-run.
I want to rollback the environment where i have executed for 1st-run. I have that state file which is in 1st-run folder.
What is the command to rollback by using statefile instead of touching the code so that automatically my GCP environment will have
vpc,firewall,subnet,compute_instance
which i have executed for the 1st time.

There is no way to roll back to a previous state as described in a state file in Terraform today. Terraform always plans changes with the goal of moving from the prior state (the latest state snapshot) to the goal state represented by the configuration. Terraform also uses the configuration for information that is not tracked in the state, such as the provider configurations.
The usual way to represent "rolling back" in Terraform is to put your configuration in version control and commit before each change, and then you can use your version control system's features to revert to an older configuration if needed.
Not all changes can be rolled back purely by reverting a VCS change though. For example, if you added a new provider block and resources for that provider all in one commit and then applied the result, in order to roll back you'd need to change the configuration to still include the provider block but not include any of the resource blocks, so you'd need to adjust the configuration during the revert. Terraform will then use the remaining provider block to configure the provider to run the destroy actions, after which you can finally remove the provider block too.

While there are commands to manipulate state, there is no command to rollback to the previous state, i.e. before the last terraform apply.
However, if you use a remote S3 backend with a dynamodb lock table, it is possible to roll back if versioning was enabled on the S3 bucket. For example, you could copy the previous version such that it becomes the latest version. You then must also update the digest in the dynamodb table, otherwise the terraform init will give you a message like:
Error refreshing state: state data in S3 does not have the expected content.
This may be caused by unusually long delays in S3 processing a previous state
update. Please wait for a minute or two and try again. If this problem
persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
to manually verify the remote state and update the Digest value stored in the
DynamoDB table to the following value: vvvvvvvvvvvvvv
You can just use this value to update the table and the rollback is done. To revert it, simply delete the last state from the S3 bucket so it goes back to its old "latest" and update the dynamodb table back to the corresponding digest.
Note that remote state is shared with your co-workers, so the above procedure should be avoided.
It's important to understand that changing the state files won't change the infrastructure by itself. That should be done by versioning the terraform code and doing terraform plan and terraform apply on the code that describes the desired infrastructure.

make sure versioning is enable for AWS bucket which maintaining your tfstate files in AWS.
by enabling (show version / view) versioning inside bucket i found tfstate file by name.
Deleted the latest version which causes mismatch (as in my case it is for terraform version), it add delete marker file for that version. means it actually backup after deletion. you can easily restore original file back by just deleting this added delete marker file.)
then i looked into old versions of tfstate files to restore back, by checking history of deployment, downloaded required one (after download ca see details, for me its checking terraform correct version match)
then uploaded that old tfstate file to the same location from where i deleted conflicted tfstate file.
on resume deployment was getting error like below.
Error refreshing state: state data in S3 does not have the expected content.
This may be caused by unusually long delays in S3 processing a previous state
update. Please wait for a minute or two and try again. If this problem
persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
to manually verify the remote state and update the Digest value stored in the
DynamoDB table to the following value: b55*****************************
which means there is digest value already present for previous tfstate lock file which need update with this new value, found in DynamoDB>table>view table details.
on resume deployment in spinnaker able to complete deployment ( exceptional case : but in my case the latest pipeline was included changes in which there was destroying unused resource, which was created using different provider, hence I required to first revert back the provider first then on resume I able to successfully deploy the changes.)

Related

Terraform: How to automatically resolve removed provider issues from old state files?

I am working on upgrading templates from terraform 0.12.31 to 0.13.7, we need to ensure that we have an automatic system for dealing with deployments that were created under the older version.
An issue I am working through is that I removed the use of all null providers in the move. When I attempt to apply or plan on a state file created on 0.12 when using terraform version 0.13 I recieve the following error:
$ terraform plan --var-file MY_VAR_FILE.json
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
Error: Provider configuration not present
To work with
module.gcp_volt_site.module.ce_config.data.null_data_source.hosts_localhost
its original provider configuration at
provider["registry.terraform.io/-/null"] is required, but it has been removed.
This occurs when a provider configuration is removed while objects created by
that provider still exist in the state. Re-add the provider configuration to
destroy
module.gcp_volt_site.module.ce_config.data.null_data_source.hosts_localhost,
after which you can remove the provider configuration again.
Error: Provider configuration not present
To work with
module.gcp_volt_site.module.ce_config.data.null_data_source.cloud_init_master
its original provider configuration at
provider["registry.terraform.io/-/null"] is required, but it has been removed.
This occurs when a provider configuration is removed while objects created by
that provider still exist in the state. Re-add the provider configuration to
destroy
module.gcp_volt_site.module.ce_config.data.null_data_source.cloud_init_master,
after which you can remove the provider configuration again.
Error: Provider configuration not present
To work with
module.gcp_volt_site.module.ce_config.data.null_data_source.vpm_config its
original provider configuration at provider["registry.terraform.io/-/null"] is
required, but it has been removed. This occurs when a provider configuration
is removed while objects created by that provider still exist in the state.
Re-add the provider configuration to destroy
module.gcp_volt_site.module.ce_config.data.null_data_source.vpm_config, after
which you can remove the provider configuration again.
My manual solution is to run terraform state rm on all the modules listed:
terraform state rm module.gcp_volt_site.module.ce_config.data.null_data_source.vpm_config
terraform state rm module.gcp_volt_site.module.ce_config.data.null_data_source.hosts_localhost
terraform state rm module.gcp_volt_site.module.ce_config.data.null_data_source.cloud_init_master
I would like to know how to do this automatically to enable a script to make these changes.
Is there some kind of terraform command I can use to list out these removed modules without the extra test so I can loop through runs of terraform state rm to remove them from the state file?
Or is there some kind of terraform command that can automatically do this in a generic manner like terraform state rm -all-not-present?
This gives me a list I can iterate through using terraform state rm $MODULE_NAME:
$ terraform state list | grep 'null_data_source'
module.gcp_volt_site.module.ce_config.data.null_data_source.cloud_init_master
module.gcp_volt_site.module.ce_config.data.null_data_source.hosts_localhost
module.gcp_volt_site.module.ce_config.data.null_data_source.vpm_config
There's a few possibilities. Without the source code of the module it's difficult to say so providing that might be helpful.
A couple of suggestions
Cleaning Cache
Remove the .terraform directory (normally in the directory you're running the init, plan, and apply from). The an older version of the module could be cached that still contains the null references.
State Refresh
Using Terraform Refresh you should be able to scan infra and bring state into alignment.
Can be dangerous, not recommended by Hashicorp.
Manual removals
The state rm command like you've suggested could help here and is fairly safe. You have an option of a --dry-run and you point to resources specifically
Using terraform state rm like you've suggested to manually remove those resources within the modules in state. Again, you want to check that the module reference isn't pointing to an old version of the module or caching an old one or they will just be recreated.
No there's not a rm --all-missing but if you know it's all the null data sources that will be missing you could use terraform state ls to list all those resources, then iterate over each of them removing them in a loop.

Terraform State migration

I started working with Terraform and realized that the state files were created and saved locally. After some searching I found that it is not recommended that terraform state files be committed to git.
So I added a backend configuration using S3 as the backend. Then I ran the following command
terraform init -reconfigure
I realize now that this set the backend as S3 but didn't copy any files.
Now when I run terraform plan, it plans to recreate the entire infrastructure that already exists.
I don't want to destroy and recreate the existing infrastructure. I just want terraform to recognize the local state files and copy them to S3.
Any suggestions on what I might do now?
State files are basically JSON files containing information about the current setup. You can manually copy files from the local to remote(S3) backend and use them without issues. You can read more about state files here: https://learn.hashicorp.com/tutorials/terraform/state-cli
I also manage a package to handle remote states in S3/Blob/GCS, if you want to try: https://github.com/tomarv2/tfremote

state snapshot was created by Terraform v0.12.29, which is newer than

I'm using Terraform with s3 as backend, every worked great before but just recently i got the following error message when running terraform plan or apply
Error: state snapshot was created by Terraform v0.14.8, which is newer than current v0.12.29; upgrade to Terraform v0.14.8 or greater to work with this state
The strange thing is I already forced the Terraform version:
terraform {
required_version = ">= 0.12"
}
And when I pulled the latest state from s3,the version is still 0.12.29.
terraform state pull | grep version
"terraform_version": "0.12.29",
....
I really have no idea where the version 0.14.8 comes from.
Happened to me, my deployment (CICD) failed and left a lock in the TF state.
So I just went and manually removed the lock from my local
terraform init -backend-config="key=prod/app1.tfstate"
terraform force-unlock -force xxxxx-8df6-a7e8-46a8-xxxxxxxxxxxx
Then I try to redeploy from CI/CD and get that error, because my local was higher version than terraform running in the CI/CD.
In the end, I did this to restore to the previous state:
S3: find the state file, restore the old version (versioning enabled in this bucket)
Run this again: terraform init -backend-config="key=prod/app1.tfstate" -reconfigure
To get the response below with the previous digest value:
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
Error refreshing state: state data in S3 does not have the expected content.
This may be caused by unusually long delays in S3 processing a previous state
update. Please wait for a minute or two and try again. If this problem
persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
to manually verify the remote state and update the Digest value stored in the
DynamoDB table to the following value: ebf597a8a25619b959baaa34a7b9d905
Update the dynamo item with the digest above
Run deployment again
Are you the only developer working on terraform?
Are you running terraform locally or also via some pipeline?
There is a strong possibility that one of your team member upgraded their terraform binary to the v0.14.8 version and applied locally (without updating the remote state) and now you would need to upgrade to that version as well
Its not just the version of the terraform state that you are accessing/running plan against. Terraform cross-references a lot of terraform states internally. So just go inside the remote state bucket and try to find that one specific remote state with different tf version.
make sure versioning is enable for AWS bucket which maintaining your tfstate files in AWS.
by enabling (show version / view) versioning inside bucket i found tfstate file by name.
Deleted the latest version which causes mismatch (as in my case it is for terraform version), it add delete marker file for that version. means it actually backup after deletion. you can easily restore original file back by just deleting this added delete marker file.)
then i looked into old versions of tfstate files to restore back, by checking history of deployment, downloaded required one (after download ca see details, for me its checking terraform correct version match)
then uploaded that old tfstate file to the same location from where i deleted conflicted tfstate file.
on resume deployment was getting error like below.
Error refreshing state: state data in S3 does not have the expected content.
This may be caused by unusually long delays in S3 processing a previous state
update. Please wait for a minute or two and try again. If this problem
persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
to manually verify the remote state and update the Digest value stored in the
DynamoDB table to the following value: b55*****************************
which means there is digest value already present for previous tfstate lock file which need update with this new value, found in DynamoDB>table>view table details.
on resume deployment in spinnaker able to complete deployment ( exceptional case : but in my case the latest pipeline was included changes in which there was destroying unused resource, which was created using different provider, hence I required to first revert back the provider first then on resume I able to successfully deploy the changes.)

How to move Terraform state from one remote store to another

We use an Azure blob storage as our Terraform remote state, and I'm trying to move state info about specific existing resources to a different container in that Storage Account. The new container (terraforminfra-v2) already exists, and the existing Terraform code points to the old container (terraforminfra). I've tried the following steps:
Use "terraform state pull > migrate.tfstate" to create a local copy of the state data in terraforminfra. When I look at this file, it seems to have all the proper state info.
Update the Terraform code to now refer to container terraforminfra-v2.
Use "terraform init" which recognizes that the backend config has changed and asks to migrate all the workspaces. I enter 'no' because I only want specific resources to change, not everything from all workspaces.
Use the command "terraform state push migrate.tfstate".
The last command seems to run for a bit like it's doing something, but when it completes (with no hint of an error), there still is no state info in the new container.
Is it because I answer 'no' in step #3, does this mean it doesn't actually change to which remote state it "points"? Related to that, is there any way with the "terraform state" command to tell where your state is?
Am I missing a step here? Thanks in advance.
OK, I think I figured out how to do this (or at least, these steps seemed to work):
rename the current folder with the .tf files to something else (like folder.old)
use "terraform state pull" to get a local copy of the state for the current workspace (you need to repeat these steps for each workspace you want to migrate)
create a new folder with the original name and copy your code to it.
create a new workspace with the same name as the original.
modify the code for the remote backend to point to the new container (or whatever else you're changing about the name/location of the remote state).
run "terraform init" so it's pointing to the new remote backend.
use "terraform state push local state file" to push the exported state to the new backend.
I then used "terraform state list" and "terraform plan" in the new folder to sanity check that everything seemed to be there.

Terraform show and plan not matching

I am beginner in terraform in a (dangerous) live environment.
I ran a script for creating 3 new accounts in AWS Organizations. Two got generated and due to service limit error I couldn't create one.
To add to it, there was a mistake of the parent-id in the script. I rectified the accounts on the console by moving it to the right parent ID.
That leaves me with one account to be created.
After making the necessary changes in the service limit, I tried running the script. The plan shows 3 accounts to be added 2 to be destroyed. There's no way these accounts can be deleted and added. (Since the script is now version controlled - I can't run just for this one account).
Here's what I did - I modified the terraform state (the parent id) in the S3 bucket. Ensured that terraform show is reflecting the new changes. The terraform plan still shows 3 accounts to add and 2 to destroy.
How do I get this fixed? Any help is deeply appreciated.
Thanks.
The code is source of truth when working with Infrastructure as Code, even if you change state file, you need to update the code as well as state file.
There is no way Terraform can update source code when detecting a drift on your resouces.
So you need:
1- write the manual changes you done in AWS into the Terraform code.
2- Do a terraform plan. It will refresh the state and show you if there is still a difference
If modifying the state file like me, do it at your own risk. I followed how to clean your terraform state and performed the surgery!
Ensure that the code is reflected properly to pick the changes.

Resources