How to handle resource changes after provider upgrade in terraform? - azure

I am trying to upgrade the azurerm terraform provider from 2.30.0 to 3.13.0. For sure there are several changes in some resources (e.g. resoruce name changes, renamed attributes, removed attributes, etc.). I checked the Azure Resource Manager Upgrade Guide and found those changes by which our configuration is affected.
For example in version 3.0.0 the attribute availibility_zones is replaced by zones for the azurerm_kubernetes_cluster_node_pool ressource. Therefore when i run terraform plan i get an error, that the attribute availibility_zones doesn't exists.
I found a migration guide from deprecated resources. I understood the idea of removing the resource from the state and importing it again by it's resource id, but there are also other resources like azurerm_subnet, azurerm_kubernetes_cluster, azurerm_storage_account they have resource changes, why the terraform import -var-file='./my.tfvars' [..] command fails.
I am not sure if it fails (only) because of the dependencies to some variables they are needed for declaring the resource properly. Or would it also fail because of reading the .tfvars and terraform compares the read variables with the state?
Actually i need a "best practice" guide how to handle resource changes after a provider update. I dont know where to start and where to end. I tried to visualize the dependencies with terraform graph and created a svg to try to figure out the order by which i have to migrate the resource changes. It's unpossible to understand the relations of the whole configuration.. I could also just remove attributes from the state file they doesnt exists anymore, or rename attributes manually.
So How to handle resource changes after provider upgrade in terraform?

General
I was able to update the provider properly - i hope at least. I would like to share my experience, maybe it would help other beginners. This is not a professional guide, but just my experience that i want to share.
First of all you have to remove ALL resources affected by the provider upgrade and then re-import them. What does that mean?
The new provider will contain divers changes on different resources. For example:
Removed deprecated attributes (attribute is completely removed)
Superseded attribute (attribute is replaced by another).
Renamed attributes
Superseded resources (here the resource can be deprecated or removed by the upgraded version)
Note
The migration guide describes how you can migrate from deprecated resources, but the workflow for attribute changes is the same. How i understood it. This is the only guide that i found.
terraform plan will show you one or several errors for affected resources.
If your terraform configuration is complex and huge, then you shouldn't try to remove and re-import them all at once. Just go step by step and fix one affected resource successively.
terraform plan can show changes although he shouldn't.
Check the force replacement attribute accurately and understand why terraform detects changes. It's seems be obvious but it doesn't have to.
There can be a type change e.g. int -> string
If the affected change is a kind of missing secret, then you can try to add the secret manually as the value to the related attribute in the state file and run terraform plan again.
Or there can be also a bug by the provider. So if you can't understand the detected change try to search the issues of the provider - mostly on github. Don't get confused if you can't find any related issue, maybe you have found a bug. Then just create a new issue.
You will also face some other errors or bugs related to terraform itself. You have to search for a workaround patiently, so that you can continue apply the resource changes.
Try to figure out resource changes or to ignore an error for this moment that occurs in another module with resource targeting.
How To
---> !! BACKUP YOUR STATE FILE !! <---: You have to backup your state file before you start manipulating the state file. You will be able to restore the state of the backed state file if something goes wrong. Also you can use the backed up state file for finding needed ids when you have to import the resource.
Get Affected Resource:
How you can find all affected resources? After the upgrade the provider will not be able to parse the state file, if a resource contains changes - like i described in the question above. You will get an error for affected resources. Then you can check the changes for this affected resource in the upgrade guide of the provider - can be found in the provider register e.g. azurerm.
Terraform Configuration: Now you have to apply the changes for the affected resources in the terraform configuration modules before you can import them like described in the migration guide.
Remove Outdated Resource: Like described in the the migration guide you have to remove the outdated resource from the state file because it contains the old format of the resource. The new provider is not able to handle these resources from the state file. They must be re-imported with the new provider.
Import Removed Resource: After you removed the resource you have to re-import it also described in the migration guide. Check the terraform import documentation for better understanding and usage.

So How to handle resource changes after provider upgrade in terraform?
I don't think deleting the state file and then importing the resource and do changes in resources attribute based on when you require to upgrade the azurerm version is a feasible solution.
Terraform Registry already given update/notes for every resource when they did some changes on their upgrading version. Just like below example
we use azurerm_app_service for version ~2.x but for version ~3.0 and ~4.0 azurerm_linux_web_app and azurerm_windows_web_app resources instead.
Would suggest you check the terraform registry for update on particular resources attribute for specific provider version or not and do it accordingly.

Related

Terraform: How to automatically resolve removed provider issues from old state files?

I am working on upgrading templates from terraform 0.12.31 to 0.13.7, we need to ensure that we have an automatic system for dealing with deployments that were created under the older version.
An issue I am working through is that I removed the use of all null providers in the move. When I attempt to apply or plan on a state file created on 0.12 when using terraform version 0.13 I recieve the following error:
$ terraform plan --var-file MY_VAR_FILE.json
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
Error: Provider configuration not present
To work with
module.gcp_volt_site.module.ce_config.data.null_data_source.hosts_localhost
its original provider configuration at
provider["registry.terraform.io/-/null"] is required, but it has been removed.
This occurs when a provider configuration is removed while objects created by
that provider still exist in the state. Re-add the provider configuration to
destroy
module.gcp_volt_site.module.ce_config.data.null_data_source.hosts_localhost,
after which you can remove the provider configuration again.
Error: Provider configuration not present
To work with
module.gcp_volt_site.module.ce_config.data.null_data_source.cloud_init_master
its original provider configuration at
provider["registry.terraform.io/-/null"] is required, but it has been removed.
This occurs when a provider configuration is removed while objects created by
that provider still exist in the state. Re-add the provider configuration to
destroy
module.gcp_volt_site.module.ce_config.data.null_data_source.cloud_init_master,
after which you can remove the provider configuration again.
Error: Provider configuration not present
To work with
module.gcp_volt_site.module.ce_config.data.null_data_source.vpm_config its
original provider configuration at provider["registry.terraform.io/-/null"] is
required, but it has been removed. This occurs when a provider configuration
is removed while objects created by that provider still exist in the state.
Re-add the provider configuration to destroy
module.gcp_volt_site.module.ce_config.data.null_data_source.vpm_config, after
which you can remove the provider configuration again.
My manual solution is to run terraform state rm on all the modules listed:
terraform state rm module.gcp_volt_site.module.ce_config.data.null_data_source.vpm_config
terraform state rm module.gcp_volt_site.module.ce_config.data.null_data_source.hosts_localhost
terraform state rm module.gcp_volt_site.module.ce_config.data.null_data_source.cloud_init_master
I would like to know how to do this automatically to enable a script to make these changes.
Is there some kind of terraform command I can use to list out these removed modules without the extra test so I can loop through runs of terraform state rm to remove them from the state file?
Or is there some kind of terraform command that can automatically do this in a generic manner like terraform state rm -all-not-present?
This gives me a list I can iterate through using terraform state rm $MODULE_NAME:
$ terraform state list | grep 'null_data_source'
module.gcp_volt_site.module.ce_config.data.null_data_source.cloud_init_master
module.gcp_volt_site.module.ce_config.data.null_data_source.hosts_localhost
module.gcp_volt_site.module.ce_config.data.null_data_source.vpm_config
There's a few possibilities. Without the source code of the module it's difficult to say so providing that might be helpful.
A couple of suggestions
Cleaning Cache
Remove the .terraform directory (normally in the directory you're running the init, plan, and apply from). The an older version of the module could be cached that still contains the null references.
State Refresh
Using Terraform Refresh you should be able to scan infra and bring state into alignment.
Can be dangerous, not recommended by Hashicorp.
Manual removals
The state rm command like you've suggested could help here and is fairly safe. You have an option of a --dry-run and you point to resources specifically
Using terraform state rm like you've suggested to manually remove those resources within the modules in state. Again, you want to check that the module reference isn't pointing to an old version of the module or caching an old one or they will just be recreated.
No there's not a rm --all-missing but if you know it's all the null data sources that will be missing you could use terraform state ls to list all those resources, then iterate over each of them removing them in a loop.

Terraform, how to centralize providers versioning

We use terraform for Azure PAAS resources creation and it runs as a separate pipeline steps for each component. For instance - first step data component plan and apply, second step web component plan and apply and so on. So the code is arranged into multiple components and each of those will have it's own definition for provider azurerm block. Inside the block we want to pin the provider version and we want to control it in centralized manner. So currently we came up with the following approach.
provider "azurerm" {
version = "=${ps.AzureRmVersion}"
skip_provider_registration = "true"
features {}
}
When the release process runs there is a powershell functionality that replaces the ps.AzureRmVerison marker with the version. My question is if there is another way to control the provider version without involving third party such as powerhsell to control it.
The version argument in provider blocks is a legacy pattern from older versions of Terraform for specifying version constraints (a set of versions that this module is compatible with) rather than version selections (a single selected version that you want to use).
Since you want to centrally control which exact version is selected I think the best approach would be to have your automation script generate a Dependency Lock File containing the versions you want to prescribe.
Normally Terraform manages this lock file itself as you install and upgrade providers, but in that case each configuration will have its own set of locks and may therefore differ from one another. Since you want to impose central policy, you can instead use Terraform CLI with a simple configuration that only contains provider requirements declarations for the providers you want to use:
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "1.0.0"
}
}
}
In that directory you can run terraform providers lock to cause Terraform to select that particular version from the registry and generate a .terraform.lock.hcl file recording the checksums for all of the platforms you specified:
terraform providers lock -platform=windows_amd64 -platform=linux_amd64
You can then save that .terraform.lock.hcl file to your central location and configure your automation to copy that file into the working directory each time (overwriting any file that might already be there) before running terraform init. Terraform will then select whatever package the lock file recorded, and make sure that it matches the checksums previously recorded.
Your individual Terraform configurations may optionally contain their own non-exact version constraints specifying which Terraform versions they are compatible with, which will then cause Terraform to report an error if the centrally-selected version recorded in your shared lock file is not compatible with one of your configurations.
Note that the lock file only constrains providers that are already recorded in it. If one of your configurations requires a different provider that's not already in the lock file then by default terraform init will select the newest compatible version of that provider and overwrite the lock file to include it.
If you want to prevent that and require all new providers to be added to the centrally-maintained lock file, you can add an additional option to terraform init to tell Terraform to fail if the action it's taking would require changes to the locked providers:
terraform init -lockfile=readonly
To add a new provider with this usage pattern, you'd need to return to the requirements-only configuration I described earlier, add the new provider to it, re-run the same terraform providers lock command to regenerate it, and then update your "master" lock file to that new version of the file.

Force Terraform to install providers from local disk only, disabling Terraform Registry

Since 1995, we have used an update mechanism which
cleanly updates and removes software
centrally stores all software meta-data internally to manage needs and artifacts from a single source of truth
NEVER triggers itself arbitrarily.
While we understand terraform has begun reaching out to a registry in a brave reinvention of that wheel without any of those features, we wish to disable it completely. Our current kit includes only one plugin:
terraform-0.13.0-1.el7.harbottle.x86_64
golang-github-terraform-provider-vsphere-1.13.0-0.1.x86_64
The goal is
never check the registry
return an error if the given module is not installed
and I'd be very grateful for good suggestions toward that end. Is there a setting I've overlooked, or can we fake it by telling it to look somewhere empty? Is there a -stay-in-your-lane switch?
Clarification:
the add-on package is a go-build package which delivers a single artifact /usr/bin/terraform-provider-vsphere and nothing else. This has worked wonderfully for all previous incarnations and may have only begun to act up in v13.
Update: These things failed:
terraform init -plugin-dir=/dev/shm
terraform init -get-plugins=false
terraform init -get=false
setting terraform::required_providers::vsphere::source=""
echo "disable_checkpoint = true" > ~/.terraformrc
$ terraform init -get-plugins=false
Initializing the backend...
Initializing provider plugins...
- Finding latest version of -/vsphere...
- Finding latest version of hashicorp/vsphere...
Update: I'm still a bit off:
rpm -qlp golang-github-terraform-provider-vsphere
/usr/share/terraform/plugins/registry.terraform.io/hashicorp/vsphere/1.14.0/linux_amd64/terraform-provider-vsphere
I feel I'm really close. /usr/share/ is in the XDG default search path, and it DOES seem to find the location, but it seems to check the registry first/at-all, which is unexpected.
Initializing provider plugins...
- Finding latest version of hashicorp/vsphere...
- Finding latest version of -/vsphere...
- Installing hashicorp/vsphere v1.14.0...
- Installed hashicorp/vsphere v1.14.0 (unauthenticated)
Error: Failed to query available provider packages
Are we sure it stops checking if it has something local, and that it does that by default? Did I read that right?
What you are describing here sounds like the intention of the Provider Installation settings in Terraform's CLI configuration file.
Specifically, you can put your provider files in a local filesystem directory of your choice -- for the sake of this example, I'm going to arbitrarily choose /usr/local/lib/terraform, and then write the following in the CLI configuration file:
provider_installation {
filesystem_mirror {
path = "/usr/local/lib/terraform"
}
}
If you don't already have a CLI configuration file, you can put this in the file ~/.terraformrc.
With the above configuration, your golang-github-terraform-provider-vsphere-1.13.0-0.1.x86_64 package would need to place the provider's executable at the following path (assuming that you're working with a Linux system):
/usr/local/lib/terraform/registry.terraform.io/hashicorp/vsphere/1.30.0/linux_amd64/terraform-provider-vsphere_v1.13.0_x4
(The filename above is the one in the official vSphere provider release, but if you're building this yourself from source then it doesn't matter what exactly it's called as long as it starts with terraform-provider-vsphere.)
It looks like you are in the process of completing an upgrade from Terraform v0.12, and so Terraform is also trying to install the legacy (un-namespaced) version of this provider, -/vsphere. Since you won't have that in your local directory the installation of that would fail, but with the knowledge that this provider is now published at hashicorp/vsphere we can avoid that by manually migrating it in the state, thus avoiding the need for Terraform to infer this automatically on the next terraform apply:
terraform state replace-provider 'registry.terraform.io/-/vsphere' 'registry.terraform.io/hashicorp/vsphere'
After you run this command your latest state snapshot will not be compatible with Terraform 0.12 anymore, so if you elect to abort your upgrade and return to 0.12 you will need to restore the previous version from a backup. If your state is not stored in a location that naturally retains historical versions, one way to get such a backup is to run terraform state pull with a Terraform 0.12 executable and save the result to a file. (By default, Terraform defers taking this action until terraform apply to avoid upgrading the state format until it would've been making other changes anyway.)
The provider_installation configuraton above is an answer if you want to make this true for all future use of Terraform, which seems to be your goal here, but for completeness I also want to note that the following command should behave in an equivalent way to the result of the above configuration if you want to force a local directory only for one particular invocation of terraform init:
terraform init -plugin-dir=/usr/local/lib/terraform
Since you seem to be upgrading from Terraform 0.12, it might also interest you to know that Terraform 0.13's default installation behavior (without any special configuration) is the same as Terraform 0.12 with the exception of now expecting a different local directory structure than before, to represent the hierarchical provider namespace. (That is, to distinguish hashicorp/vsphere from a hypothetical othernamespace/vsphere.)
Specifically, Terraform 0.13 (as with Terraform 0.12) will skip contacting the remote registry for any provider for which it can discover at least one version available in the local filesystem.
It sounds like your package representing the provider was previously placing a terraform-provider-vsphere executable somewhere that Terraform 0.12 could find and use it. You can adapt that strategy to Terraform 0.13 by placing the executable at the following location:
/usr/local/share/terraform/plugins/registry.terraform.io/hashicorp/vsphere/1.30.0/linux_amd64/terraform-provider-vsphere_v1.13.0_x4
(Again, the exact filename here isn't important as long as it starts with terraform-provider-vsphere.)
/usr/local/share here is assuming one of the default data directories from the XDG Base Directory specification, but if you have XDG_DATA_HOME/XDG_DATA_DIRS overridden on your system then Terraform should respect that and look in the other locations you've listed.
The presence of such a file, assuming you haven't overridden the default behavior with an explicit provider_installation block, will cause Terraform to behave as if you had written the following in the CLI configuration:
provider_installation {
filesystem_mirror {
path = "/usr/local/share/terraform/plugins"
include = ["hashicorp/vsphere"]
}
direct {
exclude = ["hashicorp/vsphere"]
}
}
This form of the configuration forces local installation only for the hashicorp/vsphere provider, thus mimicking what Terraform 0.12 would've done with a local plugin file terraform-provider-vsphere. You can get the more thorough behavior of never contacting remote registries with a configuration like the one I opened this answer with, which doesn't include a direct {} block at all.

terraform state refresh and accept the changes and not change again with old code

I have a huge terraform module setup to launch a entire infrastructure. Now post provisioning there were many changes applied to the setup manually. I updated the statefile to be aware of these changes using terraform refresh command.
Now I've added new components to my terraform. When I execute terraform plan it is trying to reset the old updated resources to it's initial state (coz that is what is defined in my terraform code). Is there any way for terraform to ignore the changes in the old resources and create only the newly added components?
I found a solution myself for the above. Apparently there is an option called ignore_changes under the lifecycle block that should be defined for all the resources that you expect to be changed using external methods.
Reference Link: https://www.terraform.io/docs/configuration/resources.html#ignore_changes

Terraform Apply has different "plan" than Terraform Plan

I sometime see that Terraform Apply has different "plan" than Terraform Plan.
For instance, today i have seen one of TF files that I am trying to "Terraform Apply" resulted in only 1 "change" and 1 "add" while it got "3 add", "1 change" and "3 destroy" when using "Terraform Plan"
I have been using Terraform for just two months. Is this intended behavior in Terraform?
Could anyone give an explanation for this behavior? Thanks!
Terraform version: 0.11.13
This is unexpected behaviour, but the best practice it to:
terraform plan -out deploy.tfplan
it will save the plan in the deploy.tfplan file.
Now, terraform apply deploy.tfplan.
this will ensure that the plan you want is executed all the Time without fail.
This is not an intended behaviour of terraform unless if there is a mess anywhere. I never saw this kind of issue any time till now. Did you ever edited or deleted your .tfstate state file after you passed the terraform plan command? If you are observing this issue again or still facing this kind of issue, probably you can open an issue with the product owner. But I don't think this is an issue and you will never face this kind of issue again.
Try to follow these steps when trying to perform a Terraform apply .
First make sure the changes to the terraform file has been saved.
Try running a terraform plan terraform-plan before running terraform-apply
Sounds like some the files changes have been made to are not saved with the current terraform file
Can you explain the full scenario? Normally, in my experience it is same.
Difference i can only see -- Either you are using variable file with plan and apply and some variables causes some resources and other way might be if you using a remote location for state and some other job/person also updating the state.
If you are running everything locally, it should not happen like this.
Terraform builds a graph of all the resources.It, then creates the non-dependent resources in parallel to make resource creation slightly efficient. Is any the resource creation fails, It leaves terraform in partially applied state which gets recorded in the tfstate file. After fixing the issue with resource, when you reapply the .tf files it shows you only the new resources to be changed. In your case, I think it has got more to do with the fact that some resource have a policy of "destroy-before-creation" which shows up in result. so when you apply change to 1 resource it ends up showing 1 resource deleted 1 created. This when occurs with some non "destroy-before-creation" type resources, ends up giving you output like what you mentioned above
Did you comment any of the resources in terraform file while triggering command : terraform apply ?
If yes Please check the same as commenting resources in existing terraform file will result in destroying those resources in terraform.
Have been using terraform for quite a long time and this is not an intended behaviour. It looks like something has changed in between plan and apply.
But what you can do is save the plan in a file using
terraform plan -out plan.tfplan
and then deploy using same file
terraform apply plan.tfplan.

Resources