Local state cannot be unlocked by another process on terraform - terraform

My terraform remote states and lockers are configured on s3 and dynamodb under aws account, On gitlab runner some plan task has been crashed and on the next execution plan the following error pops up:
Error: Error locking state: Error acquiring the state lock: ConditionalCheckFailedException:
The conditional request failed
Lock Info:
ID: <some-hash>
Path: remote-terrform-states/app/terraform.tfstate
Operation: OperationTypePlan
Who: root#runner-abc-project-123-concurrent-0
Version: 0.14.10
Created: 2022-01-01 00:00:00 +0000 UTC
Info: some really nice info
While trying to unlock this locker in order to perform additional execution plan again - I get the following error:
terraform force-unlock <some-hash-abc-123>
#output:
Local state cannot be unlocked by another process
How do we release this terraform locker?

According to reference of terraform command: force-unlock
Manually unlock the state for the defined configuration.
This will not modify your infrastructure. This command removes the
lock on the state for the current configuration. The behavior of this
lock is dependent on the backend being used. Local state files cannot
be unlocked by another process.
Explanation: apparently the execution plan is processing the plan output file locally and being apply on the second phase of terraform steps, like the following example:
phase 1: terraform plan -out execution-plan.out
phase 2: terraform apply -input=false execution-plan.out
Make sure that filename is same in phase 1 and 2
However - if phase 1 is being terminated or accidentally crashing, the locker will be assigned to local state file and therefore will be must to get removed on the dynamodb itself and not with terraform force-unlock command.
Solution: Locate this specific item under the dynamodb terraform lockers table and explicitly remove the locked item, you can do either with aws console or through the api.
For example:
aws dynamodb delete-item \
--table-name terraform-locker-bucket \
--key file://key.json
Contents of key.json:
{
"LockID": "remote-terrform-states/app/terraform.tfstate",
"Info": {
"ID":"<some-hash>",
"Operation":"OperationTypePlan",
"Who":"root#runner-abc-project-123-concurrent-0",
"Version":"0.14.10",
"Created":"2022-01-01 00:00:00 +0000 UTC",
"Info":"some really nice info"
}
}

terraform force-unlock <lock id>
For terragrunt, in <terragruntfile>.hcl directory, run
terragrunt force-unlock <lock id>. If didn't work, remove terragrunt.lock.hcl and .terragrunt-cache/ and try again.
Also

While destroying if facing the issue:
Adding -lock=false would help proceed further

Related

Unable to see the Terraform provider file after running terraform init

I am stuck in a path where when I run terraform init. the provider is not getting downloaded and it gives me no error. I am using the main.tf file and in it, I have provider "azurerm" syntax only. So when I run the terraform init I get the below output only and I see nowhere the terraform provider file getting initialized or getting downloaded. Logged in and authenticated to Azure login page too.
Terraform Code> terraform init
Initializing the backend...
Initializing provider plugins...
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
Terraform creates a hidden folder to store the provider. Make sure you set your OS permissions to see hidden files and folders.
From the documentation:
A hidden .terraform directory, which Terraform uses to manage cached provider plugins and modules, record which workspace is currently active, and record the last known backend configuration in case it needs to migrate state on the next run. This directory is automatically managed by Terraform, and is created during initialization.

Terraform Cloud in Git mode - failed to read schema, permission denied

I'm new to terraform and trying to use a 'custom' provider with Terraform cloud. To be clear, if I use it on my Windows machine without the TCloud everything works just fine.
On the TCloud I've got a workbook synchronized to my Git repo. The custom provider is uploaded to the Git repo: \terraform.d\plugins\zscaler.com\zpa\zpa\2.0.5\linux_amd64\terraform-provider-zpa_v2.0.5.
I've ran the chmod command to compensate for Window's lack of ability to set the provider as executable:
git update-index --chmod=+x .\terraform.d\plugins\zscaler.com\zpa\zpa\2.0.5\linux_amd64\terraform-provider-zpa_v2.0.5
I've also updated the lock file to allow both windows and linux provider hashes to deal with "local provider doesn't match any of the checksums" issue:
terraform providers lock -fs-mirror="C:\Users\user1\AppData\Roaming\terraform.d\plugins\" -platform=windows_amd64 -platform=linux_amd64 zscaler.com/zpa/zpa
When I run terraform plan from VSCode (on my Windows machine) on the repo that's initialized to the TCloud I get the following error:
> terraform plan -var-file terraform.tfvar
. . .
2022-02-02T10:14:28.328-0600 [INFO] cloud: starting Plan operation
Terraform v1.1.4
on linux_amd64
Configuring remote state backend...
Initializing Terraform configuration...
╷
│ Error: failed to read schema for zpa_provisioning_key.iot_edge_key in zscaler.com/zpa/zpa: failed to instantiate provider "zscaler.com/zpa/zpa" to obtain schema: fork/exec .terraform/providers/zscaler.com/zpa/zpa/2.0.5/linux_amd64/terraform-provider-zpa_v2.0.5: permission denied
Enabling debug doesn't give me any more clue on what's wrong. Appreciate any suggestions.
Thank you

"Invalid legacy provider address" error on Terraform

I'm trying to deploy a bitbucket pipeline using terraform v0.14.3 to create resources in google cloud. after running terraform command, the pipeline fails with this error:
Error: Invalid legacy provider address
This configuration or its associated state refers to the unqualified provider
"google".
You must complete the Terraform 0.13 upgrade process before upgrading to later
versions.
We updated our local version of terraform to v.0.13.0 and then ran: terraform 0.13upgrade as referenced in this guide: https://www.terraform.io/upgrade-guides/0-13.html. A versions.tf file was generated requiring terraform version >=0.13 and our required provider block now looks like this:
terraform {
backend "gcs" {
bucket = "some-bucket"
prefix = "terraform/state"
credentials = "key.json" #this is just a bitbucket pipeline variable
}
required_providers {
google = {
source = "hashicorp/google"
version = "~> 2.20.0"
}
}
}
provider "google" {
project = var.project_ID
credentials = "key.json"
region = var.project_region
}
We still get the same error when initiating the bitbucket pipeline. Does anyone know how to get past this error? Thanks in advance.
Solution
If you are using a newer version of Terraform, such as v0.14.x, you should:
use the replace-provider subcommand
terraform state replace-provider \
-auto-approve \
"registry.terraform.io/-/google" \
"hashicorp/google"
#=>
Terraform will perform the following actions:
~ Updating provider:
- registry.terraform.io/-/google
+ registry.terraform.io/hashicorp/google
Changing x resources:
. . .
Successfully replaced provider for x resources.
initialize Terraform again:
terraform init
#=>
Initializing the backend...
Initializing provider plugins...
- Reusing previous version of hashicorp/google from the dependency lock file
- Using previously-installed hashicorp/google vx.xx.x
Terraform has been successfully initialized!
You may now begin working with Terraform. Try . . .
This should take care of installing the provider.
Explanation
Terraform only supports upgrades from one major feature upgrade at a time. Your older state file was, more than likely, created using a version earlier than v0.13.x.
If you did not run the apply command before you upgraded your Terraform version, you can expect this error: the upgrade from v0.13.x to v0.14.x was not complete.
You can find more information here.
in our case, we were on aws and had similar error
...
Error: Invalid legacy provider address
This configuration or its associated state refers to the unqualified provider
"aws".
the steps to resolve were :
ensure syntax was upgraded by running terraform init again
check the warnings and resolve them
and finally updating the statefile with following method.
# update provider in state file
terraform state replace-provider -- -/aws hashicorp/aws
# reinit
terraform init
specific to ops problem, if issue still occurs, verify access to the bucket location from local and from pipeline. also verify the version of terraform running in pipeline. depending on configuration it may be the remote statefile is/can not be updated.
Same issue for me. I ran:
terraform providers
That gave me:
Providers required by configuration:
registry.terraform.io/hashicorp/google
Providers required by state:
registry.terraform.io/-/google
So I ran:
terraform state replace-provider registry.terraform.io/-/google registry.terraform.io/hashicorp/google
That did the trick.
To add on, I had installed terraform 0.14.6 but the state seemed to be stuck in 0.12. In my case I had 3 references that were off, this article helped me pinpoint which ones (all the entries in "Providers required by state" which had a - in the link. https://github.com/hashicorp/terraform/issues/27615
I corrected it by running the replace-provider command for each entry which was off, then running terraform init. I note doing this and running a git diff, the tfstate has been updated and now uses 0.14.x terraform instead of my previous 0.12.x. i.e.
terraform providers
terraform state replace-provider registry.terraform.io/-/azurerm registry.terraform.io/hashicorp/azurerm
Explanation: Your terraform project contains tf.state file that is outdated and refereeing to old provider address. The Error message will present this error:
Error: Invalid legacy provider address
This configuration or its associated state refers to the unqualified provider
<some-provider>.
You must complete the Terraform <some-version> upgrade process before upgrading to later
versions.
Solution: In order to solve this issue you should change the tf.state references to link to the newer required providers, update the tf.state file and initialize the project again. The steps are:
Create / Edit the required providers block with the relevant package name and version, I'd rather doing it on versions.tf file.
example:
terraform {
required_version = ">= 0.14"
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 3.35.0"
}
}
}
Run terraform providers command to present the required providers from configuration against the required providers that saved on state.
example:
Providers required by configuration:
.
├── provider[registry.terraform.io/hashicorp/aws] >= 3.35.0
Providers required by state:
provider[registry.terraform.io/-/aws]
Switch and reassign the required provider source address in the terraform state ( using terraform state replace-provider command) so we can tell terraform how to interpret the legacy provider.
The terraform state replace-provider subcommand allows re-assigning
provider source addresses recorded in the Terraform state, and so we
can use this command to tell Terraform how to reinterpret the "legacy"
provider addresses as properly-namespaced providers that match with
the provider source addresses in the configuration.
Warning: The terraform state replace-provider subcommand, like all of
the terraform state subcommands, will create a new state snapshot and
write it to the configured backend. After the command succeeds the
latest state snapshot will use syntax that Terraform v0.12 cannot
understand, so you should perform this step only when you are ready to
permanently upgrade to Terraform v0.13.
example:
terraform state replace-provider registry.terraform.io/-/aws registry.terraform.io/hashicorp/aws
output:
~ Updating provider:
- registry.terraform.io/-/aws
+ registry.terraform.io/hashicorp/aws
run terraform init to update references.
While you were under TF13 did you apply state at least once for the running project?
According to TF docs: https://www.terraform.io/upgrade-guides/0-14.html
There is no automatic update command (separately) in 0.14 (like there was in 0.13). The only way to upgrade is to force state on a project at least once, while under command when moving TF13 to 14.
You can also try terraform init in the project directory.
my case was like this
Error: Invalid legacy provider address
This configuration or its associated state refers to the unqualified provider
"openstack".
You must complete the Terraform 0.13 upgrade process before upgrading to later
versions.
for resolving the issue
remove the .terraform folder
the execute the following command
terraform state replace-provider -- -/openstack terraform-provider-openstack/openstack
after this command, you will see the below print, enter yes
Terraform will perform the following actions:
~ Updating provider:
- registry.terraform.io/-/openstack
+ registry.terraform.io/terraform-provider-openstack/openstack
Changing 11 resources:
openstack_compute_servergroup_v2.kubernetes_master
openstack_networking_network_v2.kube_router
openstack_compute_instance_v2.kubernetes_worker
openstack_networking_subnet_v2.internal
openstack_networking_subnet_v2.kube_router
data.openstack_networking_network_v2.external_network
openstack_compute_instance_v2.kubernetes_etcd
openstack_networking_router_interface_v2.internal
openstack_networking_router_v2.internal
openstack_compute_instance_v2.kubernetes_master
openstack_networking_network_v2.internal
Do you want to make these changes?
Only 'yes' will be accepted to continue.
Enter a value: yes
Successfully replaced provider for 11 resources.
I recently ran into this using Terraform Cloud for the remote backend. We had some older AWS-related workspaces set to version 0.12.4 (in the cloud) that errored out with "Invalid legacy provider address" and refused to run with the latest Terraform client 1.1.8.
I am adding my answer because it is much simpler than the other answers. We did not do any of the following:
terraform providers
terraform 0.13upgrade
remove the .terraform folder
terraform state replace-provider
Instead we simply:
In a clean folder (no local state, using local terraform.exe version 0.13.7) ran 'terraform init'
Made a small insignificant change (to ensure apply would write state) to a .tf file in the workspace
In Terraform Cloud set the workspace version to 0.13.7
Using local 0.13.7 terraform.exe ran apply - that saved new state.
Now we can use cloud and local terraform.exe version 1.1.8 and no more problems.
Note that we did also need to update a few AWS S3-related resources to the newer AWS provider syntax to get all our workspaces working with the latest provider.
We encountered a similar problem in our operational environments today. We successfully completed the terraform 0.13upgrade command. This indeed introduced a versions.tf file.
However, performing a terraform init with this setup was still not possible, and the following error popped up:
Error: Invalid legacy provider address
Further investigation in the state file revealed that, for some resources, the provider block was not updated. We hence had to run the following command to finalize the upgrade process.
terraform state replace-provider "registry.terraform.io/-/google" "hashicorp/google"
EDIT Deployment to the next environment revealed that this was caused by conditional resources. To easily enable/disable some resources we leverage the count attribute and use either 0 or 1. For the resources with count = 0, that were unaltered with Terraform 0.13, the provider was not updated.
I was using terragrunt with remote s3 state and dynamo db and sadly this does not work for me. So posting it here might help someone else.
A long way to make this work, as terragrunt state replace-provider does work for me
download the state file from s3
aws s3 cp s3://bucket-name/path/terraform.tfstate terraform.tfstate --profile profile
replace the provider using terraform
terraform state replace-provider "registry.terraform.io/-/random" "hashicorp/random"
terraform state replace-provider "registry.terraform.io/-/aws" "hashicorp/aws"
upload the state file back to s3 as even terragrunt state push terraform.tfstate does not work for me
aws s3 cp terraform.tfstate s3://bucket-name/path/terraform.tfstate --profile profile
terragrunt apply
the command will throw error with digest value,
update the dynamo db table digest value that received in previous command
Initializing the backend...
Error refreshing state: state data in S3 does not have the expected content.
This may be caused by unusually long delays in S3 processing a previous state
update. Please wait for a minute or two and try again. If this problem
persists, and neither S3 nor DynamoDB are experiencing an outage, you may need
to manually verify the remote state and update the Digest value stored in the
DynamoDB table to the following value: fe2840edf8064d9225eea6c3ef2e5d1d
finally, run terragrunt apply
The other way that this can be strange is if you are using terraform workspaces - especially with the remote state files.
Using a terraform workspace - the order of operations is important.
terraform init - connecting to the default workspace
terraform workspace switch <env> - Even if you have specified the workspace here, the init will happen using the default workspace.
This is an assumption that terraform makes - sometimes erroneously
To fix this - you can run your init using:
TF_WORKSPACE=<your_env> terraform init
Or remove the default workspace.

Terraform: Failed to unlock state: failed to retrieve lock info: unexpected end of JSON input

Terraform v0.13.5
provider aws v3.7.0
Backend: AWS S3+DynamoDB
terraform plan was aborted, and now it cannot acquire the state lock. I'm trying to release it manually but get error:
terraform force-unlock -force xxx-xxx-xx-dddd
Failed to unlock state: failed to retrieve lock info:
unexpected end of JSON input
The state file looks complete and passes json syntax validation successfully.
How to fix that?
Solution: double-check you're in correct terraform workspace.
I had the same issue. In my case, there were more than 1 .tfstate files in the working directory. This was causing the problem.
1- Ensure you have only 1 .tfstate file in the working directory
2- Ensure .tfstate file is valid.
I had to switch to the right workspace and issue the command terraform force-unlcok -force to get away with the issue.

check if Kubernetes deployment was sucessful in CI/CD pipeline

I have an AKS cluster with Kubernetes version 1.14.7.
I have build CI/CD pipelines to deploy newly created images to the cluster.
I am using kubectl apply to update a specific deployment with the new image. sometimes and for many reasons, the deployment fails, for example ImagePullBackOff.
is there a command to run after the kubectl apply command to check if the pod creation and deployment was successful?
For this purpose Kubernetes has kubectl rollout and you should use option status.
By default 'rollout status' will watch the status of the latest rollout until it's done. If you don't want to wait for the rollout to finish then you can use --watch=false. Note that if a new rollout starts in-between, then 'rollout status' will continue watching the latest revision. If you want to pin to a specific revision and abort if it is rolled over by another revision, use --revision=N where N is the revision you need to watch for.
You can read the full description here
If you use kubect apply -f myapp.yaml and check rollout status you will see:
$ kubectl rollout status deployment myapp
Waiting for deployment "myapp" rollout to finish: 0 of 3 updated replicas are available…
Waiting for deployment "myapp" rollout to finish: 1 of 3 updated replicas are available…
Waiting for deployment "myapp" rollout to finish: 2 of 3 updated replicas are available…
deployment "myapp" successfully rolled out
There is another way to wait for deployment to become available with a configured timeout like
kubectl wait --for=condition=available --timeout=60s deploy/myapp
otherwise kubectl rollout status can be used but it may stuck forever in some rare cases and will require manual cancellation of pipeline if that happens.
You can parse the output through jq:
kubectl get pod -o=json | jq '.items[]|select(any( .status.containerStatuses[]; .state.waiting.reason=="ImagePullBackOff"))|.metadata.name'
It looks like kubediff tool is a perfect match for your task:
Kubediff is a tool for Kubernetes to show you the differences between your running configuration and your version controlled configuration.
The tool can be used from the command line and as a Pod in the cluster that continuously compares YAML files in the configured repository with the current state of the cluster.
$ ./kubediff
Usage: kubediff [options] <dir/file>...
Compare yaml files in <dir> to running state in kubernetes and print the
differences. This is useful to ensure you have applied all your changes to the
appropriate environment. This tools runs kubectl, so unless your
~/.kube/config is configured for the correct environment, you will need to
supply the kubeconfig for the appropriate environment.
kubediff returns the status to stdout and non-zero exit code when difference is found. You can change this behavior using command line arguments.
You may also want to check the good article about validating YAML files:
Validating Kubernetes Deployment YAMLs

Resources