terraform remote state best practice - terraform

I am creating a few terraform modules and inside the modules I also create the resources for storing remote state ( a S3 bucket and dynamodb table)
when I then use the module I launch I write something like this:
# terraform {
# backend "s3" {
# bucket = "name"
# key = "xxxx.tfstate"
# region = "rrrr"
# encrypt = true
# dynamodb_table = "trrrrr"
# }
# }
terraform {
required_version = ">= 1.0.0, < 2.0.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 4.0"
}
}
}
provider "aws" {
region = var.region
}
module "mymodule" {
source = "./module/mymodule"
region = "param1"
prefix = "param2"
project = "xxxx"
username = "ddd"
contact = "myemail"
table_name = "table-name"
bucket_name = "uniquebucketname"
}
where I leave commented out the part on remote state and I leave terraform to create a local state and create all resources (including the bucket and the DynamoDB table).
After the resources are created
I re-run terraform init and I migrate the state to s3.
I wonder if this is a good practice or if there is something better for maintaining the state and also provide isolation.

That is an interesting approach. I would create the S3 bucket manually since it's a 1 time create for your state file mgmt. Then I would add a policy to prevent deletion | see here: https://serverfault.com/questions/226700/how-do-i-prevent-deletion-of-s3-buckets | & versioning and/or a bkp.
Beyond this approach there are better practises such as using tools like Terraform Cloud which is free for 5 users. Then in your terraform root module configuration you would put this:
terraform {
backend "remote" {
hostname = "app.terraform.io"
organization = "YOUR-TERRAFORM-CLOUD-ORG"
workspaces {
# name = "" ## For single workspace jobs
# prefix = "" ## for multiple workspaces
name = "YOUR-ROOT-MODULE-WORKSPACE-NAME"
}
}
}
More details in this similar Q&A: Initial setup of terraform backend using terraform

Related

Switch terraform 0.12.6 to 0.13.0 gives me provider["registry.terraform.io/-/null"] is required, but it has been removed

I manage state in remote terraform-cloud
I have downloaded and installed the latest terraform 0.13 CLI
Then I removed the .terraform.
Then I ran terraform init and got no error
then I did
➜ terraform apply -var-file env.auto.tfvars
Error: Provider configuration not present
To work with
module.kubernetes.module.eks-cluster.data.null_data_source.node_groups[0] its
original provider configuration at provider["registry.terraform.io/-/null"] is
required, but it has been removed. This occurs when a provider configuration
is removed while objects created by that provider still exist in the state.
Re-add the provider configuration to destroy
module.kubernetes.module.eks-cluster.data.null_data_source.node_groups[0],
after which you can remove the provider configuration again.
Releasing state lock. This may take a few moments...
This is the content of the module/kubernetes/main.tf
###################################################################################
# EKS CLUSTER #
# #
# This module contains configuration for EKS cluster running various applications #
###################################################################################
module "eks_label" {
source = "git::https://github.com/cloudposse/terraform-null-label.git?ref=master"
namespace = var.project
environment = var.environment
attributes = [var.component]
name = "eks"
}
#
# Local computed variables
#
locals {
names = {
secretmanage_policy = "secretmanager-${var.environment}-policy"
}
}
data "aws_eks_cluster" "cluster" {
name = module.eks-cluster.cluster_id
}
data "aws_eks_cluster_auth" "cluster" {
name = module.eks-cluster.cluster_id
}
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
token = data.aws_eks_cluster_auth.cluster.token
load_config_file = false
version = "~> 1.9"
}
module "eks-cluster" {
source = "terraform-aws-modules/eks/aws"
cluster_name = module.eks_label.id
cluster_version = var.cluster_version
subnets = var.subnets
vpc_id = var.vpc_id
worker_groups = [
{
instance_type = var.cluster_node_type
asg_max_size = var.cluster_node_count
}
]
tags = var.tags
}
# Grant secretmanager access to all pods inside kubernetes cluster
# TODO:
# Adjust implementation so that the policy is template based and we only allow
# kubernetes access to a single key based on the environment.
# we should export key from modules/secrets and then grant only specific ARN access
# so that only production cluster is able to read production secrets but not dev or staging
# https://docs.aws.amazon.com/secretsmanager/latest/userguide/auth-and-access_identity-based-policies.html#permissions_grant-get-secret-value-to-one-secret
resource "aws_iam_policy" "secretmanager-policy" {
name = local.names.secretmanage_policy
description = "allow to read secretmanager secrets ${var.environment}"
policy = file("modules/kubernetes/policies/secretmanager.json")
}
#
# Attache the policy to k8s worker role
#
resource "aws_iam_role_policy_attachment" "attach" {
role = module.eks-cluster.worker_iam_role_name
policy_arn = aws_iam_policy.secretmanager-policy.arn
}
#
# Attache the S3 Policy to Workers
# So we can use aws commands inside pods easily if/when needed
#
resource "aws_iam_role_policy_attachment" "attach-s3" {
role = module.eks-cluster.worker_iam_role_name
policy_arn = "arn:aws:iam::aws:policy/AmazonS3FullAccess"
}
All credits for this fix go to the one mentioning this on the cloudposse slack channel:
terraform state replace-provider -auto-approve -- -/null registry.terraform.io/hashicorp/null
This fixed my issue with this error, on to the next error. All to upgrade a version on terraform.
For us we updated all the provider URLs which we were using in the code like below:
terraform state replace-provider 'registry.terraform.io/-/null' \
'registry.terraform.io/hashicorp/null'
terraform state replace-provider 'registry.terraform.io/-/archive' \
'registry.terraform.io/hashicorp/archive'
terraform state replace-provider 'registry.terraform.io/-/aws' \
'registry.terraform.io/hashicorp/aws'
I would like to be very specific with replacement so I used the broken URL while replacing the new one.
To be more specific this is only with terraform 13
https://www.terraform.io/docs/providers/index.html#providers-in-the-terraform-registry
This error arises when there’s an object in the latest Terraform state that is no longer in the configuration but Terraform can’t destroy it (as would normally be expected) because the provider configuration for doing so also isn’t present.
Solution:
This should arise only if you’ve recently removed object
"data.null_data_source" along with the provider "null" block. To
proceed with this you’ll need to temporarily restore that provider "null" block, run terraform apply to have Terraform destroy object data "null_data_source", and then you can remove the provider "null"
block because it’ll no longer be needed.

Terraform attempts to create the S3 backend again when switching to a new workspace

I am following this excellent guide to terraform. I am currently on the 3rd post exploring the state. Specifically at the point where terraform workspaces are demonstrated.
So, I have the following main.tf:
provider "aws" {
region = "us-east-2"
}
resource "aws_s3_bucket" "terraform_state" {
bucket = "mark-kharitonov-terraform-up-and-running-state"
# Enable versioning so we can see the full revision history of our
# state files
versioning {
enabled = true
}
# Enable server-side encryption by default
server_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
}
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-up-and-running-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
terraform {
backend "s3" {
# Replace this with your bucket name!
bucket = "mark-kharitonov-terraform-up-and-running-state"
key = "workspaces-example/terraform.tfstate"
region = "us-east-2"
# Replace this with your DynamoDB table name!
dynamodb_table = "terraform-up-and-running-locks"
encrypt = true
}
}
output "s3_bucket_arn" {
value = aws_s3_bucket.terraform_state.arn
description = "The ARN of the S3 bucket"
}
output "dynamodb_table_name" {
value = aws_dynamodb_table.terraform_locks.name
description = "The name of the DynamoDB table"
}
resource "aws_instance" "example" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
}
And it is all great:
C:\work\terraform [master ≡]> terraform workspace show
default
C:\work\terraform [master ≡]> terraform apply
Acquiring state lock. This may take a few moments...
aws_dynamodb_table.terraform_locks: Refreshing state... [id=terraform-up-and-running-locks]
aws_instance.example: Refreshing state... [id=i-01120238707b3ba8e]
aws_s3_bucket.terraform_state: Refreshing state... [id=mark-kharitonov-terraform-up-and-running-state]
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
Releasing state lock. This may take a few moments...
Outputs:
dynamodb_table_name = terraform-up-and-running-locks
s3_bucket_arn = arn:aws:s3:::mark-kharitonov-terraform-up-and-running-state
C:\work\terraform [master ≡]>
Now I am trying to follow the guide - create a new workspace and apply the code there:
C:\work\terraform [master ≡]> terraform workspace new example1
Created and switched to workspace "example1"!
You're now on a new, empty workspace. Workspaces isolate their state,
so if you run "terraform plan" Terraform will not see any existing state
for this configuration.
C:\work\terraform [master ≡]> terraform plan
Acquiring state lock. This may take a few moments...
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.
------------------------------------------------------------------------
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# aws_dynamodb_table.terraform_locks will be created
+ resource "aws_dynamodb_table" "terraform_locks" {
...
+ name = "terraform-up-and-running-locks"
...
}
# aws_instance.example will be created
+ resource "aws_instance" "example" {
+ ami = "ami-0c55b159cbfafe1f0"
...
}
# aws_s3_bucket.terraform_state will be created
+ resource "aws_s3_bucket" "terraform_state" {
...
+ bucket = "mark-kharitonov-terraform-up-and-running-state"
...
}
Plan: 3 to add, 0 to change, 0 to destroy.
------------------------------------------------------------------------
Note: You didn't specify an "-out" parameter to save this plan, so Terraform
can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.
Releasing state lock. This may take a few moments...
C:\work\terraform [master ≡]>
And here the problems start. In the guide, the terraform plan command reports that only one resource is going to be created - an EC2 instance. This implies that terraform is going to reuse the same S3 bucket for the backend and the same DynamoDB table for the lock. But in my case, terraform informs me that it would want to create all the 3 resources, including the S3 bucket. Which would definitely fail (already tried).
So, what am I doing wrong? What is missing?
Creating a new workspace is effectively starting from scratch. The guide steps are a bit confusing in this regard but they are creating two plans to achieve the final result. The first creates the state S3 Bucket and the locking DynamoDB table and the second plan contains just the instance they are creating but uses the terraform code block to tell that plan where to store its state.
In your example you are both setting your state location and creating it in the same plan. This means when you create a new workspace its going to attempt to create that state location a second time because this workspace does not know about the other workspace's state.
In the end its important to know that using workspaces creates unique state files per workspace by appending the workspace name to the remote state path. For example if your state location is mark-kharitonov-terraform-up-and-running-state with a path of workspaces-example then you might see the following:
Default state: mark-kharitonov-terraform-up-and-running-state/workspaces-example/default/terraform.tfstate
Other state: mark-kharitonov-terraform-up-and-running-state/workspaces-example/other/terraform.tfstate
EDIT:
To be clear on how to get the guide results. You need to create two separate plans in separate folders (all plans in your working directory will run at the same time). So create a hierarchy like:
plans >
state >
main.tf
instance >
main.tf
Inside your plans/state/main.tf file put your state location content:
provider "aws" {
region = "us-east-2"
}
resource "aws_s3_bucket" "terraform_state" {
bucket = "mark-kharitonov-terraform-up-and-running-state"
# Enable versioning so we can see the full revision history of our
# state files
versioning {
enabled = true
}
# Enable server-side encryption by default
server_side_encryption_configuration {
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "AES256"
}
}
}
}
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-up-and-running-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
output "s3_bucket_arn" {
value = aws_s3_bucket.terraform_state.arn
description = "The ARN of the S3 bucket"
}
Then in your plans/instance/main.tf file you can reference the created state location with the terraform block and should only need the following content:
terraform {
backend "s3" {
# Replace this with your bucket name!
bucket = "mark-kharitonov-terraform-up-and-running-state"
key = "workspaces-example/terraform.tfstate"
region = "us-east-2"
# Replace this with your DynamoDB table name!
dynamodb_table = "terraform-up-and-running-locks"
encrypt = true
}
}
resource "aws_instance" "example" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
}

Configuring remote state in terrform seems duplicated?

I am configuring remote state in terraform like:
provider "aws" {
region = "ap-southeast-1"
}
terraform {
backend "s3" {
bucket = "xxx-artifacts"
key = "terraform_state.tfstate"
region = "ap-southeast-1"
}
}
data "terraform_remote_state" "s3_state" {
backend = "s3"
config {
bucket = "xxx-artifacts"
key = "terraform_state.tfstate"
region = "ap-southeast-1"
}
}
It seems very duplicated tho, why is it like that? I have the same variables in terraform block and the terraform_remote_state data source block. Is this actually required?
The terraform.backend configuration is for configuring where to store remote state for the Terraform context/directory where Terraform is being ran from.
This allows you to share state between different machines, backup your state and also co-ordinate between usages of a Terraform context via state locking.
The terraform_remote_state data source is, like other data sources, for retrieving data from an external source, in this case a Terraform state file.
This allows you to retrieve information stored in a state file from another Terraform context and use that elsewhere.
For example in one location you might create an aws_elasticsearch_domain but then need to lookup the endpoint of the domain in another context (such as for configuring where to ship logs to). Currently there isn't a data source for ES domains so you would need to either hardcode the endpoint elsewhere or you could look it up with the terraform_remote_state data source like this:
elasticsearch/main.tf
resource "aws_elasticsearch_domain" "example" {
domain_name = "example"
elasticsearch_version = "1.5"
cluster_config {
instance_type = "r4.large.elasticsearch"
}
snapshot_options {
automated_snapshot_start_hour = 23
}
tags = {
Domain = "TestDomain"
}
}
output "es_endpoint" {
value = "$aws_elasticsearch_domain.example.endpoint}"
}
logstash/userdata.sh.tpl
#!/bin/bash
sed -i 's/|ES_DOMAIN|/${es_domain}/' >> /etc/logstash.conf
logstash/main.tf
data "terraform_remote_state" "elasticsearch" {
backend = "s3"
config {
bucket = "xxx-artifacts"
key = "elasticsearch.tfstate"
region = "ap-southeast-1"
}
}
data "template_file" "logstash_config" {
template = "${file("${path.module}/userdata.sh.tpl")}"
vars {
es_domain = "${data.terraform_remote_state.elasticsearch.es_endpoint}"
}
}
resource "aws_instance" "foo" {
# ...
user_data = "${data.template_file.logstash_config.rendered}"
}

Terraform workspace states in different s3 buckets?

I use terraform to provision resources in dev and prod environments. These environments live on two different aws accounts. I had my state locally but I want to push it to s3 now. The problem is that terraform stores the state for the def and prod envs in the same s3 bucket is it possible to separate them? If not what are some alternative solutions without splitting my terraform codebase?
I have a bash wrapper around terraform and create a state file per account for separation of concerns. I also break the automation into many components to keep the state small so that performance does not suffer and downloading and uploading the state to the bucket :
function set_backend () {
local STATE_PATH=$1
if [[ $BACKEND == "s3" ]]; then
cat << EOF > ./backend.tf
terraform {
backend "s3" {
bucket = "${TF_VAR_state_bucket}"
dynamodb_table = "${DYNAMODB_STATE_TABLE}"
key = "terraform/$STATE_PATH/terraform.tfstate"
region = "$REGION"
encrypt = "true"
}
}
provider "aws" {
region = "$REGION"
version = "1.51.0"
}
provider "aws" {
region = "$DR_REGION"
version = "1.51.0"
alias = "dr"
}
provider "archive" { version = "1.1.0" }
provider "external" { version = "1.0.0" }
provider "local" { version = "1.1.0" }
provider "null" { version = "1.0.0" }
provider "random" { version = "2.0.0" }
provider "template" { version = "1.0.0" }
provider "tls" { version = "1.2.0" }
EOF
fi
}
Terragrunt is a great tool to use for managing terraform state files for different environments and to store state files in different buckets, instead of to use terraform workspace.
Useful links,
https://transcend.io/blog/why-we-use-terragrunt
https://blog.gruntwork.io/how-to-manage-terraform-state-28f5697e68fa

Terraform s3 backend vs terraform_remote_state

According to the documentation, to use s3 and not a local terraform.tfstate file for state storage, one should configure a backend more or less as follows:
terraform {
backend "s3" {
bucket = "my-bucket-name"
key = "my-key-name"
region = "my-region"
}
}
I was
using a local (terraform.tfstate) file
added the above snippet in my provided.tf file
run (again) terraform init
was asked by terraform to migrate my state to the above bucket
...so far so good...
But then comes this confusing part about terraform_remote_state ...
Why do I need this?
Isn't my state now saved remotely (on the aforemenetioned s3 bucket) already?
terraform_remote_state isn't for storage of your state its for retrieval in another terraform plan if you have outputs. It is a data source. For example if you output your Elastic IP Address in one state:
resource "aws_eip" "default" {
vpc = true
}
output "eip_id" {
value = "${aws_eip.default.id}"
}
Then wanted to retrieve that in another state:
data "terraform_remote_state" "remote" {
backend = "s3"
config {
bucket = "my-bucket-name"
key = "my-key-name"
region = "my-region"
}
}
resource "aws_instance" "foo" {
...
}
resource "aws_eip_association" "eip_assoc" {
instance_id = "${aws_instance.foo.id}"
allocation_id = "${data.terraform_remote_state.remote.eip_id}"
}
edit: If you are retrieving outputs in Terraform > 0.12 you need to include outputs
data "terraform_remote_state" "remote" {
backend = "s3"
config {
bucket = "my-bucket-name"
key = "my-key-name"
region = "my-region"
}
}
resource "aws_instance" "foo" {
...
}
resource "aws_eip_association" "eip_assoc" {
instance_id = "${aws_instance.foo.id}"
allocation_id = "${data.terraform_remote_state.remote.outputs.eip_id}"
}
Remote State allows you to collaborate with other team members, and central location to store your infrastructure state.
Apart from that by enabling s3 versioning, you can have versioning for state file, to track changes.

Resources