Terraform environment specific variables - terraform

Anyone know if there's a way to populate variables in Terraform based on what the environment/workspace is? Preferably one that
populates the var namespace (ie not an external data source),
doesn't require a wrapper
like tf(){ terraform --var-file=$(get_tf_env).tfvars
takes effect by changing a terraform env/workspace, without any additional manual steps (ie steps that aren't triggered by running terraform env )?

Populates the var namespace, doesn't require a wrapper, and takes effect by changing the workspace (Terraform 0.12 code):
variable "ami_id" {
type = map(string)
default = {
stg = "ami-foo28929"
prd = "ami-bar39b12"
}
}
resource "aws_instance" "this" {
ami = var.ami_id[terraform.workspace]
(...)
}

Terraform workspaces
A workspace is a named container for Terraform state. With multiple workspaces, a single directory of Terraform configuration can be used to manage multiple distinct sets of infrastructure resources.
In the 0.9 line of Terraform releases, this concept was known as "environment". It was renamed in 0.10 based on feedback about confusion caused by the overloading of the word "environment" both within Terraform itself and within organizations that use Terraform.
Referencing the current workspace is useful for changing behavior based on the workspace. For example, for non-default workspaces, it may be useful to spin up smaller cluster sizes. For example:
resource "aws_instance" "example" {
count = "${terraform.workspace == "default" ? var.default : var.min}"
# ... other arguments
}

There isn't a native way of doing it with Terraform that I know of. If you search around you will see that a lot of people will have different folder structures for entry points into their TF configurations, each different folder can have different values in tfvars file. One options that may get you some of the way there is to use Terraform Workspaces, introduced in 0.10.
I've implemented something similar to what you are suggesting using OctopusDeploy. If you've not used it before, Octopus is good for managing environment specific variables. I have a default tfvars file and a list of corresponding variable values within Octopus, per environment.
I have a basic step that iterates through every variable in tfvars and looks for an Octopus variable with the same name and replaces it if it is found.
I've found this to be a decent way of working as it gives a nice separation between the Terraform tfvars file (what values are needed) and the variable values in Octopus (what the actual values are).
E.g. If I have the a tfvars file containing
instance_size = "Medium"
And I have 2 environments within Octopus, Staging and Production. I can add a variable to Octopus called 'instance_size' and set a different value per environment (e.g. "Big" and "Biggest" respectively).
The step template I've written will find a corrresponding value for "instance_size" so it means that when I run it for staging I would get:
instance_size = "Big"
and for production
instance_size = "Biggest"

I would recommend taking a "Stacks" based approach for your Terraform Project so that you can configure and manage the "Stacks" and the Remote State per Workspace (aka Environment). This limits the blast radius of your changes from a risk perspective, simplifies workflow, and also provides for a cleaner more maintainable code base.
What will make your day better?
An objectively simple design that allows you to reason about the platform and its moving parts. (aka Stacks)
An implementation that provides you with flexibility while limiting risk from changes. (aka Limit the Blast Radius)
A solution that delivers value today and continues to improve while building momentum over the long haul. (aka Patterns, Workflow)
Here is a quick list of good practices
Manage "State" separately for "Stacks" across "Workspaces"
Implement "Stacks" for consistent "Configuration" across "Workspaces"
Keep it objective and simple with good "Patterns" and "Workflow".
Example Terraform Project using a Stacks Based Approach
/
/scripts
<shell scripts>
<terraform wrapper functions>
/stacks
/application_1 # Provisions Application 1 and its dependencies
/application_2 # Provisions Application 2 and its dependencies
/application_n # Provisions Application N and its dependencies
backend.tf # Remote State
data.tf # Data Sources
stack.tf # Stack Variables and Defaults
aws_resource.tf
...
...
/network # Provisions VPC, Subnets, Route Tables, Route53 Zones
/security # Provisions Security Groups, Network ACLs, IAM Resources
/storage # Provisions Storage Resources like S3, EFS, CDN
global.tf # Global Variables
dev.tfvars # Development Environment Variables
tst.tfvars # Testing Environment Variables
stg.tfvars # Staging Environment Variables
prd.tfvars # Production Environment Variables
terraform.sh # Wrapper Script for Executing Terraform (Workflow)
A few more thoughts
As your implementation grows it is much simpler to incorporate future requirements into existing stacks or as new stacks if they are a shared dependency.
Terraform allows for use of the Remote State as a Data Source. Configuring your own Output Variables per stack makes it much cleaner to configure and use exported resource attributes.
Setting up your project so that you can define variables and reasonable defaults at the stack level allows you to override them at the workspace level as necessary to meet requirements for different environments such as Dev, Test, Production, etc.... while keeping the configuration consistent and remote state managed separately per environment.
These are some of the practices we have developed and deployed on our team to improve our experience working with Terraform to manage our AWS Platform.
Cheers!

Handling environmental variables in Terraform Workspaces -Taking Advantage of Workspaces, by Miles Collier 2019 explains clearly how this works. This is just a summary.
In parameters.tf:
locals {
env = {
default = {
instance_type = "t2.micro"
ami = "ami-0ff8a91507f77f867"
instance_count = 1
}
dev = {
instance_type = "m5.2xlarge"
ami = "ami-0130c3a072f3832ff"
}
qa = {
instance_type = "m5.2xlarge"
ami = "ami-00f0abdef923519b0"
instance_count = 3
}
prod = {
instance_type = "c5.4xlarge"
ami = "ami-0422d936d535c63b1"
instance_count = 6
}
}
environmentvars = "${contains(keys(local.env), terraform.workspace)}" ? terraform.workspace : "default"
workspace = "${merge(local.env["default"], local.env[local.environmentvars])}"
}
To reference a variable, add to locals or pass this to a module:
instance_type = "${local.workspace["instance_type"]}"
This will use the value from the selected workspace, or the default value if either the variable is not defined for that workspace, or no workspace is selected. If not default is defined, it fails gracefully.
Use terraform workspace select dev to select the dev workspace.

I use terraform workspace and created a bash script to echo the --var-file argument.
#!/bin/bash
echo --var-file=variables/$(terraform workspace show).tfvars
To run terraform with tfvars applied by workspace
terraform plan $(./var.sh)
terraform apply $(./var.sh)

Related

Clarification on changes made outside of Terraform

I don't fully understand how Terraform handles external changes. Let's take an example:
resource "aws_instance" "ec2-test" {
ami = "ami-0d71ea30463e0ff8d"
instance_type = "t2.micro"
}
1: security group modification
The default security group has been manually replaced by another one. Terraform detects the change:
❯ terraform plan --refresh-only
aws_instance.ec2-test: Refreshing state... [id=i-5297abcc6001ce9a8]
Note: Objects have changed outside of Terraform
Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this plan:
# aws_instance.ec2-test has changed
~ resource "aws_instance" "ec2-test" {
id = "i-5297abcc6001ce9a8"
~ security_groups = [
- "default",
+ "test",
]
tags = {}
~ vpc_security_group_ids = [
+ "sg-8231be9a95a4b1886",
- "sg-f2fc3af19c4adefe0",
]
# (28 unchanged attributes hidden)
# (7 unchanged blocks hidden)
}
No change planned:
❯ terraform plan
aws_instance.ec2-test: Refreshing state... [id=i-5297abcc6001ce9a8]
No changes. Your infrastructure matches the configuration.
Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.
It seems normal as we did not set the security_groups argument in the resource block (the desired state is aligned with the current state).
2: IAM instance profile added
An IAM role has been manually attached to the instance. Terraform also detects the change:
❯ terraform plan --refresh-only
aws_instance.ec2-test: Refreshing state... [id=i-5297abcc6001ce9a8]
Note: Objects have changed outside of Terraform
Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this plan:
# aws_instance.ec2-test has changed
~ resource "aws_instance" "ec2-test" {
+ iam_instance_profile = "test"
id = "i-5297abcc6001ce9a8"
tags = {}
# (30 unchanged attributes hidden)
# (7 unchanged blocks hidden)
}
This is a refresh-only plan, so Terraform will not take any actions to undo these. If you were expecting these changes then you can apply this plan to record the updated values in the Terraform state without
changing any remote objects.
However, Terraform also plans to revert the change:
❯ terraform plan
aws_instance.ec2-test: Refreshing state... [id=i-5297abcc6001ce9a8]
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# aws_instance.ec2-test will be updated in-place
~ resource "aws_instance" "ec2-test" {
- iam_instance_profile = "test" -> null
id = "i-5297abcc6001ce9a8"
tags = {}
# (30 unchanged attributes hidden)
# (7 unchanged blocks hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
I tried to figure out why these two changes don't produce the same effect. This article highlights differences depending on the argument default values: https://nedinthecloud.com/2021/12/23/terraform-apply-when-external-change-happens/
But the security_groups and iam_instance_profile arguments seems similar (optional with no default value), so why Terraform is handling these two cases differently?
(tested with Terraform v1.2.2, hashicorp/aws 4.21.0)
The handling of these situations unfortunately depends a lot on decisions made by the provider developer, since it's the provider's responsibility to decide how to reconcile any differences between the configuration and the prior state. (The "prior state" is what Terraform calls the state that results from running the "refresh" steps to synchronize with the remote system).
Terraform Core takes the values you've defined in the configuration (if any optional arguments are unset, Terraform Core uses null to represent that) and the values from the prior state and sends both of them to the provider to implement the planning step. The provider can then do whatever logic it wants as long as the planned new value for each attribute is consistent with the input. "Consistent" means that one of the following conditions is true:
The planned value is equal to the value set in the configuration.
This is the most straightforward situation to follow, but there are various reasons why a provider might not do this, which I'll discuss later.
The planned value is equal to the value stored in the prior state.
This represents situations where the value in the prior state is functionally equivalent to the value in the configuration but not exactly equal, such as if the remote system treats a particular string as case insensitive and the two values differ only in case.
The provider indicated in its schema that this is a value that can be decided by the remote system, such as an object ID that's generated by the remote system during the apply step, and the corresponding value in the configuration was null to represent the argument not being set at all.
In this case the provider gets to choose whichever value it wants, because the configuration says nothing about the attribute and thus the remote system has authority on what the value is.
From what you've described, it sounds like in your first example the provider used approach number 3, while in the second example the provider used approach number 1.
Since I am not the developer of this provider I cannot say for certain why the developers made the decisions they did here, but one common reason why a provider developer might choose option three is for situations where a particular value can potentially be set by multiple different resource types, in which case the provider might be designed to treat an absent argument in the configuration as meaning "keep whatever the remote system already has", whereas a non-null argument in the configuration would mean "set the remote system to use this given value".
For iam_instance_profile it seems like the provider considers null to be a valid configuration value for that argument and uses it to represent the EC2 instance having no associated instance profile at all. For vpc_security_groups and security_groups though, leaving the argument set to null in the configuration (or omitting it, which is equivalent) the provider treats that as "keep whatever the remote system has", and so Terraform just acknowledges the change but doesn't propose to undo it.
Based on my knowledge about EC2, I can guess that the reason here is probably that the underlying EC2 API has two different ways to set security groups: you can either use the legacy EC2-Classic style of specifying a security group by name (the security_groups argument in the provider), or the new EC2-VPC style of specifying it by ID (the vpc_security_group_ids argument in the provider). Whichever of the two you choose, the remote system will presumably populate the other one automatically and therefore without this special exception in the provider it would be impossible for any configuration to converge unless you set both security_groups and vpc_security_group_ids and set them to both refer to the same security groups. To avoid that, I think the provider just lets whichever one of the two you left unset automatically track the remote system, which has the side-effect the provider cannot automatically "fix" changes made outside of Terraform unless you set at least one of them so the provider can see what the correct value ought to be.
Terraform's ability to reconcile changes in the remote system by resetting back to match the configuration is a "best effort" mechanism because in many cases that requirement comes into conflict with other requirements, and provider developers must therefore decide on a case-by-case basis what to prioritize. Although Terraform does try its best to tell you about changes outside of Terraform and to propose fixing them where possible, the only certain way to keep your Terraform configuration and your remote system synchronized is to prevent anyone from making changes outside of Terraform, for example using IAM policies in AWS.

Create resource via terraform but do not recreate if manually deleted?

I want to initially create a resource using Terraform, but if the resource gets later deleted outside of TF - e.g. manually by a user - I do not want terraform to re-create it. Is this possible?
In my case the resource is a blob on an Azure Blob storage. I tried using ignore_changes = all but that didn't help. Every time I ran terraform apply, it would recreate the blob.
resource "azurerm_storage_blob" "test" {
name = "myfile.txt"
storage_account_name = azurerm_storage_account.deployment.name
storage_container_name = azurerm_storage_container.deployment.name
type = "Block"
source_content = "test"
lifecycle {
ignore_changes = all
}
}
The requirement you've stated is not supported by Terraform directly. To achieve it you will need to either implement something completely outside of Terraform or use Terraform as part of some custom scripting written by you to perform a few separate Terraform steps.
If you want to implement it by wrapping Terraform then I will describe one possible way to do it, although there are various other variants of this that would get a similar effect.
My idea for implementing it would be to implement a sort of "bootstrapping mode" which your custom script can enable only for initial creation, but then for subsequent work you would not use the bootstrapping mode. Bootstrapping mode would be a combination of an input variable to activate it and an extra step after using it.
variable "bootstrap" {
type = bool
default = false
description = "Do not use this directly. Only for use by the bootstrap script."
}
resource "azurerm_storage_blob" "test" {
count = var.bootstrap ? 1 : 0
name = "myfile.txt"
storage_account_name = azurerm_storage_account.deployment.name
storage_container_name = azurerm_storage_container.deployment.name
type = "Block"
source_content = "test"
}
This alone would not be sufficient because normally if you were to run Terraform once with -var="bootstrap=true" and then again without it Terraform would plan to destroy the blob, after noticing it's no longer present in the configuration.
So to make this work we need a special bootstrap script which wraps Terraform like this:
terraform apply -var="bootstrap=true"
terraform state rm azurerm_storage_blob.test
That second terraform state rm command above tells Terraform to forget about the object it currently has bound to azurerm_storage_blob.test. That means that the object will continue to exist but Terraform will have no record of it, and so will behave as if it doesn't exist.
If you run the bootstrap script then, you will have the blob existing but with Terraform unaware of it. You can therefore then run terraform apply as normal (without setting the bootstrap variable) and Terraform will both ignore the object previously created and not plan to create a new one, because it will now have count = 0.
This is not a typical use-case for Terraform, so I would recommend to consider other possible solutions to meet your use-case, but I hope the above is useful as part of that design work.
If you have a resource defined in terraform configuration then terraform will always try to create it. I can't imagine what is your setup, but maybe you want to take the blob creation to a CLI script and run terraform and the script in desired order.

Multiple aws_cloudformation_stack resources based on dynamic name values with Terraform

This is a follow up to this question
I would like to create multiple aws_cloudformation_stack resources with names based on git branch. Git branches would be used to test different versions of CloudFormation stacks and can be deployed by several engineers (hence I would need to have the engineers access remote terraform state).
I would like to ensure that each deployed branch has its stack, but when branches update code the stacks would get updated (stacks destroyed and recreated)
One suggestion was to use for_each to create multiple resources.
To do that I would probably need to write branch names to a file (before terraform apply, and then read it into a list variable so for_each could iterate over the list.
However, I'd like to see if is there a better way to achieve this ?
resource "aws_cloudformation_stack" "subscriptions_sam_stack" {
for_each = toset(split(",", file("deployed_git_hashes.txt")))
name = "${var.app_name}---${var.app_env}--${each.value}"
capabilities = ["CAPABILITY_NAMED_IAM", "CAPABILITY_AUTO_EXPAND"]
template_url = "https://${var.sam_bucket}.s3-${data.aws_region.current.name}.amazonaws.com/${aws_s3_bucket_object.sam_deploy_object.id}"
}
deployed_git_branches.txt looks like this:
branch1, branch2, branch3

In Terraform 0.12, how to skip creation of resource, if resource name already exists?

I am using Terraform version 0.12. I have a requirement to skip resource creation if resource with the same name already exists.
I did the following for this :
Read the list of custom images,
data "ibm_is_images" "custom_images" {
}
Check if image already exists,
locals {
custom_vsi_image = contains([for x in data.ibm_is_images.custom_images.images: "true" if x.visibility == "private" && x.name == var.vnf_vpc_image_name], "true")
}
output "abc" {
value="${local.custom_vsi_image}"
}
Create only if image exists is false.
resource "ibm_is_image" "custom_image" {
count = "${local.custom_vsi_image == true ? 0 : 1}"
depends_on = ["data.ibm_is_images.custom_images"]
href = "${local.image_url}"
name = "${var.vnf_vpc_image_name}"
operating_system = "centos-7-amd64"
timeouts {
create = "30m"
delete = "10m"
}
}
This works fine for the first time with "terraform apply". It finds that the image did not exists, so it creates image.
When I run "terraform apply" for the second time. It is deleting the resource "custom_image" that is created above. Any idea why it is deleting the resource, when it is run for the 2nd time ?
Also, how to create a resource based on some condition(like only when it does not exists) ?
In Terraform, you're required to decide explicitly what system is responsible for the management of a particular object, and conversely which systems are just consuming an existing object. There is no way to make that decision dynamically, because that would make the result non-deterministic and -- for objects managed by Terraform -- make it unclear which configuration's terraform destroy would destroy the object.
Indeed, that non-determinism is why you're seeing Terraform in your situation flop between trying to create and then trying to delete the resource: you've told Terraform to only manage that object if it doesn't already exist, and so the first time you run Terraform after it exists Terraform will see that the object is no longer managed and so it will plan to destroy it.
If you goal is to manage everything with Terraform, an important design task is to decide how object dependencies flow within and between Terraform configurations. In your case, it seems like there is a producer/consumer relationship between a system that manages images (which may or may not be a Terraform configuration) and one or more Terraform configurations that consume existing images.
If the images are managed by Terraform then that suggests either that your main Terraform configuration should assume the image does not exist and unconditionally create it -- if your decision is that the image is owned by the same system as what consumes it -- or it should assume that the image does already exist and retrieve the information about it using a data block.
A possible solution here is to write a separate Terraform configuration that manages the image and then only apply that configuration in situations where that object isn't expected to already exist. Then your configuration that consumes the existing image can just assume it exists without caring about whether it was created by the other Terraform configuration or not.
There's a longer overview of this situation in the Terraform documentation section Module Composition, and in particular the sub-section Conditional Creation of Objects. That guide is focused on interactions between modules in a single configuration, but the same underlying principles apply to dependencies between configurations (via data sources) too.

Declare multiple providers for a list of regions

I have a Terraform module that manages AWS GuardDuty.
In the module, an aws_guardduty_detector resource is declared. The resource allows no specification of region, although I need to configure one of these resources for each region in a list. The region used needs to be declared by the provider, apparently(?).
Lack of module for_each seems to be part of the problem, or, at least, module for_each, if it existed, might let me declare the whole module, once for each region.
Thus, I wonder, is it possible to somehow declare a provider, for each region in a list?
Or, short of writing a shell script wrapper, or doing code generation, is there any other clean way to solve this problem that I might not have thought of?
To support similar processes I have found two approaches to this problem
Declare multiple AWS providers in the Terraform module.
Write the module to use a single provider, and then have a separate .tfvars file for each region you want to execute against.
For the first option, it can get messy having multiple AWS providers in one file. You must give each an alias and then each time you create a resource you must set the provider property on the resource so that Terraform knows which region provider to execute against. Also, if the provider for one of the regions can not initialize, maybe the region is down, then the entire script will not run, until you remove it or the region is back up.
For the second option, you can write the Terraform for what resources you need to set up and then just run the module multiple times, once for each regional .tfvars file.
prod-us-east-1.tfvars
prod-us-west-1.tfvars
prod-eu-west-2.tfvars
My preference is to use the second option as the module is simpler and less duplication. The only duplication is in the .tfvars files and should be more manageable.
EDIT: Added some sample .tfvars
prod-us-east-1.tfvars:
region = "us-east-1"
account_id = "0000000000"
tags = {
env = "prod"
}
dynamodb_read_capacity = 100
dynamodb_write_capacity = 50
prod-us-west-1.tfvars:
region = "us-west-1"
account_id = "0000000000"
tags = {
env = "prod"
}
dynamodb_read_capacity = 100
dynamodb_write_capacity = 50
We put whatever variables might need to be changed for the service or feature based on environment and/or region. For instance in a testing environment, the dynamodb capacity may be lower than in the production environment.

Resources