Enviornment Isolation: Dirs v. Workspaces v. Modules
The Terraform docs Separate Development and Production Environments seem to take two major approaches for handling a "dev/test/stage" type of CI enviornment, i.e.
Directory seperation - Seems messy especially when you potentially have multiple repos
Workspaces + Different Var Files
Except when you lookup workspaces it seems to imply workspaces are NOT a correct solution for isolating enviornments.
In particular, organizations commonly want to create a strong separation between multiple deployments of the same infrastructure serving different development stages (e.g. staging vs. production) or different internal teams. In this case, the backend used for each deployment often belongs to that deployment, with different credentials and access controls. Named workspaces are not a suitable isolation mechanism for this scenario.
Instead, use one or more re-usable modules to represent the common elements, and then represent each instance as a separate configuration that instantiates those common elements in the context of a different backend. In that case, the root module of each configuration will consist only of a backend configuration and a small number of module blocks whose arguments describe any small differences between the deployments.
I would also like to consider using remote state -- e.g. azurerm backend
Best Practice Questions
When the docs refer to using a "re-usable" module, what would this look like if say I had na existing configuration folder? Would I still need to create a sepreate folder for dev/test/stage?
When using remote backends, should the state file be shared across repos by default or separated by repo and enviornment?
e.g.
terraform {
backend "azurerm" {
storage_account_name = "tfstorageaccount"
container_name = "tfstate"
key = "${var.enviornment}.terraform.tfstate"
}
}
vs.
terraform {
backend "azurerm" {
storage_account_name = "tfstorageaccount"
container_name = "tfstate"
key = "cache_cluster_${var.enviornment}.terraform.tfstate"
}
}
When the docs refer to using a "re-usable" module, what would this look like if say I had na existing configuration folder? Would I still need to create a sepreate folder for dev/test/stage?
A re-usable module for your infrastructure would essentially encapsulate the part of your infrastructure that is common to all your "dev/test/stage" environments. So no, you wouldn't have any "dev/test/stage" folders in there.
If, for example, you have an infrastructure that consists of a Kubernetes cluster and a MySQL database, you could have two modules - a 'compute' module that handles the k8s cluster, and a 'storage' module that would handle the DB. These modules go into a /modules subfolder. Your root module (main.tf file in the root of your repo) would then instantiate these modules and pass the appropriate input variables to customize them for each of the "dev/test/stage" environments.
Normally it would be a bit more complex:
Any shared VPC or firewall config might go into a networking module.
Any service accounts that you might automatically create might go into a credentials or iam module.
Any DNS mappings for API endpoints might go into a dns module.
You can then easily pass in variables to customize the behavior for "dev/test/stage" as needed.
When using remote backends, should the state file be shared across repos by default or separated by repo and enviornment?
Going off the Terraform docs and their recommended separation:
In this case, the backend used for each deployment often belongs to that deployment, with different credentials and access controls.
You would not share tfstorageaccount. Now take this with a grain of salt and determine your own needs - essentially what you need to take into account is the security and data integrity implications of sharing backends/credentials. For example:
How sensitive is your state? If you have sensitive variables being output to your state, then you might not want your "production" state sitting in the same security perimeter as your "test" state.
Will you ever need to wipe your state or perform destructive actions? If, for example, your state storage provider only versions by folder, then you probably don't want your "dev/test/stage" states sitting next to each other.
Related
Background:
I have a shared module called "releases". releases contains the following resources:
aws_s3_bucket.my_deployment_bucket
aws_iam_role.my_role
aws_iam_role_policy.my_role_policy
aws_iam_instance_profile.my_instance_profile
These resources are used by ec2 instances belonging to an ASG to pull code deployments when they provision themselves. The release resources are created once and will rarely/never change. This module is one of a handful used inside an environment-specific project (qa-static) that has it's own tfstate file in AWS.
Fast Forward: It's now time to create a "prd-static" project. This project wants to re-use the environment agnostic AWS resources defined in the releases module. prd-static is basically a copy of qa with beefed up configuration for the database and cache server, etc.
The Problem:
prd-static sees the environment-agnostic AWS resources defined in the "releases" module as new resources that don't exist in AWS yet. An init and plan call shows that it wants to create these from scratch. It makes sense to me since prd-static has it's own tfstate - and tfstate is essentially the system-of-record - that terraform doesn't know that no changes should be applied. But, ideally terraform would use AWS as the source of truth for existing resources and their configuration.
If we try to apply the plan as is, the prd-static project just bombs out with an Entity Already Exists error. Leading me to this post:
what is the best way to solve EntityAlreadyExists error in terraform?
^-- logically I could import these resources into the tfstate file for prd-static and be on my merry way. Then, both projects know about the resources and in theory would only apply updates if the configuration had changed. I was able to import the bucket and the role and then re-run the plan.
Now terraform wants to delete the s3 bucket and recreate the role. That's weird - and not at all what I wanted to do.
TLDR: It appears that while modules like to be shared, modules that create single re-usable resources (like an S3 bucket) really don't want to be shared. It looks like I need to pull the environment-agnostic static resources module into it's own project with it's own tfstate that can be used independently rather than try and share the releases module across environments. Environment-specific stuff that depend on the release resources can reference them via their outputs in my build-process.
Should I be able to define a resource in a module, like an S3 bucket where the same instance is used across terraform projects that each have their own tfstate file (remote state in S3). Because I cannot.
If I really shouldn't be able to do this is the correct approach to extract the single instance stuff into its own project and depend on the outputs?
I have two different AWS configurations. On a dev laptop, the developer uses a mfa-secured profile inside a shared_credentials_file.
On jenkins, we export environment variables and then assume a role.
This means that the provider blocks look really different. At the root level, they share the same backend.tf.
I know I can have two different roots with different providers, but is there a way so I don't have to duplicate backend.tf and other root files?
I understood your point, but it is not recommended. Make aws configuration with system environment variables ready before you run terraform commands.
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN (optional)
AWS_DEFAULT_REGION
AWS_DEFAULT_PROFILE (optional)
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html
The solution that I think makes most sense is to place local developers and jenkins automations into two separate environment directories, each with their own aws.tf and backend workspace.
This made sense because developers should not be messing with resources created by automation, and any operations to the jenkins backend should be done by jenkins, otherwise devs could overwrite resources that jenkins put up, and vice versa.
Terraform addresses re-use of components via Modules. So I might put the definition for an AWS Autoscale Group in a module and then have multiple top level resource files use that ASG. Fine so far.
My question is: how to use Terraform to group and organize multiple top level resource files? In other words, what is the next level of organization?
We have a system that has multiple applications...each application would correspond to a TF resource file and those resource files would use the modules. We have different customers that use different sets of the applications so we need to keep them in their own resource files.
We're asking if there is a TF concept for deploying multiple top level resource files (applications to us).
At some point, you can't or it doesn't make sense to abstract any further. You will always have a top level resource file (i.e main.tf) describing modules to use. You can organize these top level resource files via:
Use Terraform Workspaces
You can use workspaces - in your case, maybe one per client name. Workspaces each have their own backing Terraform state. Then you can use terraform.workspace variable in your Terraform code. Workspaces can also be used to target different environments.
Use Separate Statefiles
Have one top level statefile for each of your clients, i.e clienta.main.tf, clientb.main.tf, etc. You could have them all in the same repository and use a script to run them all, individually, or in whatever pattern you prefer; or you could have one repository per client.
You can also combine workspaces with your separate statefiles to target individual environments, i.e staging,production, for each client. The Terraform docs go into more detail about workspaces and some of their downsides.
If one has two AWS accounts, one for development and one for live (for example) I am aware that one can use terraform workspaces to manage the state of each environment.
However, if i switch workspace from "dev" to "live" is there a way to tell terraform it should now be applying the state to the live account rather than the test one?
One way I thought of, which is error prone, would be swap my secret.auto.tfvars file each time i switch workspace since I presume when running with a different access key (the one belonging to the "live" account) the AWS provider will then be applying to that account. However, it'd be very easy to swap workspace and have the wrong credentials present which would run the changes against the wrong environment.
I'm looking for a way to almost link a workspace with an account id in AWS.
I did find this https://github.com/hashicorp/terraform/issues/13700 but it refers to the deprecated env command, this comment looked somewhat promising in particular
Update
I have found some information on GitHub where I left this comment as a reply to an earlier comment which recommended considering modules instead of workspaces and actually indicates that workspaces aren't well suited to this task. If anyone can provide information on how modules could be used to solve this issue of maintaining multiple versions of the "same" infrastructure concurrently I'd be keen to see how this improves upon the workspace concept.
Here's how you could use Terraform modules to structure your live vs dev environments that point to different AWS accounts, but the environments both have/use the same Terraform code.
This is one (of many) ways that you could structure your dirs; you could even put the modules into their own Git repo, but I'm going to try not to confuse things too much. In this example, you have a simple app that has 1 EC2 instance and 1 RDS database. You write whatever Terraform code you need in the modules/*/ subdirs, making sure to parameterize whatever attributes are different across environments.
Then in your dev/ and live/ dirs, main.tf should be the same, while provider.tf and terraform.tfvars reflect environment-specific info. main.tf would call the modules and pass in the env-specific params.
modules/
|-- ec2_instance/
|-- rds_db/
dev/
|-- main.tf # --> uses the 2 modules
|-- provider.tf # --> has info about dev AWS account
|-- terraform.tfvars # --> has dev-specific values
live/
|-- main.tf # --> uses the 2 modules
|-- provider.tf # --> has info about live/prod AWS account
|-- terraform.tfvars # --> has prod-specific values
When you need to plan/apply either env, you drop into the appropriate dir and run your TF commands there.
As for why this is preferred over using Terraform Workspaces, the TF docs explains it well:
In particular, organizations commonly want to create a strong separation between multiple deployments of the same infrastructure serving different development stages (e.g. staging vs. production) or different internal teams. In this case, the backend used for each deployment often belongs to that deployment, with different credentials and access controls. Named workspaces are not a suitable isolation mechanism for this scenario.
Instead, use one or more re-usable modules to represent the common elements, and then represent each instance as a separate configuration that instantiates those common elements in the context of a different backend. In that case, the root module of each configuration will consist only of a backend configuration and a small number of module blocks whose arguments describe any small differences between the deployments.
BTW> Terraform merely changed the env subcommand to workspace when they decided that 'env' was a bit too confusing.
Hope this helps!
Terraform workspaces hold the state information. They connect to user accounts based on the way in which AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are used in the environment. In order to link any given workspace to an AWS account they'd have to store user credentials in some kind of way, which they understandably don't. Therefore, I would not expect to see workspaces ever directly support that.
But, to almost link a workspace to an account you just need to automatically switch AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY each time the workspace is switched. You can do this by writing a wrapper around terraform which:
Passes all commands on to the real terraform unless it finds
workspace select in the command line.
Upon finding workspace select in the command line it parses out the
workspace name.
Export the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY for the
AWS account which you want to link to the workspace
Finish by passing on the workspace command to the real terraform
This would load in the correct credentials each time
terraform workspace select <WORKSPACE>
was used
I used to use multiple .sh files that ran different "terraform remote config" commands to switch between state files in buckets in different Google Cloud projects for different environments (dev, test and prod).
With version 0.9.0, I understand that this now goes into a a .tf file:
terraform {
backend "gcs" {
bucket = "terraform-state-test"
path = "terraform.tfstate"
project = "cloud-test"
}
}
In version 0.9.0 there is now also the State Environment ("terraform env"):
resource "google_container_cluster" "container_cluster" {
initial_node_count = "${terraform.env == "prod" ? 5 : 1}"
}
But how should I now manage multiple environments in the same directory structure with the new backend configuration?
At the time of this writing, not all of the remote backends in Terraform have been updated to support state environments. For those that have, each backend has its own conventions for how to represent the now-multiple states in the data store.
As of version 0.9.2, the "consul", "s3" and "local" backends have been updated. The "gcs" backend has not yet, but once it has the procedure described here will apply to that too.
There's initially a "default" environment, but if you never run terraform apply with this environment selected then you can ignore it and name your environments whatever you want.
To create a new environment called "production" and switch to it:
terraform workspace new production
This will establish an entirely separate state on the backend, so terraform plan should show that all of the resources need to be created fresh.
You can switch between already-existing environments like this:
terraform workspace select production
Before 0.9, many teams (including yours, it sounds like) wrote wrapper scripts to simulate this behavior. It's likely that these scripts didn't follow exactly the same naming conventions in the storage backends, so some manual work will be required to migrate.
One way to do that migration is to start off using the "local" backend, which stores state in a local directory called terraform.state.d. While working locally you can create the environments you want and then carefully overwrite the empty state files in the local directory with the existing state files from your previous scripted solution. Once all of the local environments have appropriate states in place you can then change the backend block in the config to the appropriate remote backend and run terraform init to trigger a migration of all of the local environments into the new backend.
After this, the terraform workspace select command will begin switching between the remote environments rather than the local ones.
If your chosen remote backend doesn't yet support environments, it's best to continue with a scripted solution for the time being. This means replacing terraform remote config in your existing wrapper script with a use of the partial configuration pattern to pass environment-specific configuration into terraform init.