Terraform and cleartext password in (remote) state file - terraform

There are many Git issues opened on the Terraform repo about this issue, with lots of interesting comments, but as of now I still see no solution to this issue.
Terraform stores plain text values, including passwords, in tfstate files.
Most users are required to store them remotely so the team can work concurrently on the same infrastructure with most of them storing the state files in S3.
So how do you hide your passwords?
Is there anyone here using Terraform for production? Do you keep you passwords in plain text?
Do you have a special workflow to remove or hide them? What happens when you run a terraform apply then?
I've considered the following options:
store them in Consul - I don't use Consul
remove them from the state file - this requires another process to be executed each time and I don't know how Terraform will handle the resource with an empty/unreadable/not working password
store a default password that is then changed (so Terraform will have a not working password in the tfstate file) - same as above
use the Vault resource - sounds it's not a complete workflow yet
store them in Git with git-repo-crypt - Git is not an option either
globally encrypt the S3 bucket - this will not prevent people from seeing plain text passwords if they have access to AWS as a "manager" level but it seems to be the best option so far
From my point of view, this is what I would like to see:
state file does not include passwords
state file is encrypted
passwords in the state file are "pointers" to other resources, like "vault:backend-type:/path/to/password"
each Terraform run would gather the needed passwords from the specified provider
This is just a wish.
But to get back to the question - how do you use Terraform in production?

I would like to know what to do about best practice, but let me share about my case, although it is a limited way to AWS. Basically I do not manage credentials with Terraform.
Set an initial password for RDS, ignore the difference with lifecycle hook and change it later. The way to ignore the difference is as follows:
resource "aws_db_instance" "db_instance" {
...
password = "hoge"
lifecycle {
ignore_changes = ["password"]
}
}
IAM users are managed by Terraform, but IAM login profiles including passwords are not. I believe that IAM password should be managed by individuals and not by the administrator.
API keys used by applications are also not managed by Terraform. They are encrypted with AWS KMS(Key Management Service) and the encrypted data is saved in the application's git repository or S3 bucket. The advantage of KMS encryption is that decryption permissions can be controlled by the IAM role. There is no need to manage keys for decryption.
Although I have not tried yet, recently I noticed that aws ssm put-parameter --key-id can be used as a simple key value store supporting KMS encryption, so this might be a good alternative as well.
I hope this helps you.

The whole remote state stuff is being reworked for 0.9 which should open things up for locking of remote state and potentially encrypting of the whole state file/just secrets.
Until then we simply use multiple AWS accounts and write state for the stuff that goes into that account into an S3 bucket in that account. In our case we don't really care too much about the secrets that end up in there because if you have access to read the bucket then you normally have a fair amount of access in that account. Plus our only real secrets kept in state files are RDS database passwords and we restrict access on the security group level to just the application instances and the Jenkins instances that build everything so there is no direct access from the command line on people's workstations anyway.
I'd also suggest adding encryption at rest on the S3 bucket (just because it's basically free) and versioning so you can retrieve older state files if necessary.
To take it further, if you are worried about people with read access to your S3 buckets containing state you could add a bucket policy that explicitly denies access from anyone other than some whitelisted roles/users which would then be taken into account above and beyond any IAM access. Extending the example from a related AWS blog post we might have a bucket policy that looks something like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::MyTFStateFileBucket",
"arn:aws:s3:::MyTFStateFileBucket/*"
],
"Condition": {
"StringNotLike": {
"aws:userId": [
"AROAEXAMPLEID:*",
"AIDAEXAMPLEID"
]
}
}
}
]
}
Where AROAEXAMPLEID represents an example role ID and AIDAEXAMPLEID represents an example user ID. These can be found by running:
aws iam get-role -–role-name ROLE-NAME
and
aws iam get-user -–user-name USER-NAME
respectively.
If you really want to go down the encrypting the state file fully then you'd need to write a wrapper script that makes Terraform interact with the state file locally (rather than remotely) and then have your wrapper script manage the remote state, encrypting it before it is uploaded to S3 and decrypting it as it's pulled.

Related

How to detect changes made outside of Terraform?

I have been using Terraform now for some months, and I have reached the point where my infrastructure is all base in Terraform files and I now have better control of the resources in our multiple accounts.
But I have a big problem. If someone makes a "manual" alteration of any Terraformed resource, it is easy to detect the change.
But what happens if the resource was not created using Terraform? I just don't know how to track any new resource or changes in them if the resource was not created using Terraform.
A key design tradeoff for Terraform is that it will only attempt to manage objects that it created or that you explicitly imported into it, because Terraform is often used in mixed environments where either some objects are managed by other software (like an application deployment tool) or the Terraform descriptions are decomposed into multiple separate configurations designed to work together.
For this reason, Terraform itself cannot help with the problem of objects created outside of Terraform. You will need to solve this using other techniques, such as access policies that prevent creating objects directly, or separate software (possibly created in-house) that periodically scans your cloud vendor accounts for objects that are not present in the expected Terraform state snapshot(s).
Access policies are typically the more straightforward path to implement, because preventing objects from being created in the first place is easier than recognizing objects that already exist, particularly if you are working with cloud services that create downstream objects as a side-effect of their work, as we see with (for example) autoscaling controllers.
Martin's answer is excellent and explains that Terraform can't be the arbiter of this as it is designed to play nicely both with other tooling and with itself (ie across different state files).
He also mentioned that access policies (although these have to be cloud/provider specific) are a good alternative to this so this answer will instead provide some options here for handling this with AWS if you do want to enforce this.
The AWS SDKs and other clients, including Terraform, all provide a user agent header in all requests. This is recorded by CloudTrail and thus you can search through CloudTrail logs with your favourite log searching tools to look for API actions that should be done via Terraform but don't use Terraform's user agent.
The other option that uses the user agent request header is to use IAM's aws:UserAgent global condition key which will block any requests that don't match the user agent header that's defined. An example IAM policy may look like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1598919227338",
"Action": [
"dlm:GetLifecyclePolicies",
"dlm:GetLifecyclePolicy",
"dlm:ListTagsForResource"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Sid": "Stmt1598919387700",
"Action": [
"dlm:CreateLifecyclePolicy",
"dlm:DeleteLifecyclePolicy",
"dlm:TagResource",
"dlm:UntagResource",
"dlm:UpdateLifecyclePolicy"
],
"Effect": "Allow",
"Resource": "*",
"Condition": {
"StringLike": {
"aws:UserAgent": "*terraform*"
}
}
}
]
}
The above policy allows the user, group or role it is attached to to be able to perform read only tasks to any DLM resource in the AWS account. It then allows any client with a user agent header including the string terraform to perform actions that can create, update or delete DLM resources. If a client doesn't have terraform in the user agent header then any requests to modify a DLM resource will be denied.
Caution: It's worth noting that clients can override the user agent string and so this shouldn't be relied on as a foolproof way of preventing access to things outside of this. The above mentioned techniques are mostly useful to get an idea about the usage of other tools (eg the AWS Console) in your account where you would prefer changes to be made by Terraform only.
The AWS documentation to the IAM global condition keys has this to say:
Warning
This key should be used carefully. Since the aws:UserAgent
value is provided by the caller in an HTTP header, unauthorized
parties can use modified or custom browsers to provide any
aws:UserAgent value that they choose. As a result, aws:UserAgent
should not be used to prevent unauthorized parties from making direct
AWS requests. You can use it to allow only specific client
applications, and only after testing your policy.
The Python SDK, boto, covers how the user agent string can be modified in the configuration documentation.
I haven't executed it but my idea has always been that this should be possible with a consistent usage of tags. A first naive
provider "aws" {
default_tags {
tags = {
Terraform = "true"
}
}
}
should be sufficient in many cases.
If you fear rogue developers will add this tag manually so as to hide their hacks, you could convolute your terraform modules to rotate the tag value over time to unpredictable values, so you could still search for inappropriately tagged resources. Hopefully the burden for them to overcome such mechanism will defeat the effort of simply terraforming a project. (Not for you)
On the downside, many resources will legitimately be not terraformed, e.g. DynamoDB tables or S3 items. A watching process should somehow whitelist what is allowed to exist. Not computational resources, that's for sure.
Tuning access policies and usage of CloudTrail as #ydaetskcoR suggests might be unsuitable to assess the extent of unterraformed legacy infrastructure, but are definitely worth the effort anyway.
This Reddit thread https://old.reddit.com/r/devops/comments/9rev5f/how_do_i_diff_whats_in_terraform_vs_whats_in_aws/ discusses this very topic, with some attention gathered around the sadly archived https://github.com/dtan4/terraforming , although it feels too much IMHO.

Terraform: Add resource-specific secrets

I know that you can pass general secrets to a resource through terraform variables. Is there a way to configure secrets which change at the resource level?
Specifically, I'm using terraform as a back-end to an app which allows users to set up a server with a password. That password is different for each server. Is there some way to set something like self.password for a single instance so that it:
Is not visible in the github repo where I track the terraform files
and
Can be changed for each individual instance
Right now I'm just going to be creating terraform files like password=var.{unique_id}_password but if feels like there should be a better way
More detail on the use-case:
I have a web application to provision servers for users running another web app. The password for that server is set-up by my application. The password is configured right now using a set-up script that I would like to port to terraform.
The passwords change for each server because a user can set the password for their server only, and that variable should not effect other resources
Here's a super-simplified version of the expected output when a user tries to provision a server
# new-server.tf
resource "digitalocean_droplet" "new_server" {
name = "new_server"
password = "${var.get_the_password_somehow}"
provisioner "remote-exec" {
inline = [
"set-password ${self.password}"
]
}
}
You can use the random_password provider to generate a random string.
Reference: https://www.terraform.io/docs/providers/random/r/password.html
Not sure if your use case requires management or storage of the password, but that is also possible depending on your needs. I see that you are using DO for provisioning resources.
Maybe you can put Hashicorp Vault in place to manage the randomly generated passwords. I'm an AWS guy so I would stick the password in secrets manager.

Is possible to keep secrets out of state?

For example could reference the password as an environment variable? Even if I did do that it would still be stored in state right?
# Configure the MySQL provider
provider "mysql" {
endpoint = "my-database.example.com:3306"
username = "app-user"
password = "app-password"
}
State snapshots include only the results of resource, data, and output blocks, so that Terraform can compare these with the configuration when creating a plan.
The arguments inside a provider block are not saved in state snapshots, because Terraform only needs the current arguments for the provider configuration, and never needs to compare with the previous.
Even though the provider arguments are not included in the state, it's best to keep specific credentials out of your configuration. Providers tend to offer arguments for credentials as a last resort for unusual situations, but should also offer other ways to provide credentials. For some providers there is some existing standard way to pass credentials, such as the AWS provider using the same credentials mechanisms as the AWS CLI. Other providers define their own mechanisms, such as environment variables.
For the MySQL provider in particular, we should set endpoint in the configuration because that describes what Terraform is managing, but we should use environment variables to specify who is running Terraform. We can use the MYSQL_USERNAME and MYSQL_PASSWORD environment variables to specify the credentials for the individual or system that is running Terraform.
A special exception to this is when Terraform itself is the one responsible for managing the credentials. In that case, the resource that provisioned the credentials will have its data (including the password) stored in the state. There is no way to avoid that because otherwise it would not be possible to use the password elsewhere in the configuration.
For Terraform configurations that manage credentials (rather than just using credentials), they should ideally be separated from other Terraform configurations and have their state snapshots stored in a location where they can be encrypted at rest and accessible only to the individuals or systems that will run Terraform against those configurations. In that case, treat the state snapshot itself as a secret.
No, it's not possible. Your best option is using a safe and encrypted remote backend such as S3 + Dynamodb to keep your state files. I've also read about people using git-crypt, but never tried myself.
That said, you can keep secrets out of your source code using environment variables for inputs.

Make Azure storage account and container before running terraform init?

Correct me if I'm wrong, when you run terraform init you are asked to name a storage account and container for the terraform state.
Can these also automatically be made with terraform?
Edit: I'm using Azure.
I usually split my terraform configurations into two parts.
One that creates a storage account with container, with a specific tag (tf=backend for example). The second one that creates all other resources. I share a backend.tfvars between the two, and in the second one, I get the storage account key using Azure CLI and the previously set tag (that way I don't have to get the key and pass it manually to my second script).
You could even migrate the state of the first terraform configuration once deployed, if you don't want to rely on a local state
Yes, absolutely. You would in general want an S3 bucket for each of your environments, although it's also possible to have a bucket shared across all environments and then set up access controls using bucket policies. Don't create this bucket as part of provisioning other resources, as their lifecycles will likely be different (you would want to retain the bucket for a long time and would be unlikely to want to destroy it).
What you do is you define this bucket in Terraform using local state first. After it is created, you add a remote backend pointing to this bucket.
terraform {
required_version = ">= 0.11.7"
backend "s3" {
bucket = "my-state-bucket"
key = "s3_state_bucket"
region = "us-west-2"
encrypt = "true"
}
}
After you run terraform init, Terraform will ask if you want to migrate the local state file to S3. Answer yes, and after this completes you can delete the local state file, as it's no longer used.
This approach allows you to break out of this chicken and egg situation and still manage all of your infrastructure as code, rather then creating it manually using web console or bash scripts.

Handling run time and build time secrets in AWS CodePipeline

We are dealing with the problem of providing build time and run time secrets to our applications built using AWS CodePipeline and being deployed to ECS.
Ultimately, our vision is to create a generic pipeline for each of our applications that achieves the following goals:
Complete separation of access
The services in the app-a-pipeline CANNOT access any of the credentials or use any of the keys used in the app-b-pipeline and visa-versa
Secret management by assigned developers
Only developers responsible for app-a may read and write secrets for app-a
Here are the issues at hand:
Some of our applications require access to private repositories for dependency resolution at build time
For example, our java applications require access to a private maven repository to successfully build
Some of our applications require database access credentials at runtime
For example, the servlet container running our app requires an .xml configuration file containing credentials to find and access databases
Along with some caveats:
Our codebase resides in a public repository. We do not want to expose secrets by putting either the plaintext or the cyphertext of the secret in our repository
We do not want to bake runtime secrets into our Docker images created in CodeBuild even if ECR access is restricted
The Cloudformation template for the ECS resources and its associated parameter file reside in the public repository in plaintext. This eliminates the possibility of passing runtime secrets to the ECS Cloudformation template through parameters (As far as I understand)
We have considered using tools like credstash to help with managing credentials. This solution requires that both CodeBuild and ECS task instances have the ability to use the AWS CLI. As to avoid shuffling around more credentials, we decided that it might be best to assign privileged roles to instances that require the use of AWS CLI. That way, the CLI can infer credentials from the role in the instances metadata
We have tried to devise a way to manage our secrets given these restrictions. For each app, we create a pipeline. Using a Cloudformation template, we create:
4 resources:
DynamoDB credential table
KMS credential key
ECR repo
CodePipeline (Build, deploy, etc)
3 roles:
CodeBuildRole
Read access to DynamoDB credential table
Decrypt permission with KMS key
Write to ECR repo
ECSTaskRole
Read access to DynamoDB credential table
Decrypt permission with KMS key
Read from ECR repo
DeveloperRole
Read and write access to DynamoDB credential table
Encrypt and decrypt permission with KMS key
The CodeBuild step of the CodePipeline assumes the CodeBuildRole to allow it to read build time secrets from the credential table. CodeBuild then builds the project and generates a Docker Image which it pushes to ECR. Eventually, the deploy step creates an ECS service using the Cloudformation template and the accompanying parameter file present in the projects public repository The ECS task definition includes assuming the ECSTaskRole to allow the tasks to read runtime secrets from the credential table and to pull the required image from ECR.
Here is a simple diagram of the AWS resources and their relationships as stated above
Our current proposed solution has the following issues:
Role heavy
Creating roles is a privileged action in our organization. Not all developers who try to create the above pipeline will have permission to create the necessary roles
Manual assumption of DeveloperRole:
As it stands, developers would need to manually assume the DeveloperRole. We toyed with the idea of passing in a list of developer user ARNs as a parameter to the pipeline Cloudformation template. Does Cloudformation have a mechanism to assign a role or policy to a specified user?
Is there a more well established way to pass secrets around in CodePipeline that we might be overlooking, or is this the best we can get?
Three thoughts:
AWS Secret Manager
AWS Parameter Store
IAM roles for Amazon ECS tasks
AWS Secret ManagerAWS Secrets Manager helps you protect secrets to access applications, services, and IT resources. With you can rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle.
AWS Parameter Store can protect access keys with granular access. This access can be based on ServiceRoles.
ECS provides access to the ServiceRole via this pattern:
build:
commands:
- curl 169.254.170.2$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI | jq 'to_entries | [ .[] | select(.key | (contains("Expiration") or contains("RoleArn")) | not) ] | map(if .key == "AccessKeyId" then . + {"key":"AWS_ACCESS_KEY_ID"} else . end) | map(if .key == "SecretAccessKey" then . + {"key":"AWS_SECRET_ACCESS_KEY"} else . end) | map(if .key == "Token" then . + {"key":"AWS_SESSION_TOKEN"} else . end) | map("export \(.key)=\(.value)") | .[]' -r > /tmp/aws_cred_export.txt
- chmod +x /tmp/aws_cred_export.txt
- /aws_cred_export.txt && YOUR COMMAND HERE
If your ServiceRole provided to the CodeBuild task has access to use the Parameter store key you should be good to go.
Happy hunting and hope this helps
At a high level, you can either isolate applications in a single AWS account with granular permissions (this sounds like what you're trying to do) or by using multiple AWS accounts. Neither is right or wrong per se, but I tend to favor separate AWS accounts over managing granular permissions because your starting place is complete isolation.

Resources