How to design a Terraform resource with multiple sensitive attributes - terraform

Context: I'm developing a new resource for my TF Provider.
This foo resource has a name and associated config: a list of key value pairs (both sensitive and non-sensitive).
There're 3 options I've identified:
resource "foo" "option1" {
name = "option1"
config = {
"name" = "option1"
"errors.length" = 3
"tasks.type" = "FOO"
}
config_sensitive = {
"jira.key" = "..."
"credentials.json" = "..."
}
}
resource "foo" "option2" {
name = "option2"
config = {
"name" = "option1"
"errors.length" = 3
"tasks.type" = "FOO"
"jira.key" = "..."
"credentials.json" = "..."
}
}
resource "foo" "option3" {
name = "option3"
config = file("config.json")
}
The advantage of option #3 is it looks very readable but requires a user to store an extra json file (with secrets) in the same folder (I'm not sure how acceptable that setup is). Option #2 looks tempting but foo should accept updates and if we mark the whole block as sensitive (since it may contain secret key-value pairs), the update functionality will suffer (user won't see the expected change). So Option #1 is the winner in my eyes since it's the most explicit one and allows us to distinguish between sensitive and non-sensitive attributes (while allowing updates for non-sensitive ones). Reading from file the whole config is probably not ideal since it doesn't really allow an engineer to see how the config looks like without opening another file.
There's also this weird duplicated name attribute but let's ignore it for now.
What configuration is the most acceptable and used by other TF Providers?

Option #3 should be struck immediately for three reasons:
You cannot realiably use the sensitive flag in the schema struct like you can with 1 and 2.
It requires a JSON format value which is cumbersome to work with unless you are forced into it (e.g. security policies).
Someone could inline the JSON and not store it in a file, which would completely workaround your attempt to obscure the secrets.
Options 1 and 2 are honestly no different from a secrets management perspective. You could apply the sensitive flag to either in the nested schema struct on a per-attribute basis, and use e.g. Vault to pass in values on a KV basis for either.
I would opt for 1 over 2 simply because it appears to me from your question that the arguments and values in the two blocks have no relationship with each other. Therefore, it makes more sense to organize your schema into two separate blocks for code cleanliness purposes.
I will also mention that if it is possible to refactor the credentials.json into your provider, and leverage the JIRA provider for the jira.key, then that would be best practices by both code architecture and security. It is also how the major providers handle this situation.

Terraform providers should handle the credential/auth implementation and the resource handles the resource configuration.
e.g.
resource "jira_issue" "some_story" {
title = "My story"
type = "story"
labels = ["someexampleonstackoverflow","jakewashere"]
}
Notice there's no config that doesn't relate to the thing I'm creating inside the Terraform resource.
It's very acceptable to have some documented convention in your provider that reads credentials from somewhere, whether that's an OS variable, file on disk etc.
For example: The Google Cloud provider, will read an environment variable if it's populated, if not it'll attempt to read either a configuration file that sits inside a hidden directory within $HOME or attempts to read a localhost http metadata server for the credentials.

Related

Terraform json/map conversion in dynamic section

I am trying to configure a specific list of user with terraform in a dynamic section.
First, I have all my users / password as a json in a Vault like this:
{
"user1": "longPassword1",
"user2amq": "longPassword2",
"user3": "longPassword3"
}
then I declare the Vault data with
data "vault_kv_secret_v2" "all_clients" {
provider = my.vault.provider
mount = "credentials/aws/amq"
name = "dev/clients"
}
and in a locals section:
locals {
all_clients = tomap(jsondecode(data.vault_kv_secret_v2.all_clients.data.client_list))
}
in my tf file, I declare the dynamic section like this:
dynamic "user" {
for_each = local.all_clients
content {
username = each.key
password = each.value
console_access = "false"
groups = ["users"]
}
}
When I apply my terraform I got an error:
│ on modules/amq/amq.tf line 66, in resource "aws_mq_broker" "myproject":
│ 66: for_each = local.all_clients
│
│ Cannot use a map of string value in for_each. An iterable collection is
│ required.
I tried many ways to manage such a map but always terminating with an error
(like using = instead : and bypassing the jsondecode, or having a Map or a List of Object with
"username": "user1"
"pasword": "pass1"
etc... (I am open to adjust the Json for making it working)
Nothing was working and I am a bit out of idea how to map such a simple thing into terraform. I already check plenty of questions/answers in SO and none are working for me.
Terraform version 1.3.5
UPDATE:
By just putting an output on the local variable outside my module:
locals {
all_clients = jsondecode(data.vault_kv_secret_v2.all_clients.data.client_list)
}
output all_clients {
value = local.all_clients
}
after I applied the code, the command terraform output -json all_clients will show my json structure properly (and same if I put = instead and just displayed as a map, without the jsondecode).
As the answer says, the issue is more related to sensitiveness while declaring the loop.
On the other side, I had to adjust my username not being emails because not supported by AWS AmazonMQ (ActiveMQ) and password field must be greater than 12 chars (max 250 chars).
I think the problem here is something Terraform doesn't support but isn't explaining well: you can't use a map that is marked as sensitive directly as the for_each expression, because doing so would disclose some information about that sensitive value in a way that Terraform can't hide in the UI. At the very least, it would expose the number of elements.
It seems like in this particular case it's overly conservative to consider the entire map to be sensitive, but neither Vault nor Terraform understand the meaning of your data structure and so are treating the whole thing as sensitive just to make sure nothing gets disclosed accidentally.
Assuming that only the passwords in this result are actually sensitive, I think the best answer would be to specify explicitly what is and is not sensitive using the sensitive and nonsensitive functions to override the very coarse sensitivity the hashicorp/vault provider is generating:
locals {
all_clients = tomap({
for user, password in jsondecode(nonsensitive(data.vault_kv_secret_v2.all_clients.data.client_list)) :
user => sensitive(password)
})
}
Using nonsensitive always requires care because it's overriding Terraform's automatic inference of sensitive values and so if you use it incorrectly you might show sensitive information in the Terraform UI.
In this case I first used nonsensitive on the whole JSON string returned by the vault_kv_secret_v2 data source, which therefore avoids making the jsondecode result wholly sensitive. Then I used the for expression to carefully mark just the passwords as sensitive, so that the final value would -- if assigned to somewhere that causes it to appear in the UI -- appear like this:
tomap({
"user1" = (sensitive value)
"user2amq" = (sensitive value)
"user3" = (sensitive value)
})
Now that the number of elements in the map and the map's keys are no longer sensitive, this value should be compatible with the for_each in a dynamic block just like the one you showed. Because the value of each element is sensitive, Terraform will treat the password argument value in particular as sensitive, while disclosing the email addresses.
If possible I would suggest testing this with fake data that isn't really sensitive first, just to make sure that you don't accidentally expose real sensitive data if anything here isn't quite correct.

Is DiffSuppressFunc or being more restrictive when saving to TF state is preferable in Terraform SDKv2?

context: I'm adding a new resource to TF Provider (using SDKv2) with roughly the following schema:
resource "player" "football" {
type = "FOOTBALL"
...
config = {
"dribbling" = "50"
"speed" = "90"
"position" = "GOALKEEPER"
}
}
that I represent as:
"config": {
Type: schema.TypeMap,
Elem: &schema.Schema{
Type: schema.TypeString,
},
Required: true,
ForceNew: true,
},
The important detail here for different palyer instances' types there'll be a different set of required attributes (dribbling, speed, position for football and height, can_dunk, arm_span for basketball) -- all players share the same API endpoint so I introduced just one resource to cover them all.
I'd like to support the ability of importing players and apparently READ response includes a bunch of fields that are optional on create (and I suspect most of the users won't have them in Terraform configuration file) which results in the fact that I've got a state difference when saving the whole config like:
d.Set("config", player.GetConfig()) # GetConfig includes a bunch of new attributes (optional on a create or even computed)
So I've got a question: which of the following 2 options is preferable:
Implement DiffSuppressFunc for a config attribute where I'll be ignoring these optional fields (the downside is I'll have an implicit drift between main.tf and TF state file).
Be more restrictive when writing configs to TF state file:
instead of
d.Set("config", player.GetConfig())
# filtered config will match config in main.tf
filteredConfig = ...
d.Set("config", filteredConfig)
In some other Terraform providers that deal with similar situations (where a particular argument has a mixture of configuration-provided and remote-system-provided nested values), the resource type implementation takes a compromise position of effectively exposing the same data in two different attributes, where one of them represents what the user configured and the other represents the full data returned by the remote system. For example, you might have config to be set in the configuration, and expanded_config representing the full set of elements the server decided.
There is a challenge with that approach in that you'll probably need a special rule in your Read function to somehow decide if a change you detect in the remote system constitutes "drift" relative to the configuration or if it's just an additional element added by the server.
From what you described it seems like the rule could be that any key that's present in config in the prior state (that is, the values visible to d.Get inside Read before you call d.Set) would have its value overwritten by what the server returned, but any keys that were not present before are ignored entirely. This would create the effect then that any key the author specified in the configuration is considered "managed by Terraform" while any other key is only read by Terraform and not directly managed.
If you adopt that strategy then it's worth keeping in mind what will happen in a situation where the user has changed the configuration to include a new key or to remove a previously-present key. The Read operation is in terms of the previous state rather than the configuration, so that function will see the keys that were present at the end of the last apply, not the keys currently present in the configuration. In particular this means that if an author adds a new key that the server was already tracking then it will appear in the subsequent plan as being added, even though it might technically be more appropriate to show it as an in-place update ~ or a no-op. This is an example of the compromises we sometimes need to make in order to adapt remote APIs to fit within Terraform's model of resource instances.

Can I iterate through other resources in main.tf in readResource() when developing a TF provider?

Context: I'm developing a TF provider (here's the official guide from HashiCorp).
I run into the following situation:
# main.tf
resource "foo" "example" {
id = "foo-123"
name = "foo-name"
lastname = "foo-lastname"
}
resource "bar" "example" {
id = "bar-123"
parent_id = foo.example.id
parent_name = foo.example.name
parent_lastname = foo.example.lastname
}
where I have to declare parent_name and parent_lastname (effectively duplicate them) explicitly to be able to read these values that are necessary for read / create request for Bar resource.
Is it possible to use a fancy trick with d *schema.ResourceData in
func resourceBarRead(ctx context.Context, d *schema.ResourceData, m interface{}) diag.Diagnostics {
to avoid duplicated in my TF config, i.e. have just:
resource "foo" "example" {
id = "foo-123"
name = "foo-name"
lastname = "foo-lastname"
}
resource "bar" "example" {
id = "bar-123"
parent_id = foo.example.id
}
infer foo.example.name and foo.example.lastname just based on foo.example.id in resourceBarRead() somehow so I won't have to duplicate those fields in both resources?
Obviously, this is minimal example and let's assume I need both foo.example.name and foo.example.lastname to send a read / create request for Bar resource? In other words, can I iterate through other resource in TF state / main.tf file based on target ID to find its other attributes? It seems to be a useful feauture howerever it's not mentioned in HashiCorp's guide so I guess it's undesirable and I have to duplicate those fields.
In Terraform's provider/resource model, each resource block is independent of all others unless the user explicitly connects them using references like you showed.
However, in most providers it's sufficient to pass only the id (or similar unique identifier) attribute downstream to create a relationship like this, because the other information about that object is already known to the remote system.
Without knowledge about the particular remote system you are interacting with, I would expect that you'd be able to use the value given as parent_id either directly in an API call (and thus have the remote system connect it with the existing object), or to make an additional read request to the remote API to look up the object using that ID and obtain the name and lastname values that were saved earlier.
If those values only exist in the provider's context and not in the remote API then I don't think there will be any alternative but to have the user pass them in again, since that is the only way that local values (as opposed to values persisted in the remote API) can travel between resources.

How to use the sops provider with terraform using an array instead an single value

I'm pretty new to Terraform. I'm trying to use the sops provider plugin for encrypting secrets from a yaml file:
Sops Provider
I need to create a Terraform user object for a later provisioning stage like this example:
users = [{
name = "user123"
password = "password12"
}]
I've prepared a secrets.values.enc.yaml file for storing my secret data:
yaml_users:
- name: user123
password: password12
I've encrypted the file using "sops" command. I can decrypt the file successfully for testing purposes.
Now I try to use the encrypted file in Terraform for creating the user object:
data "sops_file" "test-secret" {
source_file = "secrets.values.enc.yaml"
}
# user data decryption
users = yamldecode(data.sops_file.test-secret.raw).yaml_users
Unfortunately I cannot debug the data or the structure of "users" as Terraform doesn't display sensitive data. When I try to use that users variable for the later provisioning stage than it doesn't seem to be what is needed:
Cannot use a set of map of string value in for_each. An iterable
collection is required.
When I do the same thing with the unencrypted yaml file everything seems to be working fine:
users = yamldecode(file("secrets.values.dec.yaml")).yaml_users
It looks like the sops provider decryption doesn't create an array or that "iterable collection" that I need.
Does anyone know how to use the terraform sops provider for decrypting an array of key-value pairs? A single value like "adminpassword" is working fine.
I think the "set of map of string" part of this error message is the important part: for_each requires either a map directly (in which case the map keys become the instance identifiers) or a set of individual strings (in which case those strings become the instance identifiers).
Your example YAML file shows yaml_users being defined as a YAML sequence of maps, which corresponds to a tuple of objects on conversion with yamldecode.
To use that data structure with for_each you'll need to first project it into a map whose keys will serve as the unique identifier for each instance of the resource. Assuming that the name values are suitably unique, you could project it so that those values are the keys:
data "sops_file" "test-secret" {
source_file = "secrets.values.enc.yaml"
}
locals {
users = tomap({
for u in yamldecode(data.sops_file.test-secret.raw).yaml_users :
u.name => u
})
}
The result being a sensitive value adds an extra wrinkle here, because Terraform won't allow using a sensitive value as the identifier for an instance of a resource -- to do so would make it impossible to show the resource instance address in the UI, and impossible to describe the instance on the command line for commands that need that.
However, this does seem like exactly the use-case shown in the example of the nonsensitive function at the time I'm writing this: you have a collection that is currently wholly marked as sensitive, but you know that only parts of it are actually sensitive and so you can use nonsensitive to explain to Terraform how to separate the nonsensitive parts from the sensitive parts. Here's an updated version of the locals block in my previous example using that function:
locals {
users = tomap({
for u in yamldecode(data.sops_file.test-secret.raw).yaml_users :
nonsensitive(u.name) => u
})
}
If I'm making a correct assumption that it's only the passwords that are sensitive and that the usernames are okay to disclose, the above will produce a suitable data structure where the usernames are visible in the keys but the individual element values will still be marked as sensitive.
local.users then meets all of the expectations of resource for_each, and so you should be able to use it with whichever other resources you need to repeat systematically for each user.
Please note that Terraform's tracking of sensitive values is for UI purposes only and will not prevent this passwords from being saved in the state as a part of whichever resources make use of them. If you use Terraform to manage sensitive data then you should treat the resulting state snapshots as sensitive artifacts in their own right, being careful about where and how you store them.

terraform to append consul_key values in json

I have a project on which I have to use terraform and in the end of the terraform, I need to append consul key's values places on a /path. I have the following:
resource "consul_keys" "write" {
datacenter = "dc1"
token = "xxxx-x-x---xxxxxx--xx-x-x-x"
key {
path = "path/to/name"
value = jsonencode([
{
cluster_name = "test", "region" : "us-east1"
},
{
cluster_name = "test2", "region" : "us-central1"
}
])
}
}
But if I run the terraform again with new values, it deletes all previous values and update new values.
Any way I can keep appending the values keeping previous values as it is?
The consul_keys resource type in the hashicorp/consul provider only supports situations where it is responsible for managing the entirety of the value of each of the given keys. This is because the underlying Consul API itself treats each key as a single atomic unit, and doesn't support partial updates of the sort you want to achieve here.
If you are able to change the design of the system that is consuming these values, a different way to get a comparable result would be to set aside a particular key prefix as a collection of values that the consumer will merge together after reading them. Consul's Read Key API includes a mode recurse=true which allows you to provide a prefix to read all of the entries with a given prefix in a single request.
By designing your key structure this way, you can use a separate keys for the data that Terraform will provide and the data provided by each other system that will generate data under this shared prefix. These different systems can therefore each maintain their own designated sub-key and not need to take any special extra steps to preserve existing data already stored at that location.
If you are using consul-template then consul-template's ls function wraps the multi-key lookup I described above.
If you are reading the data from Consul in some other Terraform configuration, the consul_key_prefix data source similarly implements the operation of fetching all key/value pairs under a given prefix.

Resources