How can I load input data from a file in Terraform? - terraform

I defined a aws_cloudwatch_event_target in terraform to fire an event to lambda from cloudwatch. The input field is the event parameter for example:
resource "aws_cloudwatch_event_target" "data" {
rule = "${aws_cloudwatch_event_rule.scheduler.name}"
target_id = "finance_producer_cloudwatch"
arn = "${aws_lambda_function.finance_data_producer.arn}"
input = "{\"test\": [\"111\"]}"
}
I wonder how I can load the input json data from an external file.

The answer here depends on a few different questions:
Is this file a static part of your configuration, checked in to version control alongside your .tf files, or is it dynamically generated as part of the apply process?
Do you want to use the file contents literally, or do you need to substitute values into it from elsewhere in the Terraform configuration?
These two questions form a matrix of four different answers:
| Literal Content Include Values from Elsewhere
-------------|----------------------------------------------------------
Static File | file(...) function templatefile(...) function
Dynamic File | local_file data source template_file data source
I'll describe each of these four options in more detail below.
A common theme in all of these examples will be references to path.module, which evaluates to the path where the current module is loaded from. Another way to think about that is that it is the directory containing the current .tf file. Accessing files in other directories is allowed, but in most cases it's appropriate to keep things self-contained in your module by keeping the data files and the configuration files together.
Terraform strings are sequences of unicode characters, so Terraform can only read files containing valid UTF-8 encoded text. For JSON that's no problem, but worth keeping in mind for other file formats that might not conventionally be UTF-8 encoded.
The file function
The file function reads the literal content of a file from disk as part of the initial evaluation of the configuration. The content of the file is treated as if it were a literal string value for validation purposes, and so the file must exist on disk (and usually, in your version control) as a static part of your configuration, as opposed to being generated dynamically during terraform apply.
resource "aws_cloudwatch_event_target" "data" {
rule = aws_cloudwatch_event_rule.scheduler.name
target_id = "finance_producer_cloudwatch"
arn = aws_lambda_function.finance_data_producer.arn
input = file("${path.module}/input.json")
}
This is the most common and simplest option. If the file function is sufficient for your needs then it's the best option to use as a default choice.
The templatefile function
The templatefile function is similar to the file function, but rather than just returning the file contents literally it instead parses the file contents as a string template and then evaluates it using a set of local variables given in its second argument. This is useful if you need to pass some data from elsewhere in the Terraform configuration, as in this example:
resource "aws_cloudwatch_event_target" "data" {
rule = aws_cloudwatch_event_rule.scheduler.name
target_id = "finance_producer_cloudwatch"
arn = aws_lambda_function.finance_data_producer.arn
input = templatefile("${path.module}/input.json.tmpl", {
instance_id = aws_instance.example.id
})
}
In input.json.tmpl you can use the Terraform template syntax to substitute that variable value:
{"instance_id":${jsonencode(instance_id)}}
In cases like this where the whole result is JSON, I'd suggest just generating the whole result using jsonencode, since then you can let Terraform worry about the JSON escaping etc and just write the data structure in Terraform's object syntax:
${jsonencode({
instance_id = instance_id
})}
As with file, because templatefile is a function it gets evaluated during initial decoding of the configuration and its result is validated as a literal value. The template file must therefore also be a static file that is distributed as part of the configuration, rather than a dynamically-generated file.
The local_file data source
Data sources are special resource types that read an existing object or compute a result, rather than creating and managing a new object. Because they are resources, they can participate in the dependency graph and can thus make use of objects (including local files) that are created by other resources in the same Terraform configuration during terraform apply.
The local_file data source belongs to the local provider and is essentially the data source equivalent of the file function.
In the following example, I'm using var.input_file as a placeholder for any reference to a file path that is created by some other resource in the same configuration. In a real example, that is most likely to be a direct reference to an attribute of a resource.
data "local_file" "input" {
filename = var.input_file
}
resource "aws_cloudwatch_event_target" "data" {
rule = aws_cloudwatch_event_rule.scheduler.name
target_id = "finance_producer_cloudwatch"
arn = aws_lambda_function.finance_data_producer.arn
input = data.local_file.input.content
}
The template_file data source
NOTE: Since I originally wrote this answer, the provider where template_file was implemented has been declared obsolete and no longer maintained, and there is no replacement. In particular, the provider was archived prior to the release of Apple Silicon and so there is no available port for macOS on that architecture.
The Terraform team does not recommend rendering of dynamically-loaded templates, because it pushes various errors that could normally be detected at plan time to be detected only during apply time instead.
I've retained this content as I originally wrote it in case it's useful, but I would suggest treating this option as a last resort.
The template_file data source is the data source equivalent of the templatefile function. It's similar in usage to local_file though in this case we populate the template itself by reading it as a static file, using either the file function or local_file as described above depending on whether the template is in a static file or a dynamically-generated one, though if it were a static file we'd prefer to use the templatefile function and so we'll use the local_file data source here:
data "local_file" "input_template" {
filename = var.input_template_file
}
data "template_file" "input" {
template = data.local_file.input_template.content
vars = {
instance_id = aws_instance.example.id
}
}
resource "aws_cloudwatch_event_target" "data" {
rule = aws_cloudwatch_event_rule.scheduler.name
target_id = "finance_producer_cloudwatch"
arn = aws_lambda_function.finance_data_producer.arn
input = data.template_file.input.rendered
}
The templatefile function was added in Terraform 0.12.0, so you may see examples elsewhere of using the template_file data source to render static template files. That is an old pattern, now deprecated in Terraform 0.12, because the templatefile function makes for a more direct and readable configuration in most cases.
One quirk of the template_file data source as opposed to the templatefile function is that the data source belongs to the template provider rather than to Terraform Core, and so which template features are available in it will depend on which version of the provider is installed rather than which version of Terraform CLI is installed. The template provider is likely to lag behind Terraform Core in terms of which template language features are available, which is another reason to prefer the templatefile function where possible.
Other Possibilities
This question was specifically about reading data from a file, but for completeness I also want to note that for small JSON payloads it can sometimes be preferable to inline them directly in the configuration as a Terraform data structure and convert to JSON using jsonencode, like this:
resource "aws_cloudwatch_event_target" "data" {
rule = aws_cloudwatch_event_rule.scheduler.name
target_id = "finance_producer_cloudwatch"
arn = aws_lambda_function.finance_data_producer.arn
input = jsonencode({
instance_id = aws_instance.example.id
})
}
Writing the data structure inline as a Terraform expression means that a future reader can see directly what will be sent without needing to refer to a separate file. However, if the data structure is very large and complicated then it can hurt overall readability to include it inline because it could overwhelm the other configuration in the same file.
Which option to choose will therefore depend a lot on the specific circumstances, but always worth considering whether the indirection of a separate file is the best choice for readability.
Terraform also has a yamlencode function (experimental at the time of writing) which can do similarly for YAML-formatted data structures, either directly inside a .tf file or in an interpolation sequence in an external template.

You can use the file() operator to pull data from an external file:
input = "${file("myjson.json")}"
Just make sure myjson.json exists on disk in the same directory as the rest of your Terraform files.

I would use the data template_file resource. Like so...
data "template_file" "my_file" {
template = "${file("${path.module}/my_file.json")}"
vars = {
var_to_use_in_file = "${var.my_value}"
}
}
Then in your resource block....
resource "aws_cloudwatch_event_target" "data" {
rule = "${aws_cloudwatch_event_rule.scheduler.name}"
target_id = "finance_producer_cloudwatch"
arn = "${aws_lambda_function.finance_data_producer.arn}"
input = "${data.template_file.my_file.rendered}"
}

Related

Terraform best practise for near identical iam_policy_document's for each environment to avoid duplication

What’s the best practise for handling policy documents that are entirely the same for each environment apart from an ID within them?
Initially, the codebase I was using simply duplicated these policies
in the iam.tf file with the ID changed in each environments resource
definition. It’s a single workspace monolithic repo that I can’t
change.
I then refactored it to be a module which creates the policy
with the ID as a variable.
I then found out about templatefiles in
terraform so I refactored it to instead be a policy .tftpl file in a
subdirectory and then I call templatefile() with the different
variable for each environment.
I’m aware that the recommended convention for policy documents is to implement them as a data object, but my understanding is I can’t then parameterise it to prevent entire policy documents being repeated save for a single variable (unless I modularise it like I did initially).
Does anyone have any advice on the best practise for this scenario?
You can definitely parameterize the aws_iam_policy_document data source.
data "aws_iam_policy_document" "this" {
for_each = toset(["bucket-a", "bucket-b"])
statement {
actions = ["s3:*"]
resources = ["arn:aws:s3:::${each.key}"]
}
}
You can follow this pattern for attachment too:
resource "aws_iam_policy" "this" {
for_each = toset(["bucket-a", "bucket-b"])
name_prefix = each.key
policy = data.aws_iam_policy_document.this[each.key].json
}
resource "aws_iam_policy_attachment" "this" {
for_each = toset(["bucket-a", "bucket-b"])
name = "${each.key}-attachment"
policy_arn = aws_iam_policy.this[each.key].arn
# things to attach to
}

Extracting raw (non string) parameter values from terraform using terraform-config-inspect

I'm trying to generate json from terraform modules using terraform-config-inspect (https://github.com/hashicorp/terraform-config-inspect).
Note: Started with terraform-docs but then found what it uses underneath and it's terraform-config-inspect library.
The problem is that I want to go beyond what terraform-config-inspect provides out of box at the moment:
As an example, I want to get the name of aws_ssm_parameter resource.
For example, I have resource like this:
resource "aws_ssm_parameter" "service_security_group_id" {
name = "/${var.deployment}/shared/${var.service_name}/security_group_id"
type = "String"
value = aws_security_group.service.id
overwrite = "true"
tags = var.tags
}
and I would like to extract the value of the name parameter but by default it does not output this parameter. I tried to hack the code by modifying resource schema and other parts but ended up in getting empty string instead of name value or error because it contains parts like ${var.deployment}.
When I set it to plain string then my modified code returns what I expect
"aws_ssm_parameter.service_security_group_id": {
"mode": "managed",
"type": "aws_ssm_parameter",
"name": "service_security_group_id",
"value": "/test-env/shared/my-service/security_group_id",
"provider": {
"name": "aws"
}
}
but in normal case it fails with the following error
{
"severity": "error",
"summary": "Unsuitable value type",
"detail": "Unsuitable value: value must be known",
...
}
I know that I could build something totally custom for my specific use case but I hope there is something that could be re-used :)
So the questions are:
Is it somehow possible to take the real raw value from terraform resource so I could get "/${var.deployment}/shared/${var.service_name}/security_group_id" in json output?
Maybe some other tool out there?
Thanks in advance!
Input Variables in Terraform are a planning option and so to resolve them fully requires creating a Terraform plan. If you are able to create a Terraform plan then you can find the resolved values in the JSON serialization of the plan, using steps like the following:
terraform plan -out=tfplan (optionally include -var=... and -var-file=... if you need to set particular values for those variables.
terraform show -json tfplan to get a JSON representation of the plan.
Alternatively, if you've already applied the configuration you want to analyse then you can get similar information from the JSON representation of the latest state snapshot:
terraform show -json to get a JSON representation of the latest state snapshot.
As you've seen, terraform-config-inspect is only for static analysis of the top-level declarations and so it contains no functionality for evaluating expressions.
In order to properly evaluate expressions here without creating a Terraform plan or reading from a Terraform state snapshot would require reimplementing the Terraform Core runtime, at least to some extent. However, for this particular expression (which only relies on input variable values) you could potentially use the HCL API directly with some hard-coded placeholder values for those variables in order to get a value for that argument, derived from whatever you happen to have set var.deployment and var.service_name to in the hcl.EvalContext you construct yourself.

Locals depends_on - Terraform

I have a module a in terraform which creates a text file , i need to use that text file in another module b, i am using locals to pull the content of that text file like below in module b
locals {
ports = split("\n", file("ports.txt") )
}
But the terraform expects this file to be present at the start itself, throws error as below
Invalid value for "path" parameter: no file exists at
path/ports.txt; this function works only with files
that are distributed as part of the configuration source code, so if this file
will be created by a resource in this configuration you must instead obtain
this result from an attribute of that resource.
What am i missing here? Any help on this would be appreciated. Is there any depends_on for locals, how can i make this work
Modules are called from within other modules using module blocks. Most arguments correspond to input variables defined by the module. To reference the value from one module, you need to declare the output in that module, then you can call the output value from other modules.
For example, I suppose you have a text file in module a.
.tf file in module a
output "textfile" {
value = file("D:\\Terraform\\modules\\a\\ports.txt")
}
.tf file in module b
variable "externalFile" {
}
locals {
ports = split("\n", var.externalFile)
}
# output "b_test" {
# value = local.ports
# }
.tf file in the root module
module "a" {
source = "./modules/a"
}
module "b" {
source = "./modules/b"
externalFile = module.a.textfile
depends_on = [module.a]
}
# output "module_b_output" {
# value = module.b.b_test
# }
For more reference, you could read https://www.terraform.io/docs/language/modules/syntax.html#accessing-module-output-values
As the error message reports, the file function is only for files that are included on disk as part of your configuration, not for files generated dynamically during the apply phase.
I would typically suggest avoiding writing files to local disk as part of a Terraform configuration, because one of Terraform's main assumptions is that any objects you manage with Terraform will persist from one run to the next, but that could only be true for a local file if you always run Terraform in the same directory on the same computer, or if you use some other more complex approach such as a network filesystem. However, since you didn't mention why you are writing a file to disk I'll assume that this is a hard requirement and make a suggestion about how to do it, even though I would consider it a last resort.
The hashicorp/local provider includes a data source called local_file which will read a file from disk in a similar way to how a more typical data source might read from a remote API endpoint. In particular, it will respect any dependencies reflected in its configuration and defer reading the file until the apply step if needed.
You could coordinate this between modules then by making the output value which returns the filename also depend on whichever resource is responsible for creating the file. For example, if the file were created using a provisioner attached to an aws_instance resource then you could write something like this inside the module:
output "filename" {
value = "D:\\Terraform\\modules\\a\\ports.txt"
depends_on = [aws_instance.example]
}
Then you can pass that value from one module to the other, which will carry with it the implicit dependency on aws_instance.example to make sure the file is actually created first:
module "a" {
source = "./modules/a"
}
module "b" {
source = "./modules/b"
filename = module.a.filename
}
Then finally, inside the module, declare that input variable and use it as part of the configuration for a local_file data resource:
variable "filename" {
type = string
}
data "local_file" "example" {
filename = var.filename
}
Elsewhere in your second module you can then use data.local_file.example.content to get the contents of that file.
Notice that dependencies propagate automatically aside from the explicit depends_on in the output "filename" block. It's a good practice for a module to encapsulate its own behaviors so that everything needed for an output value to be useful has already happened by the time a caller uses it, because then the rest of your configuration will just get the correct behavior by default without needing any additional depends_on annotations.
But if there is any way you can return the data inside that ports.txt file directly from the first module instead, without writing it to disk at all, I would recommend doing that as a more robust and less complex approach.

Declare multiple providers for a list of regions

I have a Terraform module that manages AWS GuardDuty.
In the module, an aws_guardduty_detector resource is declared. The resource allows no specification of region, although I need to configure one of these resources for each region in a list. The region used needs to be declared by the provider, apparently(?).
Lack of module for_each seems to be part of the problem, or, at least, module for_each, if it existed, might let me declare the whole module, once for each region.
Thus, I wonder, is it possible to somehow declare a provider, for each region in a list?
Or, short of writing a shell script wrapper, or doing code generation, is there any other clean way to solve this problem that I might not have thought of?
To support similar processes I have found two approaches to this problem
Declare multiple AWS providers in the Terraform module.
Write the module to use a single provider, and then have a separate .tfvars file for each region you want to execute against.
For the first option, it can get messy having multiple AWS providers in one file. You must give each an alias and then each time you create a resource you must set the provider property on the resource so that Terraform knows which region provider to execute against. Also, if the provider for one of the regions can not initialize, maybe the region is down, then the entire script will not run, until you remove it or the region is back up.
For the second option, you can write the Terraform for what resources you need to set up and then just run the module multiple times, once for each regional .tfvars file.
prod-us-east-1.tfvars
prod-us-west-1.tfvars
prod-eu-west-2.tfvars
My preference is to use the second option as the module is simpler and less duplication. The only duplication is in the .tfvars files and should be more manageable.
EDIT: Added some sample .tfvars
prod-us-east-1.tfvars:
region = "us-east-1"
account_id = "0000000000"
tags = {
env = "prod"
}
dynamodb_read_capacity = 100
dynamodb_write_capacity = 50
prod-us-west-1.tfvars:
region = "us-west-1"
account_id = "0000000000"
tags = {
env = "prod"
}
dynamodb_read_capacity = 100
dynamodb_write_capacity = 50
We put whatever variables might need to be changed for the service or feature based on environment and/or region. For instance in a testing environment, the dynamodb capacity may be lower than in the production environment.

setting value of variable terraform in tfvars file for nested structure

terraform has adjusted its authorization
in main.tf [for sql config] I now have:
resource "google_sql_database_instance" "master" {
name = "${random_id.id.hex}-master"
region = "${var.region}"
database_version = "POSTGRES_9_6"
# allow direct access from work machines
ip_configuration {
authorized_networks = "${var.authorized_networks}"
require_ssl = "${var.sql_require_ssl}"
ipv4_enabled = true
}
}
where
in variables.tf I have
variable "authorized_networks" {
description = "The networks that can connect to cloudsql"
type = "list"
default = [
{
name = "work"
value = "xxx.xxx.xx.xxx/32"
}
]
}
where xxx.xxx.xx.xxx is the ip address I would like to allow. However, I prefer not to put this in my variables.tf file, but rather in a non-source controlled .tfvars file.
for variables that have a simple value, this is easy, but it is not clear to me how to do it with the nested structure. Replacing xxx.xxx.xx.xxx by a variable [e.g. var.work_ip] leads to an error
variables may not be used here
any insights?
If you omit the default argument in your main configuration altogether, you will mark variable "authorized_networks" as a required input variable, which Terraform will then check to ensure that it is set by the caller.
If this is a root module variable, then you can provide the value for it in a .tfvars file using the following syntax:
authorized_networks = [
{
name = "work"
value = "xxx.xxx.xx.xxx/32"
}
]
If this file is being generated programmatically by some wrapping automation around Terraform, you can also write it into a .tfvars.json file and use JSON syntax, which is often easier to construct robustly in other languages:
{
"authorized_networks": [
{
"name": "work",
"value": "xxx.xxx.xx.xxx/32"
}
]
}
You can either specify this file explicitly on the command line using the -var-file option, or you can give it a name ending in .auto.tfvars or .auto.tfvars.json in the current working directory when you run Terraform and Terraform will then find and load it automatically.
A common reason to keep something out of version control is because it's a dynamic setting configured elsewhere in the broader system rather than a value fixed in version control. If that is true here, then an alternative strategy is to save that setting in a configuration data store that Terraform is able to access via data sources and then write your Terraform configuration to retrieve that setting directly from the place where it is published.
For example, if the network you are modelling here were a Google Cloud Platform subnetwork, and it has either a fixed name or one that can be derived systematically in Terraform, you could retrieve this setting using the google_compute_subnetwork data source:
data "google_compute_subnetwork" "work" {
name = "work"
}
Elsewhere in configuration, you can then use data.google_compute_subnetwork.work.ip_cidr_range to access the CIDR block definition for this network.
The major Terraform providers have a wide variety of data sources like this, including ones that retrieve specific first-class objects from the target platform and also more generic ones that access configuration stores like AWS Systems Manager Parameter Store or HashiCorp Consul. Accessing the necessary information directly or publishing it "online" in a configuration store can be helpful in a larger system to efficiently integrate subsystems.

Resources