How do I apply a CRD from github to a cluster with terraform? - terraform

I want to install a CRD with terraform, I was hoping it would be easy as doing this:
data "http" "crd" {
url = "https://raw.githubusercontent.com/kubernetes-sigs/application/master/deploy/kube-app-manager-aio.yaml"
request_headers = {
Accept = "text/plain"
}
}
resource "kubernetes_manifest" "install-crd" {
manifest = data.http.crd.body
}
But I get this error:
can't unmarshal tftypes.String into *map[string]tftypes.Value, expected
map[string]tftypes.Value
Trying to convert it to yaml with yamldecode also doesn't work because yamldecode doesn't support multi-doc yaml files.
I could use exec, but I was already doing that while waiting for the kubernetes_manifest resource to be released. Does kubernetes_manifest only support a single resource or can it be used to create several from a raw text manifest file?

kubernetes_manifest (emphasis mine)
Represents one Kubernetes resource by supplying a manifest attribute
That sounds to me like it does not support multiple resources / a multi doc yaml file.
However you can manually split the incoming document and yamldecode the parts of it:
locals {
yamls = [for data in split("---", data.http.crd.body): yamldecode(data)]
}
resource "kubernetes_manifest" "install-crd" {
count = length(local.yamls)
manifest = local.yamls[count.index]
}
Unfortunately on my machine this then complains about
'status' attribute key is not allowed in manifest configuration
for exactly one of the 11 manifests.
And since I have no clue of kubernetes I have no idea what that means or wether or not it needs fixing.
Alternatively you can always use a null_resource with a script that fetches the yaml document and uses bash tools or python or whatever is installed to convert and split and filter the incoming yaml.

I got this to work using kubectl provider. Eventually kubernetes_manifest should work as well, but it is currently (v2.5.0) still beta and has some bugs. This example only uses kind+name, but for full uniqueness, it should also include the API and the namespace params.
resource "kubectl_manifest" "cdr" {
# Create a map { "kind--name" => yaml_doc } from the multi-document yaml text.
# Each element is a separate kubernetes resource.
# Must use \n---\n to avoid splitting on strings and comments containing "---".
# YAML allows "---" to be the first and last line of a file, so make sure
# raw yaml begins and ends with a newline.
# The "---" can be followed by spaces, so need to remove those too.
# Skip blocks that are empty or comments-only in case yaml began with a comment before "---".
for_each = {
for pair in [
for yaml in split(
"\n---\n",
"\n${replace(data.http.crd.body, "/(?m)^---[[:blank:]]*(#.*)?$/", "---")}\n"
) :
[yamldecode(yaml), yaml]
if trimspace(replace(yaml, "/(?m)(^[[:blank:]]*(#.*)?$)+/", "")) != ""
] : "${pair.0["kind"]}--${pair.0["metadata"]["name"]}" => pair.1
}
yaml_body = each.value
}
Once Hashicorp fixes kubernetes_manifest, I would recommend using the same approach. Do not use count+element() because if the ordering of the elements change, Terraform will delete/recreate many resources without needed it.
resource "kubernetes_manifest" "crd" {
for_each = {
for value in [
for yaml in split(
"\n---\n",
"\n${replace(data.http.crd.body, "/(?m)^---[[:blank:]]*(#.*)?$/", "---")}\n"
) :
yamldecode(yaml)
if trimspace(replace(yaml, "/(?m)(^[[:blank:]]*(#.*)?$)+/", "")) != ""
] : "${value["kind"]}--${value["metadata"]["name"]}" => value
}
manifest = each.value
}
P.S. Please support Terraform feature request for multi-document yamldecode. Will make things far easier than the above regex.

Terraform can split a multi-resource yaml (---) for you (docs):
# fetch a raw multi-resource yaml
data "http" "knative_serving_crds" {
url = "https://github.com/knative/serving/releases/download/knative-v1.7.1/serving-crds.yaml"
}
# split raw yaml into individual resources
data "kubectl_file_documents" "knative_serving_crds" {
content = data.http.knative_serving_crds.body
}
# apply each resource from the yaml one by one
resource "kubectl_manifest" "knative_serving_crds" {
depends_on = [kops_cluster_updater.updater]
for_each = data.kubectl_file_documents.knative_serving_crds.manifests
yaml_body = each.value
}

Related

Validating through terraform that a vault policy exists before using it in a group

I have the following structure
module "policies" {
source = "../../../../path/to/my/custom/modules/groups"
for_each = var.config.policies
name = each.key
policy = each.value
}
module "groups" {
source = "../../../../path/to/my/custom/modules/groups"
for_each = var.config.groups
name = each.key
type = each.value.type
policies = each.value.policies
depends_on = [
module.policies
]
}
Policies and groups are declared in a yaml file from which through yamldecode the corresponding variables to for_each are created.
Is there any way to make sure that the policies passed to policies = each.value.policies of the groups module DO exist?
I mean, OK I have the depends_on clause, but I want to also provision for typos in the yaml file and other similar situations.
The usual way to declare a dependency on an external object (managed elsewhere) in Terraform is to use a data block using a data source defined by the provider responsible for that object. If the goal is only to verify that the object exists then it's enough to declare the data source and then have your downstream object's configuration refer to anything about its result, just so Terraform can see that the data source is a dependency and so should be resolved first.
Unfortunately it seems like the hashicorp/vault provider doesn't currently have a data source for declaring a dependency on a policy, although there is a feature request for it.
Assuming that it did exist then the pattern might look something like this:
data "vault_policy" "needed" {
for_each = var.config.policies
name = each.value
}
module "policies" {
source = "../../../../path/to/my/custom/modules/groups"
for_each = var.config.policies
name = each.key
# Accessing this indirectly via the data resource tells
# Terraform that it must complete the data lookup before
# planning anything which depends on this "policy" argument.
policy = data.vault_policy.needed[each.key].name
}
Without a data source for this particular object type I don't think there will be an elegant way to solve this, but you may be able to work around it by using a more general data source like hashicorp/external's external data source for collecting data by running an external program that prints JSON.
Again because you don't actually seem to need any specific data from the policy and only want to check whether it exists, it would be sufficient to write an external program which queries vault and then exists with an unsuccessful status if the request fails, or prints an empty JSON object {} if the request succeeds.
data "external" "vault_policy" {
for_each = var.config.policies
program = ["${path.module}/query-vault"]
query = {
policy_name = each.value
}
}
module "policies" {
source = "../../../../path/to/my/custom/modules/groups"
for_each = var.config.policies
name = each.key
policy = data.external.vault_policy.query.policy_name
}
I'm not familiar enough with Vault to suggest a specific implementation of this query-vault program, but you may be able to use a shell script wrapping the vault CLI program if you follow the advice in Processing JSON in shell scripts. You only need to do the input parsing part of that, because your result would be communicated either by exit 1 to signal failure or echo '{}' followed by exiting successfully to signal success.

Terraform Import Resources and Looping Over Those Resources

I am new to Terraform and looking to utilize it for management of Snowflake environment using the provider of "chanzuckerberg/snowflake". I am specifically looking to leverage it for managing an RBAC model for roles within Snowflake.
The scenario is that I have about 60 databases in Snowflake which would equate to a resource for each in Terraform. We will then create 3 roles (reader, writer, all privileges) for each database. We will expand our roles from there.
The first question is, can I leverage map or object variables to define all database names and their attributes and import them using a for_each within a single resource or do I need to create a resource for each database and then import them individually?
The second question is, what would be the best approach for creating the 3 roles per database? Is there a way to iterate over all the resources of type snowflake_database and create the 3 roles? I was imagining the use of modules, variables, and resources based on the research I have done.
Any help in understanding how this can be accomplished would be super helpful. I understand the basics of Terraform but this is a bit of a complex situation for a newbie like myself to visualize enough to implement it. Thanks all!
Update:
This is what my project looks like and the error I am receiving is below it.
variables.tf:
variable "databases" {
type = list(object(
{
name = string
comment = string
retention_days = number
}))
}
databases.auto.tfvars:
databases = [
{
name = "TEST_DB1"
comment = "Testing state."
retention_days = 90
},
{
name = "TEST_DB2"
comment = ""
retention_days = 1
}
]
main.tf:
terraform {
required_providers {
snowflake = {
source = "chanzuckerberg/snowflake"
version = "0.25.25"
}
}
}
provider "snowflake" {
username = "user"
account = "my_account"
region = "my_region"
password = "pwd"
role = "some_role"
}
resource "snowflake_database" "sf_database" {
for_each = { for idx, db in var.databases: idx => db }
name = each.value.name
comment = each.value.comment
data_retention_time_in_days = each.value.retention_days
}
To Import the resource I run:
terraform import snowflake_database.sf_databases["TEST_DB1"]
db_test_db1
I am left with this error:
Error: resource address
"snowflake_database.sf_databases["TEST_DB1"]" does not exist in the
configuration.
Before importing this resource, please create its configuration in the
root module. For example:
resource "snowflake_database" "sf_databases" { # (resource
arguments) }
You should be able to define the databases using for_each and referring to the actual resources with brackets in the import command. Something like:
terraform import snowflake_database.resource_id_using_for_each[foreachkey]
You could then create three snowflake_role and three snowflake_database_grant definitions using for_each over the same map of databases used for the database definitions.
had this exact same problem and in the end the solution was quite simple.
you just need to wrap your import statement within single brackets.
so instead of
terraform import snowflake_database.sf_databases["TEST_DB1"] db_test_db1
do
terraform import 'snowflake_database.sf_databases["TEST_DB1"]' db_test_db1
this took way to long to figure out!

Terraform extensible user_data in aws_instance

Good day,
Our team utilizes a module that creates Linux instances with a standard configuration in user_data as defined below.
resource "aws_instance" "this" {
...
user_data = templatefile("${path.module}/user_data.tp", { hostname = upper("${local.prefix}${count.index + 1}"), domain = local.domain })
...
}
Contents of the user_data.tp:
#cloud-config
repo_update: true
repo_upgrade: all
preserve_hostname: false
hostname: ${hostname}
fqdn: ${hostname}.${domain}
manage_etc_hosts: false
runcmd:
- 'echo "preserve_hostname: true" >> /etc/cloud/cloud.cfg.d/99_hostname.cfg'
What is the best way to modify this module such that the contents of user_data.tp are always executed and optionally another block could be passed to install certain packages or execute certain shell scripts?
I'm assuming it involves using cloudinit_config and a multipart mime configuration, but would appreciate any suggestions.
Thank you.
Since you showed a cloud-config template I'm assuming here that you're preparing a user_data for an AMI that runs cloud-init on boot. That means this is perhaps more of a cloud-init question than a Terraform question, but I understand that you also want to know how to translate the cloud-init-specific answer into a workable Terraform configuration.
The User-data Formats documentation describes various possible ways to format user_data for cloud-init to consume. You mentioned multipart MIME in your question and indeed that could be a viable answer here if you want cloud-init to interpret the two payloads separately, rather than as a single artifact. The cloud-init docs talk about the tool make-mime, but the Terraform equivalent of that is the cloudinit_config data source belonging to the hashicorp/cloudinit provider:
variable "extra_cloudinit" {
type = object({
content_type = string
content = string
})
# This makes the variable optional to set,
# and var.extra_cloudinit will be null if not set.
default = null
}
data "cloudinit_config" "user_data" {
# set "count" to be whatever your aws_instance count is set to
count = ...
part {
content_type = "text/cloud-config"
content = templatefile(
"${path.module}/user_data.tp",
{
hostname = upper("${local.prefix}${count.index + 1}")
domain = local.domain
}
)
}
dynamic "part" {
# If var.extra_cloud_init is null then this
# will produce a zero-element list, or otherwise
# it'll produce a one-element list.
for_each = var.extra_cloudinit[*]
content {
content_type = part.value.content_type
content = part.value.content
# NOTE: should probably also set merge_type
# here to tell cloud-init how to merge these
# two:
# https://cloudinit.readthedocs.io/en/latest/topics/merging.html
}
}
}
resource "aws_instance" "example" {
count = length(data.cloudinit_config.user_data)
# ...
user_data = data.cloudinit_config.user_data[count.index].rendered
}
If you expect that the extra cloud-init configuration will always come in the form of extra cloud-config YAML values then an alternative approach would be to merge the two data structures together within Terraform and then yamlencode the merged result:
variable "extra_cloudinit" {
type = any
# This makes the variable optional to set,
# and var.extra_cloudinit will be null if not set.
default = {}
validation {
condition = can(merge(var.extra_cloudinit, {}))
error_message = "Must be an object to merge with the built-in cloud-init settings."
}
}
locals {
cloudinit_config = merge(
var.extra_cloudinit,
{
repo_update = true
repo_upgrade = "all"
# etc, etc
},
)
}
resource "aws_instance" "example" {
count = length(data.cloudinit_config.user_data)
# ...
user_data = <<EOT
#!cloud-config
${yamlencode(local.cloudinit_config)}
EOT
}
A disadvantage of this approach is that Terraform's merge function is always a shallow merge only, whereas cloud-init itself has various other merging options. However, an advantage is that the resulting single YAML document will generally be simpler than a multipart MIME payload and thus probably easier to review for correctness in the terraform plan output.

Terraform empty and non-empty block map variables

I want to make backend-service by using Terraform. I use resource_type google_compute_backend_service
Now, i have 2 backend-services created by gcloud command. The one is using cdn_policy block and another one doesn't use cdn_policy.
The first backend-services tfstate is like
...
"cdn_policy": [
{
"cache_key_policy": [],
"signed_url_cache_max_age_sec": 3600
}
]
...
And the second backend-services is like
"cdn_policy": []
How to create the terraform script works for both of them ? So, terraform script can run for backend-services who has cdn_policy include with its block map and can also run for backend-services without cdn_policy.
In my idea, i can create 2 terraform scripts. First for cdn_policy and second without cdn_policy. But, i think this is not best practice.
If i put cdn_policy = [], it would result error An argument named "cdn_policy" is not expected here
You can use dynamic blocks to create a set of blocks based on a list of objects in an input variable: Dynamic Blocks
resource "google_compute_backend_service" "service" {
...
dynamic "cdn_policy" {
for_each = var.cdn_policy
content {
cache_key_policy = cdn_policy.value.cache_key_policy
signed_url_cache_max_age_sec = cdn_policy.value.signed_url_cache_max_age_sec
}
}
}

File not found when trying to use etag

I am trying to use etag when i update my bucket S3, but i get this error:
Error: Error in function call
on config.tf line 48, in resource "aws_s3_bucket_object" "bucket_app":
48: etag = filemd5("${path.module}/${var.env}/app-config.json")
|----------------
| path.module is "."
| var.env is "develop"
Call to function "filemd5" failed: no file exists at develop/app-config.json.
However, this works fine:
resource "aws_s3_bucket_object" "bucket_app" {
bucket = "${var.app}-${var.env}-app-assets"
key = "config.json"
source = "${path.module}/${var.env}/app-config.json"
// etag = filemd5("${path.module}/${var.env}/app-config.json")
depends_on = [
local_file.app_config_json
]
}
I am genereting the file this way:
resource "local_file" "app_config_json" {
content = local.app_config_json
filename = "${path.module}/${var.env}/app-config.json"
}
I really don't get what i am doing wrong...
If you happen to arrive here and are using an archive_file Data Source, there is an exported attribute called output_md5. This seems to provide the same results that you would get from filemd5(data.archive_file.app_config_json.output_path).
Here is a full example:
data archive_file config {
type = "zip"
output_path = "${path.module}/config.zip"
source {
filename = "config/template-configuration.json"
content = "some content"
}
}
resource aws_s3_bucket_object config{
bucket = aws_s3_bucket.stacks.bucket
key = "config.zip"
content_type = "application/zip"
source = data.archive_file.config.output_path
etag = data.archive_file.config.output_md5
}
All functions in Terraform run during the initial configuration processing, not during the graph walk. For all of the functions that read files on disk, that means that the files must be present on disk prior to running Terraform as part of the configuration itself -- usually, checked in to version control -- rather than being generated dynamically during the Terraform operation.
The documentation for [file], which filemd5 builds on, has the following to say about it:
This function can be used only with files that already exist on disk at the beginning of a Terraform run. Functions do not participate in the dependency graph, so this function cannot be used with files that are generated dynamically during a Terraform operation. We do not recommend using dynamic local files in Terraform configurations, but in rare situations where this is necessary you can use the local_file data source to read files while respecting resource dependencies.
As the documentation there suggests, the local_file data source provides a way to read a file into memory as a resource during the graph walk, although its result would still need to be passed to md5 to get the result you needed here.
Because you're creating the file with a local_file resource anyway, you can skip the need for the additional data resource and derive the MD5 hash directly from your local_file.app_config_json resource:
resource "aws_s3_bucket_object" "bucket_app" {
bucket = "${var.app}-${var.env}-app-assets"
key = "config.json"
source = local_file.app_config_json.filename
etag = md5(local_file.app_config_json.content)
}
Note that we don't need to use depends_on if we derive the configuration from attributes of the local_file.app_config_json resource, because Terraform can therefore already see that the dependency relationship exists.

Resources