Terraform modules: correct references of variables? - terraform

I'm writing a terraform script to create an EKS cluster with its worker nodes on AWS. First time doing it so I'm a bit confused.
Here is the folder organisation:
├─── Int AWS Account
│ ├─── variables.tf
│ ├─── eks-cluster.tf (refers the modules)
│ ├─── others
│
├─── Prod AWS Account
│ ├─── (will be the same than Int with different settings in variables)
│
├─── ReadMe.md
│
├─── data sources
│
├─── Modules
│ ├─── cluster.tf
│ ├─── worker-nodes.tf
│ ├─── worker-nodes-sg.tf
I am a bit confused regarding how to use and pass variables. Right now, what I'm doing is that I refer to ${var.name} in the module folder, in the eks-cluster.tf, I either put a direct value name = blabla (mostly avoiding it), or refer to the variable again and have a variable file in the account folder.
Is that correct?

I'm not sure if I get your question correctly but in general you would want to keep your module files with variables only, as modules are intended to be generic so you can easily include them in different environments.
When including the module in eks_cluster_int.tf or eks_cluster_prod.tf you would then pass the values for all variables defined in the module itself. This way you can use the environment specific values in the same module.
module "cluster" {
source = "..."
var1 = value1 # directly passing value
var2 = ${var.int_specific_var} # can be defined in variables.tf of environment
...
}
Does this answer your question?

Related

Terraform not declaring tfvars

I am new to Terraform and I am writing a script. Following is my directory structure
folder
---.terraform
---..terraform.lock.hcl
---main.tf
---terraform.tfvars
---variables.tf
Following is my content on terraform.tfvars.
environment = "development"
Following is my content on main.tf.
tags = {
environment = var.environment
}
But the values are not updating. Following is the error:
╷
│ Warning: Value for undeclared variable
│
│ The root module does not declare a variable named "environment" but a value was found in file "terraform.tfvars". If you meant to use this value, add a "variable" block to the configuration.
│
│ To silence these warnings, use TF_VAR_... environment variables to provide certain "global" settings to all configurations in your organization. To reduce the verbosity of these warnings, use the
│ -compact-warnings option.
╵
╷
│ Warning: Value for undeclared variable
│
│ The root module does not declare a variable named "admin_username" but a value was found in file "terraform.tfvars". If you meant to use this value, add a "variable" block to the configuration.
│
│ To silence these warnings, use TF_VAR_... environment variables to provide certain "global" settings to all configurations in your organization. To reduce the verbosity of these warnings, use the
│ -compact-warnings option.
╵
╷
│ Warning: Values for undeclared variables
│
│ In addition to the other similar warnings shown, 1 other variable(s) defined without being declared.
╵
╷
│ Error: Reference to undeclared input variable
│
│ on main.tf line 22, in resource "azurerm_resource_group" "tf_example_rg":
│ 22: environment = var.environment
│
│ An input variable with the name "environment" has not been declared. This variable can be declared with a variable "environment" {} block.
As I am using terraform.tfvars I don't need to give the filename on CLI. I think I am doing everything right but it's yet not working.
You have to actually declare your variable using variable block. For example:
variable "environment" {}
If you have such declarations, you have to double check the spelling and locations of them.
#AunZaidi , As stated in the error messages terraform can not find the defined variables.
The root module does not declare a variable named "environment" but a value was found in file "terraform.tfvars". If you meant to use this value, add a "variable" block to the configuration.
I would recommend you to take a look at the terraform-azure-tutorials to get acquainted with the basics.
you can solve your issue by just defining a new variable using syntax
variable "environment" {
type = string
description = "(optional) Environment for the deployment"
}
Refer to https://developer.hashicorp.com/terraform/language/values/variables#arguments for definitions of the arguments used in terraform variables.
Also one of the recommended practices is to use a dedicated file variables.tf for all the variables inputs required in your terraform code.

Is "root" the part of "directory"?

Analyzing the path, the Node.js considering the root as the part of the directory:
/home/user/dir/file.txt
┌─────────────────────┬────────────┐
│ dir │ base │
├──────┬ ├──────┬─────┤
│ root │ │ name │ ext │
" / home/user/dir / file .txt "
└──────┴──────────────┴──────┴─────┘
C:\\path\\dir\\file.txt
┌─────────────────────┬────────────┐
│ dir │ base │
├──────┬ ├──────┬─────┤
│ root │ │ name │ ext │
" C:\ path\dir \ file .txt "
└──────┴──────────────┴──────┴─────┘
Is it actually so? Developing a new library, I am thinking must I consider the root as the part of directory, or no.
Well, actually the "directory" is a fuzzy term. According the definition,
In computing, a directory is a file system cataloging structure which
contains references to other computer files, and possibly other
directories.
Wikipedia
Nothing that answers on my question. What else we know?
When we are using the cd (the abbreviation of "change directory") command, we are specifying the path relative to current location. Nothing related with root.
The cd command works inside the specific drive (at least, on Windows). This indirectly could means that the root and directory could be combined but initially separated.
More exact terms are the "absolute path of the directory" and "relative path of the directory". But what is the "directory" itself?
For the Windows case, the data storage name could be different on separate computers but it does not affect to files structure inside the storage. Again, the root and directory are separate in this case.

Spark glob filter to match a specific nested partition

I'm using Pyspark, but I guess this is valid to scala as well
My data is stored on s3 in the following structure
 main_folder
└──  year=2022
└──  month=03
├──  day=01
│ ├──  valid=false
│ │ └──  example1.parquet
│ └──  valid=true
│ └──  example2.parquet
└──  day=02
├──  valid=false
│ └──  example3.parquet
└──  valid=true
└──  example4.parquet
(For simplicity there is only one file in any folder, and only two days, in reality, there can be thousands of files and many days/months/years)
The files that are under the valid=true and valid=false partitions have a completely different schema, and I only want to read the files in the valid=true partition
I tried using the glob filter, but it fails with AnalysisException: Unable to infer schema for Parquet. It must be specified manually. which is a symptom of having no data (so no files matched)
spark.read.parquet('s3://main_folder', pathGlobFilter='*valid=true*)
I noticed that something like this works
spark.read.parquet('s3://main_folder', pathGlobFilter='*example4*)
however, as soon as I try to use a slash or do something above the bottom level it fails.
spark.read.parquet('s3://main_folder', pathGlobFilter='*/example4*)
spark.read.parquet('s3://main_folder', pathGlobFilter='*valid=true*example4*)
I did try to replace the * with ** in all locations, but it didn't work
pathGlobFilter seems to work only for the ending filename, but for subdirectories you can try below, however it may ignore partition discovery. To consider partition discovery add basePath property in load option
spark.read.format("parquet")\
.option("basePath","s3://main_folder")\
.load("s3://main_folder/*/*/*/valid=true/*")
However I am not sure if you can combine both wildcarding and pathGlobFilter if you want to match based on both subdirectories and end filenames.
Reference:
https://simplernerd.com/java-spark-read-multiple-files-with-glob/
https://spark.apache.org/docs/latest/sql-data-sources-parquet.html

Terraform - Variable defined in "*.auto.tfvars" file, but still cannot be discovered

I have the following directory Structure:
.
├── ./first_terraform.tf
├── ./modules
│   └── ./modules/ssh_keys
│   ├── ./modules/ssh_keys/main.tf
│   ├── ./modules/ssh_keys/outputs.tf
│   └── ./modules/ssh_keys/variables.tf
├── ./terraform.auto.tfvars
├── ./variables.tf
I am trying to pass a variable ssh_key to my child module defined as main.tf inside ./modules/ssh_keys/main.tf
resource "aws_key_pair" "id_rsa_ec2" {
key_name = "id_rsa_ec2"
public_key = file(var.ssh_key)
}
I also have this variable defined both at root and child level variables.tf file. For the value, I have set it in terraform.auto.tfvars as below
# SSH Key
ssh_key = "~/.ssh/haha_key.pub"
I also have a variable defined in root level and child level variables.tf file:
variable "ssh_key" {
type = string
description = "ssh key for EC2 login and checks"
}
My root terraform configuration has this module declared as:
module "ssh_keys" {
source = "./modules/ssh_keys"
}
I first did a terraform init -upgrade on my root level. Then ran terraform refresh and got hit by the following error.
Error: Missing required argument
on first_terraform.tf line 69, in module "ssh_keys":
69: module "ssh_keys" {
The argument "ssh_key" is required, but no definition was found.
Just for reference, line 69 in my root level configuration is where the module declaration has been made. I don't know what I have done wrong here. It seems I have all the variables declared, so am I missing some relationship between root/child module variable passing etc.?
Any help is appreciated! Thanks
I Think I know what I did wrong.
Terraform Modules - as per the documentation requires parents to pass on variables as part of invocation. For example:
module "foo" {
source = "./modules/foo"
var1 = value
var2 = value
}
The above var1, var2 can come from either auto.tfvars file, environment variables (recommended) or even command line -var-file calls. In fact, this is what Terraform calls "Calling a Child Module" here
Once I did that, everything worked like a charm! I hope I did find the correct way of doing things.

Using terragrunt generate provider block causes conflicts with require providers block in module

I'm using Terragrunt with Terraform version 0.14.8.
My project uses mono repo structure as it is a project requirement to package Terragrunt files and Terraform modules together in a single package.
Folder structure:
project root:
├── environments
│   └── prd
│   ├── rds-cluster
│   │   └── terragrunt.hcl
│   └── terragrunt.hcl
└── modules
├── rds-cluster
│   ├── README.md
│   ├── main.tf
│   ├── output.tf
│   └── variables.tf
└── secretsmanager-secret
├── README.md
├── main.tf
├── output.tf
└── variables.tf
In prd/terragrunt.hcl I define the remote state block and the generate provider block.
remote_state {
backend = "s3"
...
}
generate "provider" {
path = "provider.tf"
if_exists = "overwrite_terragrunt"
contents = <<EOF
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 3.0"
}
}
}
provider "aws" {
region = "ca-central-1"
}
EOF
}
In environments/prd/rds-cluster/terragrunt.hcl, I defined the following:
include {
path = find_in_parent_folders()
}
terraform {
source = "../../../modules//rds-cluster"
}
inputs = {
...
}
In modules/rds-cluster/main.tf, I defined the following:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 3.0"
}
}
}
// RDS related resources...
My problem is that when I try to run terragrunt plan under environments/prd/rds-cluster, I get the following error message:
Error: Duplicate required providers configuration
on provider.tf line 3, in terraform:
3: required_providers {
A module may have only one required providers configuration. The required
providers were previously configured at main.tf:2,3-21.
I can resolve this by declaring the version within the provider block as shown here. However, the version attribute in provider blocks has been deprecated in Terraform 0.13; Terraform recommends the use of the required_providers sub-block under terraform block instead.
Does anyone know what I need to do to use the new required_providers block for my aws provider?
As you've seen, Terraform expects each module to have only one definition of its required providers, which is intended to avoid a situation where it's unclear why Terraform is detecting a particular when the declarations are spread among multiple files.
However, to support this sort of piecemeal code generation use-case Terraform has an advanced feature called Override Files which allows you to explicitly mark certain files for a different mode of processing where they selectively override particular definitions from other files, rather than creating entirely new definitions.
The details of this mechanism depend on which block type you're overriding, but the section on Merging terraform blocks` discusses the behavior relevant to your particular situation:
If the required_providers argument is set, its value is merged on an element-by-element basis, which allows an override block to adjust the constraint for a single provider without affecting the constraints for other providers.
In both the required_version and required_providers settings, each override constraint entirely replaces the constraints for the same component in the original block. If both the base block and the override block both set required_version then the constraints in the base block are entirely ignored.
The practical implication of the above is that if you have an override file with a required_providers block that includes an entry for the AWS provider then Terraform will treat it as a full replacement for any similar entry already present in a non-override file, but it won't affect other provider requirements entries which do not appear in the override file at all.
Putting all of this together, you should be able to get the result you were looking for by asking Terragrunt to name this generated file provider_override.tf instead of just provider.tf, which will then activate the override file processing behavior and thus allow this generated file to override any existing definition of AWS provider requirements, while allowing the configurations to retain any other provider requirements they might also be defining.

Resources