Terraform forces AKS node pool replacement without any changes - azure

I have the following resource definition for additional node pools in my k8s cluster:
resource "azurerm_kubernetes_cluster_node_pool" "extra" {
for_each = var.node_pools
kubernetes_cluster_id = azurerm_kubernetes_cluster.k8s.id
name = each.key
vm_size = each.value["vm_size"]
node_count = each.value["count"]
node_labels = each.value["labels"]
vnet_subnet_id = var.subnet.id
}
Here is the output from terraform plan:
Note: Objects have changed outside of Terraform
Terraform detected the following changes made outside of Terraform since the last "terraform apply":
# module.aks.azurerm_kubernetes_cluster_node_pool.extra["general"] has been changed
~ resource "azurerm_kubernetes_cluster_node_pool" "extra" {
+ availability_zones = []
id = "/subscriptions/3913c9fe-c571-4af9-bc9a-533202d41061/resourcegroups/amic-resources/providers/Microsoft.ContainerService/managedClusters/amic-k8s-01/agentPools/general"
name = "general"
+ node_taints = []
+ tags = {}
# (18 unchanged attributes hidden)
}
Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to undo or respond to these changes.
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.aks.azurerm_kubernetes_cluster_node_pool.extra["general"] must be replaced
-/+ resource "azurerm_kubernetes_cluster_node_pool" "extra" {
- availability_zones = [] -> null
- enable_auto_scaling = false -> null
- enable_host_encryption = false -> null
- enable_node_public_ip = false -> null
~ id = "/subscriptions/3913c9fe-c571-4af9-bc9a-533202d41061/resourcegroups/amic-resources/providers/Microsoft.ContainerService/managedClusters/amic-k8s-01/agentPools/general" -> (known after apply)
~ kubernetes_cluster_id = "/subscriptions/3913c9fe-c571-4af9-bc9a-533202d41061/resourcegroups/amic-resources/providers/Microsoft.ContainerService/managedClusters/amic-k8s-01" -> "/subscriptions/3913c9fe-c571-4af9-bc9a-533202d41061/resourceGroups/amic-resources/providers/Microsoft.ContainerService/managedClusters/amic-k8s-01" # forces replacement
- max_count = 0 -> null
~ max_pods = 30 -> (known after apply)
- min_count = 0 -> null
name = "general"
- node_taints = [] -> null
~ orchestrator_version = "1.20.7" -> (known after apply)
~ os_disk_size_gb = 128 -> (known after apply)
- tags = {} -> null
# (9 unchanged attributes hidden)
}
Plan: 1 to add, 0 to change, 1 to destroy.
As you can see, terraform tries to force replacement of my node pool because of a change in kubernetes_cluster_id, even though there is actually no change at all on this value.
I've been able to work around this by ignoring kubernetes_cluster_id changes in the lifecycle block, but I am still puzzled as to why terraform detects a change there.
So why does Terraform find a change in this case while there are none?

I'm not proud, but I managed to work my way around this bug by using the "replace" terraform string function.
resource "azurerm_kubernetes_cluster_node_pool" "extra" {
[...]
# Use this once the bug gets fixed in the provider, and delete the workaround.
# kubernetes_cluster_id = azurerm_kubernetes_cluster.k8s.id
kubernetes_cluster_id = replace(azurerm_kubernetes_cluster.k8s.id, "resourceGroups", "resourcegroups")
[...]
}
NOTE: I'm not replacing /resourceGroups/ with /resourcegroups/ because the replace function will default to consider this is a regex replacement, which might end up duplicating your forward slashes. (I didn't test this myself)

I have fixed this weird bug by introducing lifecycle block as follows:
resource "azurerm_kubernetes_cluster_node_pool" "my-node-pool" {
name = "mynodepool"
kubernetes_cluster_id = azurerm_kubernetes_cluster.aks.id
...
lifecycle {
ignore_changes = [
kubernetes_cluster_id
]
}
}
Not the cleanest way, but it works. Cluster Id should not be changed unless you recreate whole AKS Cluster, so it should be safe.

Related

How to generate multiple names / use results output?

When using azurecaf to generate multiple names like in the following code, how do I use the results output?
resource "azurecaf_name" "names" {
name = var.appname
resource_type = "azurerm_resource_group"
resource_types = ["azurerm_mssql_database"]
prefixes = [var.environment]
suffixes = [var.resource_group_location_short]
random_length = 5
clean_input = false
}
results - The generated name for the Azure resources based in the resource_types list
How to use this? Also, can I somehow debug / print out what results looks like? (I don't know if it is an array, a key-value structure etc)
You can view the results in two common ways. It is applicable to all attributes of the resource.
[1] Exporting the attribute required as a terraform output.
When you add any attribute as an output in your code by default terraform will show you the values with terraform apply.
In your used case.
output "caf_name_result" {
value = azurecaf_name.names.result
}
output "caf_name_results" {
value = azurecaf_name.names.results
}
Apply the config with the above outputs definitions you will have the below output on your terminal.
Changes to Outputs:
+ caf_name_result = (known after apply)
+ caf_name_results = (known after apply)
azurecaf_name.names: Creating...
azurecaf_name.names: Creation complete after 0s [id=YXp1cmVybV9yZXNvdXJjZV9ncm91cAlkZXYtcmctc3RhY2tvdmVyZmxvdy15b2RncC13ZXUKYXp1cmVybV9tc3NxbF9kYXRhYmFzZQlkZXYtc3FsZGItc3RhY2tvdmVyZmxvdy15b2RncC13ZXU=]
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
Outputs:
caf_name_result = "dev-rg-stackoverflow-yodgp-weu"
caf_name_results = tomap({
"azurerm_mssql_database" = "dev-sqldb-stackoverflow-yodgp-weu"
"azurerm_resource_group" = "dev-rg-stackoverflow-yodgp-weu"
})
After the first successful terraform apply you can view them anytime when you want by using the terraform output command.
$ terraform output
caf_name_result = "dev-rg-stackoverflow-yodgp-weu"
caf_name_results = tomap({
"azurerm_mssql_database" = "dev-sqldb-stackoverflow-yodgp-weu"
"azurerm_resource_group" = "dev-rg-stackoverflow-yodgp-weu"
})
You can be very specific also to check only particular output value.
$ terraform output caf_name_results
tomap({
"azurerm_mssql_database" = "dev-sqldb-stackoverflow-yodgp-weu"
"azurerm_resource_group" = "dev-rg-stackoverflow-yodgp-weu"
})
[2] View your applied resources via Terraform State Commands
This is only available after the resources are applied and only in cases when terraform execution was done from the same machine where this command is running. (in simple the identity doing terraform execution satisfies all the authentication, authorization and network connectivity conditions. )
It is not recommended, just to share another option available when requiring a quick look on the resources applied.
$ terraform state list
azurecaf_name.names
$ terraform state show azurecaf_name.names
# azurecaf_name.names:
resource "azurecaf_name" "names" {
clean_input = false
id = "YXp1cmVybV9yZXNvdXJjZV9ncm91cAlkZXYtcmctc3RhY2tvdmVyZmxvdy15b2RncC13ZXUKYXp1cmVybV9tc3NxbF9kYXRhYmFzZQlkZXYtc3FsZGItc3RhY2tvdmVyZmxvdy15b2RncC13ZXU="
name = "stackoverflow"
passthrough = false
prefixes = [
"dev",
]
random_length = 5
random_seed = 1676730686950185
random_string = "yodgp"
resource_type = "azurerm_resource_group"
resource_types = [
"azurerm_mssql_database",
]
result = "dev-rg-stackoverflow-yodgp-weu"
results = {
"azurerm_mssql_database" = "dev-sqldb-stackoverflow-yodgp-weu"
"azurerm_resource_group" = "dev-rg-stackoverflow-yodgp-weu"
}
separator = "-"
suffixes = [
"weu",
]
use_slug = true
}
[1]
[2]

The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many instances will be created

I want to exempt certain policies for an Azure VM. I have the following terraform code to exempt the policies.
It uses locals to identify the scope on which policies should be exempt.
locals {
exemption_scope = try({
mg = length(regexall("(\\/managementGroups\\/)", var.scope)) > 0 ? 1 : 0,
sub = length(split("/", var.scope)) == 3 ? 1 : 0,
rg = length(regexall("(\\/managementGroups\\/)", var.scope)) < 1 ? length(split("/", var.scope)) == 5 ? 1 : 0 : 0,
resource = length(split("/", var.scope)) >= 6 ? 1 : 0,
})
expires_on = var.expires_on != null ? "${var.expires_on}T23:00:00Z" : null
metadata = var.metadata != null ? jsonencode(var.metadata) : null
# generate reference Ids when unknown, assumes the set was created with the initiative module
policy_definition_reference_ids = length(var.member_definition_names) > 0 ? [for name in var.member_definition_names :
replace(substr(title(replace(name, "/-|_|\\s/", " ")), 0, 64), "/\\s/", "")
] : var.policy_definition_reference_ids
exemption_id = try(
azurerm_management_group_policy_exemption.management_group_exemption[0].id,
azurerm_subscription_policy_exemption.subscription_exemption[0].id,
azurerm_resource_group_policy_exemption.resource_group_exemption[0].id,
azurerm_resource_policy_exemption.resource_exemption[0].id,
"")
}
and the above local is used like mentioned below
resource "azurerm_management_group_policy_exemption" "management_group_exemption" {
count = local.exemption_scope.mg
name = var.name
display_name = var.display_name
description = var.description
management_group_id = var.scope
policy_assignment_id = var.policy_assignment_id
exemption_category = var.exemption_category
expires_on = local.expires_on
policy_definition_reference_ids = local.policy_definition_reference_ids
metadata = local.metadata
}
Both the locals and azurerm_management_group_policy_exemption are part of the same module file. And Policy exemption is applied like mentioned below
module exemption_jumpbox_sql_vulnerability_assessment {
count = var.enable_jumpbox == true ? 1 : 0
source = "../policy_exemption"
name = "Exemption - SQL servers on machines should have vulnerability"
display_name = "Exemption - SQL servers on machines should have vulnerability"
description = "Not required for Jumpbox"
scope = module.create_jumbox_vm[0].virtual_machine_id
policy_assignment_id = module.security_center.azurerm_subscription_policy_assignment_id
policy_definition_reference_ids = var.exemption_policy_definition_ids
exemption_category = "Waiver"
depends_on = [module.create_jumbox_vm,module.security_center]
}
It works for an existing Azure VM. However it throws the following error while trying to provision the Azure VM and apply the policy exemption on this Azure VM.
Ideally, module.exemption_jumpbox_sql_vulnerability_assessment should get executed only after [module.create_jumbox_vm as it is defined as a dependent. But not sure why it is throwing the error
│ The "count" value depends on resource attributes that cannot be determined
│ until apply, so Terraform cannot predict how many instances will be
│ created. To work around this, use the -target argument to first apply only
│ the resources that the count depends on.
I tried to reproduce the scenario in my environment.
resource "azurerm_management_group_policy_exemption" "management_group_exemption" {
count = local.exemption_scope.mg
name = var.name
display_name = var.display_name
description = var.description
management_group_id = var.scope
policy_assignment_id = var.policy_assignment_id
exemption_category = var.exemption_category
expires_on = local.expires_on
policy_definition_reference_ids = local.policy_definition_reference_ids
metadata = local.metadata
}
locals {
exemption_scope = try({
...
})
Received the same error:
The "count" value depends on resource attributes that cannot be determined
│ until apply, so Terraform cannot predict how many instances will be
│ created. To work around this, use the -target argument to first apply only
│ the resources that the count depends on.
Referring to local values , the values will be known on the apply time only, and not during the apply time .So if it is not dependent on other sources , it will expmpt policies but it is dependent on the VM which may be still in process of creation.
So target only the resource that is dependent on first ,as only when vm is created is when the exemption policy can be assigned to that vm.
Check count:using-expressions-in-count | Terraform | HashiCorp Developer
Also note that while using terraform count argument with Azure Virtual Machines ,NIC resource also to be created for each Virtual Machine resource.
resource "azurerm_network_interface" "nic" {
count = var.vm_count
name = "${var.vm_name_pfx}-${count.index}-nic"
location = data.azurerm_resource_group.example.location
resource_group_name = data.azurerm_resource_group.example.name
//tags = var.tags
ip_configuration {
name = "internal"
subnet_id = azurerm_subnet.internal.id
private_ip_address_allocation = "Dynamic"
}
}
Reference: terraform-azurerm-policy-exemptions/examples/count at main · AnsumanBal-MT/terraform-azurerm-policy-exemptions · GitHub

kubectl_manifest yaml_body keeps changing

kubectl_manifest keeps changing the yaml_body when applying terraform apply.
For this use case, I can not use kubernetes_manifest.
Do you have any comments on the cause and suggestions for resolving this issue?
# module.mymodule.kubectl_manifest.someresource will be updated in-place
~ resource "kubectl_manifest" "someresource " {
id = ""
name = ""
~ yaml_body = (sensitive value)
# (14 unchanged attributes hidden)
}
Plan: 0 to add, 33 to change, 0 to destroy.
Terraform v1.3.1
kubectl = {
source = "gavinbunney/kubectl"
version = ">= 1.14.0"
}

Is it possible to remove a terraform resource from an index without rebuilding

I have existing terraform infrastructure that has been created by someone before me and they had the instance created using a count, we want to remove the count but terraform is wanting to rebuild the instance.
Is there a way I can remove the count without having to rebuild the instance?
I was wondering if something such as terraform state mv would be able to achieve this or if something else was possible.
Thanks in advance!
UPDATE: Example shown below
# module.compute.aws_kms_alias.kms-alias will be created
+ resource "aws_kms_alias" "kms-alias" {
+ arn = (known after apply)
+ id = (known after apply)
+ name = "alias/kms-eks"
+ name_prefix = (known after apply)
+ target_key_arn = (known after apply)
+ target_key_id = (known after apply)
}
# module.compute.aws_kms_alias.kms-alias[1] will be destroyed
- resource "aws_kms_alias" "kms-alias" {
- arn = "arn:aws:kms:::alias/kms-eks" -> null
- id = "alias/kms-eks" -> null
- name = "alias/kms-eks" -> null
- target_key_arn = "" -> null
- target_key_id = "" -> null
}
You can modify the state to rename the resource in the state to match what is in your config. Since the count meta-argument contains characters that are special, the resource in the command needs to be cast as a literal string in the shell interpreter:
terraform state mv 'module.compute.aws_kms_alias.kms-alias[1]' module.compute.aws_kms_alias.kms-alias

Terraform : How to loop over aws_instance N times as defined within object

I have the following variable
variable "instance_types" {
default = {
instances : [
{
count = 1
name = "control-plane"
ami = "ami-xxxxx"
instance_type = "t2.large"
iam_instance_profile = "xxx-user"
subnet_id = "subnet-xxxxx"
},
{
count = 3
name = "worker"
ami = "ami-xxxxx"
instance_type = "t2.large"
iam_instance_profile = "xxx-user"
subnet_id = "subnet-xxxxx"
}
]
}
}
With the following instance declaration (that I'm attempting to iterate)
resource "aws_instance" "k8s-node" {
# Problem here : How to turn an array of 2 objects into 4 (1 control_plane, 3 workers)
for_each = {for x in var.instance_types.instances: x.count => x}
ami = lookup(each.value, "ami")
instance_type = lookup(each.value, "instance_type")
iam_instance_profile = lookup(each.value, "iam_instance_profile")
subnet_id = lookup(each.value, "subnet_id")
tags = {
Name = lookup(each.value, "name")
Type = each.key
}
}
Goal: Get the aws_instance to iterate 4 times (1 control_plane + 3 workers) and populate the values the index of instance_types.
Problem : Cannot iterate the over the object array correctly with desired result. In a typical programming language this would be achieved in a double for loop.
This can be solved easier with a data type of map(object)) for your input variable. The transformed data structure appears like:
variable "instance_types" {
...
default = {
"control-plane" = {
count = 1
ami = "ami-xxxxx"
instance_type = "t2.large"
iam_instance_profile = "xxx-user"
subnet_id = "subnet-xxxxx"
},
"worker" = {
count = 3
ami = "ami-xxxxx"
instance_type = "t2.large"
iam_instance_profile = "xxx-user"
subnet_id = "subnet-xxxxx"
}
}
}
Note the name key in the object is subsumed into the map key for efficiency and cleanliness.
If the resources are split between the control plane and worker nodes, then we are finished and can immediately leverage this variable's value in a for_each meta-argument. However, combining the resources now requires a data transformation:
locals {
instance_types = flatten([ # need this for final structure type
for instance_key, instance in var.instance_types : [ # iterate over variable input objects
for type_count in range(1, instance.count + 1) : { # sub-iterate over objects by "count" value specified; use range function and begin at 1 for human readability
new_key = "${instance_key} ${type_count}" # for resource uniqueness
type = instance_key # for easier tag value later
ami = instance.ami # this and below retained from variable inputs
instance_type = instance.instance_type
iam_instance_profile = instance.iam_instance_profile
subnet_id = instance.subnet_id
}
]
])
}
Now we can iterate within the resource with the for_each meta-argument, and utilize the for expression to reconstruct the input for suitable usage within the resource.
resource "aws_instance" "k8s-node" {
# local.instance_types is a list of objects, and we need a map of objects with unique resource keys
for_each = { for instance_type in local.instance_types : instance_type.new_key => instance_type }
ami = each.value.ami
instance_type = each.value.instance_type
iam_instance_profile = each.value.iam_instance_profile
subnet_id = each.value.subnet_id
tags = {
Name = each.key
Type = each.value.type
}
}
This will give you the behavior you desire, and you can modify it for style preferences or different uses as the need arises.
Note the lookup functions are removed since they are only useful when default values are specified as a third argument, and that is not possible in object types within variable declarations except as an experimental feature in 0.14.
The absolute namespace for these resources' exported resource attributes would be:
(module?.<declared_module_name>?.)<resource_type>.<resource_name>[<resource_key>].<attribute>
For example, given an intra-module resource, first worker node, and private ip address exported attribute:
aws_instance.k8s-node["worker 1"].private_ip
Note you can also access all resources' exported attributes by terminating the namespace at <resource_name> (retaining the map of all resources instead of accessing a singular resource value). Then you could also use a for expression in an output declaration to create a custom aggregate output for all of the similar resources and their identical exported attribute(s).
{ for node_key, node in aws_instance.k8s-node : node_key => node.private_ip }

Resources