Terraform: Azure VMSS rolling_upgrade does not re-image instances - azure

Having the following VMSS terraform config:
resource "azurerm_linux_virtual_machine_scale_set" "my-vmss" {
...
instances = 2
...
upgrade_mode = "Rolling"
rolling_upgrade_policy {
max_batch_instance_percent = 100
max_unhealthy_instance_percent = 100
max_unhealthy_upgraded_instance_percent = 0
pause_time_between_batches = "PT10M"
}
extension {
name = "my-vmss-app-health-ext"
publisher = "Microsoft.ManagedServices"
type = "ApplicationHealthLinux"
automatic_upgrade_enabled = true
type_handler_version = "1.0"
settings =jsonencode({
protocol = "tcp"
port = 8080
})
...
}
However, whenever a change is applied (e.g., changing custom_data), the VMSS is updated but instances are not reimaged. Only after manual reimage (via UI or Azure CLI) do the instances get updated.
The "terraform plan" is as expected - custom_data change is detected:
# azurerm_linux_virtual_machine_scale_set.my-vmss will be updated in-place
~ resource "azurerm_linux_virtual_machine_scale_set" "my-vmss" {
...
~ custom_data = (sensitive value)
...
Plan: 0 to add, 1 to change, 0 to destroy.
Any idea of how to make Terraform cause the instance reimaging?

It looks like not a terraform issue but a "rolling upgrades" design by Azure. From here (1) it follows that updates to custom_data won't affect existing instances. I.e., until the instance is manually reimaged (e.g., via UI or azure CLI) it won't get the new custom_data (e.g., the new cloud-init script).
In contrast, AWS does refresh instances on custom_data updates. Please let me know if my understanding is incorrect or if you have an idea of how to work around this limitation in Azure.

Related

Configure High Availability conditionally based on variable for `azurerm_postgresql_flexible_server`

I'm configuring my servers with terraform. For non-prod environments, our sku doesn't allow for high availability, but in prod our sku does.
For some reason high_availability.mode only accepts the value of "ZoneRedundant" for high availability, but doesn't accept any other value (according to the documentation). Depending on whether or not var.isProd is true, I want to turn high availability on and off, but how would I do that?
resource "azurerm_postgresql_flexible_server" "default" {
name = "example-${var.env}-postgresql-server"
location = azurerm_resource_group.default.location
resource_group_name = azurerm_resource_group.default.name
version = "14"
administrator_login = "sqladmin"
administrator_password = random_password.postgresql_server.result
geo_redundant_backup_enabled = var.isProd
backup_retention_days = var.isProd ? 60 : 7
storage_mb = 32768
high_availability {
mode = "ZoneRedundant"
}
sku_name = var.isProd ? "B_Standard_B2s" : "B_Standard_B1ms"
}
I believe the default assignment for this resource would be disabled HA, and therefore it is not the argument mode which manages the HA, but rather the existence of the high_availability block. Therefore, you could manage the HA by excluding the block to accept the default "disabled", or including the block to manage the HA as enabled with a value of ZoneRedundant:
dynamic "high_availability" {
for_each = var.isProd ? ["this"] : []
content {
mode = "ZoneRedundant"
}
}
I am hypothesizing somewhat on the API endpoint parameter defaults, so this would need to be acceptance tested with an Azure account. However, the documentation for the Azure Postgres Flexible Server in general claims HA is in fact disabled by default, so this should function as desired.
If you deploy flexible server using azurerm provider in terraform it will accept only ZoneRedundant value for highavailability.mode and also using this provider you can deploy sql server versions [11 12 13] only.
Using Azapi provider in terraform you can use highavailability values with any one of these "Disabled, ZoneRedundant or SameZone"
Based on your requirement I have created the below sample terraform script which has a environment variable accepts only prod or non-prod values using this value further flexible server will get deployed with respective properties.
If the environment value is prod script will deploy flexible server with high availability as zone redundant and also backup retention with 35 days with geo redundant backup enabled.
If the environment value is non-prod script will deploy flexible server with high availability as disabled and also backup retention with 7 days with geo redundant backup disabled
Here is the Terraform Script:
terraform {
required_providers {
azapi = {
source = "azure/azapi"
}
}
}
provider "azapi" {
}
variable "environment" {
type = string
validation {
condition = anytrue([var.environment == "prod",var.environment=="non-prod"])
error_message = "you havent defined any of allowed values"
}
}
resource "azapi_resource" "rg" {
type = "Microsoft.Resources/resourceGroups#2021-04-01"
name = "teststackhub"
location = "eastus"
parent_id = "/subscriptions/<subscriptionId>"
}
resource "azapi_resource" "test" {
type = "Microsoft.DBforPostgreSQL/flexibleServers#2022-01-20-preview"
name = "example-${var.environment}-postgresql-server"
location = azapi_resource.rg.location
parent_id = azapi_resource.rg.id
body = jsonencode({
properties= {
administratorLogin="azureuser"
administratorLoginPassword="<password>"
backup = {
backupRetentionDays = var.environment=="prod"?35:7
geoRedundantBackup = var.environment=="prod"?"Enabled":"Disabled"
}
storage={
storageSizeGB=32
}
highAvailability={
mode= var.environment=="prod"?"ZoneRedundant":"Disabled"
}
version = "14"
}
sku={
name = var.environment=="prod" ? "Standard_B2s" : "Standard_B1ms"
tier = "GeneralPurpose"
}
})
}
NOTE: The above terraform sample script is for your reference please do make the changes based on your business requirement.

Terraform does not remember several ElasticBeanstalk settings

I have an environment created with the resource aws_elastic_beanstalk_environment. Unfortunately, Terraform shows me with each plan and apply that several settings have to be added, including the VPCId.
I got the settings using the AWS CLI describe-configuration-settings and they match what I specified, but Terraform says the settings need to be re-added each time.
I have tried both this statement
setting {
name = "VPCId"
namespace = "aws:ec2:vpc"
value = var.vpc_id
resource = "AWSEBSecurityGroup"
}
and this one.
setting {
name = "VPCId"
namespace = "aws:ec2:vpc"
value = var.vpc_id
resource = ""
}
unfortunately without success, does anyone have an idea?
I am using Terraform version 0.14.11 and the AWS Provider in version 3.74.3

How do I implement a retry pattern with Terraform?

My use case: I need to create an AKS cluster with Terraform azurerm provider, and then set up a Network Watcher flow log for its NSG.
Note that as many other AKS resources, the corresponding NSG is not controlled by Terraform. Instead, it's created by Azure indirectly (and asynchronously), so I treat it as data, not resource.
Also note that Azure will create and use its own NSG even if the AKS is created with a customary created VNet.
Depending on the particular region and the Azure API gateway, my team has seen up to 40 minute delay between having the AKS created and then the NSG resource visible in the node pool resource group.
If I don't want my Terraform config to fail, I see 3 options:
Run a CLI script that waits for the NSG, make it a null_resource and depend on it
Implement the same with a custom provider
Have a really ugly workaround that implements a retry pattern - below is 10 attempts at 30 seconds each:
data "azurerm_resources" "my_nsg_1" {
resource_group_name = var.clusterNodeResourceGroup
type = "Microsoft.Network/networkSecurityGroups"
}
resource "time_sleep" "my_nsg_sleep1" {
count = length(data.azurerm_resources.my_nsg_1.resources) == 0 ? 1 : 0
create_duration = "30s"
triggers = {
ts = timestamp()
}
}
data "azurerm_resources" "my_nsg_2" {
depends_on = [time_sleep.my_nsg_sleep1]
resource_group_name = var.clusterNodeResourceGroup
type = "Microsoft.Network/networkSecurityGroups"
}
resource "time_sleep" "my_nsg_sleep2" {
count = length(data.azurerm_resources.my_nsg_1.resources) == 0 ? 1 : 0
create_duration = length(data.azurerm_resources.my_nsg_2.resources) == 0 ? "30s" : "0s"
triggers = {
ts = timestamp()
}
}
...
data "azurerm_resources" "my_nsg_11" {
depends_on = [time_sleep.my_nsg_sleep10]
resource_group_name = var.clusterNodeResourceGroup
type = "Microsoft.Network/networkSecurityGroups"
}
// Now azurerm_resources.my_nsg_11 is OK as long as the NSG was created and became visible to the current API Gateway within 5 minutes.
Note that Terraform doesn't allow resource repeating via the use of "for_each" or "count" at more than an individual resource level. In addition, because it resolves dependencies during the static phase, two sets of resource lists created with "count" or "for_each" cannot have dependencies at an individual element level of each other - you can only have one list depend on the other, obviously with no circular dependencies allowed.
E.g. my_nsg[count.index] cannot depend on my_nsg_delay[count.index-1] while my_nsg_delay[count.index] depends on my_nsg[count.index]
Hence this horrible non-DRY antipattern.
Is there a better declarative solution so I don't involve a custom provider or a script?

Terraform Error deleting App Service Plan StatusCode=409 Server farm [asp] cannot be deleted because it has web app(s) [azure-function] assigned to it

I have an Azure Function and an Azure Service Plan that was both created using the following Terraform code:
resource "azurerm_app_service_plan" "asp" {
name = "asp-${var.environment}"
resource_group_name = var.rg_name
location = var.location
kind = "FunctionApp"
reserved = true
sku {
tier = "ElasticPremium"
size = "EP1"
}
}
resource "azurerm_function_app" "function" {
name = "function-${var.environment}"
resource_group_name= var.rg_name
location= var.location
app_service_plan_id= azurerm_app_service_plan.asp.id
storage_connection_string=azurerm_storage_account.storage.primary_connection_string
os_type = "linux"
site_config {
linux_fx_version = "DOCKER|${data.azurerm_container_registry.acr.login_server}/${var.image_name}:latest"
}
identity {
type = "SystemAssigned"
}
app_settings = {
#Lots of variables, but irrelevant for this issue I assume?
}
depends_on = [azurerm_app_service_plan.asp]
version = "~2"
}
resource "azurerm_storage_account" "storage" {
name = "storage${var.environment}"
resource_group_name = var.rg_name
location = var.location
account_tier = "Standard"
account_replication_type = "LRS"
}
The function works fine.
The issue is that any change I now try to do in Terraform ends up in the following error during apply:
2020-08-25T06:31:23.256Z [DEBUG] plugin.terraform-provider-azurerm_v2.24.0_x5: {"Code":"Conflict","Message":"Server farm 'asp-staging' cannot be deleted because it has web app(s) function-staging assigned to it.","Target":null,"Details":[{"Message":"Server farm 'asp-staging' cannot be deleted because it has web app(s) function-staging assigned to it."},{"Code":"Conflict"},{"ErrorEntity":{"ExtendedCode":"11003","MessageTemplate":"Server farm '{0}' cannot be deleted because it has web app(s) {1} assigned to it.","Parameters":["asp-staging","function-staging"],"Code":"Conflict","Message":"Server farm 'asp-staging' cannot be deleted because it has web app(s) function-staging assigned to it."}}],"Innererror":null}
...
Error: Error deleting App Service Plan "asp-staging" (Resource Group "my-resource-group"): web.AppServicePlansClient#Delete: Failure sending request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=<nil> <nil>
I have another service plan with an app service, and have had no problems applying while they are running.
I have even tried removing all references to the function and its service plan and still get the same error.
I am able to delete the Function and its service plan from the portal and then Terraform applies fine once when it create the function and service plan. As long as those are present when Terraform applies it fails.
This workaround of manually deleting the function and service plan is not feasible in the long run, so I hope someone can help me point out the issue. Is there some error in the way I have created the function or service plan?
provider "azurerm" {
version = "~> 2.24.0"
...
Edit:
As suggested this might be a provider bug, so I have created this issue: https://github.com/terraform-providers/terraform-provider-azurerm/issues/8241
Edit2:
On the bug forum they claim it is a configuration error and that I am missing a dependency. I have updated the code with a depends_on, I still have the same error.
I found the issue. On every apply the service plan was reapplied every time:
# azurerm_app_service_plan.asp must be replaced
-/+ resource "azurerm_app_service_plan" "asp" {
~ id = "/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Web/serverfarms/asp" -> (known after apply)
- is_xenon = false -> null
~ kind = "elastic" -> "FunctionApp" # forces replacement
location = "norwayeast"
~ maximum_elastic_worker_count = 1 -> (known after apply)
~ maximum_number_of_workers = 20 -> (known after apply)
name = "asp"
- per_site_scaling = false -> null
reserved = true
resource_group_name = "xxx"
- tags = {
- "Owner" = "XXX"
- "Service" = "XXX"
- "environment" = "staging"
} -> null
Even though I created it as kind="FunctionApp" it seems it was changed to "elastic"
I now changed it to kind="elastic" and Terraform has stopped destroying my service plan on every apply :)
Thanks a lot to Charles Xu for lots of help!
The reason (in my case) for the 409 was that app service plans with active app services cannot be deleted. When terraform tries to delete the app service before the apps are migrated to the new plan then the 409 happens. The solution was to keep the existing app service plan during the migration and rerun terraform after the applications have been migrated to a new plan to remove the old plan.

Creating a "random" instance with Terraform - autocreate valid configurations

I'm new to Terraform and like to create "random" instances.
Some settings like OS, setup script ... will stay the same. Mostly the region/zone would change.
How can I do that?
It seems Terraform already knows about which combinations are valid. For example with AWS EC2 or lightsail it will complain if you choose a wrong combination. I guess this will reduce the amount of work. I'm wondering though if this is valid for each provider.
How could you automatically create a valid configuration, with only the region or zone changing each time Terraform runs?
Edit: Config looks like:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
}
}
}
provider "aws" {
# profile = "default"
# region = "us-west-2"
accesskey = ...
secretkey = ...
}
resource "aws_instance" "example" {
ami = "ami-830c94e3"
instance_type = "t2.micro"
}
Using AWS as an example, aws_instance has two required parameters: ami and instance_type.
Thus to create an instance, you need to provide both of them:
resource "aws_instance" "my" {
ami = "ami-02354e95b39ca8dec"
instance_type = "t2.micro"
}
Everything else will be deduced or set to their default values. In terms of availability zones and subnets, if not explicitly specified, they will be chosen "randomly" (AWS decides how to place them, so if fact they can be all in one AZ).
Thus, to create 3 instances in different subnets and AZs you can do simply:
provider "aws" {
region = "us-east-1"
}
data "aws_ami" "al2_ami" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm*"]
}
}
resource "aws_instance" "my" {
count = 3
ami = data.aws_ami.al2_ami.id
instance_type = "t2.micro"
}
A declarative system like Terraform unfortunately isn't very friendly to randomness, because it expects the system to converge on a desired state, but random configuration would mean that the desired state would change on each action and thus it would never converge. Where possible I would recommend using "randomization" or "distribution" mechanisms built in to your cloud provider, such as AWS autoscaling over multiple subnets.
However, to be pragmatic Terraform does have a random provider, which represents the generation of random numbers as a funny sort of Terraform resource so that the random results can be preserved from one run to the next, in the same way as Terraform remembers the ID of an EC2 instance from one run to the next.
The random_shuffle resource can be useful for this sort of "choose any one (or N) of these options" situation.
Taking your example of randomly choosing AWS regions and availability zones, the first step would be to enumerate all of the options your random choice can choose from:
locals {
possible_regions = toset([
"us-east-1",
"us-east-2",
"us-west-1",
"us-west-2",
])
possible_availability_zones = tomap({
us-east-1 = toset(["a", "b", "e"])
us-east-2 = toset(["a", "c")
us-west-1 = toset(["a", "b"])
us-west-2 = toset(["b", "c"])
})
}
You can then pass these inputs into random_shuffle resources to select, for example, one region and then two availability zones from that region:
resource "random_shuffle" "region" {
input = local.possible_regions
result_count = 1
}
resource "random_shuffle" "availability_zones" {
input = local.possible_availability_zones[local.chosen_region]
result_count = 2
}
locals {
local.chosen_region = random_shuffle.region.result[0]
local.chosen_availability_zones = random_shuffle.availability_zones.result
}
You can then use local.chosen_region and local.chosen_availability_zones elsewhere in your configuration.
However, there is one important catch with randomly selecting regions in particular: the AWS provider is designed to require a region, because each AWS region is an entirely distinct set of endpoints, and so the provider won't be able to successfully configure itself if the region isn't known until the apply step, as would be the case if you wrote region = local.chosen_region in the provider configuration.
To work around this will require using the exceptional-use-only -target option to terraform apply, to direct Terraform to first focus only on generating the random region, and ignore everything else until that has succeeded:
# First apply with just the random region targeted
terraform apply -target=random_shuffle.region
# After that succeeds, run apply again normally to
# create everything else.
terraform apply

Resources