Custom Script extension not Executing on VMSS - terraform

I am creating a VMSS using terraform to use for Azure Devops agent pool. I'm able to create VMSS successfully but when I try to run script to enroll it to agent pool, I'm hitting a wall. Nothing seems to work. Here is my TF code:
data "local_file" "template" {
filename = "./agent_install_script.ps1"
}
data "template_file" "script" {
template = data.local_file.template.content
vars = {
agent_name = var.agent_name
pool_name = var.agent_pool_name
token = var.pat_token
user_name = var.vmss_admin_username
logon_password = random_password.vm_password.result
}
}
module "vmss_windows2022g2" {
source = "../modules/vmss_windows"
environment = var.environment
resource_group_name = var.resource_group
vmss_sku = "Standard_DS2_v2"
vmss_nic_subnet_id = module.vnet_mgt.subnet_windows_vmss_id
vmss_nsg_id = module.nsg.vmss_nsg_id
vmss_computer_name = "win2022g2"
vmss_admin_username = var.vmss_admin_username
vmss_admin_password = random_password.vm_password.result
windows_image_id = data.azurerm_image.windows_server2022_gen2.id
vmss_storage_uri = data.azurerm_storage_account.vm_storage.primary_blob_endpoint
overprovision = false
#this will be stored at %SYSTEMDRIVE%\AzureData\CustomData.bin
customData = data.template_file.script.rendered
tags = local.env_tags_map
}
resource "azurerm_virtual_machine_scale_set_extension" "ext" {
name = "InstallDevOpsAgent"
virtual_machine_scale_set_id = module.vmss_windows2022g2.id
publisher = "Microsoft.Azure.Extensions"
type = "CustomScript"
type_handler_version = "2.0"
settings = jsonencode({
"commandToExecute" = "dir C:\\ > C:\\temp\\test.txt"
#"cd C:\\AzureData; mv .\\CustomData.bin .\\install_agent.ps1; powershell -ExecutionPolicy Unrestricted -File .\\install_agent.ps1; del .\\install_agent.ps1;"
})
#protected_settings = var.protected_settings
failure_suppression_enabled = false
auto_upgrade_minor_version = false
automatic_upgrade_enabled = false
provision_after_extensions = []
timeouts {
create = "1h"
}
}
As you can see, I'm copying the powershell script via custom_data and that is working fine with all the variables substituted properly. I have tried executing simple command dir C:\\ > C:\\temp\\test.txt to see if anything works, but am not getting any output.
TF version 1.12, azurerm provider version 3.32.0

Azure DevOps should install an extension on the scale set (and in turn the VM's) which will automatically enrol the agent without the need for a script.
More details here:
https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/scale-set-agents?view=azure-devops#lifecycle-of-a-scale-set-agent

Related

Unable to create Service Bus Authorization Rule in Azure

We are using terraform version of 0.12.19 and azurerm provider version 2.10.0 for deploying the service bus and its queues and authorization rules. So when we ran the terraform apply it created the service bus and queue but it throwed the below error for the creation of authorization rules.
But when we checked the azure portal these authorization rules were present and in tf state file as well we were able to find the entries of both the resources and they had a parameter Status as "Tainted" in it.. So when we tried to run the apply again to see if will recreate/replace the existing resources but it was failing with the same error. Now we are unable to proceed further as even when we run the plan for creating the new resources its failing at this point and not letting us proceed further.
We even tried to untainted it and run the apply but it seems still we are getting this issue though the resources doesn't have the status tainted parameter in tf state. Can you please help us here the solution so that we can resolve this. (We can't move forward to new version of terraform cli as there are so many modules dependent on it and it will impact our production deployments as well.)
Error: Error making Read request on Azure ServiceBus Queue Authorization Rule "" (Queue "sample-check-queue" / Namespace "sample-check-bus" / Resource Group "My-RG"): servicebus.QueuesClient#GetAuthorizationRule: Invalid input: autorest/validation: validation failed: parameter=authorizationRuleName constraint=MinLength value="" details: value length must be greater than or equal to 1
azurerm_servicebus_queue_authorization_rule.que-sample-check-lsr: Refreshing state... [id=/subscriptions//resourcegroups/My-RG/providers/Microsoft.ServiceBus/namespaces/sample-check-bus/queues/sample-check-queue/authorizationrules/lsr]
Below is the service_bus.tf file code:
provider "azurerm" {
version = "=2.10.0"
features {}
}
provider "azurerm" {
features {}
alias = "cloud_operations"
}
resource "azurerm_servicebus_namespace" "service_bus" {
name = "sample-check-bus"
resource_group_name = "My-RG"
location = "West Europe"
sku = "Premium"
capacity = 1
zone_redundant = true
tags = {
source = "terraform"
}
}
resource "azurerm_servicebus_queue" "que-sample-check" {
name = "sample-check-queue"
resource_group_name = "My-RG"
namespace_name = azurerm_servicebus_namespace.service_bus.name
dead_lettering_on_message_expiration = true
requires_duplicate_detection = false
requires_session = false
enable_partitioning = false
default_message_ttl = "P15D"
lock_duration = "PT2M"
duplicate_detection_history_time_window = "PT15M"
max_size_in_megabytes = 1024
max_delivery_count = 05
}
resource "azurerm_servicebus_queue_authorization_rule" "que-sample-check-lsr" {
name = "lsr"
resource_group_name = "My-RG"
namespace_name = azurerm_servicebus_namespace.service_bus.name
queue_name = azurerm_servicebus_queue.que-sample-check.name
listen = true
send = true
}
resource "azurerm_servicebus_queue_authorization_rule" "que-sample-check-AsyncReportBG-AsncRprt" {
name = "AsyncReportBG-AsncRprt"
resource_group_name = "My-RG"
namespace_name = azurerm_servicebus_namespace.service_bus.name
queue_name = azurerm_servicebus_queue.que-sample-check.name
listen = true
send = true
manage = false
}
I have tried the below terraform code to create authorization rules and could create them successfully:
I have followed this azurerm_servicebus_queue_authorization_rule |
Resources | hashicorp/azurerm | Terraform Registry having latest
version of hashicorp/azurerm terraform provider.
This maybe even related to arguments queue_name. arguments of
resources changed to queue_id in 3.X.X versions
provider "azurerm" {
features {
resource_group {
prevent_deletion_if_contains_resources = false
}
}
}
resource "azurerm_resource_group" "example" {
name = "xxxx"
location = "xx"
}
provider "azurerm" {
features {}
alias = "cloud_operations"
}
resource "azurerm_servicebus_namespace" "service_bus" {
name = "sample-check-bus"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
sku = "Premium"
capacity = 1
zone_redundant = true
tags = {
source = "terraform"
}
}
resource "azurerm_servicebus_queue" "que-sample-check" {
name = "sample-check-queue"
#resource_group_name = "My-RG"
namespace_id = azurerm_servicebus_namespace.service_bus.id
#namespace_name =
azurerm_servicebus_namespace.service_bus.name
dead_lettering_on_message_expiration = true
requires_duplicate_detection = false
requires_session = false
enable_partitioning = false
default_message_ttl = "P15D"
lock_duration = "PT2M"
duplicate_detection_history_time_window = "PT15M"
max_size_in_megabytes = 1024
max_delivery_count = 05
}
resource "azurerm_servicebus_queue_authorization_rule" "que-sample-check-lsr"
{
name = "lsr"
#resource_group_name = "My-RG"
#namespace_name = azurerm_servicebus_namespace.service_bus.name
queue_id = azurerm_servicebus_queue.que-sample-check.id
#queue_name = azurerm_servicebus_queue.que-sample-check.name
listen = true
send = true
manage = false
}
resource "azurerm_servicebus_queue_authorization_rule" "que-sample-check- AsyncReportBG-AsncRprt" {
name = "AsyncReportBG-AsncRprt"
#resource_group_name = "My-RG"
#namespace_name = azurerm_servicebus_namespace.service_bus.name
queue_id = azurerm_servicebus_queue.que-sample-check.id
#queue_name = azurerm_servicebus_queue.que-sample-check.name
listen = true
send = true
manage = false
}
Authorization rules created without error:
Please try to change the name of the authorization rule named “lsr” with increased length and also please try to create one rule at a time in your case .
Thanks all for your inputs and suggestions.
Code is working fine now with the terraform provider version 2.56.0 and terraform cli version 0.12.19. Please let me know if any concerns.

Az Function Elastic Premium - Azure Functions runtime is unreachable

I have an Azure function that I need to run in an Elastic Premium plan. After deployed I see the following error:
Azure Functions runtime is unreachable
I've tried to solve it following Microsoft documentation, no luck.
Here is some thoughts about my tries :
We checked the Storage account is created
The Function's subnet already has the service endpoint for the storage account
Vnet integration is already enabled in the Function and it (subnet) is already added to the Storage firewall
We added the required properties in the Function settings:
WEBSITE_CONTENTAZUREFILECONNECTIONSTRING = dynamic created (connection string to the
Storage account)
WEBSITE_CONTENTOVERVNET = 1
WEBSITE_CONTENTSHARE = dynamic created
WEBSITE_VNET_ROUTE_ALL = 1
Here is the documentation link.
https://learn.microsoft.com/en-us/azure/azure-functions/functions-recover-storage-account
Everything was working fine when I was using the Premium (P1v2) and the error begins when I moved to Elastic (EP1).
I am deploying it using Terraform.
Here is a TF code example we are using to deploy
locals {
app_settings = {
FUNCTIONS_WORKER_RUNTIME = "python"
FUNCTION_APP_EDIT_MODE = "readonly"
WEBSITE_VNET_ROUTE_ALL = "1"
WEBSITE_CONTENTOVERVNET = "1"
}
}
module "az_service_plan_sample" {
source = "source module"
serviceplan_name = "planname"
resource_group_name = "RG Name"
region = "East US 2"
tier = "ElasticPremium"
size = "EP1"
kind = "elastic"
capacity = 40
per_site_scaling = false
depends_on = [
module.storage_account
]
}
module "storage_account_sample" {
source = "source module"
resource_group_name = "RG Name"
location = "East US 2"
name = "saname"
storage_account_replication_type = "GRS"
subnet_ids = [subnet_ids]
}
module "sample" {
source = "source module"
azure_function_name = "functionname"
resource_group_name = "RG Name"
storage_account_name = module.storage_account.storage-account-name
storage_account_access_key = module.storage_account.storage-account-primary-key
region = "East US 2"
subnet_id = subnet_ids
app_service_id = module.az_service_plan.service_plan_id
scope_role_storage_account = module.storage_account.storage-account-id
azure_function_version = "~4"
app_settings = local.app_settings
key_vault_reference_identity_id = azurerm_user_assigned_identity.az_func.id
pre_warmed_instance_count = 2
identity_type = "UserAssigned"
user_assigned_identityies = [{
id = azurerm_user_assigned_identity.az_func.id
principal_id = azurerm_user_assigned_identity.az_func.principal_id
}]
depends_on = [
module.az_service_plan_sample,
module.storage_account_sample,
azurerm_user_assigned_identity.az_func,
]
}
AFAIk, There is not a one specific reason for Azure function runtime unreachable, Please check the below workaround to solve the above issue,
We have tried to create a Function app using Elastic premium plan and its working fine at our end,
Please make sure that you have configured the correct WEBSITE_CONTENTAZUREFILECONNECTIONSTRING value same as AzureWebJobsStorage then try to STOP/START the function app.
Also try to set the pre_warmed_instance_count=1 instead of 2 as mentioned in this MICROSOFT DOCUMENTATION:-
The default pre-warmed instance count is 1, and for most scenarios this value should remain as 1.
For more information please refer this ARTICLE|AZURE LESSONS-AZURE FUNCTION RUNTIME UNREACHABLE.
When you use a Function with Elastic Premium Plan Type that has a VNET Integration, you need to add one more property called vnet_route_all_enabled to enable route outbound from your Azure Function. Also you need to first create a file in your storage account that the name of this file will be the content of this variable WEBSITE_CONTENTSHARE in your Application Settings. Below is my code suggestion:
You can check this doc to be sure: https://learn.microsoft.com/en-us/azure/azure-functions/functions-create-vnet
Below my suggest code:
locals {
app_settings = {
FUNCTIONS_WORKER_RUNTIME = "python"
FUNCTION_APP_EDIT_MODE = "readonly"
WEBSITE_VNET_ROUTE_ALL = "1"
WEBSITE_CONTENTOVERVNET = "1"
WEBSITE_CONTENTSHARE = "file-function"
}
}
module "az_service_plan_sample" {
source = "source module"
serviceplan_name = "planname"
resource_group_name = "RG Name"
region = "East US 2"
tier = "ElasticPremium"
size = "EP1"
kind = "elastic"
capacity = 40
per_site_scaling = false
depends_on = [
module.storage_account
]
}
module "storage_account_sample" {
source = "source module"
resource_group_name = "RG Name"
location = "East US 2"
name = "saname"
storage_account_replication_type = "GRS"
subnet_ids = [subnet_ids]
}
resource "azurerm_storage_share" "share_file_ingest_function" {
name = "file-function"
storage_account_name = module.storage_account_sample.name
depends_on = [
module.storage_account_sample
]
}
module "sample" {
source = "source module"
azure_function_name = "functionname"
resource_group_name = "RG Name"
storage_account_name = module.storage_account.storage-account-name
storage_account_access_key = module.storage_account.storage-account-primary-key
region = "East US 2"
subnet_id = subnet_ids
app_service_id = module.az_service_plan.service_plan_id
scope_role_storage_account = module.storage_account.storage-account-id
azure_function_version = "~4"
app_settings = local.app_settings
key_vault_reference_identity_id = azurerm_user_assigned_identity.az_func.id
pre_warmed_instance_count = 2
vnet_route_all_enabled = true
identity_type = "UserAssigned"
user_assigned_identityies = [{
id = azurerm_user_assigned_identity.az_func.id
principal_id = azurerm_user_assigned_identity.az_func.principal_id
}]
depends_on = [
module.az_service_plan_sample,
module.storage_account_sample,
azurerm_user_assigned_identity.az_func,
]
}

Creating an Azure VM image with packer

I am trying to create an Azure VM image using packer. My packer template looks like this
variable "version" {
type = string
default = "1.0.0"
}
variable "created_by" {
type = string
}
source "azure-arm" "development_subscription" {
azure_tags = {
CreatedBy = var.created_by
CreatedDate = formatdate("DD/MM/YYYY hh:mm:ss",timestamp())
}
image_offer = "WindowsServer"
image_publisher = "MicrosoftWindowsServer"
image_sku = "2022-datacenter-g2"
managed_image_name = "MyImage_${var.version}"
managed_image_resource_group_name = "Some-RG"
os_type = "Windows"
location = "ukwest"
# client_id = var.client_id
# client_secret = var.client_secret
subscription_id = "e8204745-e84f-4b2e-9e6f-545656fe0922"
vm_size = "Standard_D2s_v3"
winrm_insecure = true
winrm_timeout = "20m"
winrm_use_ssl = true
winrm_username = "packer"
}
However I keep on getting:
==> azure-arm.development_subscription: Waiting for WinRM to become available...
==> azure-arm.development_subscription: Timeout waiting for WinRM.
Other resources I've found online imply I should try increasing the timeout, but this VM doesn't seem likely to take longer than a few seconds to boot. Do I need to do something to disable the system firewall?
I was missing tenant_id. Once I added that, everything worked fine.
I tried your code it also got stuck while connecting to winRM and timed out waiting for the same .
The Major issue I found in your code is that you have not added a communicator ="WinRM" . So ,For that reason the WinRM port doesn't get open and you are not able to connect through it.
So, I added the same as solution in the below code :
variable "version" {
type = string
default = "1.0.0"
}
variable "created_by" {
type = string
default = "ajay"
}
variable "client_secret" {
default = "XXXXXXXXXXXXXXXXXXXXXXXX"
}
variable "client_id" {
default = "XXXXXXXXXXXXXXXXXXXXXXXXXX"
}
source "azure-arm" "development_subscription" {
azure_tags = {
CreatedBy = var.created_by
CreatedDate = formatdate("DD/MM/YYYY hh:mm:ss", timestamp())
}
image_offer = "WindowsServer"
image_publisher = "MicrosoftWindowsServer"
image_sku = "2022-datacenter-g2"
managed_image_name = "MyImage_${var.version}"
managed_image_resource_group_name = "ansumantest"
os_type = "Windows"
location = "ukwest"
client_id = var.client_id
client_secret = var.client_secret
subscription_id = "XXXXXXXXXXXXXXXXXXXX"
vm_size = "Standard_D2s_v3"
communicator = "winrm"
winrm_insecure = true
winrm_timeout = "20m"
winrm_use_ssl = true
winrm_username = "packer"
}
build {
name = "learn-packer"
sources = [
"source.azure-arm.development_subscription"
]
}
Output:

Creating azure automation dsc configuration and dsc configuration node using terraform doesn't seems to be working

As a very first step of my release process I run the following terraform code
resource "azurerm_automation_account" "automation_account" {
for_each = data.terraform_remote_state.pod_bootstrap.outputs.ops_rg
name = "${local.automation_account_prefix}-${each.key}"
location = each.key
resource_group_name = each.value.name
sku_name = "Basic"
tags = {
environment = "development"
}
}
The automation accounts created as expected and I can see those in Azure portal.
I also have terraform code that creates a couple of windows VMs,each VM creation accompained by the following
resource "azurerm_virtual_machine_extension" "dsc" {
name = "DevOpsDSC"
virtual_machine_id = var.vm_id
publisher = "Microsoft.Powershell"
type = "DSC"
type_handler_version = "2.83"
settings = <<SETTINGS_JSON
{
"configurationArguments": {
"RegistrationUrl": "${var.dsc_server_endpoint}",
"NodeConfigurationName": "${var.dsc_config}",
"ConfigurationMode": "${var.dsc_mode}",
"ConfigurationModeFrequencyMins": 15,
"RefreshFrequencyMins": 30,
"RebootNodeIfNeeded": false,
"ActionAfterReboot": "continueConfiguration",
"AllowModuleOverwrite": true
}
}
SETTINGS_JSON
protected_settings = <<PROTECTED_SETTINGS_JSON
{
"configurationArguments": {
"RegistrationKey": {
"UserName": "PLACEHOLDER_DONOTUSE",
"Password": "${var.dsc_primary_access_key}"
}
}
}
PROTECTED_SETTINGS_JSON
}
The result is the following
So VM extension is created for each VM and the status says that provisioning succeeded.
For the next step I run the following terraform code
resource "azurerm_automation_dsc_configuration" "iswebserver" {
for_each = data.terraform_remote_state.pod_bootstrap.outputs.ops_rg
name = "iswebserver"
resource_group_name = each.value.name
automation_account_name = data.terraform_remote_state.ops.outputs.automation_account[each.key].name
location = each.key
content_embedded = "configuration iswebserver {}"
}
resource "azurerm_automation_dsc_nodeconfiguration" "iswebserver" {
for_each = data.terraform_remote_state.pod_bootstrap.outputs.ops_rg
name = "iswebserver.localhost"
resource_group_name = each.value.name
automation_account_name = data.terraform_remote_state.ops.outputs.automation_account[each.key].name
depends_on = [azurerm_automation_dsc_configuration.iswebserver]
content_embedded = file("${path.cwd}/iswebserver.mof")
}
The mof file content is the following
/*
#TargetNode='IsWebServer'
#GeneratedBy=P120bd0
#GenerationDate=02/25/2021 17:33:16
#GenerationHost=D-MJ05UA54
*/
instance of MSFT_RoleResource as $MSFT_RoleResource1ref
{
ResourceID = "[WindowsFeature]IIS";
IncludeAllSubFeature = True;
Ensure = "Present";
SourceInfo = "D:\\DSC\\testconfig.ps1::5::9::WindowsFeature";
Name = "Web-Server";
ModuleName = "PsDesiredStateConfiguration";
ModuleVersion = "1.0";
ConfigurationName = "TestConfig";
};
instance of OMI_ConfigurationDocument
{
Version="2.0.0";
MinimumCompatibleVersion = "1.0.0";
CompatibleVersionAdditionalProperties= {"Omi_BaseResource:ConfigurationName"};
Author="P120bd0";
GenerationDate="02/25/2021 17:33:16";
GenerationHost="D-MJ05UA54";
Name="TestConfig";
};
After running the code I have got the following result
The configuration is created as expected, clicking on configuration entry in UI grid, leads to the following
Meaning that node configuration is created as well. My expectation was that for each VM I will see the Node configured to run configuration provided in mof file but Nodes UI shows empty Nodes
So I was trying to configure node manually to connect all peaces together
and that fails with the following
So I am totally confisued. On the one hand there's azurerm_virtual_machine_extension that allows to create extension and bind it to the automation account. In addition there are azurerm_automation_dsc_configuration and azurerm_automation_dsc_nodeconfiguration that allows to create configuration and node configuration. But the bottom line is that you cannot connect all those dots to be able to create node.
Just to confirm that configuration is valid, I create additional vm without using azurerm_virtual_machine_extension and I was able succesfully add this MV to created node configuration
The problem was in azurerm_virtual_machine_extension dsc_configuration parameter. The value needs to be the same as name property of the azurerm_automation_dsc_nodeconfiguration resource.

Terraform - can source modules use different datasources for each instance OS?

I have an "aws_instance" resource (in the source module) that worked fine to bootstrap either centos, coreos or ubuntu instances via user_data = data.template_file.user-data.rendered by using the following data block;
data "template_file" "user-data" {
template = file("${path.module}/bootstrap-${var.os_distro}.sh")
vars = {
access_port = var.access_port
service_port1 = var.service_port1
docker_api_port = var.docker_api_port
}
}
The associated file (e.g. bootstrap-centos.sh) was then loaded and rendered depending on the value for the $os_distro variable in the root module.
All was well until i switched from coreos for fedora coreos... the issue being that I now need to call a different datasource first (ct_config) to transpile my bootstrap-fcos.yaml file for ignition.
Is there any logic I can use in the source module to use a different datasource when i want to deploy a fedora coreos AMI? Seems totally against the power of terraform modules to take the easy way and create a new source module just for this new OS.
The salient parts from the source and root modules are;
SOURCE MODULE
resource ` "my-ec2-instance" {
count = var.node_count
availability_zone = element(var.azs, count.index)
subnet_id = var.aws_subnet_id
private_ip = length(var.private_ips) > 0 ? element(var.private_ips, count.index) : var.private_ip
ami = var.machine_ami
instance_type = var.aws_instance_type
vpc_security_group_ids = [aws_security_group.my-sg-group.id]
key_name = var.key_name
user_data = data.template_file.user-data.rendered
monitoring = false
ebs_optimized = false
associate_public_ip_address = var.public_ip
root_block_device {
volume_type = var.root_volume_type
volume_size = var.root_volume_size
delete_on_termination = true
}
}
data "template_file" "user-data" {
template = file("${path.module}/bootstrap-${var.os_distro}.sh")
vars = {
access_port = var.access_port
service_port1 = var.service_port1
docker_api_port = var.docker_api_port
}
}
variable "user_data" {
type = string
description = "userdata used to bootstrap the node"
}
variable "os_distro" {
type = string
description = "choose centos coreos or ubuntu to load either bootstrap-centos.sh, bootstrap-ubuntu.sh or bootstrap-coreos.sh from this module"
}
ROOT MODULE
module "demo_coreos_stg_ec2" {
source = ".../aws/ec2" # as per source module code above
node_count = local.node_count
azs = local.azs
aws_subnet_id = "subnet-c18c0fbb"
private_ips = ["172.31.16.20"]
machine_ami = data.aws_ami.fcos-stable-latest.id # latest stable fedora coreos release
aws_instance_type = "t2.micro"
key_name = "keys-2020"
user_data = data.ct_config.boot_config.rendered # convert the boot config in yaml to the ignition config in json via ct (config transpiler)
os_distro = var.os_distro # enables either bootstrap-centos.sh, bootstrap-ubuntu.sh or bootstrap-coreos.sh from this module
data "ct_config" "boot_config" {
content = data.template_file.fcos.rendered
strict = true
pretty_print = true
}
data "template_file" "fcos" {
template = file("${path.module}/bootstrap-fcos.yaml")
vars = {
access_port = var.access_port
service_port1 = var.service_port1
docker_api_port = var.docker_api_port
}
}
Notice that the root module needs to be able to first use the ct_config datasource before using the template_file datasource for loading the bootstrap-fcos.yaml for interpolation. Previously all 3 OS's could use template_file to load their .sh file.

Resources