Azure AKS Terraform - how to specify VM Size - terraform

In my Azure subscription, I'm trying to create an AKS cluster using Terraform.
My main.tf looks like this:
## Azure resource provider ##
provider "azurerm" {
version = "=1.36.1"
}
## Azure resource group for the kubernetes cluster ##
resource "azurerm_resource_group" "aks_demo" {
name = var.resource_group_name
location = var.location
}
## AKS kubernetes cluster ##
resource "azurerm_kubernetes_cluster" "aks_demo" {
name = var.cluster_name
resource_group_name = azurerm_resource_group.aks_demo.name
location = azurerm_resource_group.aks_demo.location
dns_prefix = var.dns_prefix
linux_profile {
admin_username = var.admin_username
## SSH key is generated using "tls_private_key" resource
ssh_key {
key_data = "${trimspace(tls_private_key.key.public_key_openssh)} ${var.admin_username}#azure.com"
}
}
agent_pool_profile {
name = "default"
count = var.agent_count
vm_size = "Standard_D2"
os_type = "Linux"
os_disk_size_gb = 30
}
service_principal {
client_id = var.client_id
client_secret = var.client_secret
}
tags = {
Environment = "Production"
}
}
## Private key for the kubernetes cluster ##
resource "tls_private_key" "key" {
algorithm = "RSA"
}
## Save the private key in the local workspace ##
resource "null_resource" "save-key" {
triggers = {
key = tls_private_key.key.private_key_pem
}
provisioner "local-exec" {
command = <<EOF
mkdir -p ${path.module}/.ssh
echo "${tls_private_key.key.private_key_pem}" > ${path.module}/.ssh/id_rsa
chmod 0600 ${path.module}/.ssh/id_rsa
EOF
}
}
## Outputs ##
# Example attributes available for output
output "id" {
value = "${azurerm_kubernetes_cluster.aks_demo.id}"
}
output "client_key" {
value = "${azurerm_kubernetes_cluster.aks_demo.kube_config.0.client_key}"
}
output "client_certificate" {
value = "${azurerm_kubernetes_cluster.aks_demo.kube_config.0.client_certificate}"
}
output "cluster_ca_certificate" {
value = "${azurerm_kubernetes_cluster.aks_demo.kube_config.0.cluster_ca_certificate}"
}
output "kube_config" {
value = azurerm_kubernetes_cluster.aks_demo.kube_config_raw
}
output "host" {
value = azurerm_kubernetes_cluster.aks_demo.kube_config.0.host
}
output "configure" {
value = <<CONFIGURE
Run the following commands to configure kubernetes client:
$ terraform output kube_config > ~/.kube/aksconfig
$ export KUBECONFIG=~/.kube/aksconfig
Test configuration using kubectl
$ kubectl get nodes
CONFIGURE
}
My variables.tf looks like this:
## Azure config variables ##
variable "client_id" {}
variable "client_secret" {}
variable location {
default = "Central US"
}
## Resource group variables ##
variable resource_group_name {
default = "aksdemo-rg"
}
## AKS kubernetes cluster variables ##
variable cluster_name {
default = "aksdemo1"
}
variable "vm_size" {
default = "Standard_A0"
}
variable "agent_count" {
default = 3
}
variable "dns_prefix" {
default = "aksdemo"
}
variable "admin_username" {
default = "demo"
}
When I run terraform apply, I get this error:
Error: Error creating Managed Kubernetes Cluster "aksdemo1" (Resource Group "aksdemo-rg"):
containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 --
Original Error: Code="BadRequest"
Message="The VM size of AgentPoolProfile:default is not allowed in your subscription in location 'centralus'. The available VM sizes are Standard_A2,Standard_A2_v2,Standard_A2m_v2,Standard_A3,Standard_A4,Standard_A4_v2,Standard_A4m_v2,
Standard_A5,Standard_A6,Standard_A7,Standard_A8_v2,Standard_A8m_v2,Standard_B12ms,Standard_B16ms,Standard_B20ms,Standard_B2ms,Standard_B2s,Standard_B4ms,Standard_B8ms,Standard_D11_v2,Standard_D12_v2,
Standard_D13_v2,Standard_D14_v2,Standard_D15_v2,Standard_D16_v3,Standard_D16s_v3,Standard_D1_v2,Standard_D2_v2,Standard_D2_v3,Standard_D2s_v3,Standard_D32_v3,Standard_D32s_v3,Standard_D3_v2,Standard_D48_v3,
Standard_D48s_v3,Standard_D4_v2,Standard_D4_v3,Standard_D4s_v3,Standard_D5_v2,Standard_D64_v3,Standard_D64s_v3,Standard_D8_v3,Standard_D8s_v3,Standard_DS1,Standard_DS11,Standard_DS11_v2,Standard_DS12,Standard_DS12_v2,Standard_DS13,Standard_DS13-2_v2,Standard_DS13-4_v2,Standard_DS13_v2,Standard_DS14,Standard_DS14-4_v2,Standard_DS14-8_v2,Standard_DS14_v2,Standard_DS15_v2,Standard_DS1_v2,Standard_DS2,Standard_DS2_v2,Standard_DS3,Standard_DS3_v2,Standard_DS4,Standard_DS4_v2,Standard_DS5_v2,Standard_E16_v3,Standard_E16s_v3,Standard_E2_v3,Standard_E2s_v3,Standard_E32-16s_v3,Standard_E32-8s_v3,Standard_E32_v3,Standard_E32s_v3,Standard_E48_v3,Standard_E48s_v3,Standard_E4_v3,Standard_E4s_v3,Standard_E64-16s_v3,Standard_E64-32s_v3,Standard_E64_v3,Standard_E64i_v3,Standard_E64is_v3,Standard_E64s_v3,Standard_E8_v3,Standard_E8s_v3,Standard_F16,Standard_F16s,Standard_F16s_v2,Standard_F2,Standard_F2s,Standard_F2s_v2,Standard_F32s_v2,Standard_F4,Standard_F48s_v2,Standard_F4s,Standard_F4s_v2,Standard_F64s_v2,Standard_F72s_v2,Standard_F8,
Standard_F8s,Standard_F8s_v2
For more details, please visit https://aka.ms/cpu-quota"
This is confusing to me, as there is clearly a variable named vm_size
What can I change in order for this to work?

As I see from the code you provided and the error you got, you made the mistake in the code.
What the code you made:
agent_pool_profile {
name = "default"
count = var.agent_count
vm_size = "Standard_D2"
os_type = "Linux"
os_disk_size_gb = 30
}
It should be like this when you use the variable for the VM size:
agent_pool_profile {
name = "default"
count = var.agent_count
vm_size = var.vm_size
os_type = "Linux"
os_disk_size_gb = 30
}
And the VM size should be an appropriate one on yourself and for the requirements. For example, just like it shows in the Terraform example.

The error message is telling you that you're trying to use a VM size, or VM type if you will that's not available for your subscription in that location, it's also giving you all the VM sizes you can choose from.
Note that you have probably copy pasted this:
agent_pool_profile {
name = "default"
count = var.agent_count
vm_size = "Standard_D2"
os_type = "Linux"
os_disk_size_gb = 30
}
The VM size is hardcoded there, so you're default Standard_A0value is not being picked up.
You have more than one way to debug here, first I'd start by making sure the right value is being used, and second change the VM type to see if that works.

Related

Error: waiting for creation of MsSql Database - Terraform failure

I have had to re-develop my pipeline for building out my infrastructure to use local agent pools and change Ubuntu code to windows (PowerShell code).
I am now in the position where I am building out my infrastructure and it failing on the most basic of tasks.
I have created my SQL Server and that seems to be OK. I have also got my logging system infra done OK too, but I am really struggling on building out a DB on my SQL Server.
At the most basic here is my code. Server build OK.
resource "azurerm_mssql_server" "main" {
name = local.sqlServerName
resource_group_name = local.resourceGroupName
location = var.location
version = "12.0"
minimum_tls_version = "1.2"
administrator_login = var.sql_administrator_login
administrator_login_password = var.sql_administrator_login_password
tags = var.tags
}
resource "azurerm_sql_active_directory_administrator" "main" {
server_name = azurerm_mssql_server.main.name
resource_group_name = local.resourceGroupName
login = local.sql_ad_login
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = local.object_id
}
resource "azurerm_sql_firewall_rule" "main" {
name = var.sql_firewall_rule
resource_group_name = local.resourceGroupName
server_name = azurerm_mssql_server.main.name
start_ip_address = "0.0.0.0"
end_ip_address = "0.0.0.0"
}
resource "azurerm_mssql_database" "main" {
name = "${local.raptSqlDatabaseName}-${var.environment}"
server_id = azurerm_mssql_server.main.id
min_capacity = 0.5
max_size_gb = 100
zone_redundant = false
collation = "SQL_Latin1_General_CP1_CI_AS"
sku_name = "GP_S_Gen5_2"
auto_pause_delay_in_minutes = 60
create_mode = "Default"
}
I get an error saying:
Error: waiting for creation of MsSql Database "xxx-xxx-xxx-xxx-Prod" (MsSql Server Name "xxx-xxx-xxx-prod" / Resource Group "rg-xxx-xxx-xxx"): Code="InternalServerError" Message="An unexpected error occured while processing the request.
Before I had to design it all in PS and not bash and use local pool, I never had a problem and this worked fine.
I found this but it's saying the correct error but nothing else seems correct. It is odd because I can build up my other infra fine in the same main.tf file.
https://github.com/hashicorp/terraform-provider-azurerm/issues/13194
I am also noticing that Terraform output is not working:
Here is my output file:
output "sql_server_name" {
value = azurerm_mssql_server.main.fully_qualified_domain_name
}
output "sql_server_user" {
value = azurerm_mssql_server.main.administrator_login
}
output "sql_server_password" {
value = azurerm_mssql_server.main.administrator_login_password
sensitive = true
}
#output "cl_sql_database_name" {
# value = azurerm_mssql_database.cl.name
#}
#output "rapt_sql_database_name" {
# value = azurerm_mssql_database.rapt.name
#}
output "app_insights_instrumentation_key" {
value = azurerm_application_insights.main.instrumentation_key
}
Is there any chance this is linked?
Please use latest terraform version i.e. 1.1.13 and azurerm-provider i.e. 2.92.0 as there were some bugs in previous Azure API version which resulted in 500 Error Code which has been fixed in the recent versions as mentioned in this GitHub Issue . Only after a successful apply , you get the output otherwise it won't get stored if there is an error.
I tested the same code with latest versions on both PowerShell and Bash as below :
provider "azurerm" {
features {}
}
data "azurerm_client_config" "current" {}
locals {
resourceGroupName="ansumantest"
sqlServerName="ansumantestsql"
sql_ad_login="sqladmin"
object_id= data.azurerm_client_config.current.object_id
raptSqlDatabaseName="ansserverdb"
}
variable "location" {
default="eastus"
}
variable "sql_administrator_login" {
default="4dm1n157r470r"
}
variable "sql_administrator_login_password" {
default="4-v3ry-53cr37-p455w0rd"
}
variable "sql_firewall_rule" {
default="ansumantestfirewall"
}
variable "environment" {
default="test"
}
resource "azurerm_mssql_server" "main" {
name = local.sqlServerName
resource_group_name = local.resourceGroupName
location = var.location
version = "12.0"
minimum_tls_version = "1.2"
administrator_login = var.sql_administrator_login
administrator_login_password = var.sql_administrator_login_password
}
resource "azurerm_sql_active_directory_administrator" "main" {
server_name = azurerm_mssql_server.main.name
resource_group_name = local.resourceGroupName
login = local.sql_ad_login
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = local.object_id
}
resource "azurerm_sql_firewall_rule" "main" {
name = var.sql_firewall_rule
resource_group_name = local.resourceGroupName
server_name = azurerm_mssql_server.main.name
start_ip_address = "0.0.0.0"
end_ip_address = "0.0.0.0"
}
resource "azurerm_mssql_database" "main" {
name = "${local.raptSqlDatabaseName}-${var.environment}"
server_id = azurerm_mssql_server.main.id
min_capacity = 0.5
max_size_gb = 100
zone_redundant = false
collation = "SQL_Latin1_General_CP1_CI_AS"
sku_name = "GP_S_Gen5_2"
auto_pause_delay_in_minutes = 60
create_mode = "Default"
}
Output.tf
output "sql_server_name" {
value = azurerm_mssql_server.main.fully_qualified_domain_name
}
output "sql_server_user" {
value = azurerm_mssql_server.main.administrator_login
}
output "sql_server_password" {
value = azurerm_mssql_server.main.administrator_login_password
sensitive = true
}
output "cl_sql_database_name" {
value = azurerm_mssql_database.main.name
}
Output :
Updated Terraform to latest version didnt really help...
I turned on logging with TF_LOGS (env variable in terraform).
Watching the activity logs in the resource group and i noticed it was building the DB server and then there was an error with AAD and then the DB Server build failed after that. So I removed the AAD work in my main.tf - then i reran the pipeline then it worked fine. Phew...

Terraform tried creating a "implicit dependency" but the next stage of my code still fails to find the Azure resource group just created

Would be grateful for any assistance, I thought I had nailed this one when I stumbled across the following link ...
Creating a resource group with terraform in azure: Cannot find resource group directly after creating it
However, the next stage of my code is still failing...
Error: Code="ResourceGroupNotFound" Message="Resource group 'ShowTell' could not be found
# We strongly recommend using the required_providers block to set the
# Azure Provider source and version being used
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=2.64.0"
}
}
}
# Configure the Microsoft Azure Provider
provider "azurerm" {
features {}
}
variable "resource_group_name" {
type = string
default = "ShowTell"
description = ""
}
# Create your resource group
resource "azurerm_resource_group" "example" {
name = var.resource_group_name
location = "UK South"
}
# Should be accessible from LukesContainer.uksouth.azurecontainer.io
resource "azurerm_container_group" "LukesContainer" {
name = "LukesContainer"
location = "UK South"
resource_group_name = "${var.resource_group_name}"
ip_address_type = "public"
dns_name_label = "LukesContainer"
os_type = "Linux"
container {
name = "hello-world"
image = "microsoft/aci-helloworld:latest"
cpu = "0.5"
memory = "1.5"
ports {
port = "443"
protocol = "TCP"
}
}
container {
name = "sidecar"
image = "microsoft/aci-tutorial-sidecar"
cpu = "0.5"
memory = "1.5"
}
tags = {
environment = "testing"
}
}
In order to create an implicit dependency you must refer directly to the object that the dependency relates to. In your case, that means deriving the resource group name from the resource group object itself, rather than from the variable you'd used to configure that object:
resource "azurerm_container_group" "LukesContainer" {
name = "LukesContainer"
location = "UK South"
resource_group_name = azurerm_resource_group.example.name
# ...
}
With the configuration you included in your question, both the resource group and the container group depend on var.resource_group_name but there was no dependency between azurerm_container_group.LukesContainer and azurerm_resource_group.example, and so Terraform is therefore free to create those two objects in either order.
By deriving the container group's resource group name from the resource group object you tell Terraform that the resource group must be processed first, and then its results used to populate the container group.

Terraform - AKS Private Cloud | Infinite wait on helm relase

I am trying to create a Private Cloud on AKS with Terraform.
The public route seemd to work fine and I am putting in security stuff, step by step
After putting in Networks azurerm_virtual_network, azurerm_subnet it seems to hand my Helm Deployment
There are no logs, its just an infinite wait.
helm_release.ingress: Still creating... [11m0s elapsed] (this is a simple NGINX Ingress Controller)
resource "azurerm_virtual_network" "foo_network" {
name = "${var.prefix}-network"
location = azurerm_resource_group.foo_group.location
resource_group_name = azurerm_resource_group.foo_group.name
address_space = ["10.1.0.0/16"]
}
resource "azurerm_subnet" "internal" {
name = "internal"
virtual_network_name = azurerm_virtual_network.foo_network.name
resource_group_name = azurerm_resource_group.foo_group.name
address_prefixes = ["10.1.0.0/22"]
}
Any points on how should I debug this? Lack of logs is making it difficult to understand.
Complete Script
provider "azurerm" {
features {}
}
resource "azurerm_resource_group" "foo" {
name = "${var.prefix}-k8s-resources"
location = var.location
}
resource "azurerm_kubernetes_cluster" "foo" {
name = "${var.prefix}-k8s"
location = azurerm_resource_group.foo.location
resource_group_name = azurerm_resource_group.foo.name
dns_prefix = "${var.prefix}-k8s"
default_node_pool {
name = "system"
node_count = 1
vm_size = "Standard_D4s_v3"
}
identity {
type = "SystemAssigned"
}
addon_profile {
aci_connector_linux {
enabled = false
}
azure_policy {
enabled = false
}
http_application_routing {
enabled = false
}
kube_dashboard {
enabled = true
}
oms_agent {
enabled = false
}
}
}
provider "kubernetes" {
version = "~> 1.11.3"
load_config_file = false
host = azurerm_kubernetes_cluster.foo.kube_config.0.host
username = azurerm_kubernetes_cluster.foo.kube_config.0.username
password = azurerm_kubernetes_cluster.foo.kube_config.0.password
cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.foo.kube_config.0.cluster_ca_certificate)
}
provider "helm" {
# Use provider with Helm 3.x support
version = "~> 1.2.2"
}
resource "null_resource" "configure_kubectl" {
provisioner "local-exec" {
command = "az aks get-credentials --resource-group ${azurerm_resource_group.foo.name} --name ${azurerm_kubernetes_cluster.foo.name} --overwrite-existing"
environment = {
KUBECONFIG = ""
}
}
depends_on = [azurerm_kubernetes_cluster.foo]
}
resource "helm_release" "ingress" {
name = "ingress-foo"
repository = "https://kubernetes.github.io/ingress-nginx"
chart = "ingress-nginx"
timeout = 3000
depends_on = [null_resource.configure_kubectl]
}
The best way to debug this is to be able to kubectl into the AKS cluster. (AKS should have documentation on how to set up kubectl.)
Then, play around with kubectl get pods -A and see if anything jumps out as being wrong. Specifically, look for nginx-ingress pods that are not in a Running status.
If there are such pods, debug further with kubectl describe pod <pod_name> or kubectl logs -f <pod_name>, depending on whether the issue happens after the container has successfully started up or not.

Could not read output attribute from remote state datasource

I am new to terraform so I will attempt to explain with the best of my ability. Terraform will not read in the variable/output from the statefile and use that value in another file.
I have tried searching the internet for everything I could find to see if anyone how has had this problem and how they fixed it.
###vnet.tf
#Remote State pulling data from bastion resource group state
data "terraform_remote_state" "network" {
backend = "azurerm"
config = {
storage_account_name = "terraformstatetracking"
container_name = "bastionresourcegroups"
key = "terraform.terraformstate"
}
}
#creating virtual network and putting that network in resource group created by bastion.tf file
module "quannetwork" {
source = "Azure/network/azurerm"
resource_group_name = "data.terraform_remote_state.network.outputs.quan_netwk"
location = "centralus"
vnet_name = "quan"
address_space = "10.0.0.0/16"
subnet_prefixes = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
subnet_names = ["subnet1", "subnet2", "subnet3"]
tags = {
environment = "quan"
costcenter = "it"
}
}
terraform {
backend "azurerm" {
storage_account_name = "terraformstatetracking"
container_name = "quannetwork"
key = "terraform.terraformstate"
}
}
###resourcegroups.tf
# Create a resource group
#Bastion
resource "azurerm_resource_group" "cm" {
name = "${var.prefix}cm.RG"
location = "${var.location}"
tags = "${var.tags}"
}
#Bastion1
resource "azurerm_resource_group" "network" {
name = "${var.prefix}network.RG"
location = "${var.location}"
tags = "${var.tags}"
}
#bastion2
resource "azurerm_resource_group" "storage" {
name = "${var.prefix}storage.RG"
location = "${var.location}"
tags = "${var.tags}"
}
terraform {
backend "azurerm" {
storage_account_name = "terraformstatetracking"
container_name = "bastionresourcegroups"
key = "terraform.terraformstate"
}
}
###outputs.tf
output "quan_netwk" {
description = "Quan Network Resource Group"
value = "${azurerm_resource_group.network.id}"
}
When running the vnet.tf code it should read in the output from the outputs.tf which is stored in the azure backend storage account statefile file and use that value for the resource_group_name in the quannetwork module. Instead it creates a resource group named data.terraform_remote_state.network.outputs.quan_netwk. Any help would be greatly appreciated.
First, you need to input a string for the resource_group_name in your module quannetwork, not the resource group Id.
Second, if you want to quote something in the remote state, do not just put it in the Double quotes, the right format below:
resource_group_name = "${data.terraform_remote_state.network.outputs.quan_netwk}"

How Do I Avoid Repeating A Variable In Terraform?

Terraform doesn't allow you to interpolate variables within the variables file otherwise you get the error:
Error: Variables not allowed
on variables.tf line 9, in variable "resource_group_name": 9:
default = "${var.prefix}-terraform-dev_rg"
Variables may not be used here.
This then means I end up duplicating the value of the prefix in my variables.tf file when I try to create the name for the resource group.
Is there a nice way around this to avoid duplicating the value of the variable?
variables.tf
variable "prefix" {
description = "The prefix used for all resources in this plan"
default = "terraform-dev"
}
variable resource_group_name {
type = "string"
default = "terraform-dev_rg"
}
variable resource_group_location {
type = "string"
default = "eastus"
}
main.tf
# Configure the Microsoft Azure Provider
provider "azurerm" {
version = "=1.28.0"
}
# Create a resource group
resource "azurerm_resource_group" "resource-group" {
name = var.resource_group_name
location = var.resource_group_location
}
#Create an application gateway with web app firewall
module "firewall" {
source = "./firewall"
resource_group_name = var.resource_group_name
resource_group_location = var.resource_group_location
}
./firewall/variables.tf
#Passed down from the root variables.tf
variable "prefix" {}
variable "resource_group_name" {}
variable "resource_group_location" {}
./firewall/main.tf
# Create a virtual network for the firewall
resource "azurerm_virtual_network" "firewall-vnet" {
name = "${var.prefix}-waf-vnet"
address_space = ["10.0.0.0/16"]
resource_group_name = var.resource_group_name
location = var.resource_group_location
}
Try to use local values,
https://www.terraform.io/docs/configuration/locals.html
variable "prefix" {
description = "The prefix used for all resources in this plan"
default = "terraform-dev"
}
variable resource_group_location {
type = "string"
default = "eastus"
}
locals {
resource_group_name = "${var.prefix}_rg"
}
resource "azurerm_resource_group" "resource-group" {
name = local.resource_group_name
location = var.resource_group_location
}
Terraform does not support variables inside a variable.
If you want to generate a value based on two or more variables then you can try Terraform locals (https://www.terraform.io/docs/configuration/locals.html).
Locals should help you here to achieve goal.
something like
variables.tf
variable "prefix" {
description = "The prefix used for all resources in this plan"
default = "terraform-dev"
}
variable resource_group_location {
type = "string"
default = "eastus"
}
main.tf
locals {
resource_group_name = "${var.prefix}_rg"
}
# Configure the Microsoft Azure Provider
provider "azurerm" {
version = "=1.28.0"
}
# Create a resource group
resource "azurerm_resource_group" "resource-group" {
name = local.resource_group_name
location = var.resource_group_location
}
Hope this helps.
Please read similar discussion here -https://stackoverflow.com/questions/58841060/terraform-variables-within-variables/58841360?noredirect=1#comment129460631_58841360

Resources