Terraform - AKS Private Cloud | Infinite wait on helm relase - terraform

I am trying to create a Private Cloud on AKS with Terraform.
The public route seemd to work fine and I am putting in security stuff, step by step
After putting in Networks azurerm_virtual_network, azurerm_subnet it seems to hand my Helm Deployment
There are no logs, its just an infinite wait.
helm_release.ingress: Still creating... [11m0s elapsed] (this is a simple NGINX Ingress Controller)
resource "azurerm_virtual_network" "foo_network" {
name = "${var.prefix}-network"
location = azurerm_resource_group.foo_group.location
resource_group_name = azurerm_resource_group.foo_group.name
address_space = ["10.1.0.0/16"]
}
resource "azurerm_subnet" "internal" {
name = "internal"
virtual_network_name = azurerm_virtual_network.foo_network.name
resource_group_name = azurerm_resource_group.foo_group.name
address_prefixes = ["10.1.0.0/22"]
}
Any points on how should I debug this? Lack of logs is making it difficult to understand.
Complete Script
provider "azurerm" {
features {}
}
resource "azurerm_resource_group" "foo" {
name = "${var.prefix}-k8s-resources"
location = var.location
}
resource "azurerm_kubernetes_cluster" "foo" {
name = "${var.prefix}-k8s"
location = azurerm_resource_group.foo.location
resource_group_name = azurerm_resource_group.foo.name
dns_prefix = "${var.prefix}-k8s"
default_node_pool {
name = "system"
node_count = 1
vm_size = "Standard_D4s_v3"
}
identity {
type = "SystemAssigned"
}
addon_profile {
aci_connector_linux {
enabled = false
}
azure_policy {
enabled = false
}
http_application_routing {
enabled = false
}
kube_dashboard {
enabled = true
}
oms_agent {
enabled = false
}
}
}
provider "kubernetes" {
version = "~> 1.11.3"
load_config_file = false
host = azurerm_kubernetes_cluster.foo.kube_config.0.host
username = azurerm_kubernetes_cluster.foo.kube_config.0.username
password = azurerm_kubernetes_cluster.foo.kube_config.0.password
cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.foo.kube_config.0.cluster_ca_certificate)
}
provider "helm" {
# Use provider with Helm 3.x support
version = "~> 1.2.2"
}
resource "null_resource" "configure_kubectl" {
provisioner "local-exec" {
command = "az aks get-credentials --resource-group ${azurerm_resource_group.foo.name} --name ${azurerm_kubernetes_cluster.foo.name} --overwrite-existing"
environment = {
KUBECONFIG = ""
}
}
depends_on = [azurerm_kubernetes_cluster.foo]
}
resource "helm_release" "ingress" {
name = "ingress-foo"
repository = "https://kubernetes.github.io/ingress-nginx"
chart = "ingress-nginx"
timeout = 3000
depends_on = [null_resource.configure_kubectl]
}

The best way to debug this is to be able to kubectl into the AKS cluster. (AKS should have documentation on how to set up kubectl.)
Then, play around with kubectl get pods -A and see if anything jumps out as being wrong. Specifically, look for nginx-ingress pods that are not in a Running status.
If there are such pods, debug further with kubectl describe pod <pod_name> or kubectl logs -f <pod_name>, depending on whether the issue happens after the container has successfully started up or not.

Related

Terraform force recreate azure web app when scale up the sku of service plan

i've this terraform code :
# Configure the Azure provider
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 2.65"
}
}
required_version = ">= 0.14.7"
}
provider "azurerm" {
features {}
}
# Generate a random integer to create a globally unique name
resource "random_integer" "ri" {
min = 10000
max = 99999
}
# Create the resource group
resource "azurerm_resource_group" "rg" {
name = "myResourceGroup-${random_integer.ri.result}"
location = "eastus"
}
# Create the Linux App Service Plan
resource "azurerm_app_service_plan" "appserviceplan" {
name = "webapp-asp-${random_integer.ri.result}"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
sku {
tier = "Free"
size = "F1"
}
}
# Create the web app, pass in the App Service Plan ID, and deploy code from a public GitHub repo
resource "azurerm_app_service" "webapp" {
name = "webapp-${random_integer.ri.result}"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
app_service_plan_id = azurerm_app_service_plan.appserviceplan.id
source_control {
repo_url = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
branch = "master"
manual_integration = true
use_mercurial = false
}
}
This code work as exepected.
now i would try to scale up the SKU of the service plan (to Standard /S1 ), Terraform mark my web app as tainted and say that my web app should be replaced
i try to use the meta argment create_before_destroy in the azure plan definition like this :
resource "azurerm_app_service_plan" "appserviceplan" {
# ...
lifecycle {
create_before_destroy = true
}
}
but it's always ask to recreate the web app
Can someone have a idea about that ?

Terraform: Error when creating azure kubernetes service with local_account_disabled=true

An error occurs when I try to create a AKS with Terraform. The AKS was created but the error still comes at the end, which is ugly.
│ Error: retrieving Access Profile for Cluster: (Managed Cluster Name
"aks-1" / Resource Group "pengine-aks-rg"):
containerservice.ManagedClustersClient#GetAccessProfile: Failure responding to request:
StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400
Code="BadRequest" Message="Getting static credential is not allowed because this cluster
is set to disable local accounts."
This is my terraform code:
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=2.96.0"
}
}
}
resource "azurerm_resource_group" "aks-rg" {
name = "aks-rg"
location = "West Europe"
}
resource "azurerm_kubernetes_cluster" "aks-1" {
name = "aks-1"
location = azurerm_resource_group.aks-rg.location
resource_group_name = azurerm_resource_group.aks-rg.name
dns_prefix = "aks1"
local_account_disabled = "true"
default_node_pool {
name = "nodepool1"
node_count = 3
vm_size = "Standard_D2_v2"
}
identity {
type = "SystemAssigned"
}
tags = {
Environment = "Test"
}
}
Is this a Terraform bug? Can I avoid the error?
If you disable local accounts you need to activate AKS-managed Azure Active Directory integration as you have no more local accounts to authenticate against AKS.
This example enables RBAC, Azure AAD & Azure RBAC:
resource "azurerm_kubernetes_cluster" "aks-1" {
...
role_based_access_control {
enabled = true
azure_active_directory {
managed = true
tenant_id = data.azurerm_client_config.current.tenant_id
admin_group_object_ids = ["OBJECT_IDS_OF_ADMIN_GROUPS"]
azure_rbac_enabled = true
}
}
}
If you dont want AAD integration you need set local_account_disabled = "false".

Helm package deployment using Terraform

I have setup my Azure Kubernetes Cluster using Terraform and it working good.
I trying to deploy packages using Helm but not able to deploy getting below error.
Error: chart "stable/nginx-ingress" not found in https://kubernetes-charts.storage.googleapis.com repository
Note: I tried other packages as well my not able to deploy using "Terraform Resource" below is Terraform code. I tried local helm package using helm command and it works. I think the issue with Terraform helm resources. "nginx" is a sample package not able to deploy any package using Terraform.
resource "azurerm_kubernetes_cluster" "k8s" {
name = var.aks_cluster_name
location = var.location
resource_group_name = var.resource_group_name
dns_prefix = var.aks_dns_prefix
kubernetes_version = "1.19.0"
# private_cluster_enabled = true
linux_profile {
admin_username = var.aks_admin_username
ssh_key {
key_data = var.aks_ssh_public_key
}
}
default_node_pool {
name = var.aks_node_pool_name
enable_auto_scaling = true
node_count = var.aks_agent_count
min_count = var.aks_min_agent_count
max_count = var.aks_max_agent_count
vm_size = var.aks_node_pool_vm_size
}
service_principal {
client_id = var.client_id
client_secret = var.client_secret
}
# tags = data.azurerm_resource_group.rg.tags
}
provider "helm" {
version = "1.3.2"
kubernetes {
host = azurerm_kubernetes_cluster.k8s.kube_config[0].host
client_key = base64decode(azurerm_kubernetes_cluster.k8s.kube_config[0].client_key)
client_certificate = base64decode(azurerm_kubernetes_cluster.k8s.kube_config[0].client_certificate)
cluster_ca_certificate = base64decode(azurerm_kubernetes_cluster.k8s.kube_config[0].cluster_ca_certificate)
load_config_file = false
}
}
resource "helm_release" "nginx-ingress" {
name = "nginx-ingress-internal"
repository = "https://kubernetes-charts.storage.googleapis.com"
chart = "stable/nginx-ingress"
set {
name = "rbac.create"
value = "true"
}
}
You should skip stable in the chart name: it is a repository name but you have no helm repositories defined. Your resource should look like:
resource "helm_release" "nginx-ingress" {
name = "nginx-ingress-internal"
repository = "https://kubernetes-charts.storage.googleapis.com"
chart = "nginx-ingress"
...
}
which is an equivalent to the helm command:
helm install nginx-ingress-internal nginx-ingress --repo https://kubernetes-charts.storage.googleapis.com
Alternatively you can define repositories with the repository_config_path.

Azure AKS Terraform - how to specify VM Size

In my Azure subscription, I'm trying to create an AKS cluster using Terraform.
My main.tf looks like this:
## Azure resource provider ##
provider "azurerm" {
version = "=1.36.1"
}
## Azure resource group for the kubernetes cluster ##
resource "azurerm_resource_group" "aks_demo" {
name = var.resource_group_name
location = var.location
}
## AKS kubernetes cluster ##
resource "azurerm_kubernetes_cluster" "aks_demo" {
name = var.cluster_name
resource_group_name = azurerm_resource_group.aks_demo.name
location = azurerm_resource_group.aks_demo.location
dns_prefix = var.dns_prefix
linux_profile {
admin_username = var.admin_username
## SSH key is generated using "tls_private_key" resource
ssh_key {
key_data = "${trimspace(tls_private_key.key.public_key_openssh)} ${var.admin_username}#azure.com"
}
}
agent_pool_profile {
name = "default"
count = var.agent_count
vm_size = "Standard_D2"
os_type = "Linux"
os_disk_size_gb = 30
}
service_principal {
client_id = var.client_id
client_secret = var.client_secret
}
tags = {
Environment = "Production"
}
}
## Private key for the kubernetes cluster ##
resource "tls_private_key" "key" {
algorithm = "RSA"
}
## Save the private key in the local workspace ##
resource "null_resource" "save-key" {
triggers = {
key = tls_private_key.key.private_key_pem
}
provisioner "local-exec" {
command = <<EOF
mkdir -p ${path.module}/.ssh
echo "${tls_private_key.key.private_key_pem}" > ${path.module}/.ssh/id_rsa
chmod 0600 ${path.module}/.ssh/id_rsa
EOF
}
}
## Outputs ##
# Example attributes available for output
output "id" {
value = "${azurerm_kubernetes_cluster.aks_demo.id}"
}
output "client_key" {
value = "${azurerm_kubernetes_cluster.aks_demo.kube_config.0.client_key}"
}
output "client_certificate" {
value = "${azurerm_kubernetes_cluster.aks_demo.kube_config.0.client_certificate}"
}
output "cluster_ca_certificate" {
value = "${azurerm_kubernetes_cluster.aks_demo.kube_config.0.cluster_ca_certificate}"
}
output "kube_config" {
value = azurerm_kubernetes_cluster.aks_demo.kube_config_raw
}
output "host" {
value = azurerm_kubernetes_cluster.aks_demo.kube_config.0.host
}
output "configure" {
value = <<CONFIGURE
Run the following commands to configure kubernetes client:
$ terraform output kube_config > ~/.kube/aksconfig
$ export KUBECONFIG=~/.kube/aksconfig
Test configuration using kubectl
$ kubectl get nodes
CONFIGURE
}
My variables.tf looks like this:
## Azure config variables ##
variable "client_id" {}
variable "client_secret" {}
variable location {
default = "Central US"
}
## Resource group variables ##
variable resource_group_name {
default = "aksdemo-rg"
}
## AKS kubernetes cluster variables ##
variable cluster_name {
default = "aksdemo1"
}
variable "vm_size" {
default = "Standard_A0"
}
variable "agent_count" {
default = 3
}
variable "dns_prefix" {
default = "aksdemo"
}
variable "admin_username" {
default = "demo"
}
When I run terraform apply, I get this error:
Error: Error creating Managed Kubernetes Cluster "aksdemo1" (Resource Group "aksdemo-rg"):
containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 --
Original Error: Code="BadRequest"
Message="The VM size of AgentPoolProfile:default is not allowed in your subscription in location 'centralus'. The available VM sizes are Standard_A2,Standard_A2_v2,Standard_A2m_v2,Standard_A3,Standard_A4,Standard_A4_v2,Standard_A4m_v2,
Standard_A5,Standard_A6,Standard_A7,Standard_A8_v2,Standard_A8m_v2,Standard_B12ms,Standard_B16ms,Standard_B20ms,Standard_B2ms,Standard_B2s,Standard_B4ms,Standard_B8ms,Standard_D11_v2,Standard_D12_v2,
Standard_D13_v2,Standard_D14_v2,Standard_D15_v2,Standard_D16_v3,Standard_D16s_v3,Standard_D1_v2,Standard_D2_v2,Standard_D2_v3,Standard_D2s_v3,Standard_D32_v3,Standard_D32s_v3,Standard_D3_v2,Standard_D48_v3,
Standard_D48s_v3,Standard_D4_v2,Standard_D4_v3,Standard_D4s_v3,Standard_D5_v2,Standard_D64_v3,Standard_D64s_v3,Standard_D8_v3,Standard_D8s_v3,Standard_DS1,Standard_DS11,Standard_DS11_v2,Standard_DS12,Standard_DS12_v2,Standard_DS13,Standard_DS13-2_v2,Standard_DS13-4_v2,Standard_DS13_v2,Standard_DS14,Standard_DS14-4_v2,Standard_DS14-8_v2,Standard_DS14_v2,Standard_DS15_v2,Standard_DS1_v2,Standard_DS2,Standard_DS2_v2,Standard_DS3,Standard_DS3_v2,Standard_DS4,Standard_DS4_v2,Standard_DS5_v2,Standard_E16_v3,Standard_E16s_v3,Standard_E2_v3,Standard_E2s_v3,Standard_E32-16s_v3,Standard_E32-8s_v3,Standard_E32_v3,Standard_E32s_v3,Standard_E48_v3,Standard_E48s_v3,Standard_E4_v3,Standard_E4s_v3,Standard_E64-16s_v3,Standard_E64-32s_v3,Standard_E64_v3,Standard_E64i_v3,Standard_E64is_v3,Standard_E64s_v3,Standard_E8_v3,Standard_E8s_v3,Standard_F16,Standard_F16s,Standard_F16s_v2,Standard_F2,Standard_F2s,Standard_F2s_v2,Standard_F32s_v2,Standard_F4,Standard_F48s_v2,Standard_F4s,Standard_F4s_v2,Standard_F64s_v2,Standard_F72s_v2,Standard_F8,
Standard_F8s,Standard_F8s_v2
For more details, please visit https://aka.ms/cpu-quota"
This is confusing to me, as there is clearly a variable named vm_size
What can I change in order for this to work?
As I see from the code you provided and the error you got, you made the mistake in the code.
What the code you made:
agent_pool_profile {
name = "default"
count = var.agent_count
vm_size = "Standard_D2"
os_type = "Linux"
os_disk_size_gb = 30
}
It should be like this when you use the variable for the VM size:
agent_pool_profile {
name = "default"
count = var.agent_count
vm_size = var.vm_size
os_type = "Linux"
os_disk_size_gb = 30
}
And the VM size should be an appropriate one on yourself and for the requirements. For example, just like it shows in the Terraform example.
The error message is telling you that you're trying to use a VM size, or VM type if you will that's not available for your subscription in that location, it's also giving you all the VM sizes you can choose from.
Note that you have probably copy pasted this:
agent_pool_profile {
name = "default"
count = var.agent_count
vm_size = "Standard_D2"
os_type = "Linux"
os_disk_size_gb = 30
}
The VM size is hardcoded there, so you're default Standard_A0value is not being picked up.
You have more than one way to debug here, first I'd start by making sure the right value is being used, and second change the VM type to see if that works.

How to configure an Azure app service to pull images from an ACR with terraform?

I have the following terraform module to setup app services under the same plan:
provider "azurerm" {
}
variable "env" {
type = string
description = "The SDLC environment (qa, dev, prod, etc...)"
}
variable "appsvc_names" {
type = list(string)
description = "The names of the app services to create under the same app service plan"
}
locals {
location = "eastus2"
resource_group_name = "app505-dfpg-${var.env}-web-${local.location}"
acr_name = "app505dfpgnedeploycr88836"
}
resource "azurerm_app_service_plan" "asp" {
name = "${local.resource_group_name}-asp"
location = local.location
resource_group_name = local.resource_group_name
kind = "Linux"
reserved = true
sku {
tier = "Basic"
size = "B1"
}
}
resource "azurerm_app_service" "appsvc" {
for_each = toset(var.appsvc_names)
name = "${local.resource_group_name}-${each.value}-appsvc"
location = local.location
resource_group_name = local.resource_group_name
app_service_plan_id = azurerm_app_service_plan.asp.id
site_config {
linux_fx_version = "DOCKER|${local.acr_name}/${each.value}:latest"
}
app_settings = {
DOCKER_REGISTRY_SERVER_URL = "https://${local.acr_name}.azurecr.io"
}
}
output "hostnames" {
value = {
for appsvc in azurerm_app_service.appsvc: appsvc.name => appsvc.default_site_hostname
}
}
I am invoking it through the following configuration:
terraform {
backend "azurerm" {
}
}
locals {
appsvc_names = ["gateway"]
}
module "web" {
source = "../../modules/web"
env = "qa"
appsvc_names = local.appsvc_names
}
output "hostnames" {
description = "The hostnames of the created app services"
value = module.web.hostnames
}
The container registry has the images I need:
C:\> az acr login --name app505dfpgnedeploycr88836
Login Succeeded
C:\> az acr repository list --name app505dfpgnedeploycr88836
[
"gateway"
]
C:\> az acr repository show-tags --name app505dfpgnedeploycr88836 --repository gateway
[
"latest"
]
C:\>
When I apply the terraform configuration everything is created fine, but inspecting the created app service resource in Azure Portal reveals that its Container Settings show no docker image:
Now, I can manually switch to another ACR and then back to the one I want only to get this:
Cannot perform credential operations for /subscriptions/0f1c414a-a389-47df-aab8-a351876ecd47/resourceGroups/app505-dfpg-ne-deploy-eastus2/providers/Microsoft.ContainerRegistry/registries/app505dfpgnedeploycr88836 as admin user is disabled. Kindly enable admin user as per docs: https://learn.microsoft.com/en-us/azure/container-registry/container-registry-authentication#admin-account
This is confusing me. According to https://learn.microsoft.com/en-us/azure/container-registry/container-registry-authentication#admin-account the admin user should not be used and so my ACR does not have one. On the other hand, I understand that I need somehow configure the app service to authenticate with the ACR.
What is the right way to do it then?
So this is now possible since the v2.71 version of the Azure RM provider. A couple of things have to happen...
Assign a Managed Identity to the application (can also use User Assigned but a bit more work)
Set the site_config.acr_use_managed_identity_credentials property to true
Grant the application's identity ACRPull rights on the container.
Below is a modified version of the code above, not tested but should be ok
provider "azurerm" {
}
variable "env" {
type = string
description = "The SDLC environment (qa, dev, prod, etc...)"
}
variable "appsvc_names" {
type = list(string)
description = "The names of the app services to create under the same app service plan"
}
locals {
location = "eastus2"
resource_group_name = "app505-dfpg-${var.env}-web-${local.location}"
acr_name = "app505dfpgnedeploycr88836"
}
resource "azurerm_app_service_plan" "asp" {
name = "${local.resource_group_name}-asp"
location = local.location
resource_group_name = local.resource_group_name
kind = "Linux"
reserved = true
sku {
tier = "Basic"
size = "B1"
}
}
resource "azurerm_app_service" "appsvc" {
for_each = toset(var.appsvc_names)
name = "${local.resource_group_name}-${each.value}-appsvc"
location = local.location
resource_group_name = local.resource_group_name
app_service_plan_id = azurerm_app_service_plan.asp.id
site_config {
linux_fx_version = "DOCKER|${local.acr_name}/${each.value}:latest"
acr_use_managed_identity_credentials = true
}
app_settings = {
DOCKER_REGISTRY_SERVER_URL = "https://${local.acr_name}.azurecr.io"
}
identity {
type = "SystemAssigned"
}
}
data "azurerm_container_registry" "this" {
name = local.acr_name
resource_group_name = local.resource_group_name
}
resource "azurerm_role_assignment" "acr" {
for_each = azurerm_app_service.appsvc
role_definition_name = "AcrPull"
scope = azurerm_container_registry.this.id
principal_id = each.value.identity[0].principal_id
}
output "hostnames" {
value = {
for appsvc in azurerm_app_service.appsvc: appsvc.name => appsvc.default_site_hostname
}
}
EDITED 21 Dec 2021
The MS documentation issue regarding the value being reset by Azure has now been resolved and you can also control Managed Identity via the portal.
So you can use service principal auth with App Service, but you'd have to create service principal grant it ACRpull permissions over the registry and use service principal login\password in App Service site_config
DOCKER_REGISTRY_SERVER_USERNAME
DOCKER_REGISTRY_SERVER_PASSWORD

Resources