Question and details
How can I allow a Kubernetes cluster in Azure to talk to an Azure Container Registry via terraform?
I want to load custom images from my Azure Container Registry. Unfortunately, I encounter a permissions error at the point where Kubernetes is supposed to download the image from the ACR.
What I have tried so far
My experiments without terraform (az cli)
It all works perfectly after I attach the acr to the aks via az cli:
az aks update -n myAKSCluster -g myResourceGroup --attach-acr <acrName>
My experiments with terraform
This is my terraform configuration; I have stripped some other stuff out. It works in itself.
terraform {
backend "azurerm" {
resource_group_name = "tf-state"
storage_account_name = "devopstfstate"
container_name = "tfstatetest"
key = "prod.terraform.tfstatetest"
}
}
provider "azurerm" {
}
provider "azuread" {
}
provider "random" {
}
# define the password
resource "random_string" "password" {
length = 32
special = true
}
# define the resource group
resource "azurerm_resource_group" "rg" {
name = "myrg"
location = "eastus2"
}
# define the app
resource "azuread_application" "tfapp" {
name = "mytfapp"
}
# define the service principal
resource "azuread_service_principal" "tfapp" {
application_id = azuread_application.tfapp.application_id
}
# define the service principal password
resource "azuread_service_principal_password" "tfapp" {
service_principal_id = azuread_service_principal.tfapp.id
end_date = "2020-12-31T09:00:00Z"
value = random_string.password.result
}
# define the container registry
resource "azurerm_container_registry" "acr" {
name = "mycontainerregistry2387987222"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
sku = "Basic"
admin_enabled = false
}
# define the kubernetes cluster
resource "azurerm_kubernetes_cluster" "mycluster" {
name = "myaks"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
dns_prefix = "mycluster"
network_profile {
network_plugin = "azure"
}
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_B2s"
}
# Use the service principal created above
service_principal {
client_id = azuread_service_principal.tfapp.application_id
client_secret = azuread_service_principal_password.tfapp.value
}
tags = {
Environment = "demo"
}
windows_profile {
admin_username = "dingding"
admin_password = random_string.password.result
}
}
# define the windows node pool for kubernetes
resource "azurerm_kubernetes_cluster_node_pool" "winpool" {
name = "winp"
kubernetes_cluster_id = azurerm_kubernetes_cluster.mycluster.id
vm_size = "Standard_B2s"
node_count = 1
os_type = "Windows"
}
# define the kubernetes name space
resource "kubernetes_namespace" "namesp" {
metadata {
name = "namesp"
}
}
# Try to give permissions, to let the AKR access the ACR
resource "azurerm_role_assignment" "acrpull_role" {
scope = azurerm_container_registry.acr.id
role_definition_name = "AcrPull"
principal_id = azuread_service_principal.tfapp.object_id
skip_service_principal_aad_check = true
}
This code is adapted from https://github.com/terraform-providers/terraform-provider-azuread/issues/104.
Unfortunately, when I launch a container inside the kubernetes cluster, I receive an error message:
Failed to pull image "mycontainerregistry.azurecr.io/myunittests": [rpc error: code = Unknown desc = Error response from daemon: manifest for mycontainerregistry.azurecr.io/myunittests:latest not found: manifest unknown: manifest unknown, rpc error: code = Unknown desc = Error response from daemon: Get https://mycontainerregistry.azurecr.io/v2/myunittests/manifests/latest: unauthorized: authentication required]
Update / note:
When I run terraform apply with the above code, the creation of resources is interrupted:
azurerm_container_registry.acr: Creation complete after 18s [id=/subscriptions/000/resourceGroups/myrg/providers/Microsoft.ContainerRegistry/registries/mycontainerregistry2387987222]
azurerm_role_assignment.acrpull_role: Creating...
azuread_service_principal_password.tfapp: Still creating... [10s elapsed]
azuread_service_principal_password.tfapp: Creation complete after 12s [id=000/000]
azurerm_kubernetes_cluster.mycluster: Creating...
azurerm_role_assignment.acrpull_role: Creation complete after 8s [id=/subscriptions/000/resourceGroups/myrg/providers/Microsoft.ContainerRegistry/registries/mycontainerregistry2387987222/providers/Microsoft.Authorization/roleAssignments/000]
azurerm_kubernetes_cluster.mycluster: Still creating... [10s elapsed]
Error: Error creating Managed Kubernetes Cluster "myaks" (Resource Group "myrg"): containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="ServicePrincipalNotFound" Message="Service principal clientID: 000 not found in Active Directory tenant 000, Please see https://aka.ms/aks-sp-help for more details."
on test.tf line 56, in resource "azurerm_kubernetes_cluster" "mycluster":
56: resource "azurerm_kubernetes_cluster" "mycluster" {
I think, however, that this is just because it takes a few minutes for the service principal to be created. When I run terraform apply again a few minutes later, it goes beyond that point without issues.
(I did up the answer above)
Just adding a simpler way where you don't need to create a service principal for anyone else that might need it.
resource "azurerm_kubernetes_cluster" "kubweb" {
name = local.cluster_web
location = local.rgloc
resource_group_name = local.rgname
dns_prefix = "${local.cluster_web}-dns"
kubernetes_version = local.kubversion
# used to group all the internal objects of this cluster
node_resource_group = "${local.cluster_web}-rg-node"
# azure will assign the id automatically
identity {
type = "SystemAssigned"
}
default_node_pool {
name = "nodepool1"
node_count = 4
vm_size = local.vm_size
orchestrator_version = local.kubversion
}
role_based_access_control {
enabled = true
}
addon_profile {
kube_dashboard {
enabled = true
}
}
tags = {
environment = local.env
}
}
resource "azurerm_container_registry" "acr" {
name = "acr1"
resource_group_name = local.rgname
location = local.rgloc
sku = "Standard"
admin_enabled = true
tags = {
environment = local.env
}
}
# add the role to the identity the kubernetes cluster was assigned
resource "azurerm_role_assignment" "kubweb_to_acr" {
scope = azurerm_container_registry.acr.id
role_definition_name = "AcrPull"
principal_id = azurerm_kubernetes_cluster.kubweb.kubelet_identity[0].object_id
}
This code worked for me.
resource "azuread_application" "aks_sp" {
name = "sp-aks-${local.cluster_name}"
}
resource "azuread_service_principal" "aks_sp" {
application_id = azuread_application.aks_sp.application_id
app_role_assignment_required = false
}
resource "azuread_service_principal_password" "aks_sp" {
service_principal_id = azuread_service_principal.aks_sp.id
value = random_string.aks_sp_password.result
end_date_relative = "8760h" # 1 year
lifecycle {
ignore_changes = [
value,
end_date_relative
]
}
}
resource "azuread_application_password" "aks_sp" {
application_object_id = azuread_application.aks_sp.id
value = random_string.aks_sp_secret.result
end_date_relative = "8760h" # 1 year
lifecycle {
ignore_changes = [
value,
end_date_relative
]
}
}
data "azurerm_container_registry" "pyp" {
name = var.container_registry_name
resource_group_name = var.container_registry_resource_group_name
}
resource "azurerm_role_assignment" "aks_sp_container_registry" {
scope = data.azurerm_container_registry.pyp.id
role_definition_name = "AcrPull"
principal_id = azuread_service_principal.aks_sp.object_id
}
# requires Azure Provider 1.37+
resource "azurerm_kubernetes_cluster" "pyp" {
name = local.cluster_name
location = azurerm_resource_group.pyp.location
resource_group_name = azurerm_resource_group.pyp.name
dns_prefix = local.env_name_nosymbols
kubernetes_version = local.kubernetes_version
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_D2s_v3"
os_disk_size_gb = 80
}
windows_profile {
admin_username = "winadm"
admin_password = random_string.windows_profile_password.result
}
network_profile {
network_plugin = "azure"
dns_service_ip = cidrhost(local.service_cidr, 10)
docker_bridge_cidr = "172.17.0.1/16"
service_cidr = local.service_cidr
load_balancer_sku = "standard"
}
service_principal {
client_id = azuread_service_principal.aks_sp.application_id
client_secret = random_string.aks_sp_password.result
}
addon_profile {
oms_agent {
enabled = true
log_analytics_workspace_id = azurerm_log_analytics_workspace.pyp.id
}
}
tags = local.tags
}
source https://github.com/giuliov/pipeline-your-pipelines/tree/master/src/kubernetes/terraform
Just want to go into more depth as this was something I struggled with as-well.
The recommended approach is to use Managed Identities instead of Service Principal due to less overhead.
Create a Container Registry:
resource "azurerm_container_registry" "acr" {
name = "acr"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
sku = "Standard"
admin_enabled = false
}
Create a AKS Cluster, the code below creates the AKS Cluster with 2 Identities:
A System Assigned Identity which is assigned to the Control Plane.
A User Assigned Managed Identity is also automatically created and assigned to the Kubelet, notice I have no specific code for that as it happens automatically.
The Kubelet is the process which goes to the Container Registry to pull the image, thus we need to make sure this User Assigned Managed Identity has the AcrPull Role on the Container Registry.
resource "azurerm_kubernetes_cluster" "aks" {
name = "aks"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
dns_prefix = "aks"
node_resource_group = "aks-node"
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_Ds2_v2"
enable_auto_scaling = false
type = "VirtualMachineScaleSets"
vnet_subnet_id = azurerm_subnet.aks_subnet.id
max_pods = 50
}
network_profile {
network_plugin = "azure"
load_balancer_sku = "Standard"
}
identity {
type = "SystemAssigned"
}
}
Create Role Assignment mentioned above to allow the User Assigned Managed Identity to Pull from the Container Registry.
resource "azurerm_role_assignment" "ra" {
principal_id = azurerm_kubernetes_cluster.aks.kubelet_identity[0].object_id
role_definition_name = "AcrPull"
scope = azurerm_container_registry.acr.id
skip_service_principal_aad_check = true
}
Hope that clears things up for you, as I have seen some confusion on the internet about the two identities created.
source: https://jimferrari.com/2022/02/09/attach-azure-container-registry-to-azure-kubernetes-service-terraform/
The Terraform documentation for the Azure Container Registry resource now keeps track of this, which should always be up to date.
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/container_registry#example-usage-attaching-a-container-registry-to-a-kubernetes-cluster
resource "azurerm_resource_group" "example" {
name = "example-resources"
location = "West Europe"
}
resource "azurerm_container_registry" "example" {
name = "containerRegistry1"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
}
resource "azurerm_kubernetes_cluster" "example" {
name = "example-aks1"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
dns_prefix = "exampleaks1"
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_D2_v2"
}
identity {
type = "SystemAssigned"
}
tags = {
Environment = "Production"
}
}
resource "azurerm_role_assignment" "example" {
principal_id = azurerm_kubernetes_cluster.example.kubelet_identity[0].object_id
role_definition_name = "AcrPull"
scope = azurerm_container_registry.example.id
skip_service_principal_aad_check = true
}
Related
I've followed the Microsoft article and created a Kubernetes cluster in Azure. Below is the complete terraform code that I have used to built the Kubernetes cluster along with the AppGateway.
variable "resource_group_name_prefix" {
default = "rg"
description = "Prefix of the resource group name that's combined with a random ID so name is unique in your Azure subscription."
}
variable "resource_group_location" {
default = "eastus"
description = "Location of the resource group."
}
variable "virtual_network_name" {
description = "Virtual network name"
default = "aksVirtualNetwork"
}
variable "virtual_network_address_prefix" {
description = "VNET address prefix"
default = "192.168.0.0/16"
}
variable "aks_subnet_name" {
description = "Subnet Name."
default = "kubesubnet"
}
variable "aks_subnet_address_prefix" {
description = "Subnet address prefix."
default = "192.168.0.0/24"
}
variable "app_gateway_subnet_address_prefix" {
description = "Subnet server IP address."
default = "192.168.1.0/24"
}
variable "app_gateway_name" {
description = "Name of the Application Gateway"
default = "ApplicationGateway1"
}
variable "app_gateway_sku" {
description = "Name of the Application Gateway SKU"
default = "Standard_v2"
}
variable "app_gateway_tier" {
description = "Tier of the Application Gateway tier"
default = "Standard_v2"
}
variable "aks_name" {
description = "AKS cluster name"
default = "aks-cluster1"
}
variable "aks_dns_prefix" {
description = "Optional DNS prefix to use with hosted Kubernetes API server FQDN."
default = "aks"
}
variable "aks_agent_os_disk_size" {
description = "Disk size (in GB) to provision for each of the agent pool nodes. This value ranges from 0 to 1023. Specifying 0 applies the default disk size for that agentVMSize."
default = 40
}
variable "aks_agent_count" {
description = "The number of agent nodes for the cluster."
default = 1
}
variable "aks_agent_vm_size" {
description = "VM size"
default = "Standard_B8ms"
}
variable "aks_service_cidr" {
description = "CIDR notation IP range from which to assign service cluster IPs"
default = "10.0.0.0/16"
}
variable "aks_dns_service_ip" {
description = "DNS server IP address"
default = "10.0.0.10"
}
variable "aks_docker_bridge_cidr" {
description = "CIDR notation IP for Docker bridge."
default = "172.17.0.1/16"
}
variable "aks_enable_rbac" {
description = "Enable RBAC on the AKS cluster. Defaults to false."
default = "false"
}
variable "vm_user_name" {
description = "User name for the VM"
default = "vmuser1"
}
variable "public_ssh_key_path" {
description = "Public key path for SSH."
default = "./keys/id_rsa.pub"
}
variable "tags" {
type = map(string)
default = {
source = "terraform"
}
}
# Locals block for hardcoded names
locals {
backend_address_pool_name = "${azurerm_virtual_network.test.name}-beap"
frontend_port_name = "${azurerm_virtual_network.test.name}-feport"
frontend_ip_configuration_name = "${azurerm_virtual_network.test.name}-feip"
http_setting_name = "${azurerm_virtual_network.test.name}-be-htst"
listener_name = "${azurerm_virtual_network.test.name}-httplstn"
request_routing_rule_name = "${azurerm_virtual_network.test.name}-rqrt"
app_gateway_subnet_name = "appgwsubnet"
subscription_id = "<subscription_id>"
tenant_id = "<tenant_id>"
client_id = "<client_id>"
client_secret = "<client_secret>"
client_objectid = "<client_objectid>"
}
terraform {
required_version = ">=0.12"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~>2.0"
}
}
}
provider "azurerm" {
subscription_id = local.subscription_id
tenant_id = local.tenant_id
client_id = local.client_id
client_secret = local.client_secret
features {}
}
resource "random_pet" "rg-name" {
prefix = var.resource_group_name_prefix
}
resource "azurerm_resource_group" "rg" {
name = random_pet.rg-name.id
location = var.resource_group_location
}
# User Assigned Identities
resource "azurerm_user_assigned_identity" "testIdentity" {
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
name = "identity1"
tags = var.tags
}
resource "azurerm_virtual_network" "test" {
name = var.virtual_network_name
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
address_space = [var.virtual_network_address_prefix]
subnet {
name = var.aks_subnet_name
address_prefix = var.aks_subnet_address_prefix
}
subnet {
name = "appgwsubnet"
address_prefix = var.app_gateway_subnet_address_prefix
}
tags = var.tags
}
data "azurerm_subnet" "kubesubnet" {
name = var.aks_subnet_name
virtual_network_name = azurerm_virtual_network.test.name
resource_group_name = azurerm_resource_group.rg.name
depends_on = [azurerm_virtual_network.test]
}
data "azurerm_subnet" "appgwsubnet" {
name = "appgwsubnet"
virtual_network_name = azurerm_virtual_network.test.name
resource_group_name = azurerm_resource_group.rg.name
depends_on = [azurerm_virtual_network.test]
}
# Public Ip
resource "azurerm_public_ip" "test" {
name = "publicIp1"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
allocation_method = "Static"
sku = "Standard"
tags = var.tags
}
resource "azurerm_application_gateway" "network" {
name = var.app_gateway_name
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
sku {
name = var.app_gateway_sku
tier = "Standard_v2"
capacity = 2
}
gateway_ip_configuration {
name = "appGatewayIpConfig"
subnet_id = data.azurerm_subnet.appgwsubnet.id
}
frontend_port {
name = local.frontend_port_name
port = 80
}
frontend_port {
name = "httpsPort"
port = 443
}
frontend_ip_configuration {
name = local.frontend_ip_configuration_name
public_ip_address_id = azurerm_public_ip.test.id
}
backend_address_pool {
name = local.backend_address_pool_name
}
backend_http_settings {
name = local.http_setting_name
cookie_based_affinity = "Disabled"
port = 80
protocol = "Http"
request_timeout = 1
}
http_listener {
name = local.listener_name
frontend_ip_configuration_name = local.frontend_ip_configuration_name
frontend_port_name = local.frontend_port_name
protocol = "Http"
}
request_routing_rule {
name = local.request_routing_rule_name
rule_type = "Basic"
http_listener_name = local.listener_name
backend_address_pool_name = local.backend_address_pool_name
backend_http_settings_name = local.http_setting_name
}
tags = var.tags
depends_on = [azurerm_virtual_network.test, azurerm_public_ip.test]
}
resource "azurerm_kubernetes_cluster" "k8s" {
name = var.aks_name
location = azurerm_resource_group.rg.location
dns_prefix = var.aks_dns_prefix
resource_group_name = azurerm_resource_group.rg.name
http_application_routing_enabled = false
linux_profile {
admin_username = var.vm_user_name
ssh_key {
key_data = file(var.public_ssh_key_path)
}
}
default_node_pool {
name = "agentpool"
node_count = var.aks_agent_count
vm_size = var.aks_agent_vm_size
os_disk_size_gb = var.aks_agent_os_disk_size
vnet_subnet_id = data.azurerm_subnet.kubesubnet.id
}
service_principal {
client_id = local.client_id
client_secret = local.client_secret
}
network_profile {
network_plugin = "azure"
dns_service_ip = var.aks_dns_service_ip
docker_bridge_cidr = var.aks_docker_bridge_cidr
service_cidr = var.aks_service_cidr
}
role_based_access_control {
enabled = var.aks_enable_rbac
}
depends_on = [azurerm_virtual_network.test, azurerm_application_gateway.network]
tags = var.tags
}
resource "azurerm_role_assignment" "ra1" {
scope = data.azurerm_subnet.kubesubnet.id
role_definition_name = "Network Contributor"
principal_id = local.client_objectid
depends_on = [azurerm_virtual_network.test]
}
resource "azurerm_role_assignment" "ra2" {
scope = azurerm_user_assigned_identity.testIdentity.id
role_definition_name = "Managed Identity Operator"
principal_id = local.client_objectid
depends_on = [azurerm_user_assigned_identity.testIdentity]
}
resource "azurerm_role_assignment" "ra3" {
scope = azurerm_application_gateway.network.id
role_definition_name = "Contributor"
principal_id = azurerm_user_assigned_identity.testIdentity.principal_id
depends_on = [azurerm_user_assigned_identity.testIdentity, azurerm_application_gateway.network]
}
resource "azurerm_role_assignment" "ra4" {
scope = azurerm_resource_group.rg.id
role_definition_name = "Reader"
principal_id = azurerm_user_assigned_identity.testIdentity.principal_id
depends_on = [azurerm_user_assigned_identity.testIdentity, azurerm_application_gateway.network]
}
I have created the id_rsa.pub key like mentioned below
AKS Cluster:
AppGateway:
and I have installed the AGIC by running the following commands
az account set --subscription "Dev3-Tooling"
az aks get-credentials --resource-group "rg-active-stag" --name "aks-cluster1"
kubectl apply -f https://raw.githubusercontent.com/Azure/aad-pod-identity/v1.8.6/deploy/infra/deployment.yaml --insecure-skip-tls-verify
helm repo add application-gateway-kubernetes-ingress https://appgwingress.blob.core.windows.net/ingress-azure-helm-package/
helm repo update
helm install ingress-azure -f helm-config.yaml application-gateway-kubernetes-ingress/ingress-azure --version 1.5.0
Azure Ingress Pod is in the unhealthy state
and I am getting the following error while trying to access the application deployed
I have tried to repro same in my lab environment and got below results.
Step-1:
After provisioning kubernetes cluster and Application Gateway, connect to the cluster using the below command
$ az aks get-credentials -g <resourceGroupName> -n <cluster-name>
Step-2:
Install Azure AD Pod identity for token-based access control
Refer the below command, if your cluster is RBAC enabled.
kubectl create -f https://raw.githubusercontent.com/azure/aad-pod-identity/master/deploy/infra/deployment-rbac.yaml"
Refer the below command, if your cluster is RBAC disabled.
kubectl create -f https://raw.githubusercontent.com/azure/aad-pod-identity/master/deploy/infra/deployment.yaml
Step-3:
Add application gateway ingress controller to the helm repository.
`$helm repo add application-gateway-kubernetes-ingress https://appgwingress.blob.core.windows.net/ingress-azure-helm-package/
$helm repo update
Step-4
Now write helm configuration file and save it to helm-config.yaml as below.
verbosityLevel: 3
appgw:
subscriptionId: <subscriptionId>
resourceGroup: <resourceGroupName>
name: <applicationGatewayName>
shared: false
armAuth:
type: aadPodIdentity
identityResourceID: <identityResourceId>
identityClientID: <identityClientId>
rbac:
enabled: false # true/false
aksClusterConfiguration:
apiServerAddress: <aks-api-server-address>
Step-5:
Now, install AGIC using above file.
$helm install -f helm-config.yaml --generate-name application-gateway-kubernetes-ingress/ingress-azure
After performing the above steps, you can verify ingress service using the below command.
$ kubectl get ingress
Access the IP address in above screenshot, you will be able to access the application.
I want to create AKS and ACR resources in my Azure environment. The script is able to create the two resources, and I am able to connect to each of them. But the AKS node cannot pull images from the ACR. After some research, I found I need to create a Private Endpoint between the AKS and ACR.
The strange thing is that if I create the PE using Terraform the AKS and ACR still cannot communicate. If I create the PE manually, they can communicate. I compared the parameters of the two PEs on the UI and they look the same.
Could someone help me define the PE using the following script? Or let me know what I did wrong?
Thanks!
Full TF script without the Private Endpoint
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=2.97.0"
}
}
required_version = ">= 1.1.7"
}
provider "azurerm" {
features {}
subscription_id = "xxx"
}
resource "azurerm_resource_group" "rg" {
name = "aks-rg"
location = "East US"
}
resource "azurerm_kubernetes_cluster" "aks" {
name = "my-aks"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
dns_prefix = "myaks"
default_node_pool {
name = "default"
node_count = 2
vm_size = "Standard_B2s"
}
identity {
type = "SystemAssigned"
}
}
resource "azurerm_container_registry" "acr" {
name = "my-aks-acr-123"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
sku = "Premium"
admin_enabled = true
network_rule_set {
default_action = "Deny"
}
}
resource "azurerm_role_assignment" "acrpull" {
principal_id = azurerm_kubernetes_cluster.aks.kubelet_identity[0].object_id
role_definition_name = "AcrPull"
scope = azurerm_container_registry.acr.id
skip_service_principal_aad_check = true
}
Then you need to create a VNET, a Subnet (no part of this code ) plus a private DNS zone:
Private DNS zone:
resource "azurerm_private_dns_zone" "example" {
name = "mydomain.com"
resource_group_name = azurerm_resource_group.example.name
}
AKS Part:
resource "azurerm_kubernetes_cluster" "aks" {
name = "my-aks"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
dns_prefix = "myaks"
private_cluster_enabled = true
default_node_pool {
name = "default"
node_count = 2
vm_size = "Standard_B2s"
}
identity {
type = "SystemAssigned"
}
}
You need to create the ACR and a private endpoint for the ACR:
resource "azurerm_container_registry" "acr" {
name = "my-aks-acr-123"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
public_network_access_enabled = false
sku = "Premium"
admin_enabled = true
}
resource "azurerm_private_endpoint" "acr" {
name = "pvep-acr"
location = azurerm_resource_group.example.location
resource_group_name = azurerm_resource_group.example.name
subnet_id = YOUR_SUBNET
private_service_connection {
name = "example-acr"
private_connection_resource_id = azurerm_container_registry.acr.id
is_manual_connection = false
subresource_names = ["registry"]
}
private_dns_zone_group {
name = data.azurerm_private_dns_zone.example.name
private_dns_zone_ids = [data.azurerm_private_dns_zone.example.id]
}
}
resource "azurerm_role_assignment" "acrpull" {
principal_id = azurerm_kubernetes_cluster.aks.kubelet_identity[0].object_id
role_definition_name = "AcrPull"
scope = azurerm_container_registry.acr.id
skip_service_principal_aad_check = true
}
I'm trying to assign UAMI to an AKS kubelet using terraform, but I don't have permissions and it fails with the following error.
Error: creating Managed Kubernetes Cluster "ClusterName" (Resource Group "ResourceGroupName"): containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="CustomKubeletIdentityMissingPermissionError" Message="The cluster user assigned identity must be given permission to assign kubelet identity /subscriptions/***/resourceGroups/ResourceGroupName/providers/Microsoft.ManagedIdentity/userAssignedIdentities/UAMI. Check access result not allowed for action Microsoft.ManagedIdentity/userAssignedIdentities/assign/action.
I would like to grant permissions, but the error message does not mention the scope, so I do not know where to assign permissions.
In addition, I am using the same UAMI that is currently assigned to the control plane, is there any problem?
Thank you for your cooperation.
You need to provide the role Microsoft.ManagedIdentity/userAssignedIdentities/assign/action. As its not directly present in any built-in role definition in Azure , you will have to create a custom role and then assign it to the UAMI to set kublet identity.
I tried the same after receiving the error as below :
Terraform code:
provider"azurerm"{
features{}
}
provider "random" {}
data "azurerm_subscription" "primary" {
}
data "azurerm_client_config" "example" {
}
data "azurerm_resource_group" "rg" {
name = "ansumantest"
}
resource "azurerm_user_assigned_identity" "UAMI" {
resource_group_name = data.azurerm_resource_group.rg.name
location = data.azurerm_resource_group.rg.location
name = "AKS-MI"
}
resource "random_uuid" "customrole" {}
resource "random_uuid" "roleassignment" {}
resource "azurerm_role_definition" "example" {
role_definition_id = random_uuid.customrole.result
name = "CustomKubeletIdentityPermission"
scope = data.azurerm_subscription.primary.id
permissions {
actions = ["Microsoft.ManagedIdentity/userAssignedIdentities/assign/action"]
not_actions = []
}
assignable_scopes = [
data.azurerm_subscription.primary.id,
]
}
resource "azurerm_role_assignment" "example" {
name = random_uuid.roleassignment.result
scope = data.azurerm_subscription.primary.id
role_definition_id = azurerm_role_definition.example.role_definition_resource_id
principal_id = azurerm_user_assigned_identity.UAMI.principal_id
}
resource "azurerm_user_assigned_identity" "kubletIdentity" {
resource_group_name = data.azurerm_resource_group.rg.name
location = data.azurerm_resource_group.rg.location
name = "Kublet-MI"
}
resource "azurerm_kubernetes_cluster" "aks" {
name = "ansumantestaks"
location = data.azurerm_resource_group.rg.location
resource_group_name = data.azurerm_resource_group.rg.name
dns_prefix = "ansumantestaks-dns"
default_node_pool {
name = "system"
node_count = 1
vm_size = "Standard_B2ms"
type = "VirtualMachineScaleSets"
availability_zones = [1, 2, 3]
enable_auto_scaling = false
}
identity{
type = "UserAssigned"
user_assigned_identity_id = azurerm_user_assigned_identity.UAMI.id
}
kubelet_identity {
client_id = azurerm_user_assigned_identity.kubletIdentity.client_id
object_id = azurerm_user_assigned_identity.kubletIdentity.principal_id
user_assigned_identity_id = azurerm_user_assigned_identity.kubletIdentity.id
}
depends_on = [
azurerm_role_assignment.example
]
}
Output:
I was able to achieve the same with a built-in role.
data "azurerm_resource_group" "main" {
name = var.resource_group_name
}
resource "azurerm_user_assigned_identity" "this" {
location = data.azurerm_resource_group.main.location
resource_group_name = data.azurerm_resource_group.main.name
name = "${var.cluster_name}-msi"
}
resource "azurerm_role_assignment" "this" {
scope = data.azurerm_resource_group.main.id
role_definition_name = "Managed Identity Operator"
principal_id = azurerm_user_assigned_identity.this.principal_id
}
resource "azurerm_kubernetes_cluster" "this" {
depends_on = [
azurerm_role_assignment.msi_operator,
]
name = var.cluster_name
kubernetes_version = var.kubernetes_version
location = data.azurerm_resource_group.main.location
resource_group_name = data.azurerm_resource_group.main.name
dns_prefix = var.prefix
sku_tier = var.sku_tier
private_cluster_enabled = var.private_cluster_enabled
kubelet_identity {
user_assigned_identity_id = azurerm_user_assigned_identity.this.id
client_id = azurerm_user_assigned_identity.this.client_id
object_id = azurerm_user_assigned_identity.this.principal_id
}
...
}
It's WRONG to say.... we need to create a Custome Role with "Microsoft.ManagedIdentity/userAssignedIdentities/assign/action. As its not directly present in any built-in role definition in Azure "
It is present in the built in role "Managed Identity Operator". (Pls check the following screenshot from Azure)
Most people find it difficult to find the correct scope So start with the subscription scope and it will do the job. (e.g: var.subscription_id)
To be honest the scope should be set to the id of the Managed Identity itself.
So then the Terraform code should only have following role assignment, when you have the Managed Identity Created Already.
resource "azurerm_role_assignment" "kubelet_identity" {
scope = azurerm_user_assigned_identity.module.id
role_definition_name = "Managed Identity Operator"
principal_id = azurerm_user_assigned_identity.module.principal_id
}
I am using the Terraform code below, to create a resource group, create an AKS cluster and i am trying to allow the AKS cluster to use an existing ACR in the same subscription, using the data {} reference. It works fine without the role assignment block, but when i use that i keep getting the below error
Error: Invalid index
on main.tf line 40, in resource "azurerm_role_assignment" "aks_to_acr_role":
40: principal_id = azurerm_kubernetes_cluster.aks.kubelet_identity[0].object_id
|----------------
| azurerm_kubernetes_cluster.aks.kubelet_identity is empty list of object
The given key does not identify an element in this collection value.
I have looked all over stack exchange, microsoft azure docs and Terraform issues and lots of blog posts, i honestly have no idea what is wrong at this point. Any suggestions would be greatly appreciated.
resource "azurerm_resource_group" "rg" {
name = var.resource_group_name
location = var.location
}
resource "azurerm_kubernetes_cluster" "aks" {
name = var.cluster_name
kubernetes_version = var.kubernetes_version
location = var.location
resource_group_name = azurerm_resource_group.rg.name
dns_prefix = var.cluster_name
default_node_pool {
name = "system"
node_count = var.system_node_count
vm_size = "Standard_B2ms"
type = "VirtualMachineScaleSets"
availability_zones = [1, 2, 3]
enable_auto_scaling = false
}
service_principal {
client_id = var.appId
client_secret = var.password
}
}
data "azurerm_container_registry" "acr_name" {
name = "xxxxx"
resource_group_name = "xxxxx"
}
resource "azurerm_role_assignment" "aks_to_acr_role" {
scope = data.azurerm_container_registry.acr_name.id
role_definition_name = "AcrPull"
principal_id = azurerm_kubernetes_cluster.aks.kubelet_identity[0].object_id
skip_service_principal_aad_check = true
}
ACR name and RG name are xxxxx out of the code just for privacy
While Using the Service Principal as a identity for Kubernetes cluster the kubelet_identity will be empty as you have not defined identity block while creating the AKS Cluster . The Identity block conflicts with Service Principal Block so, they can't be used together .
Solutions:
You can use Identity as SystemAssigned instead of Service
Principal then you don't have to configure the kubelet_identity
block ,it will automatically get preconfigured and you can use
azurerm_kubernetes_cluster.aks.kubelet_identity[0].object_id
successfully. So, your code will be like below:
provider"azurerm"{
features{}
}
data "azurerm_resource_group" "rg" {
name = "ansumantest"
}
resource "azurerm_kubernetes_cluster" "aks" {
name = "ansumantestaks"
location = data.azurerm_resource_group.rg.location
resource_group_name = data.azurerm_resource_group.rg.name
dns_prefix = "ansumantestaks-dns"
default_node_pool {
name = "system"
node_count = 1
vm_size = "Standard_B2ms"
type = "VirtualMachineScaleSets"
availability_zones = [1, 2, 3]
enable_auto_scaling = false
}
identity{
type = "SystemAssigned"
}
}
data "azurerm_container_registry" "acr_name" {
name = "ansumantestacr"
resource_group_name = data.azurerm_resource_group.rg.name
}
resource "azurerm_role_assignment" "aks_to_acr_role" {
scope = data.azurerm_container_registry.acr_name.id
role_definition_name = "AcrPull"
principal_id = azurerm_kubernetes_cluster.aks.kubelet_identity[0].object_id
skip_service_principal_aad_check = true
}
Output:
If you want to use Service Principal only instead of Identity , then you have to use Service Principal Object Id in the role assignment
as the aks is also using the same Service Principal.The Code with
Service Principal Block will be like below :
provider"azurerm"{
features{}
}
provider"azuread"{}
# Service Principal Which is being used by AKS.
data "azuread_service_principal" "akssp"{
display_name = "aksspansuman"
}
data "azurerm_resource_group" "rg" {
name = "ansumantest"
}
resource "azurerm_kubernetes_cluster" "aks" {
name = "ansumantestaks"
location = data.azurerm_resource_group.rg.location
resource_group_name = data.azurerm_resource_group.rg.name
dns_prefix = "ansumantestaks-dns"
default_node_pool {
name = "system"
node_count = 1
vm_size = "Standard_B2ms"
type = "VirtualMachineScaleSets"
availability_zones = [1, 2, 3]
enable_auto_scaling = false
}
service_principal {
client_id = data.azuread_service_principal.akssp.application_id
client_secret = "e997Q~xxxxxxxx"
}
}
data "azurerm_container_registry" "acr_name" {
name = "ansumantestacr"
resource_group_name = data.azurerm_resource_group.rg.name
}
resource "azurerm_role_assignment" "aks_to_acr_role" {
scope = data.azurerm_container_registry.acr_name.id
role_definition_name = "AcrPull"
principal_id = data.azuread_service_principal.akssp.object_id
skip_service_principal_aad_check = true
}
Outputs:
I was looking for something, to assign a network contributor's role to AKS. The role assignment needs the principle ID. and I get that by inspecting AKS terraform object.
terraform state show azurerm_kubernetes_cluster.aks
---
identity {
principal_id = "9966f59f-745a-4210-abcd-123456789"
tenant_id = "18518570-0488-436a-abcd-123456789"
type = "SystemAssigned"
}
so I realize then I just need to change azurerm_kubernetes_cluster.aks.kubelet_identity[0].object_id this part.
this resolved the issue for me
resource "azurerm_role_assignment" "example" {
scope = data.azurerm_resource_group.staging-rg.id
role_definition_name = "Network Contributor"
principal_id = azurerm_kubernetes_cluster.aks.identity[0].principal_id
}
trying to create private aks via terraform using existing vnet and subnet, was able to create cluster suddenly below error came.
│ Error: creating Managed Kubernetes Cluster "demo-azwe-aks-cluster" (Resource Group "demo-azwe-aks-rg"): containerservice.ManagedClustersClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="CustomRouteTableWithUnsupportedMSIType" Message="Clusters using managed identity type SystemAssigned do not support bringing your own route table. Please see https://aka.ms/aks/customrt for more information"
│
│ with azurerm_kubernetes_cluster.aks_cluster,
│ on aks_cluster.tf line 30, in resource "azurerm_kubernetes_cluster" "aks_cluster":
│ 30: resource "azurerm_kubernetes_cluster" "aks_cluster" {
# Provision AKS Cluster
resource "azurerm_kubernetes_cluster" "aks_cluster" {
name = "${var.global-prefix}-${var.cluster-id}-${var.environment}-azwe-aks-cluster"
location = "${var.location}"
resource_group_name = azurerm_resource_group.aks_rg.name
dns_prefix = "${var.global-prefix}-${var.cluster-id}-${var.environment}-azwe-aks-cluster"
kubernetes_version = data.azurerm_kubernetes_service_versions.current.latest_version
node_resource_group = "${var.global-prefix}-${var.cluster-id}-${var.environment}-azwe-aks-nrg"
private_cluster_enabled = true
default_node_pool {
name = "dpool"
vm_size = "Standard_DS2_v2"
orchestrator_version = data.azurerm_kubernetes_service_versions.current.latest_version
availability_zones = [1, 2, 3]
enable_auto_scaling = true
max_count = 2
min_count = 1
os_disk_size_gb = 30
type = "VirtualMachineScaleSets"
vnet_subnet_id = data.azurerm_subnet.aks.id
node_labels = {
"nodepool-type" = "system"
"environment" = "${var.environment}"
"nodepoolos" = "${var.nodepool-os}"
"app" = "system-apps"
}
tags = {
"nodepool-type" = "system"
"environment" = "dev"
"nodepoolos" = "linux"
"app" = "system-apps"
}
}
# Identity (System Assigned or Service Principal)
identity {
type = "SystemAssigned"
}
# Add On Profiles
addon_profile {
azure_policy {enabled = true}
oms_agent {
enabled = true
log_analytics_workspace_id = azurerm_log_analytics_workspace.insights.id
}
}
# Create Azure AD Group in Active Directory for AKS Admins
resource "azuread_group" "aks_administrators" {
name = "${azurerm_resource_group.aks_rg.name}-cluster-administrators"
description = "Azure AKS Kubernetes administrators for the ${azurerm_resource_group.aks_rg.name}-cluster."
}
RBAC and Azure AD Integration Block
role_based_access_control {
enabled = true
azure_active_directory {
managed = true
admin_group_object_ids = [azuread_group.aks_administrators.id]
}
}
# Linux Profile
linux_profile {
admin_username = "ubuntu"
ssh_key {
key_data = file(var.ssh_public_key)
}
}
# Network Profile
network_profile {
network_plugin = "kubenet"
load_balancer_sku = "Standard"
}
tags = {
Environment = "prod"
}
}
You are trying to create a Private AKS cluster with existing Vnet and existing subnet for both AKS and firewall ,So as per the error "CustomRouteTableWithUnsupportedMSIType" you need a managed identity to create a route table and a role assigned to it i.e. Network Contributor.
Network profile will be azure instead of kubenet as you are using azure vnet and its subnet.
Add on's you can use as per your requirement but please ensure you have used data block for workspace otherwise you can directly give the resourceID. So, instead of
log_analytics_workspace_id = azurerm_log_analytics_workspace.insights.id
you can use
log_analytics_workspace_id = "/subscriptions/SubscriptionID/resourcegroups/resourcegroupname/providers/microsoft.operationalinsights/workspaces/workspacename"
Example to create private cluster with existng vnet and subnets (I haven't added add on's):
provider "azurerm" {
features {}
}
#resource group as this will be referred to in managed identity creation
data "azurerm_resource_group" "base" {
name = "resourcegroupname"
}
#exisiting vnet
data "azurerm_virtual_network" "base" {
name = "ansuman-vnet"
resource_group_name = data.azurerm_resource_group.base.name
}
#exisiting subnets
data "azurerm_subnet" "aks" {
name = "akssubnet"
resource_group_name = data.azurerm_resource_group.base.name
virtual_network_name = data.azurerm_virtual_network.base.name
}
data "azurerm_subnet" "firewall" {
name = "AzureFirewallSubnet"
resource_group_name = data.azurerm_resource_group.base.name
virtual_network_name = data.azurerm_virtual_network.base.name
}
#user assigned identity required to create route table
resource "azurerm_user_assigned_identity" "base" {
resource_group_name = data.azurerm_resource_group.base.name
location = data.azurerm_resource_group.base.location
name = "mi-name"
}
#role assignment required to create route table
resource "azurerm_role_assignment" "base" {
scope = data.azurerm_resource_group.base.id
role_definition_name = "Network Contributor"
principal_id = azurerm_user_assigned_identity.base.principal_id
}
#route table
resource "azurerm_route_table" "base" {
name = "rt-aksroutetable"
location = data.azurerm_resource_group.base.location
resource_group_name = data.azurerm_resource_group.base.name
}
#route
resource "azurerm_route" "base" {
name = "dg-aksroute"
resource_group_name = data.azurerm_resource_group.base.name
route_table_name = azurerm_route_table.base.name
address_prefix = "0.0.0.0/0"
next_hop_type = "VirtualAppliance"
next_hop_in_ip_address = azurerm_firewall.base.ip_configuration.0.private_ip_address
}
#route table association
resource "azurerm_subnet_route_table_association" "base" {
subnet_id = data.azurerm_subnet.aks.id
route_table_id = azurerm_route_table.base.id
}
#firewall
resource "azurerm_public_ip" "base" {
name = "pip-firewall"
location = data.azurerm_resource_group.base.location
resource_group_name = data.azurerm_resource_group.base.name
allocation_method = "Static"
sku = "Standard"
}
resource "azurerm_firewall" "base" {
name = "fw-akscluster"
location = data.azurerm_resource_group.base.location
resource_group_name = data.azurerm_resource_group.base.name
ip_configuration {
name = "ip-firewallakscluster"
subnet_id = data.azurerm_subnet.firewall.id
public_ip_address_id = azurerm_public_ip.base.id
}
}
#kubernetes_cluster
resource "azurerm_kubernetes_cluster" "base" {
name = "testakscluster"
location = data.azurerm_resource_group.base.location
resource_group_name = data.azurerm_resource_group.base.name
dns_prefix = "dns-testakscluster"
private_cluster_enabled = true
network_profile {
network_plugin = "azure"
outbound_type = "userDefinedRouting"
}
default_node_pool {
name = "default"
node_count = 1
vm_size = "Standard_D2_v2"
vnet_subnet_id = data.azurerm_subnet.aks.id
}
identity {
type = "UserAssigned"
user_assigned_identity_id = azurerm_user_assigned_identity.base.id
}
depends_on = [
azurerm_route.base,
azurerm_role_assignment.base
]
}
Output:
(Terraform Plan)
(Terraform Apply)
(Azure portal)
Note: Its bydefault that azure requires the subnet name for firewall to be AzureFirewallSubnet. If you are using subnet with any other name for firewall creation then it will error out. So, Please ensure to name the existing subnet to be used by firewall to be AzureFirewallSubnet.