Error in NMI pod after adding and installing Bitnami External DNS via Terraform and Helm - No AzureIdentityBinding found for pod - azure

I am struggling to get the azureIdentity for ExternalDNS bound and get DNS entries into our zone(s).
Key error: I0423 19:27:52.830107 1 mic.go:610] No AzureIdentityBinding found for pod default/external-dns-84dcc5f68c-cl5h5 that matches selector: external-dns. it will be ignored
Also, no azureAssignedIdentity is created since there is no match for the pod and selector/aadpodidbinding.
I'm building IaaC using Terraform, Helm, Azure, Azure AKS, VSCODE, and so far, three Kubernetes add-ons - aad pod identity, application-gateway-kubernetes-ingress, and Bitnami external-dns.
Since the identity isn't being bound, an azureAssignedIdentity isn't being created and ExternalDNS isn't able to put records into our DNS zone(s).
The names and aadpodidbindings seem correct. I've tried passing in fullnameOverride in the Terraform kubectl_manifest provider for the Helm install of Bitnami ExternalDNS. I've tried suppressing the suffixes on ExternalDNS names and labels. I've tried editing the Helm and Kubernetes YAML on the cluster itself to try to force a binding. I've tried using the AKS user managed identity which is used for AAD Pod Identity and is located in the cluster's nodepools resource group. I've tried letting the Bitnami ExternalDNS configure and add an azure.json file, and I've also done so manually prior to adding and installing ExternalDNS. I've tried assigning the managed identity to the VMSS of the AKS cluster.
Thanks!
JBP
PS C:\Workspace\tf\HelmOne> kubectl logs pod/external-dns-84dcc5f68c-542mv
: Refresh request failed. Status Code = '404'. Response body: getting assigned identities for pod default/external-dns-84dcc5f68c-542mv in CREATED state failed after 16 attempts, retry duration [5]s, error: <nil>. Check MIC pod logs for identity assignment errors\n"
time="2021-04-24T19:57:30Z" level=debug msg="Retrieving Azure DNS zones for resource group: one-hi-sso-dnsrg-tf."
time="2021-04-24T20:06:02Z" level=error msg="azure.BearerAuthorizer#WithAuthorization: Failed to refresh the Token for request to https://management.azure.com/subscriptions/8fb55161-REDACTED-3400b5271a8c/resourceGroups/one-hi-sso-dnsrg-tf/providers/Microsoft.Network/dnsZones?api-version=2018-05-01: StatusCode=404 -- Original Error: adal: Refresh request failed. Status Code = '404'. Response body: getting assigned identities for pod default/external-dns-84dcc5f68c-542mv in CREATED state failed after 16 attempts, retry duration [5]s, error: <nil>. Check MIC pod logs for identity assignment errors\n"
time="2021-04-24T20:06:02Z" level=debug msg="Retrieving Azure DNS zones for resource group: one-hi-sso-dnsrg-tf."
PS C:\Workspace\tf\HelmOne> kubectl logs pod/aad-pod-identity-nmi-vtmwm
I0424 20:07:22.400942 1 server.go:196] status (404) took 80007557875 ns for req.method=GET reg.path=/metadata/identity/oauth2/token req.remote=10.0.8.7
E0424 20:08:44.427353 1 server.go:375] failed to get matching identities for pod: default/external-dns-84dcc5f68c-542mv, error: getting assigned identities for pod default/external-dns-84dcc5f68c-542mv in CREATED state failed after 16 attempts, retry duration [5]s, error: <nil>. Check MIC pod logs for identity assignment errors
I0424 20:08:44.427400 1 server.go:196] status (404) took 80025612263 ns for req.method=GET reg.path=/metadata/identity/oauth2/token req.remote=10.0.8.7
PS C:\Workspace\TF\HelmOne> kubectl logs pod/aad-pod-identity-mic-86944f67b8-k4hds
I0422 21:05:11.298958 1 main.go:114] starting mic process. Version: v1.7.5. Build date: 2021-04-02-21:14
W0422 21:05:11.299031 1 main.go:119] --kubeconfig not passed will use InClusterConfig
I0422 21:05:11.299038 1 main.go:136] kubeconfig () cloudconfig (/etc/kubernetes/azure.json)
I0422 21:05:11.299205 1 main.go:144] running MIC in namespaced mode: false
I0422 21:05:11.299223 1 main.go:148] client QPS set to: 5. Burst to: 5
I0422 21:05:11.299243 1 mic.go:139] starting to create the pod identity client. Version: v1.7.5. Build date: 2021-04-02-21:14
I0422 21:05:11.318835 1 mic.go:145] Kubernetes server version: v1.18.14
I0422 21:05:11.319465 1 cloudprovider.go:122] MIC using user assigned identity: c380##### REDACTED #####814b for authentication.
I0422 21:05:11.392322 1 probes.go:41] initialized health probe on port 8080
I0422 21:05:11.392351 1 probes.go:44] started health probe
I0422 21:05:11.392458 1 metrics.go:341] registered views for metric
I0422 21:05:11.392544 1 prometheus_exporter.go:21] starting Prometheus exporter
I0422 21:05:11.392561 1 metrics.go:347] registered and exported metrics on port 8888
I0422 21:05:11.392568 1 mic.go:244] initiating MIC Leader election
I0422 21:05:11.393053 1 leaderelection.go:243] attempting to acquire leader lease default/aad-pod-identity-mic...
E0423 01:47:52.730839 1 leaderelection.go:325] error retrieving resource lock default/aad-pod-identity-mic: etcdserver: request timed out
resource "helm_release" "external-dns" {
name = "external-dns"
repository = "https://charts.bitnami.com/bitnami"
chart = "external-dns"
namespace = "default"
version = "4.0.0"
set {
name = "azure.cloud"
value = "AzurePublicCloud"
}
#MyDnsResourceGroup
set {
name = "azure.resourceGroup"
value = data.azurerm_resource_group.dnsrg.name
}
set {
name = "azure.tenantId"
value = data.azurerm_subscription.currenttenantid.tenant_id
}
set {
name = "azure.subscriptionId"
value = data.azurerm_subscription.currentSubscription.subscription_id
}
set {
name = "azure.userAssignedIdentityID"
value = azurerm_user_assigned_identity.external-dns-mi-tf.client_id
}
#Verbosity of the logs (options: panic, debug, info, warning, error, fatal, trace)
set {
name = "logLevel"
value = "trace"
}
set {
name = "sources"
value = "{service,ingress}"
}
set {
name = "domainFilters"
value = "{${var.child_domain_prefix}.${lower(var.parent_domain)}}"
}
#DNS provider where the DNS records will be created (mandatory) (options: aws, azure, google, ...)
set {
name = "provider"
value = "azure"
}
#podLabels: {aadpodidbinding: <selector>} # selector you defined above in AzureIdentityBinding
set {
name = "podLabels.aadpodidbinding"
value = "external-dns"
}
set {
name = "azure.useManagedIdentityExtension"
value = true
}
}
resource "helm_release" "aad-pod-identity" {
name = "aad-pod-identity"
repository = "https://raw.githubusercontent.com/Azure/aad-pod-identity/master/charts"
chart = "aad-pod-identity"
}
resource "helm_release" "ingress-azure" {
name = "ingress-azure"
repository = "https://appgwingress.blob.core.windows.net/ingress-azure-helm-package/"
chart = "ingress-azure"
namespace = "default"
version = "1.4.0"
set {
name = "debug"
value = "true"
}
set {
name = "appgw.name"
value = data.azurerm_application_gateway.appgwpub.name
}
set {
name = "appgw.resourceGroup"
value = data.azurerm_resource_group.appgwpubrg.name
}
set {
name = "appgw.subscriptionId"
value = data.azurerm_subscription.currentSubscription.subscription_id
}
set {
name = "appgw.usePrivateIP"
value = "false"
}
set {
name = "armAuth.identityClientID"
value = azurerm_user_assigned_identity.agic-mi-tf.client_id
}
set {
name = "armAuth.identityResourceID"
value = azurerm_user_assigned_identity.agic-mi-tf.id
}
set {
name = "armAuth.type"
value = "aadPodIdentity"
}
set {
name = "rbac.enabled"
value = "true"
}
set {
name = "verbosityLevel"
value = "5"
}
set {
name = "appgw.environment"
value = "AZUREPUBLICCLOUD"
}
set {
name = "metadata.name"
value = "ingress-azure"
}
}
PS C:\Workspace\tf\HelmOne> kubectl get azureassignedidentities
NAME AGE
ingress-azure-68c97fd496-qbptf-default-ingress-azure 23h
PS C:\Workspace\tf\HelmOne> kubectl get azureidentity
NAME AGE
ingress-azure 23h
one-hi-sso-agic-mi-tf 23h
one-hi-sso-external-dns-mi-tf 23h
PS C:\Workspace\tf\HelmOne> kubectl edit azureidentity one-hi-sso-external-dns-mi-tf
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentity
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"aadpodidentity.k8s.io/v1","kind":"AzureIdentity","metadata":{"annotations":{},"name":"one-hi-sso-external-dns-mi-tf","namespace":"default"},"spec":{"clientID":"f58e7c55-REDACTED-a6e358e53912","resourceID":"/subscriptions/8fb55161-REDACTED-3400b5271a8c/resourceGroups/one-hi-sso-kuberg-tf/providers/Microsoft.ManagedIdentity/userAssignedIdentities/one-hi-sso-external-dns-mi-tf","type":0}}
creationTimestamp: "2021-04-22T20:44:42Z"
generation: 2
name: one-hi-sso-external-dns-mi-tf
namespace: default
resourceVersion: "432055"
selfLink: /apis/aadpodidentity.k8s.io/v1/namespaces/default/azureidentities/one-hi-sso-external-dns-mi-tf
uid: f8e22fd9-REDACTED-6cdead0d7e22
spec:
clientID: f58e7c55-REDACTED-a6e358e53912
resourceID: /subscriptions/8fb55161-REDACTED-3400b5271a8c/resourceGroups/one-hi-sso-kuberg-tf/providers/Microsoft.ManagedIdentity/userAssignedIdentities/one-hi-sso-external-dns-mi-tf
type: 0
PS C:\Workspace\tf\HelmOne> kubectl edit azureidentitybinding external-dns-mi-binding
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentityBinding
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"aadpodidentity.k8s.io/v1","kind":"AzureIdentityBinding","metadata":{"annotations":{},"name":"external-dns-mi-binding","namespace":"default"},"spec":{"AzureIdentity":"one-hi-sso-external-dns-mi-tf","Selector":"external-dns"}}
creationTimestamp: "2021-04-22T20:44:42Z"
generation: 1
name: external-dns-mi-binding
namespace: default
resourceVersion: "221101"
selfLink: /apis/aadpodidentity.k8s.io/v1/namespaces/default/azureidentitybindings/external-dns-mi-binding
uid: f39e7418-e896-4b8e-b596-035cf4b66252
spec:
AzureIdentity: one-hi-sso-external-dns-mi-tf
Selector: external-dns
resource "kubectl_manifest" "one-hi-sso-external-dns-mi-tf" {
yaml_body = <<YAML
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentity
metadata:
name: one-hi-sso-external-dns-mi-tf
namespace: default
spec:
type: 0
resourceID: /subscriptions/8fb55161-REDACTED-3400b5271a8c/resourceGroups/one-hi-sso-kuberg-tf/providers/Microsoft.ManagedIdentity/userAssignedIdentities/one-hi-sso-external-dns-mi-tf
clientID: f58e7c55-REDACTED-a6e358e53912
YAML
}
resource "kubectl_manifest" "external-dns-mi-binding" {
yaml_body = <<YAML
apiVersion: "aadpodidentity.k8s.io/v1"
kind: AzureIdentityBinding
metadata:
name: external-dns-mi-binding
spec:
AzureIdentity: one-hi-sso-external-dns-mi-tf
Selector: external-dns
YAML
}

The managed identity I’m using was not added to the virtual machine scale set VMSS. Once I added it, the binding works and the azureAssignedIdentity was created.
Also - I converted the AzureIdentity and Selector lines in my AzureIdentity YAML from upper case first letters to lower case first letters.
Correct:
azureIdentity:
selector:

Related

Can't connect externally to Neo4j server in AKS cluster using Neo4j browser

I have an AKS cluster with a Node.js server connecting to a Neo4j-standalone instance all deployed with Helm.
I installed an ingress-nginx controller, referenced a default Let's Encrypt certificate and habilitated TPC ports with Terraform as
resource "helm_release" "nginx" {
name = "ingress-nginx"
repository = "ingress-nginx"
# repository = "https://kubernetes.github.io/ingress-nginx"
chart = "ingress-nginx/ingress-nginx"
namespace = "default"
set {
name = "tcp.7687"
value = "default/cluster:7687"
}
set {
name = "tcp.7474"
value = "default/cluster:7474"
}
set {
name = "tcp.7473"
value = "default/cluster:7473"
}
set {
name = "tcp.6362"
value = "default/cluster-admin:6362"
}
set {
name = "tcp.7687"
value = "default/cluster-admin:7687"
}
set {
name = "tcp.7474"
value = "default/cluster-admin:7474"
}
set {
name = "tcp.7473"
value = "default/cluster-admin:7473"
}
set {
name = "controller.extraArgs.default-ssl-certificate"
value = "default/tls-secret"
}
set {
name = "controller.service.externalTrafficPolicy"
value = "Local"
}
set {
name = "controller.service.annotations.service.beta.kubernetes.io/azure-load-balancer-internal"
value = "true"
}
set {
name = "controller.service.loadBalancerIP"
value = var.public_ip_address
}
set {
name = "controller.service.annotations.service.beta.kubernetes.io/azure-dns-label-name"
value = "xxx.westeurope.cloudapp.azure.com"
}
set {
name = "controller.service.annotations.service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path"
value = "/healthz"
}
}
I then have an Ingress with paths pointing to Neo4j services so on https://xxx.westeurope.cloudapp.azure.com/neo4j-tcp-http/browser/ I can get to the browser.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-service
namespace: default
annotations:
nginx.ingress.kubernetes.io/use-regex: "true"
nginx.ingress.kubernetes.io/rewrite-target: /$2$3$4
# nginx.ingress.kubernetes.io/rewrite-target: /
# certmanager.k8s.io/acme-challenge-type: http01
nginx.ingress.kubernetes/cluster-issuer: letsencrypt-issuer
ingress.kubernetes.io/ssl-redirect: "true"
# kubernetes.io/tls-acme: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- xxxx.westeurope.cloudapp.azure.com
secretName: tls-secret
rules:
# - host: xxx.westeurope.cloud.app.azure.com #dns from Azure PublicIP
### Node.js server
- http:
paths:
- path: /(/|$)(.*)
pathType: Prefix
backend:
service:
name: server-clusterip-service
port:
number: 80
- http:
paths:
- path: /server(/|$)(.*)
pathType: Prefix
backend:
service:
name: server-clusterip-service
port:
number: 80
##### Neo4j
- http:
paths:
# 502 bad gateway
# /any character 502 bad gatway
- path: /neo4j-tcp-bolt(/|$)(.*)
pathType: Prefix
backend:
service:
# neo4j chart
# name: cluster
# neo4j-standalone chart
name: neo4j
port:
# name: tcp-bolt
number: 7687
- http:
paths:
# /browser/ show browser
#/any character shows login to xxx.westeurope.cloudapp.azure.com:443 from https, :80 from http
- path: /neo4j-tcp-http(/|$)(.*)
pathType: Prefix
backend:
service:
# neo4j chart
# name: cluster
# neo4j-standalone chart
name: neo4j
port:
# name: tcp-http
number: 7474
- http:
paths:
- path: /neo4j-tcp-https(/|$)(.*)
# 502 bad gateway
# /any character 502 bad gatway
pathType: Prefix
backend:
service:
# neo4j chart
# name: cluster
# neo4j-standalone chart
name: neo4j
port:
# name: tcp-https
number: 7473
I can get to the Neo4j Browser on the https://xxx.westeurope.cloudapp.azure.com/neo4j-tcp-http/browser/ but using the Connect Url bolt+s//server.bolt it won't connect to the server with the error ServiceUnavailable: WebSocket connection failure. Due to security constraints in your web browser, the reason for the failure is not available to this Neo4j Driver..
Now I'm guessing that is because Neo4j bolt connector is not using the Certificate used by the ingress-nginxcontroller.
vincenzocalia#vincenzos-MacBook-Air helm_charts % kubectl describe secret tls-secret
Name: tls-secret
Namespace: default
Labels: controller.cert-manager.io/fao=true
Annotations: cert-manager.io/alt-names: xxx.westeurope.cloudapp.azure.com
cert-manager.io/certificate-name: tls-certificate
cert-manager.io/common-name: xxx.westeurope.cloudapp.azure.com
cert-manager.io/ip-sans:
cert-manager.io/issuer-group:
cert-manager.io/issuer-kind: ClusterIssuer
cert-manager.io/issuer-name: letsencrypt-issuer
cert-manager.io/uri-sans:
Type: kubernetes.io/tls
Data
====
tls.crt: 5648 bytes
tls.key: 1679 bytes
I tried to use it overriding the chart values, but then the Neo4j driver from Node.js server won't connect to the server ..
ssl:
# setting per "connector" matching neo4j config
bolt:
privateKey:
secretName: tls-secret # we set up the template to grab `private.key` from this secret
subPath: tls.key # we specify the privateKey value name to get from the secret
publicCertificate:
secretName: tls-secret # we set up the template to grab `public.crt` from this secret
subPath: tls.crt # we specify the publicCertificate value name to get from the secret
trustedCerts:
sources: [ ] # a sources array for a projected volume - this allows someone to (relatively) easily mount multiple public certs from multiple secrets for example.
revokedCerts:
sources: [ ] # a sources array for a projected volume
https:
privateKey:
secretName: tls-secret
subPath: tls.key
publicCertificate:
secretName: tls-secret
subPath: tls.crt
trustedCerts:
sources: [ ]
revokedCerts:
sources: [ ]
Is there a way to use it or should I setup another certificate just for Neo4j? If so what would it be the dnsNames to set on it?
Is there something else I'm doing wrong?
Thank you very much.
From what I can gather from your information, the problem seems to be that you're trying to expose the bolt port behind an ingress. Ingresses are implemented as an L7 (protocol aware) reverse proxy and manage load-balancing etc. The bolt protocol has its load balancing and routing for cluster applications. So you will need to expose the network service directly for every instance of neo4j you are running.
Check out this part of the documentation for more information:
https://neo4j.com/docs/operations-manual/current/kubernetes/accessing-neo4j/#access-outside-k8s
Finally after a few days of going in circles I found what the problems were..
First using a Staging certificate will cause Neo4j bolt connection to fail, as it's not Trusted, with error:
ServiceUnavailable: WebSocket connection failure. Due to security constraints in your web browser, the reason for the failure is not available to this Neo4j Driver. Please use your browsers development console to determine the root cause of the failure. Common reasons include the database being unavailable, using the wrong connection URL or temporary network problems. If you have enabled encryption, ensure your browser is configured to trust the certificate Neo4j is configured to use. WebSocket readyState is: 3
found here https://grishagin.com/neo4j/2022/03/29/neo4j-websocket-issue.html
Then I was missing to assign a general listening address to the bolt connector as it's listening by default only to 127.0.0.0:7687 https://neo4j.com/docs/operations-manual/current/configuration/connectors/
To listen for Bolt connections on all network interfaces (0.0.0.0)
so I added server.bolt.listen_address: "0.0.0.0:7687" to Neo4j chart values config.
Next, as I'm connecting the default neo4j ClusterIP service tcp ports to the ingress controller's exposed TCP connections through the Ingress as described here https://neo4j.com/labs/neo4j-helm/1.0.0/externalexposure/ as an alternative to using a LoadBalancer, the Neo4j LoadBalancer services is not needed so the services:neo4j:enabled gets set to "false", in my tests I actually found that if you leave it enabled bolt won't connect despite setting everything correctly..
Other Neo4j missing config where server.bolt.enabled : "true", server.bolt.tls_level: "REQUIRED", dbms.ssl.policy.bolt.client_auth: "NONE" and dbms.ssl.policy.bolt.enabled: "true" the complete list of config options is here https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/
Neo4j chart's values for ssl config were fine.
So now I can use the (renamed for brevity) path /neo4j/browser/ to serve the Neo4j Browser app, and either the /bolt path as the browser Connect URL, or PublicIP's <DSN>:<bolt port>.
You are connected as user neo4j
to bolt+s://xxxx.westeurope.cloudapp.azure.com/bolt
Connection credentials are stored in your web browser.
Hope this explanation and the code recap below will help others.
Cheers.
ingress controller
resource "helm_release" "nginx" {
name = "ingress-nginx"
namespace = "default"
repository = "https://kubernetes.github.io/ingress-nginx"
chart = "ingress-nginx"
set {
name = "version"
value = "4.4.2"
}
### expose tcp connections for neo4j service
### bolt url connection port
set {
name = "tcp.7687"
value = "default/neo4j:7687"
}
### http browser app port
set {
name = "tcp.7474"
value = "default/neo4j:7474"
}
set {
name = "controller.extraArgs.default-ssl-certificate"
value = "default/tls-secret"
}
set {
name = "controller.service.externalTrafficPolicy"
value = "Local"
}
set {
name = "controller.service.annotations.service.beta.kubernetes.io/azure-load-balancer-internal"
value = "true"
}
set {
name = "controller.service.loadBalancerIP"
value = var.public_ip_address
}
set {
name = "controller.service.annotations.service.beta.kubernetes.io/azure-dns-label-name"
value = "xxx.westeurope.cloudapp.azure.com"
}
set {
name = "controller.service.annotations.service.beta.kubernetes.io/azure-load-balancer-health-probe-request-path"
value = "/healthz"
}
}
Ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-service
namespace: default
annotations:
nginx.ingress.kubernetes.io/use-regex: "true"
nginx.ingress.kubernetes.io/rewrite-target: /$2$3$4
ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes/cluster-issuer: letsencrypt-issuer
spec:
ingressClassName: nginx
tls:
- hosts:
- xxx.westeurope.cloudapp.azure.com
secretName: tls-secret
rules:
### Node.js server
- http:
paths:
- path: /(/|$)(.*)
pathType: Prefix
backend:
service:
name: server-clusterip-service
port:
number: 80
- http:
paths:
- path: /server(/|$)(.*)
pathType: Prefix
backend:
service:
name: server-clusterip-service
port:
number: 80
##### Neo4j
- http:
paths:
- path: /bolt(/|$)(.*)
pathType: Prefix
backend:
service:
name: neo4j
port:
# name: tcp-bolt
number: 7687
- http:
paths:
- path: /neo4j(/|$)(.*)
pathType: Prefix
backend:
service:
name: neo4j
port:
# name: tcp-http
number: 7474
Values.yaml (Umbrella chart)
neo4j-db: #chart dependency alias
nameOverride: "neo4j"
fullnameOverride: 'neo4j'
neo4j:
# Name of your cluster
name: "xxxx" # this will be the label: app: value for the service selector
password: "xxxxx"
##
passwordFromSecret: ""
passwordFromSecretLookup: false
edition: "community"
acceptLicenseAgreement: "yes"
offlineMaintenanceModeEnabled: false
resources:
cpu: "1000m"
memory: "2Gi"
volumes:
data:
mode: 'volumeClaimTemplate'
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
storageClassName: neo4j-sc-data
resources:
requests:
storage: 4Gi
backups:
mode: 'share' # share an existing volume (e.g. the data volume)
share:
name: 'logs'
logs:
mode: 'volumeClaimTemplate'
volumeClaimTemplate:
accessModes:
- ReadWriteOnce
storageClassName: neo4j-sc-logs
resources:
requests:
storage: 4Gi
services:
# A LoadBalancer Service for external Neo4j driver applications and Neo4j Browser, this will create "cluster-neo4j" svc
neo4j:
enabled: false
config:
server.bolt.enabled : "true"
server.bolt.tls_level: "REQUIRED"
server.bolt.listen_address: "0.0.0.0:7687"
dbms.ssl.policy.bolt.client_auth: "NONE"
dbms.ssl.policy.bolt.enabled: "true"
startupProbe:
failureThreshold: 1000
periodSeconds: 50
ssl:
bolt:
privateKey:
secretName: tls-secret
subPath: tls.key
publicCertificate:
secretName: tls-secret
subPath: tls.crt
trustedCerts:
sources: [ ]
revokedCerts:
sources: [ ] # a sources array for a projected volume

keda: Azure Service bus scaled object not scaling deployment-Scaling is not performed because triggers are not active

Trying to autoscale pod with inbound messages from azure service bus using KEDA. Scaled object defined is
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: main-router-scaledobject
namespace: rehmannazar-camel-dev
spec:
minReplicaCount: 0
maxReplicaCount: 10
scaleTargetRef:
name: mainrouter
kind: Deployment
triggers:
- type: azure-servicebus
metadata:
topicName: topic4test
subscriptionName: sub3
messageCount: "10"
activationMessageCount: "0"
authenticationRef:
name: trigger-auth-service*
with trigger-auth-service defined as
apiVersion: keda.sh/v1alpha1
*kind: TriggerAuthentication
metadata:
name: trigger-auth-service
spec:
secretTargetRef:
- parameter: connection
name: connectionsecret
key: connection*
and connectionsecret defines connection string to azure service bus.
kubectl describe scaledobject main-router-scaledobject
is having status
Status:
Conditions:
Message: ScaledObject is defined correctly and is ready for scaling
Reason: ScaledObjectReady
Status: True
Type: Ready
Message: Scaling is not performed because triggers are not active
Reason: ScalerNotActive
Status: False
Type: Active
Message: No fallbacks are active on this scaled object
Reason: NoFallbackFound
Status: False
Type: Fallback
External Metric Names:
s0-azure-servicebus-topic4test
Health:
s0-azure-servicebus-topic4test:
Number Of Failures: 0
Status: Happy
Hpa Name: keda-hpa-main-router-scaledobject
Original Replica Count: 1
Scale Target GVKR:
Group: apps
Kind: Deployment
Resource: deployments
Version: v1
Scale Target Kind: apps/v1.Deployment
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal KEDAScaleTargetDeactivated 3m53s (x191 over 98m) keda-operator Deactivated apps/v1.Deployment rehmannazar-camel-dev/mainrouter from 1 to 0
kubectl get ScaledObject main-router-scaledobject
NAME SCALETARGETKIND SCALETARGETNAME MIN MAX TRIGGERS AUTHENTICATION READY ACTIVE FALLBACK AGE
main-router-scaledobject apps/v1.Deployment mainrouter 0 10 azure-servicebus trigger-auth-service True False False 101m
yet pods are not scaled to zero and when posting messages on subscription sub3 pods are not scaled. Pods are also not downscaled to zero when sub3 has no messages. There is always a single pod in running state. Only activity I am observing is pods getting terminated and new pod getting started but pods replicas always remain 1. Is there something I missed in keda configuration?.
KEDA configuration is working . Issue was keda integration with camel-k.

KEDAScalerFailed : no azure identity found for request clientID

I tried various different methods but not able to access the Azure Storage Queues via PodIdentity. The resource group, client ID already exists.
The steps:-
kubectl create namespace keda
helm install keda kedacore/keda --set podIdentity.activeDirectory.identity= --namespace keda
kubectl create namespace myapp
The first few sections of myapp.yaml :
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentity
metadata:
name: <idvalue>
namespace: myapp
spec:
clientID: "<clientId>"
resourceID: "<resourceId>"
type: 0
---
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentityBinding
metadata:
name: <idvalue>-binding
namespace: myapp
spec:
azureIdentity: <idvalue>
selector: <idvalue> #keeping same as identity
---
The rest of the file is the deployment section, so not pasting here.
Then ran the Helm to deploy the myapp.yaml via myappInt.values.yaml file ->
helm install -f C:\MyApp\myappInt.values.yaml (this file contains the clustername, role etc.)
myappInt.values.yaml file:-
image:
registry: <registryname>
deployment:
environment: INT
clusterName: <clustername>
clusterRole: <clusterrole>
region: <region>
processingRegion: <processingregion>
azureIdentityClientId: "<clientId>"
azureIdentityResourceId: "<resourceId>"
Then the scaler ->
kubectl apply -f c:\MyApp\kedascaling.yaml --namespace myapp
The kedascaling.yaml:-
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-pod-identity-auth
spec:
podIdentity:
provider: azure
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: myapp-scaledobject
namespace: myapp
spec:
scaleTargetRef:
name: myapp # Corresponds with Deployment Name
minReplicaCount: 2
maxReplicaCount: 3
triggers:
- type: azure-queue
metadata:
queueName: myappqueue # Required
accountName: myappstorage # Required when pod identity is used
queueLength: "1" # Required
authenticationRef:
name: keda-pod-identity-auth # AuthenticationRef would need pod identity
Finally it gives the error below:-
kind: Event
apiVersion: v1
metadata:
name: myapp-scaledobject.16def024b939fdf2
namespace: myappnamespace
uid: someuid
resourceVersion: '186302648'
creationTimestamp: '2022-03-23T06:55:54Z'
managedFields:
- manager: keda
operation: Update
apiVersion: v1
time: '2022-03-23T06:55:54Z'
fieldsType: FieldsV1
fieldsV1:
f:count: {}
f:firstTimestamp: {}
f:involvedObject:
f:apiVersion: {}
f:kind: {}
f:name: {}
f:namespace: {}
f:resourceVersion: {}
f:uid: {}
f:lastTimestamp: {}
f:message: {}
f:reason: {}
f:source:
f:component: {}
f:type: {}
involvedObject:
kind: ScaledObject
namespace: myapp
name: myapp-scaledobject
uid: <some id>
apiVersion: keda.sh/v1alpha1
resourceVersion: '<some version>'
**reason: KEDAScalerFailed
message: |
no azure identity found for request clientID**
source:
component: keda-operator
firstTimestamp: '2022-03-23T06:55:54Z'
lastTimestamp: '2022-03-23T07:30:54Z'
count: 71
type: Warning
eventTime: null
reportingComponent: ''
reportingInstance: ''
Any idea what I am doing wrong here? Any help would be greatly appreciated. Asked at Keda repo but no response.
I had a similar error recently... I needed to make sure that the AAD Pod Identity was in the same namespace as the KEDA operator service.
Whatever identity you assigned to KEDA when creating KEDA with HELM, ensure that it's within the same namespace (which in your instance is "keda").
For example after running:
helm install keda kedacore/keda --set podIdentity.activeDirectory.identity=my-keda-identity --namespace keda
if my-keda-identity is not in namespace "keda" then the KEDA operator will not be able to bind AAD because it can't find it. If you need to update the AAD reference you can simply run:
helm upgrade keda kedacore/keda --set podIdentity.activeDirectory.identity=my-second-app-reference --namespace keda
Next, recreate the KEDA operator pod (I like to do this to test things out in a clean manor) and then run the following command to see if binding worked:
kubectl logs -n keda <keda-operator-pod-name> -c keda-operator
You should see the error go away (as long as the identity has access to retrieve queue messages from Azure Storage via RBAC)

k8s pods stuck in failed/shutdown state after preemption (gke v1.20)

TL;DR - gke 1.20 preemptible nodes cause pods to zombie into Failed/Shutdown
We have been using GKE for a few years with clusters containing a mixture of both stable and preemptible node pools. Recently, since gke v1.20, we have started seeing preempted pods enter into a weird zombie state where they are described as:
Status: Failed
Reason: Shutdown
Message: Node is shutting, evicting pods
When this started occurring we were convinced it was related to our pods failing to properly handle the SIGTERM at preemption. We decided to eliminate our service software as a source of a problem by boiling it down to a simple service that mostly sleeps:
/* eslint-disable no-console */
let exitNow = false
process.on( 'SIGINT', () => {
console.log( 'INT shutting down gracefully' )
exitNow = true
} )
process.on( 'SIGTERM', () => {
console.log( 'TERM shutting down gracefully' )
exitNow = true
} )
const sleep = ( seconds ) => {
return new Promise( ( resolve ) => {
setTimeout( resolve, seconds * 1000 )
} )
}
const Main = async ( cycles = 120, delaySec = 5 ) => {
console.log( `Starting ${cycles}, ${delaySec} second cycles` )
for ( let i = 1; i <= cycles && !exitNow; i++ ) {
console.log( `---> ${i} of ${cycles}` )
await sleep( delaySec ) // eslint-disable-line
}
console.log( '*** Cycle Complete - exiting' )
process.exit( 0 )
}
Main()
This code is built into a docker image using the tini init to spawn the pod process running under nodejs (fermium-alpine image). No matter how we shuffle the signal handling it seems the pods never really shutdown cleanly, even though the logs suggest they are.
Another oddity to this is that according to the Kubernetes Pod logs, we see the pod termination start and then gets cancelled:
2021-08-06 17:00:08.000 EDT Stopping container preempt-pod
2021-08-06 17:02:41.000 EDT Cancelling deletion of Pod preempt-pod
We have also tried adding a preStop 15 second delay just to see if that has any effect, but nothing we try seems to matter - the pods become zombies. New replicas are started on the other nodes that are available in the pool, so it always maintains the minimum number of successfully running pods on the system.
We are also testing the preemption cycle using a sim maintenance event:
gcloud compute instances simulate-maintenance-event node-id
After poking around various posts I finally relented to running a cronjob every 9 minutes to avoid the alertManager trigger that occurs after pods have been stuck in shutdown for 10+ minutes. This still feels like a hack to me, but it works, and it forced me to dig in to k8s cronjob and RBAC.
This post started me on the path:
How to remove Kubernetes 'shutdown' pods
And the resultant cronjob spec:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-accessor-role
namespace: default
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["pods"]
verbs: ["get", "delete", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: pod-access
namespace: default
subjects:
- kind: ServiceAccount
name: cronjob-sa
namespace: default
roleRef:
kind: Role
name: pod-accessor-role
apiGroup: ""
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: cronjob-sa
namespace: default
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cron-zombie-killer
namespace: default
spec:
schedule: "*/9 * * * *"
successfulJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
name: cron-zombie-killer
namespace: default
spec:
serviceAccountName: cronjob-sa
restartPolicy: Never
containers:
- name: cron-zombie-killer
imagePullPolicy: IfNotPresent
image: bitnami/kubectl
command:
- "/bin/sh"
args:
- "-c"
- "kubectl get pods -n default --field-selector='status.phase==Failed' -o name | xargs kubectl delete -n default 2> /dev/null"
status: {}
Note that the redirect of stderr to /dev/null is to simply avoid the error output from kubectl delete when the kubectl get doesn't find any pods in the failed state.
Update added missing "delete" verb from the role, and added the missing RoleBinding
Update added imagePullPolicy
Starting with GKE 1.20.5 and later, the kubelet graceful node shutdown feature is enabled preemptible nodes. From the note on the feature page:
When pods were evicted during the graceful node shutdown, they are
marked as failed. Running kubectl get pods shows the status of the the
evicted pods as Shutdown. And kubectl describe pod indicates that the
pod was evicted because of node shutdown:
Status: Failed Reason: Shutdown Message: Node
is shutting, evicting pods Failed pod objects will be preserved until
explicitly deleted or cleaned up by the GC. This is a change of
behavior compared to abrupt node termination.
These pods should eventually be garbage collected, although I'm not sure of the threshold value.

Create kubeconfig with restricted permission

I need to create a kubeconfig with restricted access, I want to be able to provide permission to update configmap in specific namesapce, how can I create such a kubeconfig with the following permission
for specefic namespace (myns)
update only configmap (mycm)
Is there a simple way to create it ?
The tricky part here is that I need that some program will have access to cluster X and modify only this comfigMap, How would I do it from outside process without providing the full kubeconfig file which can be problematic from security reason
To make it clear, I own the cluster, I just want to give to some program restricted permissions
This is not straight forward. But still possible.
Create the namespace myns if not exists.
$ kubectl create ns myns
namespace/myns created
Create a service account cm-user in myns namespace. It'll create a secret token as well.
$ kubectl create sa cm-user -n myns
serviceaccount/cm-user created
$ kubectl get sa cm-user -n myns
NAME SECRETS AGE
cm-user 1 18s
$ kubectl get secrets -n myns
NAME TYPE DATA AGE
cm-user-token-kv5j5 kubernetes.io/service-account-token 3 63s
default-token-m7j9v kubernetes.io/service-account-token 3 96s
Get the token and ca.crt from cm-user-token-kv5j5 secret.
$ kubectl get secrets cm-user-token-kv5j5 -n myns -oyaml
Base64 decode the value of token from cm-user-token-kv5j5.
Now create a user using the decoded token.
$ kubectl config set-credentials cm-user --token=<decoded token value>
User "cm-user" set.
Now generate a kubeconfig file kubeconfig-cm.
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority-data: <ca.crt value from cm-user-token-kv5j5 secret>
server: <kubernetes server>
name: <cluster>
contexts:
- context:
cluster:<cluster>
namespace: myns
user: cm-user
name: cm-user
current-context: cm-user
users:
- name: cm-user
user:
token: <decoded token>
Now create a role and rolebinding for sa cm-user.
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: myns
name: cm-user-role
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["update", "get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: cm-user-rb
namespace: myns
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: cm-user-role
subjects:
- namespace: myns
kind: ServiceAccount
name: cm-user
We are done. Now using this kubeconfig file you can update the mycm configmap. It doesn't have any other privileges.
$ kubectl get cm -n myns --kubeconfig kubeconfig-cm
NAME DATA AGE
mycm 0 8s
$ kubectl delete cm mycm -n myns --kubeconfig kubeconfig-cm
Error from server (Forbidden): configmaps "mycm" is forbidden: User "system:serviceaccount:myns:cm-user" cannot delete resource "configmaps" in API group "" in the namespace "myns"
You need to use RBAC and define role and then bind that role to a user or serviceaccount using rolebinding
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: default
name: configmap-reader
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["configmaps"]
verbs: ["update", "get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
# This role binding allows "jane" to read config maps in the "default" namespace.
# You need to already have a Role named "configmap-reader" in that namespace.
kind: RoleBinding
metadata:
name: read-configmap
namespace: default
subjects:
# You can specify more than one "subject"
- kind: User
name: jane # "name" is case sensitive
apiGroup: rbac.authorization.k8s.io
roleRef:
# "roleRef" specifies the binding to a Role / ClusterRole
kind: Role #this must be Role or ClusterRole
name: configmap-reader # this must match the name of the Role or ClusterRole you wish to bind to
apiGroup: rbac.authorization.k8s.io
https://kubernetes.io/docs/reference/access-authn-authz/rbac/

Resources