keda: Azure Service bus scaled object not scaling deployment-Scaling is not performed because triggers are not active - azure

Trying to autoscale pod with inbound messages from azure service bus using KEDA. Scaled object defined is
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: main-router-scaledobject
namespace: rehmannazar-camel-dev
spec:
minReplicaCount: 0
maxReplicaCount: 10
scaleTargetRef:
name: mainrouter
kind: Deployment
triggers:
- type: azure-servicebus
metadata:
topicName: topic4test
subscriptionName: sub3
messageCount: "10"
activationMessageCount: "0"
authenticationRef:
name: trigger-auth-service*
with trigger-auth-service defined as
apiVersion: keda.sh/v1alpha1
*kind: TriggerAuthentication
metadata:
name: trigger-auth-service
spec:
secretTargetRef:
- parameter: connection
name: connectionsecret
key: connection*
and connectionsecret defines connection string to azure service bus.
kubectl describe scaledobject main-router-scaledobject
is having status
Status:
Conditions:
Message: ScaledObject is defined correctly and is ready for scaling
Reason: ScaledObjectReady
Status: True
Type: Ready
Message: Scaling is not performed because triggers are not active
Reason: ScalerNotActive
Status: False
Type: Active
Message: No fallbacks are active on this scaled object
Reason: NoFallbackFound
Status: False
Type: Fallback
External Metric Names:
s0-azure-servicebus-topic4test
Health:
s0-azure-servicebus-topic4test:
Number Of Failures: 0
Status: Happy
Hpa Name: keda-hpa-main-router-scaledobject
Original Replica Count: 1
Scale Target GVKR:
Group: apps
Kind: Deployment
Resource: deployments
Version: v1
Scale Target Kind: apps/v1.Deployment
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal KEDAScaleTargetDeactivated 3m53s (x191 over 98m) keda-operator Deactivated apps/v1.Deployment rehmannazar-camel-dev/mainrouter from 1 to 0
kubectl get ScaledObject main-router-scaledobject
NAME SCALETARGETKIND SCALETARGETNAME MIN MAX TRIGGERS AUTHENTICATION READY ACTIVE FALLBACK AGE
main-router-scaledobject apps/v1.Deployment mainrouter 0 10 azure-servicebus trigger-auth-service True False False 101m
yet pods are not scaled to zero and when posting messages on subscription sub3 pods are not scaled. Pods are also not downscaled to zero when sub3 has no messages. There is always a single pod in running state. Only activity I am observing is pods getting terminated and new pod getting started but pods replicas always remain 1. Is there something I missed in keda configuration?.

KEDA configuration is working . Issue was keda integration with camel-k.

Related

Event Hub triggered Azure Function running on AKS with KEDA does not scale out

I have deployed an Event Hub triggered Azure Function written in Java on AKS. The function should scale out using KEDA.
The function is correctly triggerd and working but it's not scaling out when the load increases. I have added sleep calls to the function implementation to make sure it's not burning through the events too fast and should be forced to scale out but this did not show any change as well.
kubectl get hpa shows the following output
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
keda-hpa-eventlogger Deployment/eventlogger 64/64 (avg) 1 20 1 3m41s
This seems to be a first indicator that something is not correct as i assume the first number in the targets column is the number of unprocessed events in event hub. This stays the same no matter how many events i pump into the hub.
The Function was deployed using the following Kubernetes Deployment Manifest
data:
AzureWebJobsStorage: <removed>
FUNCTIONS_WORKER_RUNTIME: amF2YQ==
EventHubConnectionString: <removed>
apiVersion: v1
kind: Secret
metadata:
name: eventlogger
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: eventlogger
labels:
app: eventlogger
spec:
selector:
matchLabels:
app: eventlogger
template:
metadata:
labels:
app: eventlogger
spec:
containers:
- name: eventlogger
image: <removed>
env:
- name: AzureFunctionsJobHost__functions__0
value: eventloggerHandler
envFrom:
- secretRef:
name: eventlogger
readinessProbe:
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 240
httpGet:
path: /
port: 80
scheme: HTTP
startupProbe:
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 240
httpGet:
path: /
port: 80
scheme: HTTP
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: eventlogger
labels:
app: eventlogger
spec:
scaleTargetRef:
name: eventlogger
pollingInterval: 5
cooldownPeriod: 5
minReplicaCount: 0
maxReplicaCount: 20
triggers:
- type: azure-eventhub
metadata:
storageConnectionFromEnv: AzureWebJobsStorage
connectionFromEnv: EventHubConnectionString
---
The Connection String of Event Hub contains the "EntityPath=" Section as described in the KEDA Event Hub Scaler Documentation and has Manage-Permissions on the Event Hub Namespace.
The output of kubectl describe ScaledObject is
Name: eventlogger
Namespace: default
Labels: app=eventlogger
scaledobject.keda.sh/name=eventlogger
Annotations: <none>
API Version: keda.sh/v1alpha1
Kind: ScaledObject
Metadata:
Creation Timestamp: 2022-04-17T10:30:36Z
Finalizers:
finalizer.keda.sh
Generation: 1
Managed Fields:
API Version: keda.sh/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:labels:
.:
f:app:
f:spec:
.:
f:cooldownPeriod:
f:maxReplicaCount:
f:minReplicaCount:
f:pollingInterval:
f:scaleTargetRef:
.:
f:name:
f:triggers:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2022-04-17T10:30:36Z
API Version: keda.sh/v1alpha1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:finalizers:
.:
v:"finalizer.keda.sh":
f:labels:
f:scaledobject.keda.sh/name:
f:status:
.:
f:conditions:
f:externalMetricNames:
f:lastActiveTime:
f:originalReplicaCount:
f:scaleTargetGVKR:
.:
f:group:
f:kind:
f:resource:
f:version:
f:scaleTargetKind:
Manager: keda
Operation: Update
Time: 2022-04-17T10:30:37Z
Resource Version: 1775052
UID: 3b6a68c1-c3b9-4cdf-b5d5-41a9721ac661
Spec:
Cooldown Period: 5
Max Replica Count: 20
Min Replica Count: 0
Polling Interval: 5
Scale Target Ref:
Name: eventlogger
Triggers:
Metadata:
Connection From Env: EventHubConnectionString
Storage Connection From Env: AzureWebJobsStorage
Type: azure-eventhub
Status:
Conditions:
Message: ScaledObject is defined correctly and is ready for scaling
Reason: ScaledObjectReady
Status: False
Type: Ready
Message: Scaling is performed because triggers are active
Reason: ScalerActive
Status: True
Type: Active
Status: Unknown
Type: Fallback
External Metric Names:
s0-azure-eventhub-$Default
Last Active Time: 2022-04-17T10:30:47Z
Original Replica Count: 1
Scale Target GVKR:
Group: apps
Kind: Deployment
Resource: deployments
Version: v1
Scale Target Kind: apps/v1.Deployment
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal KEDAScalersStarted 10s keda-operator Started scalers watch
Normal ScaledObjectReady 10s keda-operator ScaledObject is ready for scaling
So i'm a bit stucked as i don't see any errors but it's still not behaving as expected.
Versions:
Kubernetes version: 1.21.9
KEDA Version: 2.6.1 installed using kubectl apply -f https://github.com/kedacore/keda/releases/download/v2.6.1/keda-2.6.1.yaml
Azure Functions using Java 11 and extensionBundle in host.json is configured using version [2.8.4, 3.0.0)
Was able to find a solution to the problem.
Event Hub triggered Azure Functions deployed on AKS show the same scaling characteristics as Azure Functions on App Service show:
You only get one consumer per partition to allow for ordering per partition.
This characteristic overrules maxReplicaCount in the Kubernetes Deployment Manifest.
So to solve my own issue: By increasing the partitions for Event Hub i get a pod per partition and KEDA scales the workload as expected.

KEDAScalerFailed : no azure identity found for request clientID

I tried various different methods but not able to access the Azure Storage Queues via PodIdentity. The resource group, client ID already exists.
The steps:-
kubectl create namespace keda
helm install keda kedacore/keda --set podIdentity.activeDirectory.identity= --namespace keda
kubectl create namespace myapp
The first few sections of myapp.yaml :
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentity
metadata:
name: <idvalue>
namespace: myapp
spec:
clientID: "<clientId>"
resourceID: "<resourceId>"
type: 0
---
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentityBinding
metadata:
name: <idvalue>-binding
namespace: myapp
spec:
azureIdentity: <idvalue>
selector: <idvalue> #keeping same as identity
---
The rest of the file is the deployment section, so not pasting here.
Then ran the Helm to deploy the myapp.yaml via myappInt.values.yaml file ->
helm install -f C:\MyApp\myappInt.values.yaml (this file contains the clustername, role etc.)
myappInt.values.yaml file:-
image:
registry: <registryname>
deployment:
environment: INT
clusterName: <clustername>
clusterRole: <clusterrole>
region: <region>
processingRegion: <processingregion>
azureIdentityClientId: "<clientId>"
azureIdentityResourceId: "<resourceId>"
Then the scaler ->
kubectl apply -f c:\MyApp\kedascaling.yaml --namespace myapp
The kedascaling.yaml:-
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: keda-pod-identity-auth
spec:
podIdentity:
provider: azure
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: myapp-scaledobject
namespace: myapp
spec:
scaleTargetRef:
name: myapp # Corresponds with Deployment Name
minReplicaCount: 2
maxReplicaCount: 3
triggers:
- type: azure-queue
metadata:
queueName: myappqueue # Required
accountName: myappstorage # Required when pod identity is used
queueLength: "1" # Required
authenticationRef:
name: keda-pod-identity-auth # AuthenticationRef would need pod identity
Finally it gives the error below:-
kind: Event
apiVersion: v1
metadata:
name: myapp-scaledobject.16def024b939fdf2
namespace: myappnamespace
uid: someuid
resourceVersion: '186302648'
creationTimestamp: '2022-03-23T06:55:54Z'
managedFields:
- manager: keda
operation: Update
apiVersion: v1
time: '2022-03-23T06:55:54Z'
fieldsType: FieldsV1
fieldsV1:
f:count: {}
f:firstTimestamp: {}
f:involvedObject:
f:apiVersion: {}
f:kind: {}
f:name: {}
f:namespace: {}
f:resourceVersion: {}
f:uid: {}
f:lastTimestamp: {}
f:message: {}
f:reason: {}
f:source:
f:component: {}
f:type: {}
involvedObject:
kind: ScaledObject
namespace: myapp
name: myapp-scaledobject
uid: <some id>
apiVersion: keda.sh/v1alpha1
resourceVersion: '<some version>'
**reason: KEDAScalerFailed
message: |
no azure identity found for request clientID**
source:
component: keda-operator
firstTimestamp: '2022-03-23T06:55:54Z'
lastTimestamp: '2022-03-23T07:30:54Z'
count: 71
type: Warning
eventTime: null
reportingComponent: ''
reportingInstance: ''
Any idea what I am doing wrong here? Any help would be greatly appreciated. Asked at Keda repo but no response.
I had a similar error recently... I needed to make sure that the AAD Pod Identity was in the same namespace as the KEDA operator service.
Whatever identity you assigned to KEDA when creating KEDA with HELM, ensure that it's within the same namespace (which in your instance is "keda").
For example after running:
helm install keda kedacore/keda --set podIdentity.activeDirectory.identity=my-keda-identity --namespace keda
if my-keda-identity is not in namespace "keda" then the KEDA operator will not be able to bind AAD because it can't find it. If you need to update the AAD reference you can simply run:
helm upgrade keda kedacore/keda --set podIdentity.activeDirectory.identity=my-second-app-reference --namespace keda
Next, recreate the KEDA operator pod (I like to do this to test things out in a clean manor) and then run the following command to see if binding worked:
kubectl logs -n keda <keda-operator-pod-name> -c keda-operator
You should see the error go away (as long as the identity has access to retrieve queue messages from Azure Storage via RBAC)

FluXCD Helm deployment from Azure ACR - no chart name found error

I am attempting to deploy a Helm chart to AKS using FluxCD. The chart has been pushed to Azure ACR using the Helm cli - "helm push ...". The chart is declared in the ACR as helm/release-services:0.1.0
I am receiving the following error after a Flux reconcile:
'chart pull error: failed to get chart version for remote reference:
no chart name found'
with helm-controller logs as follows
{"level":"info","ts":"2022-02-07T12:40:18.121Z","logger":"controller.helmrelease","msg":"HelmChart 'flux-system/release-services-test-release-services' is not ready","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"release-services","namespace":"release-services-test"}
{"level":"info","ts":"2022-02-07T12:40:18.135Z","logger":"controller.helmrelease","msg":"reconcilation finished in 15.458307ms, next run in 5m0s","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"release-services","namespace":"release-services-test"}
Below is the HelmChart resource in AKS:
apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: HelmChart
metadata:
creationTimestamp: "2022-02-07T07:30:16Z"
finalizers:
- finalizers.fluxcd.io
generation: 1
name: release-services-test-release-services
namespace: flux-system
resourceVersion: "105266699"
selfLink: /apis/source.toolkit.fluxcd.io/v1beta1/namespaces/flux-system/helmcharts/release-services-test-release-services
uid: e4820a70-8885-44a1-8dfd-0e2bf7256915
spec:
chart: release-services
interval: 5m0s
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: psbombb-helm-acr-dev
version: '>=0.1.0'
status:
conditions:
- lastTransitionTime: "2022-02-07T11:02:49Z"
message: 'chart pull error: failed to get chart version for remote reference:
no chart name found'
reason: ChartPullFailed
status: "False"
type: Ready
observedGeneration: 1
and the HelmRelease is as follows
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
creationTimestamp: "2022-02-07T04:34:14Z"
finalizers:
- finalizers.fluxcd.io
generation: 9
labels:
kustomize.toolkit.fluxcd.io/name: apps
kustomize.toolkit.fluxcd.io/namespace: flux-system
name: release-services
namespace: release-services-test
resourceVersion: "105341484"
selfLink: /apis/helm.toolkit.fluxcd.io/v2beta1/namespaces/release-services-test/helmreleases/release-services
uid: 6a6e5f5c-951d-4655-9c15-fa9fe7421a04
spec:
chart:
spec:
chart: release-services
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: psbombb-helm-acr-dev
namespace: flux-system
version: '>=0.1.0'
install:
remediation:
retries: 3
interval: 5m
releaseName: release-services
timeout: 12m
values:
image:
name: release-services
pullPolicy: IfNotPresent
registry: <repository>.azurecr.io
repository: <repository>.azurecr.io/helm/release-services
tag: 0.1.0
postgres:
secret:
create: false
existingName: release-services-secrets
status:
conditions:
- lastTransitionTime: "2022-02-07T08:27:13Z"
message: HelmChart 'flux-system/release-services-test-release-services' is not
ready
reason: ArtifactFailed
status: "False"
type: Ready
failures: 50
helmChart: flux-system/release-services-test-release-services
observedGeneration: 9
Is there anything I am missing that anyone can spot for me please?
Thank you kindly
I think your issue is that the Azure Container Registry stores Helm Charts as OCI Artifacts.
The Flux source controller will pull the index.yaml from a HTTP Helm Chart repo to look for tags and this is not working with an OCI registry.
Here is the GitHub issue for this were you can see that the Flux guys will work on this as of now the OCI Feature is stable with Helm 3.8.0.

k8s pods stuck in failed/shutdown state after preemption (gke v1.20)

TL;DR - gke 1.20 preemptible nodes cause pods to zombie into Failed/Shutdown
We have been using GKE for a few years with clusters containing a mixture of both stable and preemptible node pools. Recently, since gke v1.20, we have started seeing preempted pods enter into a weird zombie state where they are described as:
Status: Failed
Reason: Shutdown
Message: Node is shutting, evicting pods
When this started occurring we were convinced it was related to our pods failing to properly handle the SIGTERM at preemption. We decided to eliminate our service software as a source of a problem by boiling it down to a simple service that mostly sleeps:
/* eslint-disable no-console */
let exitNow = false
process.on( 'SIGINT', () => {
console.log( 'INT shutting down gracefully' )
exitNow = true
} )
process.on( 'SIGTERM', () => {
console.log( 'TERM shutting down gracefully' )
exitNow = true
} )
const sleep = ( seconds ) => {
return new Promise( ( resolve ) => {
setTimeout( resolve, seconds * 1000 )
} )
}
const Main = async ( cycles = 120, delaySec = 5 ) => {
console.log( `Starting ${cycles}, ${delaySec} second cycles` )
for ( let i = 1; i <= cycles && !exitNow; i++ ) {
console.log( `---> ${i} of ${cycles}` )
await sleep( delaySec ) // eslint-disable-line
}
console.log( '*** Cycle Complete - exiting' )
process.exit( 0 )
}
Main()
This code is built into a docker image using the tini init to spawn the pod process running under nodejs (fermium-alpine image). No matter how we shuffle the signal handling it seems the pods never really shutdown cleanly, even though the logs suggest they are.
Another oddity to this is that according to the Kubernetes Pod logs, we see the pod termination start and then gets cancelled:
2021-08-06 17:00:08.000 EDT Stopping container preempt-pod
2021-08-06 17:02:41.000 EDT Cancelling deletion of Pod preempt-pod
We have also tried adding a preStop 15 second delay just to see if that has any effect, but nothing we try seems to matter - the pods become zombies. New replicas are started on the other nodes that are available in the pool, so it always maintains the minimum number of successfully running pods on the system.
We are also testing the preemption cycle using a sim maintenance event:
gcloud compute instances simulate-maintenance-event node-id
After poking around various posts I finally relented to running a cronjob every 9 minutes to avoid the alertManager trigger that occurs after pods have been stuck in shutdown for 10+ minutes. This still feels like a hack to me, but it works, and it forced me to dig in to k8s cronjob and RBAC.
This post started me on the path:
How to remove Kubernetes 'shutdown' pods
And the resultant cronjob spec:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-accessor-role
namespace: default
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["pods"]
verbs: ["get", "delete", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: pod-access
namespace: default
subjects:
- kind: ServiceAccount
name: cronjob-sa
namespace: default
roleRef:
kind: Role
name: pod-accessor-role
apiGroup: ""
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: cronjob-sa
namespace: default
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cron-zombie-killer
namespace: default
spec:
schedule: "*/9 * * * *"
successfulJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
name: cron-zombie-killer
namespace: default
spec:
serviceAccountName: cronjob-sa
restartPolicy: Never
containers:
- name: cron-zombie-killer
imagePullPolicy: IfNotPresent
image: bitnami/kubectl
command:
- "/bin/sh"
args:
- "-c"
- "kubectl get pods -n default --field-selector='status.phase==Failed' -o name | xargs kubectl delete -n default 2> /dev/null"
status: {}
Note that the redirect of stderr to /dev/null is to simply avoid the error output from kubectl delete when the kubectl get doesn't find any pods in the failed state.
Update added missing "delete" verb from the role, and added the missing RoleBinding
Update added imagePullPolicy
Starting with GKE 1.20.5 and later, the kubelet graceful node shutdown feature is enabled preemptible nodes. From the note on the feature page:
When pods were evicted during the graceful node shutdown, they are
marked as failed. Running kubectl get pods shows the status of the the
evicted pods as Shutdown. And kubectl describe pod indicates that the
pod was evicted because of node shutdown:
Status: Failed Reason: Shutdown Message: Node
is shutting, evicting pods Failed pod objects will be preserved until
explicitly deleted or cleaned up by the GC. This is a change of
behavior compared to abrupt node termination.
These pods should eventually be garbage collected, although I'm not sure of the threshold value.

ImagePullBackOff unauthorized: authentication required

I have gone through all the motions and I have what appears to be a common problem. Unfortunately, all of the solutions I've tried from github and SO have yet to work. Here's the error:
Warning Failed 4m (x4 over 5m) kubelet, aks-agentpool-97052351-0 Failed to pull image "ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi": [rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required]
-- created the service principal
az ad sp create-for-rbac
--scopes /subscriptions/11870e73-bdb2-47b0-bf27-25d24c41ae24/resourcegroups/USS-MicroService-Test/providers/Microsoft.ContainerRegistry/registries/UssMicroServiceRegistry
--role Reader
--name kimage-reader
-- created the secret for Kube
kubectl create secret docker-registry kimagereadersecret --docker-server ussmicroserviceregistry.azurecr.io --docker-email coreyp#united-systems.com --docker-username=kimage-reader --docker-password 4b37b896-a04e-48b4-a950-5f1abdd3e7aa
-- kubectl.exe describe pod simpledotnetapi-deployment-6fbf97df55-2hg2m
Name: simpledotnetapi-deployment-6fbf97df55-2hg2m
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: aks-agentpool-97052351-0/10.240.0.4
Start Time: Mon, 17 Jun 2019 15:22:30 -0500
Labels: app=simpledotnetapi-pod
pod-template-hash=6fbf97df55
Annotations: <none>
Status: Pending
IP: 10.240.0.26
Controlled By: ReplicaSet/simpledotnetapi-deployment-6fbf97df55
Containers:
simpledotnetapi-simpledotnetapi:
Container ID:
Image: ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi
Image ID:
Port: 5000/TCP
Host Port: 0/TCP
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hj9b5 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-hj9b5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hj9b5
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m default-scheduler Successfully assigned default/simpledotnetapi-deployment-6fbf97df55-2hg2m to aks-agentpool-97052351-0
Normal BackOff 4m (x6 over 5m) kubelet, aks-agentpool-97052351-0 Back-off pulling image "ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi"
Normal Pulling 4m (x4 over 5m) kubelet, aks-agentpool-97052351-0 pulling image "ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi"
Warning Failed 4m (x4 over 5m) kubelet, aks-agentpool-97052351-0 Failed to pull image "ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi": [rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required]
Warning Failed 4m (x4 over 5m) kubelet, aks-agentpool-97052351-0 Error: ErrImagePull
Warning Failed 24s (x22 over 5m) kubelet, aks-agentpool-97052351-0 Error: ImagePullBackOff
-- kubectl.exe get pod simpledotnetapi-deployment-6fbf97df55-2hg2m -o yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: 2019-06-17T20:22:30Z
generateName: simpledotnetapi-deployment-6fbf97df55-
labels:
app: simpledotnetapi-pod
pod-template-hash: 6fbf97df55
name: simpledotnetapi-deployment-6fbf97df55-2hg2m
namespace: default
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: simpledotnetapi-deployment-6fbf97df55
uid: a99e4ac8-8ec3-11e9-9bf8-86d46846735e
resourceVersion: "813190"
selfLink: /api/v1/namespaces/default/pods/simpledotnetapi-deployment-6fbf97df55-2hg2m
uid: a1c220a2-913d-11e9-801a-c6aef815c06a
spec:
containers:
- image: ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi
imagePullPolicy: Always
name: simpledotnetapi-simpledotnetapi
ports:
- containerPort: 5000
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-hj9b5
readOnly: true
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: kimagereadersecret
nodeName: aks-agentpool-97052351-0
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-hj9b5
secret:
defaultMode: 420
secretName: default-token-hj9b5
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2019-06-17T20:22:30Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2019-06-17T20:22:30Z
message: 'containers with unready status: [simpledotnetapi_simpledotnetapi]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2019-06-17T20:22:30Z
message: 'containers with unready status: [simpledotnetapi_simpledotnetapi]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2019-06-17T20:22:30Z
status: "True"
type: PodScheduled
containerStatuses:
- image: ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi
imageID: ""
lastState: {}
name: simpledotnetapi-simpledotnetapi
ready: false
restartCount: 0
state:
waiting:
message: Back-off pulling image "ussmicroserviceregistry.azurecr.io/simpledotnetapi-simpledotnetapi"
reason: ImagePullBackOff
hostIP: 10.240.0.4
phase: Pending
podIP: 10.240.0.26
qosClass: BestEffort
startTime: 2019-06-17T20:22:30Z
-- yaml configuration file
apiVersion: apps/v1
kind: Deployment
metadata:
name: simpledotnetapi-deployment
spec:
replicas: 3
selector:
matchLabels:
app: simpledotnetapi-pod
template:
metadata:
labels:
app: simpledotnetapi-pod
spec:
imagePullSecrets:
- name: kimagereadersecret
containers:
- name: simpledotnetapi_simpledotnetapi
image: ussmicroserviceregistry.azurecr.io/simpledotnetapi-simpledotnetapi
ports:
- containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
name: simpledotnetapi-service
spec:
type: LoadBalancer
ports:
- port: 80
selector:
app: simpledotnetapi
type: front-end
-- output of kubectl get secret kimagereadersecret
NAME TYPE DATA AGE
kimagereadersecret kubernetes.io/dockerconfigjson 1 1h
-- credentials/secret from Kube dashboard
{
"kind": "Secret",
"apiVersion": "v1",
"metadata": {
"name": "kimagereadersecret",
"namespace": "default",
"selfLink": "/api/v1/namespaces/default/secrets/kimagereadersecret",
"uid": "86006aff-9156-11e9-801a-c6aef815c06a",
"resourceVersion": "830006",
"creationTimestamp": "2019-06-17T23:20:41Z"
},
"data": {
".dockerconfigjson": "eyJhdXRocyI6eyJ1c3NtaWNyb3NlcnZpY2VyZWdpc3RyeS5henVyZWNyLmlvIjp7InVzZXJuYW1lIjoiMzNjYjBjZTQtOTVmMC00NGJkLWJiYmYtNTZkNTA2ZmY0ZWIzIiwicGFzc3dvcmQiOiI0YjM3Yjg5Ni1hMDRlLTQ4YjQtYTk1MC01ZjFhYmRkM2U3YWEiLCJlbWFpbCI6ImNvcmV5cEB1bml0ZWQtc3lzdGVtcy5jb20iLCJhdXRoIjoiTXpOallqQmpaVFF0T1RWbU1DMDBOR0prTFdKaVltWXROVFprTlRBMlptWTBaV0l6T2pSaU16ZGlPRGsyTFdFd05HVXRORGhpTkMxaE9UVXdMVFZtTVdGaVpHUXpaVGRoWVE9PSJ9fX0="
},
"type": "kubernetes.io/dockerconfigjson"
}
-- Full dump from the Kube Dashboard
Failed to pull image "ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi": [rpc error: code = Unknown desc = Error response from daemon: manifest for ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi:latest not found: manifest unknown: manifest unknown, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required]
The entire project is in GitHub # https://github.com/coreyperkins/KubeSimpleDotNetApi
-- ACR screenshot
-- Pod Failure in Kube
I'm fairly certain you didn't give it enough permissions:
az ad sp create-for-rbac
--scopes /subscriptions/11870e73-bdb2-47b0-bf27-25d24c41ae24/resourcegroups/USS-MicroService-Test/providers/Microsoft.ContainerRegistry/registries/UssMicroServiceRegistry
--role Reader
--name kimage-reader
role should be acrpull, not reader. and just delete this secret: `kimagereadersecret 1 and reference to it in the pod. kubernetes will handle that for you.
Looks like you may be missing the kimagereadersecret in your Kubernetes cluster. As I understand az ad sp create-for-rbac just creates access to Azure resources, but how does k8s know which credentials to use to pull from the registry? You can follow this to create the registry secret. You can check that it exists with:
$ kubectl get secret kimagereadersecret
In your case, it could be that it's defaulting to no credentials or using whatever you have configured for Docker which doesn't have access to ussmicroserviceregistry.azurecr.io/simpledotnetapi-simpledotnetapi
For your issue, maybe it's just a mistake that you make. All the things you have done is OK. Just in the deployment, you need to change the image with a tag like below:
image: ussmicroserviceregistry.azurecr.io/simpledotnetapi-simpledotnetapi:tag
Set the tag the same as you set in the ACR. Then it will work well. If you do not set tag, then it will use the default tag latest and it probably is not right.

Resources