ImagePullBackOff unauthorized: authentication required - azure

I have gone through all the motions and I have what appears to be a common problem. Unfortunately, all of the solutions I've tried from github and SO have yet to work. Here's the error:
Warning Failed 4m (x4 over 5m) kubelet, aks-agentpool-97052351-0 Failed to pull image "ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi": [rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required]
-- created the service principal
az ad sp create-for-rbac
--scopes /subscriptions/11870e73-bdb2-47b0-bf27-25d24c41ae24/resourcegroups/USS-MicroService-Test/providers/Microsoft.ContainerRegistry/registries/UssMicroServiceRegistry
--role Reader
--name kimage-reader
-- created the secret for Kube
kubectl create secret docker-registry kimagereadersecret --docker-server ussmicroserviceregistry.azurecr.io --docker-email coreyp#united-systems.com --docker-username=kimage-reader --docker-password 4b37b896-a04e-48b4-a950-5f1abdd3e7aa
-- kubectl.exe describe pod simpledotnetapi-deployment-6fbf97df55-2hg2m
Name: simpledotnetapi-deployment-6fbf97df55-2hg2m
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: aks-agentpool-97052351-0/10.240.0.4
Start Time: Mon, 17 Jun 2019 15:22:30 -0500
Labels: app=simpledotnetapi-pod
pod-template-hash=6fbf97df55
Annotations: <none>
Status: Pending
IP: 10.240.0.26
Controlled By: ReplicaSet/simpledotnetapi-deployment-6fbf97df55
Containers:
simpledotnetapi-simpledotnetapi:
Container ID:
Image: ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi
Image ID:
Port: 5000/TCP
Host Port: 0/TCP
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hj9b5 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-hj9b5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hj9b5
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m default-scheduler Successfully assigned default/simpledotnetapi-deployment-6fbf97df55-2hg2m to aks-agentpool-97052351-0
Normal BackOff 4m (x6 over 5m) kubelet, aks-agentpool-97052351-0 Back-off pulling image "ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi"
Normal Pulling 4m (x4 over 5m) kubelet, aks-agentpool-97052351-0 pulling image "ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi"
Warning Failed 4m (x4 over 5m) kubelet, aks-agentpool-97052351-0 Failed to pull image "ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi": [rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required]
Warning Failed 4m (x4 over 5m) kubelet, aks-agentpool-97052351-0 Error: ErrImagePull
Warning Failed 24s (x22 over 5m) kubelet, aks-agentpool-97052351-0 Error: ImagePullBackOff
-- kubectl.exe get pod simpledotnetapi-deployment-6fbf97df55-2hg2m -o yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: 2019-06-17T20:22:30Z
generateName: simpledotnetapi-deployment-6fbf97df55-
labels:
app: simpledotnetapi-pod
pod-template-hash: 6fbf97df55
name: simpledotnetapi-deployment-6fbf97df55-2hg2m
namespace: default
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: simpledotnetapi-deployment-6fbf97df55
uid: a99e4ac8-8ec3-11e9-9bf8-86d46846735e
resourceVersion: "813190"
selfLink: /api/v1/namespaces/default/pods/simpledotnetapi-deployment-6fbf97df55-2hg2m
uid: a1c220a2-913d-11e9-801a-c6aef815c06a
spec:
containers:
- image: ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi
imagePullPolicy: Always
name: simpledotnetapi-simpledotnetapi
ports:
- containerPort: 5000
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-hj9b5
readOnly: true
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: kimagereadersecret
nodeName: aks-agentpool-97052351-0
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-hj9b5
secret:
defaultMode: 420
secretName: default-token-hj9b5
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2019-06-17T20:22:30Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2019-06-17T20:22:30Z
message: 'containers with unready status: [simpledotnetapi_simpledotnetapi]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2019-06-17T20:22:30Z
message: 'containers with unready status: [simpledotnetapi_simpledotnetapi]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2019-06-17T20:22:30Z
status: "True"
type: PodScheduled
containerStatuses:
- image: ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi
imageID: ""
lastState: {}
name: simpledotnetapi-simpledotnetapi
ready: false
restartCount: 0
state:
waiting:
message: Back-off pulling image "ussmicroserviceregistry.azurecr.io/simpledotnetapi-simpledotnetapi"
reason: ImagePullBackOff
hostIP: 10.240.0.4
phase: Pending
podIP: 10.240.0.26
qosClass: BestEffort
startTime: 2019-06-17T20:22:30Z
-- yaml configuration file
apiVersion: apps/v1
kind: Deployment
metadata:
name: simpledotnetapi-deployment
spec:
replicas: 3
selector:
matchLabels:
app: simpledotnetapi-pod
template:
metadata:
labels:
app: simpledotnetapi-pod
spec:
imagePullSecrets:
- name: kimagereadersecret
containers:
- name: simpledotnetapi_simpledotnetapi
image: ussmicroserviceregistry.azurecr.io/simpledotnetapi-simpledotnetapi
ports:
- containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
name: simpledotnetapi-service
spec:
type: LoadBalancer
ports:
- port: 80
selector:
app: simpledotnetapi
type: front-end
-- output of kubectl get secret kimagereadersecret
NAME TYPE DATA AGE
kimagereadersecret kubernetes.io/dockerconfigjson 1 1h
-- credentials/secret from Kube dashboard
{
"kind": "Secret",
"apiVersion": "v1",
"metadata": {
"name": "kimagereadersecret",
"namespace": "default",
"selfLink": "/api/v1/namespaces/default/secrets/kimagereadersecret",
"uid": "86006aff-9156-11e9-801a-c6aef815c06a",
"resourceVersion": "830006",
"creationTimestamp": "2019-06-17T23:20:41Z"
},
"data": {
".dockerconfigjson": "eyJhdXRocyI6eyJ1c3NtaWNyb3NlcnZpY2VyZWdpc3RyeS5henVyZWNyLmlvIjp7InVzZXJuYW1lIjoiMzNjYjBjZTQtOTVmMC00NGJkLWJiYmYtNTZkNTA2ZmY0ZWIzIiwicGFzc3dvcmQiOiI0YjM3Yjg5Ni1hMDRlLTQ4YjQtYTk1MC01ZjFhYmRkM2U3YWEiLCJlbWFpbCI6ImNvcmV5cEB1bml0ZWQtc3lzdGVtcy5jb20iLCJhdXRoIjoiTXpOallqQmpaVFF0T1RWbU1DMDBOR0prTFdKaVltWXROVFprTlRBMlptWTBaV0l6T2pSaU16ZGlPRGsyTFdFd05HVXRORGhpTkMxaE9UVXdMVFZtTVdGaVpHUXpaVGRoWVE9PSJ9fX0="
},
"type": "kubernetes.io/dockerconfigjson"
}
-- Full dump from the Kube Dashboard
Failed to pull image "ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi": [rpc error: code = Unknown desc = Error response from daemon: manifest for ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi:latest not found: manifest unknown: manifest unknown, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required]
The entire project is in GitHub # https://github.com/coreyperkins/KubeSimpleDotNetApi
-- ACR screenshot
-- Pod Failure in Kube

I'm fairly certain you didn't give it enough permissions:
az ad sp create-for-rbac
--scopes /subscriptions/11870e73-bdb2-47b0-bf27-25d24c41ae24/resourcegroups/USS-MicroService-Test/providers/Microsoft.ContainerRegistry/registries/UssMicroServiceRegistry
--role Reader
--name kimage-reader
role should be acrpull, not reader. and just delete this secret: `kimagereadersecret 1 and reference to it in the pod. kubernetes will handle that for you.

Looks like you may be missing the kimagereadersecret in your Kubernetes cluster. As I understand az ad sp create-for-rbac just creates access to Azure resources, but how does k8s know which credentials to use to pull from the registry? You can follow this to create the registry secret. You can check that it exists with:
$ kubectl get secret kimagereadersecret
In your case, it could be that it's defaulting to no credentials or using whatever you have configured for Docker which doesn't have access to ussmicroserviceregistry.azurecr.io/simpledotnetapi-simpledotnetapi

For your issue, maybe it's just a mistake that you make. All the things you have done is OK. Just in the deployment, you need to change the image with a tag like below:
image: ussmicroserviceregistry.azurecr.io/simpledotnetapi-simpledotnetapi:tag
Set the tag the same as you set in the ACR. Then it will work well. If you do not set tag, then it will use the default tag latest and it probably is not right.

Related

Unable to connect to MongoDB: MongoNetworkError & MongoNetworkError connecting to kubernetis MongoDB pod with mongoose

I am trying to connect to MongoDB in a microservice-based project using NodeJs, Kubernetes, Ingress, and skaffold.
I got two errors on doing skaffold dev:
MongoNetworkError: failed to connect to server [auth-mongo-srv:21017] on first connect [MongoNetworkTimeoutError: connection timed out.
Mongoose default connection error: MongoNetworkError: MongoNetworkError: failed to connect to server [auth-mongo-srv:21017] on first connect [MongoNetworkTimeoutError: connection timed out at connectionFailureError.
My auth-mongo-deploy.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: auth-mongo-deploy
spec:
replicas: 1
selector:
matchLabels:
app: auth-mongo
template:
metadata:
labels:
app: auth-mongo
spec:
containers:
- name: auth-mongo
image: mongo
---
apiVersion: v1
kind: Service
metadata:
name: auth-mongo-srv
spec:
selector:
app: auth-mongo
ports:
- name: db
protocol: TCP
port: 27017
targetPort: 27017
My server.ts
const dbURI: string = "mongodb://auth-mongo-srv:21017/auth"
logger.debug(dbURI)
logger.info('connecting to database...')
// changing {} --> options change nothing!
mongoose.connect(dbURI, {}).then(() => {
logger.info('Mongoose connection done')
app.listen(APP_PORT, () => {
logger.info(`server listening on ${APP_PORT}`)
})
console.clear();
}).catch((e) => {
logger.info('Mongoose connection error')
logger.error(e)
})
Additional information:
1. pod is created:
rhythm#vivobook:~/Documents/TicketResale/server$ kubectl get pods
NAME STATUS RESTARTS AGE
auth-deploy-595c6cbf6d-9wzt9 1/1 Running 0 5m53s
auth-mongo-deploy-6b96b7798c-9726w 1/1 Running 0 5m53s
tickets-deploy-675b7b9b58-f5bzs 1/1 Running 0 5m53s
2. pod description:
kubectl describe pod auth-mongo-deploy-6b96b7798c-9726w
Name: auth-mongo-deploy-694b67f76d-ksw82
Namespace: default
Priority: 0
Node: minikube/192.168.49.2
Start Time: Tue, 21 Jun 2022 14:11:47 +0530
Labels: app=auth-mongo
pod-template-hash=694b67f76d
skaffold.dev/run-id=2f5d2142-0f1a-4fa4-b641-3f301f10e65a
Annotations: <none>
Status: Running
IP: 172.17.0.2
IPs:
IP: 172.17.0.2
Controlled By: ReplicaSet/auth-mongo-deploy-694b67f76d
Containers:
auth-mongo:
Container ID: docker://fa43cd7e03ac32ed63c82419e5f9722deffd2f93206b6a0f2b25ae9be8f6cedf
Image: mongo
Image ID: docker-pullable://mongo#sha256:37e84d3dd30cdfb5472ec42b8a6b4dc6ca7cacd91ebcfa0410a54528bbc5fa6d
Port: <none>
Host Port: <none>
State: Running
Started: Tue, 21 Jun 2022 14:11:52 +0530
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zw7s9 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-zw7s9:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 79s default-scheduler Successfully assigned default/auth-mongo-deploy-694b67f76d-ksw82 to minikube
Normal Pulling 79s kubelet Pulling image "mongo"
Normal Pulled 75s kubelet Successfully pulled image "mongo" in 4.429126953s
Normal Created 75s kubelet Created container auth-mongo
Normal Started 75s kubelet Started container auth-mongo
I have also tried:
kubectl describe service auth-mongo-srv
Name: auth-mongo-srv
Namespace: default
Labels: skaffold.dev/run-id=2f5d2142-0f1a-4fa4-b641-3f301f10e65a
Annotations: <none>
Selector: app=auth-mongo
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.100.42.183
IPs: 10.100.42.183
Port: db 27017/TCP
TargetPort: 27017/TCP
Endpoints: 172.17.0.2:27017
Session Affinity: None
Events: <none>
And then changed:
const dbURI: string = "mongodb://auth-mongo-srv:21017/auth" to
const dbURI: string = "mongodb://172.17.0.2:27017:21017/auth"
generated a different error of MongooseServerSelectionError.
const dbURI: string = "mongodb://auth-mongo-srv:27017/auth"

FluXCD Helm deployment from Azure ACR - no chart name found error

I am attempting to deploy a Helm chart to AKS using FluxCD. The chart has been pushed to Azure ACR using the Helm cli - "helm push ...". The chart is declared in the ACR as helm/release-services:0.1.0
I am receiving the following error after a Flux reconcile:
'chart pull error: failed to get chart version for remote reference:
no chart name found'
with helm-controller logs as follows
{"level":"info","ts":"2022-02-07T12:40:18.121Z","logger":"controller.helmrelease","msg":"HelmChart 'flux-system/release-services-test-release-services' is not ready","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"release-services","namespace":"release-services-test"}
{"level":"info","ts":"2022-02-07T12:40:18.135Z","logger":"controller.helmrelease","msg":"reconcilation finished in 15.458307ms, next run in 5m0s","reconciler group":"helm.toolkit.fluxcd.io","reconciler kind":"HelmRelease","name":"release-services","namespace":"release-services-test"}
Below is the HelmChart resource in AKS:
apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: HelmChart
metadata:
creationTimestamp: "2022-02-07T07:30:16Z"
finalizers:
- finalizers.fluxcd.io
generation: 1
name: release-services-test-release-services
namespace: flux-system
resourceVersion: "105266699"
selfLink: /apis/source.toolkit.fluxcd.io/v1beta1/namespaces/flux-system/helmcharts/release-services-test-release-services
uid: e4820a70-8885-44a1-8dfd-0e2bf7256915
spec:
chart: release-services
interval: 5m0s
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: psbombb-helm-acr-dev
version: '>=0.1.0'
status:
conditions:
- lastTransitionTime: "2022-02-07T11:02:49Z"
message: 'chart pull error: failed to get chart version for remote reference:
no chart name found'
reason: ChartPullFailed
status: "False"
type: Ready
observedGeneration: 1
and the HelmRelease is as follows
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
creationTimestamp: "2022-02-07T04:34:14Z"
finalizers:
- finalizers.fluxcd.io
generation: 9
labels:
kustomize.toolkit.fluxcd.io/name: apps
kustomize.toolkit.fluxcd.io/namespace: flux-system
name: release-services
namespace: release-services-test
resourceVersion: "105341484"
selfLink: /apis/helm.toolkit.fluxcd.io/v2beta1/namespaces/release-services-test/helmreleases/release-services
uid: 6a6e5f5c-951d-4655-9c15-fa9fe7421a04
spec:
chart:
spec:
chart: release-services
reconcileStrategy: ChartVersion
sourceRef:
kind: HelmRepository
name: psbombb-helm-acr-dev
namespace: flux-system
version: '>=0.1.0'
install:
remediation:
retries: 3
interval: 5m
releaseName: release-services
timeout: 12m
values:
image:
name: release-services
pullPolicy: IfNotPresent
registry: <repository>.azurecr.io
repository: <repository>.azurecr.io/helm/release-services
tag: 0.1.0
postgres:
secret:
create: false
existingName: release-services-secrets
status:
conditions:
- lastTransitionTime: "2022-02-07T08:27:13Z"
message: HelmChart 'flux-system/release-services-test-release-services' is not
ready
reason: ArtifactFailed
status: "False"
type: Ready
failures: 50
helmChart: flux-system/release-services-test-release-services
observedGeneration: 9
Is there anything I am missing that anyone can spot for me please?
Thank you kindly
I think your issue is that the Azure Container Registry stores Helm Charts as OCI Artifacts.
The Flux source controller will pull the index.yaml from a HTTP Helm Chart repo to look for tags and this is not working with an OCI registry.
Here is the GitHub issue for this were you can see that the Flux guys will work on this as of now the OCI Feature is stable with Helm 3.8.0.

AKS timeout container kubectl Rollout status check failed

I have a sporadic issue that I am struggling to understand - azure pipeline on promote fails due to kubectl rollout status Deployment/name --timeout 120s --namespace xyz
I have tried to increase the progressDeadlineSeconds, but I know it may not take, I have tried to update the replicaSets to 2 so it can take but it still does not apply. I am not fully understanding this error and there is a roll out issue.
Yaml file
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: #{KubeComponentName}#
namespace: #{Namespace}#
spec:
selector:
matchLabels:
app: #{KubeComponentName}#
progressDeadlineSeconds: 600
replicas: #{ReplicaCount}#
template:
metadata:
labels:
app: #{KubeComponentName}#
annotations:
spec:
securityContext:
runAsUser: 999
serviceAccountName: #{KubeComponentName}#
containers:
- name: #{KubeComponentName}#
image: #{ImageRegistry}#/datahub/#{KubeComponentName}#:latest
#command: ["/bin/bash", "-c", "--"]
#args: [ "while true; do sleep 30; done;" ]
volumeMounts:
ports:
env:
- name: NodeName
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: PodName
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: PodNamespace
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: PodIp
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: PodServiceAccount
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
- name: ComponentInfo__ComponentName
value: #{KubeComponentName}#
- name: ComponentInfo__ComponentHost
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: ComponentInfo__ServiceUser
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
- name: MongoDbUserName
valueFrom:
secretKeyRef:
name: mongodb-xyz-username
key: secret-value
- name: MongoDbPassword
valueFrom:
secretKeyRef:
name: mongodb-xyz-password
key: secret-value
- name: MongoDbKubernetesHosts
value: #{MongoDbKubernetesHosts}#
- name: MongoDbScriptBasePath
value: #{MongoDbScriptBasePath}#
volumes:
My errors keep happening such that I get a timeout error waiting for rollout or exceeded progress deadline
----
/opt/vsts-agent/_work/_tool/kubectl/1.18.6/x64/kubectl rollout status Deployment/datahub-recon --timeout 120s --namespace xyz
Waiting for deployment "datahub-recon" rollout to finish: 0 of 1 updated replicas are available...
error: timed out waiting for the condition
##[error]Error: error: timed out waiting for the condition
/opt/vsts-agent/_work/_tool/kubectl/1.18.6/x64/kubectl describe Deployment datahub-recon --namespace xyz
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing True ReplicaSetUpdated
OldReplicaSets: <none>
NewReplicaSet: datahub-recon-567c7d6958 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 2m1s deployment-controller Scaled up replica set datahub-recon-567c7d6958 to 1
For more information, go to https://dev.azure.com/pbc/Premera/_environments/23
##[error]Rollout status check failed.
----
/opt/vsts-agent/_work/_tool/kubectl/1.18.6/x64/kubectl rollout status Deployment/datahub-recon --timeout 120s --namespace xyz
error: deployment "datahub-recon" exceeded its progress deadline
##[error]Error: error: deployment "datahub-recon" exceeded its progress deadline
/opt/vsts-agent/_work/_tool/kubectl/1.18.6/x64/kubectl describe Deployment datahub-recon --namespace xyz
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing False ProgressDeadlineExceeded
OldReplicaSets: datahub-recon-6bc6f85fc6 (2/2 replicas created)
NewReplicaSet: datahub-recon-bd7d9754 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 13m deployment-controller Scaled up replica set datahub-recon-bd7d9754 to 1
For more information, go to https://dev.azure.com/pbc/Premera/_environments/23
##[error]Rollout status check failed.

Nextcloud with Replicas on Azure Kubernetes - Failing to Mount Azure Files ReadWriteMany Volume

I'm trying to deploy Nextcloud w/HPA (replicas - horizontal scaling) on Azure Kubernetes with the official Nextcloud Helm chart and a ReadWriteMany volume created following these official instructions, but the volume never mounts, get this (or some version thereof) error:
kind: Event
apiVersion: v1
metadata:
name: nextcloud-6bc9b947bf-z6rlh.16bf7711bc2827a5
namespace: nextcloud
uid: c3c5619b-19da-4070-afbb-24bce111ddbe
resourceVersion: '55858'
creationTimestamp: '2021-12-10T18:08:27Z'
managedFields:
- manager: kubelet
operation: Update
apiVersion: v1
time: '2021-12-10T18:08:27Z'
fieldsType: FieldsV1
fieldsV1:
f:count: {}
f:firstTimestamp: {}
f:involvedObject: {}
f:lastTimestamp: {}
f:message: {}
f:reason: {}
f:source:
f:component: {}
f:host: {}
f:type: {}
involvedObject:
kind: Pod
namespace: nextcloud
name: nextcloud-6bc9b947bf-z6rlh
uid: 6106d13f-7033-4a4e-a6e9-a8e3947c52a4
apiVersion: v1
resourceVersion: '55764'
reason: FailedMount
message: >
MountVolume.MountDevice failed for volume "nextcloud-rwx" : rpc error: code =
Internal desc = volume(#azure-secret#aksshare#) mount
"//nextcloudcluster.file.core.windows.net/aksshare" on
"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/nextcloud-rwx/globalmount"
failed with mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t cifs -o
dir_mode=0777,file_mode=0777,gid=33,mfsymlinks,actimeo=30,<masked>
//nextcloudcluster.file.core.windows.net/aksshare
/var/lib/kubelet/plugins/kubernetes.io/csi/pv/nextcloud-rwx/globalmount
Output: mount error(13): Permission denied
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log
messages (dmesg)
source:
component: kubelet
host: aks-agentpool-16596208-vmss000002
firstTimestamp: '2021-12-10T18:08:27Z'
lastTimestamp: '2021-12-10T18:08:35Z'
count: 5
type: Warning
eventTime: null
reportingComponent: ''
reportingInstance: ''
Here is my PersistentVolume yaml:
apiVersion: v1
kind: PersistentVolume
metadata:
name: nextcloud-rwx
namespace: nextcloud
spec:
capacity:
storage: 32Gi
accessModes:
- ReadWriteMany
azureFile:
secretName: azure-secret
shareName: aksshare
readOnly: false
mountOptions:
- dir_mode=0777
- file_mode=0777
- gid=33
- mfsymlinks
PersistentVolumeClaim yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nextcloud-rwx
namespace: nextcloud
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
resources:
requests:
storage: 32Gi
I've also tried changing uid and gid to 0, 1000, etc, and get an even more egregious permission denied message because it doesn't "match the fsgroup(33)" (hence why I tried with gid=33).
Any ideas would be greatly appreciated! Thank you for your time.

AWS EKS terraform tutorial (with assumeRole) - k8s dashboard error

I followed the tutorial at https://learn.hashicorp.com/tutorials/terraform/eks.
Everything works fine with a single IAM user with the required permissions as specified at https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/iam-permissions.md
But when I try to assumeRole in a cross AWSAccount scenario I run into errors/failures.
I started kubectl proxy as per step 5.
However, when I try to access the k8s dashboard at http://127.0.0.1:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/ (after completing steps 1-5), I get the error message as follows -
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "no endpoints available for service \"kubernetes-dashboard\"",
"reason": "ServiceUnavailable",
"code": 503
}
I also got zero pods in READY state for the metrics server deployment in step 3 of the tutorial -
$ kubectl get deployment metrics-server -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 0/1 1 0 21m
My kube dns too has zero pods in READY state and the status is -
kubectl -n kube-system -l=k8s-app=kube-dns get pod
NAME READY STATUS RESTARTS AGE
coredns-55cbf8d6c5-5h8md 0/1 Pending 0 10m
coredns-55cbf8d6c5-n7wp8 0/1 Pending 0 10m
My terraform version info is as below -
$ terraform version
2021/03/06 21:18:18 [WARN] Log levels other than TRACE are currently unreliable, and are supported only for backward compatibility.
Use TF_LOG=TRACE to see Terraform's internal logs.
----
2021/03/06 21:18:18 [INFO] Terraform version: 0.14.7
2021/03/06 21:18:18 [INFO] Go runtime version: go1.15.6
2021/03/06 21:18:18 [INFO] CLI args: []string{"/usr/local/bin/terraform", "version"}
2021/03/06 21:18:18 [DEBUG] Attempting to open CLI config file: /Users/user1/.terraformrc
2021/03/06 21:18:18 [DEBUG] File doesn't exist, but doesn't need to. Ignoring.
2021/03/06 21:18:18 [DEBUG] ignoring non-existing provider search directory terraform.d/plugins
2021/03/06 21:18:18 [DEBUG] ignoring non-existing provider search directory /Users/user1/.terraform.d/plugins
2021/03/06 21:18:18 [DEBUG] ignoring non-existing provider search directory /Users/user1/Library/Application Support/io.terraform/plugins
2021/03/06 21:18:18 [DEBUG] ignoring non-existing provider search directory /Library/Application Support/io.terraform/plugins
2021/03/06 21:18:18 [INFO] CLI command args: []string{"version"}
Terraform v0.14.7
+ provider registry.terraform.io/hashicorp/aws v3.31.0
+ provider registry.terraform.io/hashicorp/kubernetes v2.0.2
+ provider registry.terraform.io/hashicorp/local v2.0.0
+ provider registry.terraform.io/hashicorp/null v3.0.0
+ provider registry.terraform.io/hashicorp/random v3.0.0
+ provider registry.terraform.io/hashicorp/template v2.2.0
Output of describe pods for kube-system ns is -
$ kubectl describe pods -n kube-system
Name: coredns-7dcf49c5dd-kffzw
Namespace: kube-system
Priority: 2000000000
PriorityClassName: system-cluster-critical
Node: <none>
Labels: eks.amazonaws.com/component=coredns
k8s-app=kube-dns
pod-template-hash=7dcf49c5dd
Annotations: eks.amazonaws.com/compute-type: ec2
kubernetes.io/psp: eks.privileged
Status: Pending
IP:
Controlled By: ReplicaSet/coredns-7dcf49c5dd
Containers:
coredns:
Image: 602401143452.dkr.ecr.ca-central-1.amazonaws.com/eks/coredns:v1.8.0-eksbuild.1
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-sqv8j (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-sqv8j:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-sqv8j
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 34s (x16 over 15m) default-scheduler no nodes available to schedule pods
Name: coredns-7dcf49c5dd-rdw94
Namespace: kube-system
Priority: 2000000000
PriorityClassName: system-cluster-critical
Node: <none>
Labels: eks.amazonaws.com/component=coredns
k8s-app=kube-dns
pod-template-hash=7dcf49c5dd
Annotations: eks.amazonaws.com/compute-type: ec2
kubernetes.io/psp: eks.privileged
Status: Pending
IP:
Controlled By: ReplicaSet/coredns-7dcf49c5dd
Containers:
coredns:
Image: 602401143452.dkr.ecr.ca-central-1.amazonaws.com/eks/coredns:v1.8.0-eksbuild.1
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8080/health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/tmp from tmp (rw)
/var/run/secrets/kubernetes.io/serviceaccount from coredns-token-sqv8j (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-sqv8j:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-sqv8j
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly
node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 35s (x16 over 15m) default-scheduler no nodes available to schedule pods
Name: metrics-server-5889d4b758-2bmc4
Namespace: kube-system
Priority: 0
PriorityClassName: <none>
Node: <none>
Labels: k8s-app=metrics-server
pod-template-hash=5889d4b758
Annotations: kubernetes.io/psp: eks.privileged
Status: Pending
IP:
Controlled By: ReplicaSet/metrics-server-5889d4b758
Containers:
metrics-server:
Image: k8s.gcr.io/metrics-server-amd64:v0.3.6
Port: <none>
Host Port: <none>
Environment: <none>
Mounts:
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from metrics-server-token-wsqkn (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
metrics-server-token-wsqkn:
Type: Secret (a volume populated by a Secret)
SecretName: metrics-server-token-wsqkn
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 6s (x9 over 6m56s) default-scheduler no nodes available to schedule pods
Also,
$ kubectl get nodes
No resources found.
And,
$ kubectl describe nodes
returns nothing
Can someone help me troubleshoot and fix this ?
TIA.
Self documenting my solution
Given my AWS setup is as follows
account1:user1:role1
account2:user2:role2
and the role setup is as below -
arn:aws:iam::account2:role/role2
<< trust relationship >>
eks.amazonaws.com
ec2.amazonaws.com
arn:aws:iam::account1:user/user1
arn:aws:sts::account2:assumed-role/role2/user11
Updating the eks-cluster.tf as below -
map_roles = [
{
"groups": [ "system:masters" ],
"rolearn": "arn:aws:iam::account2:role/role2",
"username": "role2"
}
]
map_users = [
{
"groups": [ "system:masters" ],
"userarn": "arn:aws:iam::account1:user/user1",
"username": "user1"
},
{
"groups": [ "system:masters" ],
"userarn": "arn:aws:sts::account2:assumed-role/role2/user11",
"username": "user1"
}
]
p.s.: Yes "user11" is a generated username suffixed with a "1" to the account1 user with a username of "user1"
Makes everything work !

Resources