Openshift Angular Application S2I Build ImagePullBackOff - node.js

Trying to complete Openshift S2I build using NodeJS builder image, running into the error: npm ERR! enoent ENOENT: no such file or directory, open '/opt/app-root/src/package.json'.
Here are the logs of the build
Adding cluster TLS certificate authority to trust store
Cloning "https://dev.azure.com/westfieldgrp/PL/_git/rule_tool_frontend" ...
Commit: 620bcb6c63dd479ffb4c73f72bea0d71eeb4ba55 (deleted files that have been moved)
Author: D************ <D************#************.com>
Date: Fri Dec 16 09:39:09 2022 -0500
Adding cluster TLS certificate authority to trust store
Adding cluster TLS certificate authority to trust store
time="2022-12-16T14:40:04Z" level=info msg="Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled"
I1216 14:40:04.659698 1 defaults.go:102] Defaulting to storage driver "overlay" with options [mountopt=metacopy=on].
Caching blobs under "/var/cache/blobs".
Trying to pull image-registry.openshift-image-registry.svc:5000/openshift/nodejs#sha256:ec4bda6a4daaea3a28591ea97afc0ea52d06d881a5d966e18269f9c0d0c87892...
Getting image source signatures
Copying blob sha256:600dbb68a707d0370701a1985b053a56c1b71c054179497b8809f0bbdcf72fda
Copying blob sha256:2cf6011ee4f717c20cb7060fe612341720080cd81c52bcd32f54edb15af12991
Copying blob sha256:417723e2b937d59afc1be1bee1ba70a3952be0d1bc922efd8160e0a7060ff7d4
Copying config sha256:f6dc2bbf0dea77c31c3c5d0435fee81c1f52ab70ecdeb0092102b2ae86b0a1ef
Writing manifest to image destination
Storing signatures
Generating dockerfile with builder image image-registry.openshift-image-registry.svc:5000/openshift/nodejs#sha256:ec4bda6a4daaea3a28591ea97afc0ea52d06d881a5d966e18269f9c0d0c87892
Adding transient rw bind mount for /run/secrets/rhsm
STEP 1/9: FROM image-registry.openshift-image-registry.svc:5000/openshift/nodejs#sha256:ec4bda6a4daaea3a28591ea97afc0ea52d06d881a5d966e18269f9c0d0c87892
STEP 2/9: LABEL "io.openshift.build.commit.date"="Fri Dec 16 09:39:09 2022 -0500" "io.openshift.build.commit.id"="620bcb6c63dd479ffb4c73f72bea0d71eeb4ba55" "io.openshift.build.commit.ref"="main" "io.openshift.build.commit.message"="deleted files that have been moved" "io.openshift.build.source-context-dir"="/" "io.openshift.build.image"="image-registry.openshift-image-registry.svc:5000/openshift/nodejs#sha256:ec4bda6a4daaea3a28591ea97afc0ea52d06d881a5d966e18269f9c0d0c87892" "io.openshift.build.commit.author"="DominicRomano <DominicRomano#westfieldgrp.com>"
STEP 3/9: ENV OPENSHIFT_BUILD_NAME="rule-tool-frontend2-3" OPENSHIFT_BUILD_NAMESPACE="rule-tool-webapp2" OPENSHIFT_BUILD_SOURCE="https://************#dev.azure.com/************/**/_git/rule_tool_frontend" OPENSHIFT_BUILD_COMMIT="620bcb6c63dd479ffb4c73f72bea0d71eeb4ba55"
STEP 4/9: USER root
STEP 5/9: COPY upload/src /tmp/src
STEP 6/9: RUN chown -R 1001:0 /tmp/src
STEP 7/9: USER 1001
STEP 8/9: RUN /usr/libexec/s2i/assemble
---> Installing application source ...
---> Installing all dependencies
npm ERR! code ENOENT
npm ERR! syscall open
npm ERR! path /opt/app-root/src/package.json
npm ERR! errno -2
npm ERR! enoent ENOENT: no such file or directory, open '/opt/app-root/src/package.json'
npm ERR! enoent This is related to npm not being able to find a file.
npm ERR! enoent
npm ERR! A complete log of this run can be found in:
npm ERR! /opt/app-root/src/.npm/_logs/2022-12-16T14_40_22_836Z-debug-0.log
error: build error: error building at STEP "RUN /usr/libexec/s2i/assemble": error while running runtime: exit status 254
Here is the YAML
kind: Pod
apiVersion: v1
metadata:
generateName: rule-tool-frontend2-f484544fb-
annotations:
k8s.ovn.org/pod-networks: >-
{"default":{"ip_addresses":["**.***.*.**/**"],"mac_address":"**:**:**:**:**:**","gateway_ips":["**.***.*.*"],"ip_address":"**.***.*.**/**","gateway_ip":"**.***.*.*"}}
k8s.v1.cni.cncf.io/network-status: |-
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"**.***.*.*"
],
"mac": "**:**:**:**:**:**",
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status: |-
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"**.***.*.*"
],
"mac": "**:**:**:**:**:**",
"default": true,
"dns": {}
}]
openshift.io/scc: restricted
resourceVersion: '186661887'
name: rule-tool-frontend2-f484544fb-sb24h
uid: faf4501f-417f-481a-a05f-d57b411188b7
creationTimestamp: '2022-12-16T13:54:49Z'
managedFields:
- manager: kube-controller-manager
operation: Update
apiVersion: v1
time: '2022-12-16T13:54:49Z'
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:generateName': {}
'f:labels':
.: {}
'f:app': {}
'f:deploymentconfig': {}
'f:pod-template-hash': {}
'f:ownerReferences':
.: {}
'k:{"uid":"19afb68c-c09d-4ced-97bf-a69cfeb3c05e"}': {}
'f:spec':
'f:containers':
'k:{"name":"rule-tool-frontend2"}':
.: {}
'f:image': {}
'f:imagePullPolicy': {}
'f:name': {}
'f:ports':
.: {}
'k:{"containerPort":8080,"protocol":"TCP"}':
.: {}
'f:containerPort': {}
'f:protocol': {}
'f:resources': {}
'f:terminationMessagePath': {}
'f:terminationMessagePolicy': {}
'f:dnsPolicy': {}
'f:enableServiceLinks': {}
'f:restartPolicy': {}
'f:schedulerName': {}
'f:securityContext': {}
'f:terminationGracePeriodSeconds': {}
- manager: svatwfldopnshft-2v2n5-master-0
operation: Update
apiVersion: v1
time: '2022-12-16T13:54:49Z'
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:annotations':
'f:k8s.ovn.org/pod-networks': {}
- manager: multus
operation: Update
apiVersion: v1
time: '2022-12-16T13:54:51Z'
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:annotations':
'f:k8s.v1.cni.cncf.io/network-status': {}
'f:k8s.v1.cni.cncf.io/networks-status': {}
subresource: status
- manager: Go-http-client
operation: Update
apiVersion: v1
time: '2022-12-16T13:54:52Z'
fieldsType: FieldsV1
fieldsV1:
'f:status':
'f:conditions':
'k:{"type":"ContainersReady"}':
.: {}
'f:lastProbeTime': {}
'f:lastTransitionTime': {}
'f:message': {}
'f:reason': {}
'f:status': {}
'f:type': {}
'k:{"type":"Initialized"}':
.: {}
'f:lastProbeTime': {}
'f:lastTransitionTime': {}
'f:status': {}
'f:type': {}
'k:{"type":"Ready"}':
.: {}
'f:lastProbeTime': {}
'f:lastTransitionTime': {}
'f:message': {}
'f:reason': {}
'f:status': {}
'f:type': {}
'f:containerStatuses': {}
'f:hostIP': {}
'f:podIP': {}
'f:podIPs':
.: {}
'k:{"ip":"**.***.*.*"}':
.: {}
'f:ip': {}
'f:startTime': {}
subresource: status
namespace: rule-tool-webapp2
ownerReferences:
- apiVersion: apps/v1
kind: ReplicaSet
name: rule-tool-frontend2-f484544fb
uid: 19afb68c-c09d-4ced-97bf-a69cfeb3c05e
controller: true
blockOwnerDeletion: true
labels:
app: rule-tool-frontend2
deploymentconfig: rule-tool-frontend2
pod-template-hash: f484544fb
spec:
restartPolicy: Always
serviceAccountName: default
imagePullSecrets:
- name: default-dockercfg-g9tqv
priority: 0
schedulerName: default-scheduler
enableServiceLinks: true
terminationGracePeriodSeconds: 30
preemptionPolicy: PreemptLowerPriority
nodeName: svatwfldopnshft-2v2n5-worker-kmk85
securityContext:
seLinuxOptions:
level: 's0:c28,c12'
fsGroup: 1000780000
containers:
- resources: {}
terminationMessagePath: /dev/termination-log
name: rule-tool-frontend2
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
runAsUser: 1000780000
ports:
- containerPort: 8080
protocol: TCP
imagePullPolicy: Always
volumeMounts:
- name: kube-api-access-k2tzb
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
terminationMessagePolicy: File
image: >-
image-registry.openshift-image-registry.svc:5000/rule-tool-webapp2/rule-tool-frontend2:latest
serviceAccount: default
volumes:
- name: kube-api-access-k2tzb
projected:
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
name: kube-root-ca.crt
items:
- key: ca.crt
path: ca.crt
- downwardAPI:
items:
- path: namespace
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- configMap:
name: openshift-service-ca.crt
items:
- key: service-ca.crt
path: service-ca.crt
defaultMode: 420
dnsPolicy: ClusterFirst
tolerations:
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 300
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 300
status:
phase: Pending
conditions:
- type: Initialized
status: 'True'
lastProbeTime: null
lastTransitionTime: '2022-12-16T13:54:49Z'
- type: Ready
status: 'False'
lastProbeTime: null
lastTransitionTime: '2022-12-16T13:54:49Z'
reason: ContainersNotReady
message: 'containers with unready status: [rule-tool-frontend2]'
- type: ContainersReady
status: 'False'
lastProbeTime: null
lastTransitionTime: '2022-12-16T13:54:49Z'
reason: ContainersNotReady
message: 'containers with unready status: [rule-tool-frontend2]'
- type: PodScheduled
status: 'True'
lastProbeTime: null
lastTransitionTime: '2022-12-16T13:54:49Z'
hostIP: **.***.*.*
podIP: **.***.*.*
podIPs:
- ip: **.***.*.*
startTime: '2022-12-16T13:54:49Z'
containerStatuses:
- name: rule-tool-frontend2
state:
waiting:
reason: ImagePullBackOff
message: >-
Back-off pulling image
"image-registry.openshift-image-registry.svc:5000/rule-tool-webapp2/rule-tool-frontend2:latest"
lastState: {}
ready: false
restartCount: 0
image: >-
image-registry.openshift-image-registry.svc:5000/rule-tool-webapp2/rule-tool-frontend2:latest
imageID: ''
started: false
qosClass: BestEffort
package.json is located in rule_tool_frontend/src, how do I change where NPM is looking for package.json? Is this something that should be edited in the build YAML?
Thank you for any help.
Tried to complete the Openshift S2I build using NodeJS builder image, expecting a successful build. Got the described error instead.

Related

AKS timeout container kubectl Rollout status check failed

I have a sporadic issue that I am struggling to understand - azure pipeline on promote fails due to kubectl rollout status Deployment/name --timeout 120s --namespace xyz
I have tried to increase the progressDeadlineSeconds, but I know it may not take, I have tried to update the replicaSets to 2 so it can take but it still does not apply. I am not fully understanding this error and there is a roll out issue.
Yaml file
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: #{KubeComponentName}#
namespace: #{Namespace}#
spec:
selector:
matchLabels:
app: #{KubeComponentName}#
progressDeadlineSeconds: 600
replicas: #{ReplicaCount}#
template:
metadata:
labels:
app: #{KubeComponentName}#
annotations:
spec:
securityContext:
runAsUser: 999
serviceAccountName: #{KubeComponentName}#
containers:
- name: #{KubeComponentName}#
image: #{ImageRegistry}#/datahub/#{KubeComponentName}#:latest
#command: ["/bin/bash", "-c", "--"]
#args: [ "while true; do sleep 30; done;" ]
volumeMounts:
ports:
env:
- name: NodeName
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: PodName
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: PodNamespace
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: PodIp
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: PodServiceAccount
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
- name: ComponentInfo__ComponentName
value: #{KubeComponentName}#
- name: ComponentInfo__ComponentHost
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: ComponentInfo__ServiceUser
valueFrom:
fieldRef:
fieldPath: spec.serviceAccountName
- name: MongoDbUserName
valueFrom:
secretKeyRef:
name: mongodb-xyz-username
key: secret-value
- name: MongoDbPassword
valueFrom:
secretKeyRef:
name: mongodb-xyz-password
key: secret-value
- name: MongoDbKubernetesHosts
value: #{MongoDbKubernetesHosts}#
- name: MongoDbScriptBasePath
value: #{MongoDbScriptBasePath}#
volumes:
My errors keep happening such that I get a timeout error waiting for rollout or exceeded progress deadline
----
/opt/vsts-agent/_work/_tool/kubectl/1.18.6/x64/kubectl rollout status Deployment/datahub-recon --timeout 120s --namespace xyz
Waiting for deployment "datahub-recon" rollout to finish: 0 of 1 updated replicas are available...
error: timed out waiting for the condition
##[error]Error: error: timed out waiting for the condition
/opt/vsts-agent/_work/_tool/kubectl/1.18.6/x64/kubectl describe Deployment datahub-recon --namespace xyz
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing True ReplicaSetUpdated
OldReplicaSets: <none>
NewReplicaSet: datahub-recon-567c7d6958 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 2m1s deployment-controller Scaled up replica set datahub-recon-567c7d6958 to 1
For more information, go to https://dev.azure.com/pbc/Premera/_environments/23
##[error]Rollout status check failed.
----
/opt/vsts-agent/_work/_tool/kubectl/1.18.6/x64/kubectl rollout status Deployment/datahub-recon --timeout 120s --namespace xyz
error: deployment "datahub-recon" exceeded its progress deadline
##[error]Error: error: deployment "datahub-recon" exceeded its progress deadline
/opt/vsts-agent/_work/_tool/kubectl/1.18.6/x64/kubectl describe Deployment datahub-recon --namespace xyz
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing False ProgressDeadlineExceeded
OldReplicaSets: datahub-recon-6bc6f85fc6 (2/2 replicas created)
NewReplicaSet: datahub-recon-bd7d9754 (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 13m deployment-controller Scaled up replica set datahub-recon-bd7d9754 to 1
For more information, go to https://dev.azure.com/pbc/Premera/_environments/23
##[error]Rollout status check failed.

Nextcloud with Replicas on Azure Kubernetes - Failing to Mount Azure Files ReadWriteMany Volume

I'm trying to deploy Nextcloud w/HPA (replicas - horizontal scaling) on Azure Kubernetes with the official Nextcloud Helm chart and a ReadWriteMany volume created following these official instructions, but the volume never mounts, get this (or some version thereof) error:
kind: Event
apiVersion: v1
metadata:
name: nextcloud-6bc9b947bf-z6rlh.16bf7711bc2827a5
namespace: nextcloud
uid: c3c5619b-19da-4070-afbb-24bce111ddbe
resourceVersion: '55858'
creationTimestamp: '2021-12-10T18:08:27Z'
managedFields:
- manager: kubelet
operation: Update
apiVersion: v1
time: '2021-12-10T18:08:27Z'
fieldsType: FieldsV1
fieldsV1:
f:count: {}
f:firstTimestamp: {}
f:involvedObject: {}
f:lastTimestamp: {}
f:message: {}
f:reason: {}
f:source:
f:component: {}
f:host: {}
f:type: {}
involvedObject:
kind: Pod
namespace: nextcloud
name: nextcloud-6bc9b947bf-z6rlh
uid: 6106d13f-7033-4a4e-a6e9-a8e3947c52a4
apiVersion: v1
resourceVersion: '55764'
reason: FailedMount
message: >
MountVolume.MountDevice failed for volume "nextcloud-rwx" : rpc error: code =
Internal desc = volume(#azure-secret#aksshare#) mount
"//nextcloudcluster.file.core.windows.net/aksshare" on
"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/nextcloud-rwx/globalmount"
failed with mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t cifs -o
dir_mode=0777,file_mode=0777,gid=33,mfsymlinks,actimeo=30,<masked>
//nextcloudcluster.file.core.windows.net/aksshare
/var/lib/kubelet/plugins/kubernetes.io/csi/pv/nextcloud-rwx/globalmount
Output: mount error(13): Permission denied
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log
messages (dmesg)
source:
component: kubelet
host: aks-agentpool-16596208-vmss000002
firstTimestamp: '2021-12-10T18:08:27Z'
lastTimestamp: '2021-12-10T18:08:35Z'
count: 5
type: Warning
eventTime: null
reportingComponent: ''
reportingInstance: ''
Here is my PersistentVolume yaml:
apiVersion: v1
kind: PersistentVolume
metadata:
name: nextcloud-rwx
namespace: nextcloud
spec:
capacity:
storage: 32Gi
accessModes:
- ReadWriteMany
azureFile:
secretName: azure-secret
shareName: aksshare
readOnly: false
mountOptions:
- dir_mode=0777
- file_mode=0777
- gid=33
- mfsymlinks
PersistentVolumeClaim yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nextcloud-rwx
namespace: nextcloud
spec:
accessModes:
- ReadWriteMany
storageClassName: ""
resources:
requests:
storage: 32Gi
I've also tried changing uid and gid to 0, 1000, etc, and get an even more egregious permission denied message because it doesn't "match the fsgroup(33)" (hence why I tried with gid=33).
Any ideas would be greatly appreciated! Thank you for your time.

I am completely stuck when trying to run skaffold on my project. It keeps throwing an error when run from the ingress srv

When I run skaffold this is the error I get. Skaffold generates tags, checks the cache, starts the deploy then it cleans up.
- stderr: "error: error parsing C: ~\k8s\\ingress-srv.yaml: error converting YAML to JSON: yaml: line 20: mapping values are not allowed in this context
\n"
- cause: exit status 1
Docker creates a container for the server. Here is the ingress server yaml file:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-srv
annotations:
kubernetes.io/ingress.class: "nginx"
spec:
rules:
- host: northernherpgeckosales.dev
http:
paths:
- path: /api/users/?(.*)
pathType: Prefix
backend:
service:
name: auth-srv
port:
number: 3000
- path: /?(.*)
pathType: Prefix
backend:
service:
name: front-end-srv
port:
number: 3000
For good measure here is the skaffold file:
apiVersion: skaffold/v2alpha3
kind: Config
deploy:
kubectl:
manifests:
- ./infra/k8s/*
build:
local:
push: false
artifacts:
- image: giantgecko/auth
context: auth
docker:
dockerfile: Dockerfile
sync:
manual:
- src: 'src/**/*.ts'
dest: .
- image: giantgecko/front-end
context: front-end
docker:
dockerfile: Dockerfile
sync:
manual:
- src: '**/*.js'
dest: .
Take a closer look at your Ingress definition file (starting from line 19):
- path: /?(.*)
pathType: Prefix
backend:
service:
name: front-end-srv
port:
number: 3000
You have unnecessary indents from the line 20 (pathType: Prefix) till the end of the file. Just format your YAML file properly. For the previous path: /api/users/?(.*) everything is alright - no unnecessary indents.
Final YAML looks like this:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-srv
annotations:
kubernetes.io/ingress.class: "nginx"
spec:
rules:
- host: northernherpgeckosales.dev
http:
paths:
- path: /api/users/?(.*)
pathType: Prefix
backend:
service:
name: auth-srv
port:
number: 3000
- path: /?(.*)
pathType: Prefix
backend:
service:
name: front-end-srv
port:
number: 3000

How does the master bootstrap process work and how can I debug it?

I am working to stand up 3 instances of the yugabyte master and tserver in separate k8s clusters connected over LoadBalancer services on bare metal. However, on all three master instances it looks like the bootstrap process is failing:
I0531 19:50:28.081645 1 master_main.cc:94] NumCPUs determined to be: 2
I0531 19:50:28.082594 1 server_base_options.cc:124] Updating master addrs to {yb-master-black.example.com:7100},{yb-master-blue.example.com:7100},{yb-master-white.example.com:7100},{:7100}
I0531 19:50:28.082682 1 server_base_options.cc:124] Updating master addrs to {yb-master-black.example.com:7100},{yb-master-blue.example.com:7100},{yb-master-white.example.com:7100},{:7100}
I0531 19:50:28.082937 1 mem_tracker.cc:249] MemTracker: hard memory limit is 1.699219 GB
I0531 19:50:28.082963 1 mem_tracker.cc:251] MemTracker: soft memory limit is 1.444336 GB
I0531 19:50:28.083189 1 server_base_options.cc:124] Updating master addrs to {yb-master-black.example.com:7100},{yb-master-blue.example.com:7100},{yb-master-white.example.com:7100},{:7100}
I0531 19:50:28.090148 1 server_base_options.cc:124] Updating master addrs to {yb-master-black.example.com:7100},{yb-master-blue.example.com:7100},{yb-master-white.example.com:7100},{:7100}
I0531 19:50:28.090863 1 rpc_server.cc:86] yb::server::RpcServer created at 0x1a7e210
I0531 19:50:28.090924 1 master.cc:146] yb::master::Master created at 0x7ffe2d4bd140
I0531 19:50:28.090958 1 master.cc:147] yb::master::TSManager created at 0x1a90850
I0531 19:50:28.090975 1 master.cc:148] yb::master::CatalogManager created at 0x1dea000
I0531 19:50:28.091152 1 master_main.cc:115] Initializing master server...
I0531 19:50:28.093097 1 server_base.cc:462] Could not load existing FS layout: Not found (yb/util/env_posix.cc:1482): /mnt/disk0/yb-data/master/instance: No such file or directory (system error 2)
I0531 19:50:28.093150 1 server_base.cc:463] Creating new FS layout
I0531 19:50:28.193439 1 fs_manager.cc:463] Generated new instance metadata in path /mnt/disk0/yb-data/master/instance:
uuid: "5f2f6ad78d27450b8cde9c8bcf40fefa"
format_stamp: "Formatted at 2020-05-31 19:50:28 on yb-master-0"
I0531 19:50:28.238484 1 fs_manager.cc:463] Generated new instance metadata in path /mnt/disk1/yb-data/master/instance:
uuid: "5f2f6ad78d27450b8cde9c8bcf40fefa"
format_stamp: "Formatted at 2020-05-31 19:50:28 on yb-master-0"
I0531 19:50:28.377483 1 fs_manager.cc:251] Opened local filesystem: /mnt/disk0,/mnt/disk1
uuid: "5f2f6ad78d27450b8cde9c8bcf40fefa"
format_stamp: "Formatted at 2020-05-31 19:50:28 on yb-master-0"
I0531 19:50:28.378015 1 server_base.cc:245] Auto setting FLAGS_num_reactor_threads to 2
I0531 19:50:28.380707 1 thread_pool.cc:166] Starting thread pool { name: Master queue_limit: 10000 max_workers: 1024 }
I0531 19:50:28.382266 1 master_main.cc:118] Starting Master server...
I0531 19:50:28.382313 24 async_initializer.cc:74] Starting to init ybclient
I0531 19:50:28.382365 1 master_main.cc:119] ulimit cur(max)...
ulimit: core file size unlimited(unlimited) blks
ulimit: data seg size unlimited(unlimited) kb
ulimit: open files 1048576(1048576)
ulimit: file size unlimited(unlimited) blks
ulimit: pending signals 22470(22470)
ulimit: file locks unlimited(unlimited)
ulimit: max locked memory 64(64) kb
ulimit: max memory size unlimited(unlimited) kb
ulimit: stack size 8192(unlimited) kb
ulimit: cpu time unlimited(unlimited) secs
ulimit: max user processes unlimited(unlimited)
W0531 19:50:28.383322 24 master.cc:186] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:6854): Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
I0531 19:50:28.383525 24 client-internal.cc:1847] New master addresses: [yb-master-black.example.com:7100,yb-master-blue.example.com:7100,yb-master-white.example.com:7100,:7100]
I0531 19:50:28.383685 1 service_pool.cc:148] yb.master.MasterBackupService: yb::rpc::ServicePoolImpl created at 0x1a82b40
I0531 19:50:28.384888 1 service_pool.cc:148] yb.master.MasterService: yb::rpc::ServicePoolImpl created at 0x1a83680
I0531 19:50:28.385342 1 service_pool.cc:148] yb.tserver.TabletServerService: yb::rpc::ServicePoolImpl created at 0x1a838c0
I0531 19:50:28.388526 1 thread_pool.cc:166] Starting thread pool { name: Master-high-pri queue_limit: 10000 max_workers: 1024 }
I0531 19:50:28.388588 1 service_pool.cc:148] yb.consensus.ConsensusService: yb::rpc::ServicePoolImpl created at 0x201eb40
I0531 19:50:28.393231 1 service_pool.cc:148] yb.tserver.RemoteBootstrapService: yb::rpc::ServicePoolImpl created at 0x201ed80
I0531 19:50:28.393501 1 webserver.cc:148] Starting webserver on 0.0.0.0:7000
I0531 19:50:28.393544 1 webserver.cc:153] Document root: /home/yugabyte/www
I0531 19:50:28.394471 1 webserver.cc:240] Webserver started. Bound to: http://0.0.0.0:7000/
I0531 19:50:28.394668 1 service_pool.cc:148] yb.server.GenericService: yb::rpc::ServicePoolImpl created at 0x201efc0
I0531 19:50:28.395015 1 rpc_server.cc:169] RPC server started. Bound to: 0.0.0.0:7100
I0531 19:50:28.420223 23 tcp_stream.cc:308] { local: 10.233.80.35:55710 remote: 172.16.0.34:7100 }: Recv failed: Network error (yb/util/net/socket.cc:537): recvmsg error: Connection refused (system error 111)
E0531 19:51:28.523921 24 async_initializer.cc:84] Failed to initialize client: Timed out (yb/rpc/rpc.cc:213): Could not locate the leader master: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 293) passed its deadline 2074493.105s (passed: 60.140s): Not found (yb/master/master_rpc.cc:284): no leader found: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 1)
W0531 19:51:29.524827 24 master.cc:186] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:6854): Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
I0531 19:51:29.524914 24 client-internal.cc:1847] New master addresses: [yb-master-black.example.com:7100,yb-master-blue.example.com:7100,yb-master-white.example.com:7100,:7100]
E0531 19:52:29.524785 24 async_initializer.cc:84] Failed to initialize client: Timed out (yb/rpc/outbound_call.cc:512): Could not locate the leader master: GetMasterRegistration RPC (request call id 2359) to 172.29.1.1:7100 timed out after 0.033s
W0531 19:52:30.525079 24 master.cc:186] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:6854): Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
I0531 19:52:30.525205 24 client-internal.cc:1847] New master addresses: [yb-master-black.example.com:7100,yb-master-blue.example.com:7100,yb-master-white.example.com:7100,:7100]
W0531 19:53:28.114395 36 master-path-handlers.cc:150] Illegal state (yb/master/catalog_manager.cc:6854): Unable to list Masters: Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
W0531 19:53:29.133951 36 master-path-handlers.cc:1002] Illegal state (yb/master/catalog_manager.cc:6854): Unable to list Masters: Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
E0531 19:53:30.625366 24 async_initializer.cc:84] Failed to initialize client: Timed out (yb/rpc/rpc.cc:213): Could not locate the leader master: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 299) passed its deadline 2074615.247s (passed: 60.099s): Not found (yb/master/master_rpc.cc:284): no leader found: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 1)
W0531 19:53:31.625660 24 master.cc:186] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:6854): Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
I0531 19:53:31.625742 24 client-internal.cc:1847] New master addresses: [yb-master-black.example.com:7100,yb-master-blue.example.com:7100,yb-master-white.example.com:7100,:7100]
W0531 19:53:34.024369 37 master-path-handlers.cc:150] Illegal state (yb/master/catalog_manager.cc:6854): Unable to list Masters: Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
E0531 19:54:31.870801 24 async_initializer.cc:84] Failed to initialize client: Timed out (yb/rpc/rpc.cc:213): Could not locate the leader master: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 300) passed its deadline 2074676.348s (passed: 60.244s): Not found (yb/master/master_rpc.cc:284): no leader found: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 1)
W0531 19:54:32.871065 24 master.cc:186] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:6854): Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
I0531 19:54:32.871222 24 client-internal.cc:1847] New master addresses: [yb-master-black.example.com:7100,yb-master-blue.example.com:7100,yb-master-white.example.com:7100,:7100]
W0531 19:55:28.190217 41 master-path-handlers.cc:1002] Illegal state (yb/master/catalog_manager.cc:6854): Unable to list Masters: Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
W0531 19:55:31.745038 42 master-path-handlers.cc:1002] Illegal state (yb/master/catalog_manager.cc:6854): Unable to list Masters: Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
E0531 19:55:33.164300 24 async_initializer.cc:84] Failed to initialize client: Timed out (yb/rpc/rpc.cc:213): Could not locate the leader master: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 299) passed its deadline 2074737.593s (passed: 60.292s): Not found (yb/master/master_rpc.cc:284): no leader found: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 1)
W0531 19:55:34.164574 24 master.cc:186] Failed to get current config: Illegal state (yb/master/catalog_manager.cc:6854): Node 5f2f6ad78d27450b8cde9c8bcf40fefa peer not initialized.
I0531 19:55:34.164667 24 client-internal.cc:1847] New master addresses: [yb-master-black.example.com:7100,yb-master-blue.example.com:7100,yb-master-white.example.com:7100,:7100]
E0531 19:56:34.315380 24 async_initializer.cc:84] Failed to initialize client: Timed out (yb/rpc/rpc.cc:213): Could not locate the leader master: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 299) passed its deadline 2074798.886s (passed: 60.150s): Not found (yb/master/master_rpc.cc:284): no leader found: GetLeaderMasterRpc(addrs: [yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, :7100], num_attempts: 1)
As far as connectivity goes, I am able to verify the LoadBalancer endpoints are responding across the different network boundaries by curling the same service endpoint but on the UI port:
[root#yb-master-0 yugabyte]# curl -I http://yb-master-blue.example.com:7000
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1975
Access-Control-Allow-Origin: *
[root#yb-master-0 yugabyte]# curl -I http://yb-master-white.example.com:7000
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1975
Access-Control-Allow-Origin: *
[root#yb-master-0 yugabyte]# curl -I http://yb-master-black.example.com:7000
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1975
Access-Control-Allow-Origin: *
What strategies are there to debug the bootstrap process?
EDIT:
Here are the startup flags for the master:
/home/yugabyte/bin/yb-master --fs_data_dirs=/mnt/disk0,/mnt/disk1 --server_broadcast_addresses=yb-master-white.example.com:7100 --master_addresses=yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, --replication_factor=3 --enable_ysql=true --rpc_bind_addresses=0.0.0.0:7100 --metric_node_name=yb-master-0 --memory_limit_hard_bytes=1824522240 --stderrthreshold=0 --num_cpus=2 --undefok=num_cpus,enable_ysql --default_memory_limit_to_ram_ratio=0.85 --leader_failure_max_missed_heartbeat_periods=10 --placement_cloud=AAAA --placement_region=XXXX --placement_zone=XXXX
/home/yugabyte/bin/yb-master --fs_data_dirs=/mnt/disk0,/mnt/disk1 --server_broadcast_addresses=yb-master-blue.example.com:7100 --master_addresses=yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, --replication_factor=3 --enable_ysql=true --rpc_bind_addresses=0.0.0.0:7100 --metric_node_name=yb-master-0 --memory_limit_hard_bytes=1824522240 --stderrthreshold=0 --num_cpus=2 --undefok=num_cpus,enable_ysql --default_memory_limit_to_ram_ratio=0.85 --leader_failure_max_missed_heartbeat_periods=10 --placement_cloud=AAAA --placement_region=YYYY --placement_zone=YYYY
/home/yugabyte/bin/yb-master --fs_data_dirs=/mnt/disk0,/mnt/disk1 --server_broadcast_addresses=yb-master-black.example.com:7100 --master_addresses=yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, --replication_factor=3 --enable_ysql=true --rpc_bind_addresses=0.0.0.0:7100 --metric_node_name=yb-master-0 --memory_limit_hard_bytes=1824522240 --stderrthreshold=0 --num_cpus=2 --undefok=num_cpus,enable_ysql --default_memory_limit_to_ram_ratio=0.85 --leader_failure_max_missed_heartbeat_periods=10 --placement_cloud=AAAA --placement_region=ZZZZ --placement_zone=ZZZZ
For the sake of completeness here is one of the k8s manifest that I've modified from one of the helm examples. It is modified to utilize LoadBalancer for the master service:
---
# Source: yugabyte/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
name: "yb-masters"
labels:
app: "yb-master"
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
type: LoadBalancer
loadBalancerIP: 172.16.0.34
ports:
- name: "rpc-port"
port: 7100
- name: "ui"
port: 7000
selector:
app: "yb-master"
---
# Source: yugabyte/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
name: "yb-tservers"
labels:
app: "yb-tserver"
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
clusterIP: None
ports:
- name: "rpc-port"
port: 7100
- name: "ui"
port: 9000
- name: "yedis-port"
port: 6379
- name: "yql-port"
port: 9042
- name: "ysql-port"
port: 5433
selector:
app: "yb-tserver"
---
# Source: yugabyte/templates/service.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: "yb-master"
namespace: "yugabytedb"
labels:
app: "yb-master"
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
serviceName: "yb-masters"
podManagementPolicy: Parallel
replicas: 1
volumeClaimTemplates:
- metadata:
name: datadir0
annotations:
volume.beta.kubernetes.io/storage-class: rook-ceph-block
labels:
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: rook-ceph-block
resources:
requests:
storage: 10Gi
- metadata:
name: datadir1
annotations:
volume.beta.kubernetes.io/storage-class: rook-ceph-block
labels:
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: rook-ceph-block
resources:
requests:
storage: 10Gi
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 0
selector:
matchLabels:
app: "yb-master"
template:
metadata:
labels:
app: "yb-master"
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
affinity:
# Set the anti-affinity selector scope to YB masters.
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- "yb-master"
topologyKey: kubernetes.io/hostname
containers:
- name: "yb-master"
image: "yugabytedb/yugabyte:2.1.6.0-b17"
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- "sh"
- "-c"
- >
mkdir -p /mnt/disk0/cores;
mkdir -p /mnt/disk0/yb-data/scripts;
if [ ! -f /mnt/disk0/yb-data/scripts/log_cleanup.sh ]; then
if [ -f /home/yugabyte/bin/log_cleanup.sh ]; then
cp /home/yugabyte/bin/log_cleanup.sh /mnt/disk0/yb-data/scripts;
fi;
fi
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
limits:
cpu: 2
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
command:
- "/home/yugabyte/bin/yb-master"
- "--fs_data_dirs=/mnt/disk0,/mnt/disk1"
- "--server_broadcast_addresses=yb-master-blue.example.com:7100"
- "--master_addresses=yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, "
- "--replication_factor=3"
- "--enable_ysql=true"
- "--rpc_bind_addresses=0.0.0.0:7100"
- "--metric_node_name=$(HOSTNAME)"
- "--memory_limit_hard_bytes=1824522240"
- "--stderrthreshold=0"
- "--num_cpus=2"
- "--undefok=num_cpus,enable_ysql"
- "--default_memory_limit_to_ram_ratio=0.85"
- "--leader_failure_max_missed_heartbeat_periods=10"
- "--placement_cloud=AAAA"
- "--placement_region=YYYY"
- "--placement_zone=YYYY"
ports:
- containerPort: 7100
name: "rpc-port"
- containerPort: 7000
name: "ui"
volumeMounts:
- name: datadir0
mountPath: /mnt/disk0
- name: datadir1
mountPath: /mnt/disk1
- name: yb-cleanup
image: busybox:1.31
env:
- name: USER
value: "yugabyte"
command:
- "/bin/sh"
- "-c"
- >
mkdir /var/spool/cron;
mkdir /var/spool/cron/crontabs;
echo "0 * * * * /home/yugabyte/scripts/log_cleanup.sh" | tee -a /var/spool/cron/crontabs/root;
crond;
while true; do
sleep 86400;
done
volumeMounts:
- name: datadir0
mountPath: /home/yugabyte/
subPath: yb-data
volumes:
- name: datadir0
hostPath:
path: /mnt/disks/ssd0
- name: datadir1
hostPath:
path: /mnt/disks/ssd1
---
# Source: yugabyte/templates/service.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: "yb-tserver"
namespace: "yugabytedb"
labels:
app: "yb-tserver"
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
serviceName: "yb-tservers"
podManagementPolicy: Parallel
replicas: 1
volumeClaimTemplates:
- metadata:
name: datadir0
annotations:
volume.beta.kubernetes.io/storage-class: rook-ceph-block
labels:
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: rook-ceph-block
resources:
requests:
storage: 10Gi
- metadata:
name: datadir1
annotations:
volume.beta.kubernetes.io/storage-class: rook-ceph-block
labels:
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
accessModes:
- "ReadWriteOnce"
storageClassName: rook-ceph-block
resources:
requests:
storage: 10Gi
updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 0
selector:
matchLabels:
app: "yb-tserver"
template:
metadata:
labels:
app: "yb-tserver"
heritage: "Helm"
release: "blue"
chart: "yugabyte"
component: "yugabytedb"
spec:
affinity:
# Set the anti-affinity selector scope to YB masters.
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- "yb-tserver"
topologyKey: kubernetes.io/hostname
containers:
- name: "yb-tserver"
image: "yugabytedb/yugabyte:2.1.6.0-b17"
imagePullPolicy: IfNotPresent
lifecycle:
postStart:
exec:
command:
- "sh"
- "-c"
- >
mkdir -p /mnt/disk0/cores;
mkdir -p /mnt/disk0/yb-data/scripts;
if [ ! -f /mnt/disk0/yb-data/scripts/log_cleanup.sh ]; then
if [ -f /home/yugabyte/bin/log_cleanup.sh ]; then
cp /home/yugabyte/bin/log_cleanup.sh /mnt/disk0/yb-data/scripts;
fi;
fi
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 500m
memory: 2Gi
command:
- "/home/yugabyte/bin/yb-tserver"
- "--fs_data_dirs=/mnt/disk0,/mnt/disk1"
- "--server_broadcast_addresses=$(HOSTNAME).yb-tservers.$(NAMESPACE).svc.cluster.local:9100"
- "--rpc_bind_addresses=$(HOSTNAME).yb-tservers.$(NAMESPACE).svc.cluster.local"
- "--cql_proxy_bind_address=$(HOSTNAME).yb-tservers.$(NAMESPACE).svc.cluster.local"
- "--enable_ysql=true"
- "--pgsql_proxy_bind_address=$(POD_IP):5433"
- "--tserver_master_addrs=yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100, "
- "--metric_node_name=$(HOSTNAME)"
- "--memory_limit_hard_bytes=3649044480"
- "--stderrthreshold=0"
- "--num_cpus=2"
- "--undefok=num_cpus,enable_ysql"
- "--leader_failure_max_missed_heartbeat_periods=10"
- "--placement_cloud=AAAA"
- "--placement_region=YYYY"
- "--placement_zone=YYYY"
- "--use_cassandra_authentication=false"
ports:
- containerPort: 7100
name: "rpc-port"
- containerPort: 9000
name: "ui"
- containerPort: 6379
name: "yedis-port"
- containerPort: 9042
name: "yql-port"
- containerPort: 5433
name: "ysql-port"
volumeMounts:
- name: datadir0
mountPath: /mnt/disk0
- name: datadir1
mountPath: /mnt/disk1
- name: yb-cleanup
image: busybox:1.31
env:
- name: USER
value: "yugabyte"
command:
- "/bin/sh"
- "-c"
- >
mkdir /var/spool/cron;
mkdir /var/spool/cron/crontabs;
echo "0 * * * * /home/yugabyte/scripts/log_cleanup.sh" | tee -a /var/spool/cron/crontabs/root;
crond;
while true; do
sleep 86400;
done
volumeMounts:
- name: datadir0
mountPath: /home/yugabyte/
subPath: yb-data
volumes:
- name: datadir0
hostPath:
path: /mnt/disks/ssd0
- name: datadir1
hostPath:
path: /mnt/disks/ssd1
This was mostly resolved (looks like I've now run into an unrelated issue), by dropping the extraneous comma on the master addresses list:
--master_addresses=yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100,
vs
--master_addresses=yb-master-black.example.com:7100, yb-master-blue.example.com:7100, yb-master-white.example.com:7100

ImagePullBackOff unauthorized: authentication required

I have gone through all the motions and I have what appears to be a common problem. Unfortunately, all of the solutions I've tried from github and SO have yet to work. Here's the error:
Warning Failed 4m (x4 over 5m) kubelet, aks-agentpool-97052351-0 Failed to pull image "ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi": [rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required]
-- created the service principal
az ad sp create-for-rbac
--scopes /subscriptions/11870e73-bdb2-47b0-bf27-25d24c41ae24/resourcegroups/USS-MicroService-Test/providers/Microsoft.ContainerRegistry/registries/UssMicroServiceRegistry
--role Reader
--name kimage-reader
-- created the secret for Kube
kubectl create secret docker-registry kimagereadersecret --docker-server ussmicroserviceregistry.azurecr.io --docker-email coreyp#united-systems.com --docker-username=kimage-reader --docker-password 4b37b896-a04e-48b4-a950-5f1abdd3e7aa
-- kubectl.exe describe pod simpledotnetapi-deployment-6fbf97df55-2hg2m
Name: simpledotnetapi-deployment-6fbf97df55-2hg2m
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: aks-agentpool-97052351-0/10.240.0.4
Start Time: Mon, 17 Jun 2019 15:22:30 -0500
Labels: app=simpledotnetapi-pod
pod-template-hash=6fbf97df55
Annotations: <none>
Status: Pending
IP: 10.240.0.26
Controlled By: ReplicaSet/simpledotnetapi-deployment-6fbf97df55
Containers:
simpledotnetapi-simpledotnetapi:
Container ID:
Image: ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi
Image ID:
Port: 5000/TCP
Host Port: 0/TCP
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hj9b5 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-hj9b5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hj9b5
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m default-scheduler Successfully assigned default/simpledotnetapi-deployment-6fbf97df55-2hg2m to aks-agentpool-97052351-0
Normal BackOff 4m (x6 over 5m) kubelet, aks-agentpool-97052351-0 Back-off pulling image "ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi"
Normal Pulling 4m (x4 over 5m) kubelet, aks-agentpool-97052351-0 pulling image "ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi"
Warning Failed 4m (x4 over 5m) kubelet, aks-agentpool-97052351-0 Failed to pull image "ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi": [rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required]
Warning Failed 4m (x4 over 5m) kubelet, aks-agentpool-97052351-0 Error: ErrImagePull
Warning Failed 24s (x22 over 5m) kubelet, aks-agentpool-97052351-0 Error: ImagePullBackOff
-- kubectl.exe get pod simpledotnetapi-deployment-6fbf97df55-2hg2m -o yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: 2019-06-17T20:22:30Z
generateName: simpledotnetapi-deployment-6fbf97df55-
labels:
app: simpledotnetapi-pod
pod-template-hash: 6fbf97df55
name: simpledotnetapi-deployment-6fbf97df55-2hg2m
namespace: default
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: simpledotnetapi-deployment-6fbf97df55
uid: a99e4ac8-8ec3-11e9-9bf8-86d46846735e
resourceVersion: "813190"
selfLink: /api/v1/namespaces/default/pods/simpledotnetapi-deployment-6fbf97df55-2hg2m
uid: a1c220a2-913d-11e9-801a-c6aef815c06a
spec:
containers:
- image: ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi
imagePullPolicy: Always
name: simpledotnetapi-simpledotnetapi
ports:
- containerPort: 5000
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-hj9b5
readOnly: true
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: kimagereadersecret
nodeName: aks-agentpool-97052351-0
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: default-token-hj9b5
secret:
defaultMode: 420
secretName: default-token-hj9b5
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2019-06-17T20:22:30Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2019-06-17T20:22:30Z
message: 'containers with unready status: [simpledotnetapi_simpledotnetapi]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2019-06-17T20:22:30Z
message: 'containers with unready status: [simpledotnetapi_simpledotnetapi]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2019-06-17T20:22:30Z
status: "True"
type: PodScheduled
containerStatuses:
- image: ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi
imageID: ""
lastState: {}
name: simpledotnetapi-simpledotnetapi
ready: false
restartCount: 0
state:
waiting:
message: Back-off pulling image "ussmicroserviceregistry.azurecr.io/simpledotnetapi-simpledotnetapi"
reason: ImagePullBackOff
hostIP: 10.240.0.4
phase: Pending
podIP: 10.240.0.26
qosClass: BestEffort
startTime: 2019-06-17T20:22:30Z
-- yaml configuration file
apiVersion: apps/v1
kind: Deployment
metadata:
name: simpledotnetapi-deployment
spec:
replicas: 3
selector:
matchLabels:
app: simpledotnetapi-pod
template:
metadata:
labels:
app: simpledotnetapi-pod
spec:
imagePullSecrets:
- name: kimagereadersecret
containers:
- name: simpledotnetapi_simpledotnetapi
image: ussmicroserviceregistry.azurecr.io/simpledotnetapi-simpledotnetapi
ports:
- containerPort: 5000
---
apiVersion: v1
kind: Service
metadata:
name: simpledotnetapi-service
spec:
type: LoadBalancer
ports:
- port: 80
selector:
app: simpledotnetapi
type: front-end
-- output of kubectl get secret kimagereadersecret
NAME TYPE DATA AGE
kimagereadersecret kubernetes.io/dockerconfigjson 1 1h
-- credentials/secret from Kube dashboard
{
"kind": "Secret",
"apiVersion": "v1",
"metadata": {
"name": "kimagereadersecret",
"namespace": "default",
"selfLink": "/api/v1/namespaces/default/secrets/kimagereadersecret",
"uid": "86006aff-9156-11e9-801a-c6aef815c06a",
"resourceVersion": "830006",
"creationTimestamp": "2019-06-17T23:20:41Z"
},
"data": {
".dockerconfigjson": "eyJhdXRocyI6eyJ1c3NtaWNyb3NlcnZpY2VyZWdpc3RyeS5henVyZWNyLmlvIjp7InVzZXJuYW1lIjoiMzNjYjBjZTQtOTVmMC00NGJkLWJiYmYtNTZkNTA2ZmY0ZWIzIiwicGFzc3dvcmQiOiI0YjM3Yjg5Ni1hMDRlLTQ4YjQtYTk1MC01ZjFhYmRkM2U3YWEiLCJlbWFpbCI6ImNvcmV5cEB1bml0ZWQtc3lzdGVtcy5jb20iLCJhdXRoIjoiTXpOallqQmpaVFF0T1RWbU1DMDBOR0prTFdKaVltWXROVFprTlRBMlptWTBaV0l6T2pSaU16ZGlPRGsyTFdFd05HVXRORGhpTkMxaE9UVXdMVFZtTVdGaVpHUXpaVGRoWVE9PSJ9fX0="
},
"type": "kubernetes.io/dockerconfigjson"
}
-- Full dump from the Kube Dashboard
Failed to pull image "ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi": [rpc error: code = Unknown desc = Error response from daemon: manifest for ussmicroserviceregistry.azurecr.io/simpledotnetapi_simpledotnetapi:latest not found: manifest unknown: manifest unknown, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required, rpc error: code = Unknown desc = Error response from daemon: Get https://ussmicroserviceregistry.azurecr.io/v2/simpledotnetapi_simpledotnetapi/manifests/latest: unauthorized: authentication required]
The entire project is in GitHub # https://github.com/coreyperkins/KubeSimpleDotNetApi
-- ACR screenshot
-- Pod Failure in Kube
I'm fairly certain you didn't give it enough permissions:
az ad sp create-for-rbac
--scopes /subscriptions/11870e73-bdb2-47b0-bf27-25d24c41ae24/resourcegroups/USS-MicroService-Test/providers/Microsoft.ContainerRegistry/registries/UssMicroServiceRegistry
--role Reader
--name kimage-reader
role should be acrpull, not reader. and just delete this secret: `kimagereadersecret 1 and reference to it in the pod. kubernetes will handle that for you.
Looks like you may be missing the kimagereadersecret in your Kubernetes cluster. As I understand az ad sp create-for-rbac just creates access to Azure resources, but how does k8s know which credentials to use to pull from the registry? You can follow this to create the registry secret. You can check that it exists with:
$ kubectl get secret kimagereadersecret
In your case, it could be that it's defaulting to no credentials or using whatever you have configured for Docker which doesn't have access to ussmicroserviceregistry.azurecr.io/simpledotnetapi-simpledotnetapi
For your issue, maybe it's just a mistake that you make. All the things you have done is OK. Just in the deployment, you need to change the image with a tag like below:
image: ussmicroserviceregistry.azurecr.io/simpledotnetapi-simpledotnetapi:tag
Set the tag the same as you set in the ACR. Then it will work well. If you do not set tag, then it will use the default tag latest and it probably is not right.

Resources