Airflow Kubernetes executor , Tasks in queued state and pod says invalid image and uses local executor - python-3.x

strong textI have installed airflow in docker and using kubernetes executor unable to trigger dags. Dag runs are in queued state. KubernetesExecutor is creating pod but it says invalid image. If i describe the pod it uses local executor instead of kubernetes executor. Please help
Attaching log file for reference
**kubectl describe pod tablescreationschematablescreation-ecabd38a66664a33b6645a72ef056edc
Name: swedschematablescreationschematablescreation-ecabd38a66664a33b6645a72ef056edc
Namespace: default
Priority: 0
Node: 10.73.96.181
Start Time: Mon, 11 May 2020 18:22:15 +0530
Labels: airflow-worker=5888feda-6aee-49c8-a94b-39cbe5758062
airflow_version=1.10.10
dag_id=Swed-schema-tables-creation
execution_date=2020-05-11T12_52_09.829627_plus_00_00
kubernetes_executor=True
task_id=Schema_Tables_Creation
try_number=1
Annotations: <none>
Status: Pending
IP: 172.17.0.46
IPs:
IP: 172.17.0.46
Containers:
base:
Container ID:
Image: :
Image ID:
Port: <none>
Host Port: <none>
Command:
airflow
run
Swed-schema-tables-creation
Schema_Tables_Creation
2020-05-11T12:52:09.829627+00:00
--local
--pool
default_pool
-sd
/root/airflow/dags/User_Creation_dag.py
State: Waiting
Reason: InvalidImageName
Ready: False
Restart Count: 0
Environment:
AIRFLOW__CORE__EXECUTOR: LocalExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql://airflowkube:airflowkube#10.73.96.181:5434/airflowkube
Mounts:
/root/airflow/dags from airflow-dags (ro)
/root/airflow/logs from airflow-logs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-64cxg (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
airflow-dags:
Type: HostPath (bare host directory volume)
Path: /data/Naveen/Airflow/dags
HostPathType:
airflow-logs:
Type: HostPath (bare host directory volume)
Path: /data/Naveen/Airflow/Logs
HostPathType:
default-token-64cxg:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-64cxg
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/swedschematablescreationschematablescreation-ecabd38a66664a33b6645a72ef056edc to evblfnclnullnull1538
Warning Failed 2m15s (x12 over 4m28s) kubelet, evblfnclnullnull1538 Error: InvalidImageName
Warning InspectFailed 2m (x13 over 4m28s) kubelet, evblfnclnullnull1538 Failed to apply default image tag ":": couldn't parse image reference ":": invalid reference format**

It looks like there is no image defined for the workers. You can set this in the airflow.cfg. Be sure to set these worker_container_repository and worker_container_tag. These are null by default which results in the behaviour you're experiencing.
A working example:
AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: "apache/airflow"
AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: "1.10.10-python3.6"
AIRFLOW__KUBERNETES__RUN_AS_USER: "50000"
The airflow docs have some more detailed info on the available properties: https://airflow.apache.org/docs/stable/configurations-ref.html#kubernetes

Related

Kubernetes nfs-subdir-external-provisioner stuck in ContainerCreating / Unable to attach or mount volumes: unmounted volumes=[nfs-client-root]

I'm trying to install a nfs-client-provisioner and run a mongdb with it.
Unfortunately, the nfs-client-provisioner hangs in ContainerCreating and says "Warning FailedMount 3m35s (x13 over 37m) kubelet Unable to attach or mount volumes: unmounted volumes=[nfs-client-root], unattached volumes=[nfs-client-root kube-api-access-lr9tl]: timed out waiting for the condition
".
The nfs server is configured on the same VPS machine (Debian 10).
I am able to mount and write files on the nfs server from a second vps with debian 10.
The cluster is setup with K0s
I have an error with the helm chart and manual installation.
Any help is apechiatet1
For Some more info see below:
Helm version:
version.BuildInfo{Version:"v3.8.2", GitCommit:"6e3701edea09e5d55a8ca2aae03a68917630e91b", GitTreeState:"clean", GoVersion:"go1.17.5"}
Kubernetes version:
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:49:13Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5+k0s", GitCommit:"5ab78974affb1a76f1e5687aaa8b02aeac4380b8", GitTreeState:"clean", BuildDate:"2022-03-24T22:59:27Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}
k0s version: v1.23.5+k0s.0
worker added with:
token=$(k0s token create --role=worker)
docker run -d --name k0s-worker1 --hostname k0s-worker1 --privileged -v /var/lib/k0s docker.io/k0sproject/k0s:latest k0s worker $token
kubectl get nodes
NAME STATUS ROLES AGE VERSION
k0s-worker9 Ready <none> 42m v1.23.5+k0s
v2202204173709187201 Ready control-plane 43m v1.23.5+k0s
kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
managed-nfs-storage k8s-sigs.io/nfs-subdir-external-provisioner Delete Immediate false 40m
/ect/exports
/data/nfs-storage *(rw,sync,no_root_squash,no_subtree_check,insecure)
Output with
sudo k0s kubectl describe pod nfs-client-provisioner-6889579fdb-t7j74
Name: nfs-client-provisioner-6889579fdb-t7j74
Namespace: default
Priority: 0
Node: k0s-worker9/172.17.0.2
Start Time: Tue, 26 Apr 2022 08:45:49 +0200
Labels: app=nfs-client-provisioner
pod-template-hash=6889579fdb
Annotations: kubernetes.io/psp: 00-k0s-privileged
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/nfs-client-provisioner-6889579fdb
Containers:
nfs-client-provisioner:
Container ID:
Image: gcr.io/k8s-staging-sig-storage/nfs-subdir-external-provisioner:v4.0.1
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
PROVISIONER_NAME: k8s-sigs.io/nfs-subdir-external-provisioner
NFS_SERVER: 47.122.181.39
NFS_PATH: /data/nfs-storage
Mounts:
/persistentvolumes from nfs-client-root (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lr9tl (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nfs-client-root:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 47.122.181.39
Path: /data/nfs-storage
ReadOnly: false
kube-api-access-lr9tl:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18m default-scheduler Successfully assigned default/nfs-client-provisioner-6889579fdb-t7j74 to k0s-worker9
Warning FailedMount 2m42s (x6 over 16m) kubelet Unable to attach or mount volumes: unmounted volumes=[nfs-client-root], unattached volumes=[nfs-client-root kube-api-access-lr9tl]: timed out waiting for the condition
Warning FailedMount 24s (x2 over 7m14s) kubelet Unable to attach or mount volumes: unmounted volumes=[nfs-client-root], unattached volumes=[kube-api-access-lr9tl nfs-client-root]: timed out waiting for the condition
command using helm:
helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
--set nfs.server=47.122.181.39 \
--set nfs.path=/data/nfs-storage
without helm:
from https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner/tree/v4.0.2/deploy
kubectl create -f rbac.yaml
kubectl create -f class.yaml
kubectl create -f deployment.yaml
class.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: managed-nfs-storage
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
archiveOnDelete: "false"
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
labels:
app: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: nfs-client-provisioner
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: gcr.io/k8s-staging-sig-storage/nfs-subdir-external-provisioner:v4.0.1
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: k8s-sigs.io/nfs-subdir-external-provisioner
- name: NFS_SERVER
value: 47.122.181.39
- name: NFS_PATH
value: /data/nfs-storage
volumes:
- name: nfs-client-root
nfs:
server: 47.122.181.39
path: /data/nfs-storage
K0s cluster setup:
sudo curl -sSLf https://get.k0s.sh | sudo sh
sudo k0s install controller --enable-worker
sudo k0s start
sudo cp /var/lib/k0s/pki/admin.conf ~/admin.conf
export KUBECONFIG=~/admin.conf
token=$(k0s token create --role=worker)
docker run -d --name k0s-worker9 --hostname k0s-worker9 --privileged -v /var/lib/k0s docker.io/k0sproject/k0s:latest k0s worker $token
Have you tried:
nfs.mountOptions = {
nfsvers = 4
}
I also used this provisioner. It works only with nfs4. And also check mount point of your NFS server if it's root / or not. If your provisioner is configured and ready try to recreate pvc.

Running containers issue

I have a little windows .exe deployed in azure kubernetes cluster. When I run kubectl get podsI get the following result,
NAME READY STATUS RESTARTS AGE
sample-deploy-548d6b9c6b-8v2nb 0/1 CrashLoopBackOff 5 6m12s
sample-deploy-548d6b9c6b-fpmz9 0/1 CrashLoopBackOff 5 6m12s
sample-deploy-548d6b9c6b-hgsj7 0/1 CrashLoopBackOff 5 6m12s
When I run kubectl describe pod sample-deploy-548d6b9c6b-8v2nb I get the following details
Name: sample-deploy-548d6b9c6b-8v2nb
Namespace: default
Priority: 0
Node: akswin000000/10.240.0.35
Start Time: Thu, 02 Jul 2020 16:59:02 +0100
Labels: app=sampleservice
pod-template-hash=548d6b9c6b
Annotations: <none>
Status: Running
IP: 10.240.0.45
IPs: <none>
Controlled By: ReplicaSet/sample-deploy-548d6b9c6b
Containers:
sampleservice:
Container ID: docker://3d22a9e647d4652227a9986f6940c6806e477f0b790a74f5795840131cc861ca
Image: samplekube.azurecr.io/sample:v1
Image ID: docker-pullable://samplekube.azurecr.io/sample#sha256:a814e92d5af97b8cfbd6cd0789e164858848f82f0316a771670382ce0bbcba92
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: ContainerCannotRun
Message: hcsshim::CreateComputeSystem 3d22a9e647d4652227a9986f6940c6806e477f0b790a74f5795840131cc861ca: The container operating system does not match the host operating system.
(extra info: {"SystemType":"Container","Name":"3d22a9e647d4652227a9986f6940c6806e477f0b790a74f5795840131cc861ca","Owner":"docker","VolumePath":"\\\\?\\Volume{58649455-b9a5-4d00-b151-485ec8ab6006}","IgnoreFlushesDuringBoot":true,"LayerFolderPath":"C:\\ProgramData\\docker\
\windowsfilter\\3d22a9e647d4652227a9986f6940c6806e477f0b790a74f5795840131cc861ca","Layers":[{"ID":"7d7579eb-d8f7-5314-b6a0-399937aee9ca","Path":"C:\\ProgramData\\docker\\windowsfilter\\e0357f9d6b48e4b580a09cefedec8aac329894b57a49a30f9dc27795a1626aca"},{"ID":"f9bd195c-3ff
c-5c98-9713-1a7658666667","Path":"C:\\ProgramData\\docker\\windowsfilter\\019404385f250e8807ea3b693e35813b3328b3a14e83da51e8119401f0d20f9f"},{"ID":"0d763990-3499-5a19-b5e9-5e0788397f83","Path":"C:\\ProgramData\\docker\\windowsfilter\\3be0598c3fa3671a1436c670b6964c0a30ddc
2bd2e4011f347e6ef503888826a"},{"ID":"88fb7b4f-d24a-5ddf-9b67-861041ffef72","Path":"C:\\ProgramData\\docker\\windowsfilter\\978600b419ddd768b0b03c09e198d7b8d411cc6ca63b5ba15b6cc5343bb8b2a7"}],"ProcessorWeight":5000,"HostName":"sample-deploy-548d6b9c6b-8v2nb","MappedDirect
ories":[{"HostPath":"c:\\var\\lib\\kubelet\\pods\\8257607b-9506-42af-9068-a3965bb46648\\volumes\\kubernetes.io~secret\\default-token-9wzn2","ContainerPath":"c:\\var\\run\\secrets\\kubernetes.io\\serviceaccount","ReadOnly":true,"BandwidthMaximum":0,"IOPSMaximum":0,"Create
InUtilityVM":false}],"HvPartition":false,"NetworkSharedContainerName":"fbd7d679302c57485ca7d4842528077fbb09e43ad691f47dd4cc84cbd8d3e3db"})
Exit Code: 128
Started: Thu, 02 Jul 2020 16:59:37 +0100
Finished: Thu, 02 Jul 2020 16:59:37 +0100
Ready: False
Restart Count: 2
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9wzn2 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-9wzn2:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-9wzn2
Optional: false
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=windows
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 67s default-scheduler Successfully assigned default/sample-deploy-548d6b9c6b-8v2nb to akswin000000
Warning BackOff 18s (x2 over 45s) kubelet, akswin000000 Back-off restarting failed container
Normal Pulling 7s (x4 over 60s) kubelet, akswin000000 Pulling image "samplekube.azurecr.io/sample:v1"
Normal Pulled 6s (x4 over 57s) kubelet, akswin000000 Successfully pulled image "samplekube.azurecr.io/sample:v1"
Normal Created 5s (x4 over 57s) kubelet, akswin000000 Created container sampleservice
Warning Failed 5s (x4 over 56s) kubelet, akswin000000 Error: failed to start container "sampleservice": Error response from daemon: hcsshim::CreateComputeSystem sampleservice: The container operating system does not match the host operating system.
(extra info: {"SystemType":"Container","Name":"sampleservice","Owner":"docker","VolumePath":"\\\\?\\Volume{58649455-b9a5-4d00-b151-485ec8ab6006}","IgnoreFlushesDuringBoot":true,"LayerFolderPath":"C:\\ProgramData\\docker\\windowsfilter\\sampleservice","Layers":[{"ID":"7d7
579eb-d8f7-5314-b6a0-399937aee9ca","Path":"C:\\ProgramData\\docker\\windowsfilter\\e0357f9d6b48e4b580a09cefedec8aac329894b57a49a30f9dc27795a1626aca"},{"ID":"f9bd195c-3ffc-5c98-9713-1a7658666667","Path":"C:\\ProgramData\\docker\\windowsfilter\\019404385f250e8807ea3b693e35
813b3328b3a14e83da51e8119401f0d20f9f"},{"ID":"0d763990-3499-5a19-b5e9-5e0788397f83","Path":"C:\\ProgramData\\docker\\windowsfilter\\3be0598c3fa3671a1436c670b6964c0a30ddc2bd2e4011f347e6ef503888826a"},{"ID":"88fb7b4f-d24a-5ddf-9b67-861041ffef72","Path":"C:\\ProgramData\\do
cker\\windowsfilter\\978600b419ddd768b0b03c09e198d7b8d411cc6ca63b5ba15b6cc5343bb8b2a7"}],"ProcessorWeight":5000,"HostName":"sample-deploy-548d6b9c6b-8v2nb","MappedDirectories":[{"HostPath":"c:\\var\\lib\\kubelet\\pods\\8257607b-9506-42af-9068-a3965bb46648\\volumes\\kuber
netes.io~secret\\default-token-9wzn2","ContainerPath":"c:\\var\\run\\secrets\\kubernetes.io\\serviceaccount","ReadOnly":true,"BandwidthMaximum":0,"IOPSMaximum":0,"CreateInUtilityVM":false}],"HvPartition":false,"NetworkSharedContainerName":"fbd7d679302c57485ca7d4842528077
fbb09e43ad691f47dd4cc84cbd8d3e3db"})
It seems to me is try to run the containers under linux instead of windows, I have both a windows and linux nodes in my cluster. How can I resolve this. Thanks
Below the kubectl get nodes -o wide --show-labels
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME LABELS
aks-agentpool-38156504-vmss000000 Ready agent 5h17m v1.15.11 10.240.0.4 <none> Ubuntu 16.04.6 LTS 4.15.0-1089-azure docker://3.0.10+azure agentpool=agentpool,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_DS2_v2,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=westeurope,failure-domain.beta.kubernetes.io/zone=0,kubernetes.azure.com/cluster=MC_testmass_anthonycluster_westeurope,kubernetes.azure.com/mode=system,kubernetes.azure.com/node-image-version=AKSUbuntu-1604-2020.06.18,kubernetes.azure.com/role=agent,kubernetes.io/arch=amd64,kubernetes.io/hostname=aks-agentpool-38156504-vmss000000,kubernetes.io/os=linux,kubernetes.io/role=agent,node-role.kubernetes.io/agent=,storageprofile=managed,storagetier=Premium_LRS
akswin000000 Ready agent 5h14m v1.15.11 10.240.0.35 <none> Windows Server 2019 Datacenter 10.0.17763.1282 docker://19.3.5 agentpool=win,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=Standard_DS2_v2,beta.kubernetes.io/os=windows,failure-domain.beta.kubernetes.io/region=westeurope,failure-domain.beta.kubernetes.io/zone=0,kubernetes.azure.com/cluster=MC_testmass_anthonycluster_westeurope,kubernetes.azure.com/node-image-version=AKSWindows-2019-17763.1282.200610,kubernetes.azure.com/role=agent,kubernetes.io/arch=amd64,kubernetes.io/hostname=akswin000000,kubernetes.io/os=windows,kubernetes.io/role=agent,node-role.kubernetes.io/agent=,storageprofile=managed,storagetier=Premium_LRS
this is the yaml file
apiVersion: apps/v1
kind: Deployment
metadata:
name: sample-deploy
labels:
app: sampleservice
spec:
replicas: 3
template:
metadata:
name: sampleservice
labels:
app: sampleservice
spec:
nodeSelector:
"kubernetes.io/os": windows
containers:
- name: sampleservice
image: samplekube.azurecr.io/sample:v1
imagePullPolicy: Always
restartPolicy: Always
selector:
matchLabels:
app: sampleservice
---
apiVersion: v1
kind: Service
metadata:
name: sample-service
spec:
selector:
app: sampleservice
ports:
- port: 80
type: LoadBalancer
This is what I am getting now when describe a pod,
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 94s default-scheduler Successfully assigned default/sample-deploy-6d4b86bf46-djtvf to aksnpwin000000
Normal Pulled 31s (x4 over 82s) kubelet, aksnpwin000000 Container image "masskube.azurecr.io/sample2:v1" already present on machine
Normal Created 31s (x4 over 82s) kubelet, aksnpwin000000 Created container sampleservice
Normal Started 29s (x4 over 79s) kubelet, aksnpwin000000 Started container sampleservice
Warning BackOff 3s (x5 over 59s) kubelet, aksnpwin000000 Back-off restarting failed container
And this is what I get on the after running kubectl logs podname
'Sample.exe' is not recognized as an internal or external command,
operable program or batch file.
I have managed to resolve the issue by amending the docker file as follows,
FROM mcr.microsoft.com/dotnet/framework/runtime:4.8-windowsservercore-ltsc2019
WORKDIR /app
EXPOSE 80
COPY /bin/Release .
ENTRYPOINT ["Sample.exe"]
Thanks everyone for the help
You need to set a node selector for your deployment’s template, like this:
nodeSelector:
kubernetes.io/os: windows
You have got nodeSelector kubernetes.io/os: windows in the deployment but the windows node has got label beta.kubernetes.io/os=windows. The nodeSelector and label need to exactly match.
You need to have nodeSelector as below in the pod spec to schedule the pod on windows node.
apiVersion: apps/v1
kind: Deployment
metadata:
name: sample-deploy
labels:
app: sampleservice
spec:
replicas: 3
template:
metadata:
name: sampleservice
labels:
app: sampleservice
spec:
nodeSelector:
"beta.kubernetes.io/os": windows
containers:
- name: sampleservice
image: samplekube.azurecr.io/sample:v1
imagePullPolicy: Always
restartPolicy: Always
selector:
matchLabels:
app: sampleservice
https://learn.microsoft.com/en-us/azure/aks/windows-container-cli

Azure Container Registry image pulls are very slow with image size ~150 MBs

When deploying an image to an AKS instance, the image pull from the ACR (Premium SKU) is very slow, even for "small" images around ~150 MBs in size.
Both the AKS resource and the ACR resource are in the Canada East region.
Here is an example:
root#076fff2831b2:/tmp# kubectl describe pod application-service-59bcf96874-pvrmb
Name: application-service-59bcf96874-pvrmb
Namespace: default
Priority: 0
Node: aks-41067869-1/10.255.13.163
Start Time: Tue, 11 Feb 2020 18:15:53 -0500
Labels: app.kubernetes.io/instance=application-service
app.kubernetes.io/name=application-service
pod-template-hash=59bcf96874
Annotations: <none>
Status: Running
IP: 10.255.13.175
IPs: <none>
Controlled By: ReplicaSet/application-service-59bcf96874
Containers:
application-service:
Container ID: docker://0e86526a293d9055d482a09f043f0be68c594244fe4216f8fb190bc2caf6b65b
Image: myacr01.azurecr.io/microservices/application-service:0.0.6
Image ID: docker-pullable://myacr01.azurecr.io/microservices/application-service#sha256:cfbb3ffa7adc52da9cc0b8d7f78376076ea712025b59df8e406c559d369f4085
Port: 3000/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 11 Feb 2020 18:35:00 -0500
Finished: Tue, 11 Feb 2020 18:35:00 -0500
Ready: False
Restart Count: 5
Liveness: http-get https://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
PORT: 3000
undefined: undefined
Mounts:
/kvmnt from application-service-kv-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from application-service-token-9jk8j (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
application-service-kv-volume:
Type: FlexVolume (a generic volume resource that is provisioned/attached using an exec based plugin)
Driver: azure/kv
FSType:
SecretRef: &LocalObjectReference{Name:kvcreds,}
ReadOnly: false
Options: map[keyvaultname:testIt2 keyvaultobjectnames:APPLICATION-SVC-SQLDB-CS;INGESTION-CONSUMER-EHB-CS;INGESTION-PRODUCER-EHB-CS keyvaultobjecttypes:secret;secret;secret tenantid:REMOVED usepodidentity:false usevmmanagedidentity:false]
application-service-token-9jk8j:
Type: Secret (a volume populated by a Secret)
SecretName: application-service-token-9jk8j
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 20m default-scheduler Successfully assigned default/application-service-59bcf96874-pvrmb to aks-41067869-1
Normal Pulling 20m kubelet, aks-41067869-1 Pulling image "myacr01.azurecr.io/microservices/application-service:0.0.6"
Normal Pulled 4m39s kubelet, aks-41067869-1 Successfully pulled image "myacr01.azurecr.io/microservices/application-service:0.0.6"
Normal Started 3m36s (x4 over 4m33s) kubelet, aks-41067869-1 Started container application-service
Warning BackOff 3m4s (x11 over 4m30s) kubelet, aks-41067869-1 Back-off restarting failed container
Normal Pulled 2m52s (x4 over 4m32s) kubelet, aks-41067869-1 Container image "myacr01.azurecr.io/microservices/application-service:0.0.6" already present on machine
Normal Created 2m51s (x5 over 4m33s) kubelet, aks-41067869-1 Created container application-service
Some details were modified/removed for privacy reasons.
However, the thing to note is the ~15m needed to go from a state of "Pulling" to "Pulled" for an image from an ACR.
This issue is occurring daily. The Azure Insights blade of the AKS instance shows a maximum of 26% node CPU and 14.32% node memory utilization over the last 7 days.
How we can go about troubleshooting this further to determine the possible causes of delays?
Any help is greatly appreciated.
Thanks!

kubernetes giving CrashLoopBackOff error while creating pods

I'm creating a pod of node container, and it is giving CrashLoopBackOff error.
kubectl get pods
kubectl describe pod test-node3
Any help would be appreciated.
You can add command as below so that pod will remain in running state.
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox
command: ['sh', '-c', 'echo Hello Kubernetes! && sleep 3600']
Ref: Doc
Your container does not have a long running process. The main process in the container is exiting with exit code 0 which usually means that the process has terminated successfully. You can see it in the kubectl describe output you have shared.
Reason: Completed
Exit Code: 0
Once Pod is assigned to a node by scheduler, kubelet starts creating containers using container runtime. There are three possible states of containers: Waiting, Running and Terminated.
Terminated: Indicates that the container completed its execution and has stopped running.
A container enters into this when it has successfully completed execution or when it has failed for some reason. Regardless, a reason and exit code is displayed, as well as the container’s start and finish time.
On your screenshot its clear that container inside pod is running to completion with its work done, with exit code 0 as below snippet
State: Terminated
Reason: Completed
Exit Code: 0
You should either add a long running process to your container or define restartPolicy: Never on pod definition.
Tested your image with adding correct restart policy and POD runs correctly to completion with no crash
kubectl run test --image=abhishekk27/kube-pub:new --restart=Never
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test 0/1 Completed 0 8m12s
yaml genrated :
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: test
name: test
spec:
containers:
- image: abhishekk27/kube-pub:new
name: test
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Never
Result:
$ kubectl describe pod test
Name: test
Namespace: default
Priority: 0
Node: dlv-k8s-node-1/131.160.200.104
Start Time: Fri, 17 Jan 2020 09:45:00 +0000
Labels: run=test
Annotations: <none>
Status: Succeeded
IP: 10.244.1.12
IPs:
IP: 10.244.1.12
Containers:
test:
Container ID: docker://b335e5fef022dced824f85ba2bfe4c024608c9b5463599eb36591a14d709786d
Image: abhishekk27/kube-pub:new
Image ID: docker-pullable://abhishekk27/kube-pub#sha256:6a696bd733edaa48b9be781960f4ee178d16f1c9aea51e53bd0f54326a3d05b1
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 17 Jan 2020 09:45:48 +0000
Finished: Fri, 17 Jan 2020 09:45:48 +0000
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-7f4mt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-7f4mt:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-7f4mt
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m50s default-scheduler Successfully assigned default/test to dlv-k8s-node-1
Normal Pulling 6m46s kubelet, dlv-k8s-node-1 Pulling image "abhishekk27/kube-pub:new"
Normal Pulled 5m58s kubelet, dlv-k8s-node-1 Successfully pulled image "abhishekk27/kube-pub:new"
Normal Created 5m58s kubelet, dlv-k8s-node-1 Created container test
Normal Started 5m58s kubelet, dlv-k8s-node-1 Started container test

how to solve EADDRINUSE in GKE container?

I am new to containers and using GKE. I used to run my node server app with npm run debug and am trying to do this as well on GKE using the shell of my container. When I log into the shell of myapp container and do this I get:
> api_server#0.0.0 start /usr/src/app
> node src/
events.js:167
throw er; // Unhandled 'error' event
^
Error: listen EADDRINUSE :::8089
Normally I deal with this using something like killall -9 node but when I do this it looks like I am kicked out of my shell and the container is restarted by kubernetes. It seems node is using the port already or something:
netstat -tulpn | grep 8089
tcp 0 0 :::8089 :::* LISTEN 23/node
How can I start my server from the shell?
My config files:
Dockerfile:
FROM node:10-alpine
RUN apk add --update \
libc6-compat
WORKDIR /usr/src/app
COPY package*.json ./
COPY templates-mjml/ templates-mjml/
COPY public/ public/
COPY src/ src/
COPY data/ data/
COPY config/ config/
COPY migrations/ migrations/
ENV NODE_ENV 'development'
ENV PORT '8089'
RUN npm install --development
myapp.yaml:
apiVersion: v1
kind: Service
metadata:
name: myapp
labels:
app: myapp
spec:
ports:
- port: 8089
name: http
selector:
app: myapp
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: gcr.io/myproject-224713/firstapp:v4
ports:
- containerPort: 8089
env:
- name: POSTGRES_DB_HOST
value: 127.0.0.1:5432
- name: POSTGRES_DB_USER
valueFrom:
secretKeyRef:
name: mysecret
key: username
- name: POSTGRES_DB_PASSWORD
valueFrom:
secretKeyRef:
name: mysecret
key: password
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/cloud_sql_proxy",
"-instances=myproject-224713:europe-west4:mydatabase=tcp:5432",
"-credential_file=/secrets/cloudsql/credentials.json"]
securityContext:
runAsUser: 2
allowPrivilegeEscalation: false
volumeMounts:
- name: cloudsql-instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
volumes:
- name: cloudsql-instance-credentials
secret:
secretName: cloudsql-instance-credentials
---
myrouter.yaml:
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: myapp-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- "*"
gateways:
- myapp-gateway
http:
- match:
- uri:
prefix: /
route:
- destination:
host: myapp
weight: 100
websocketUpgrade: true
EDIT:
I got following logs:
EDIT 2:
After adding a featherjs health service I get following output for describe:
Name: myapp-95df4dcd6-lptnq
Namespace: default
Node: gke-standard-cluster-1-default-pool-59600833-pcj3/10.164.0.3
Start Time: Wed, 02 Jan 2019 22:08:33 +0100
Labels: app=myapp
pod-template-hash=518908782
Annotations: kubernetes.io/limit-ranger: LimitRanger plugin set: cpu request for container myapp; cpu request for container cloudsql-proxy
sidecar.istio.io/status:
{"version":"3c9617ff82c9962a58890e4fa987c69ca62487fda71c23f3a2aad1d7bb46c748","initContainers":["istio-init"],"containers":["istio-proxy"]...
Status: Running
IP: 10.44.3.17
Controlled By: ReplicaSet/myapp-95df4dcd6
Init Containers:
istio-init:
Container ID: docker://768b2327c6cfa57b3d25a7029e52ce6a88dec6848e91dd7edcdf9074c91ff270
Image: gcr.io/gke-release/istio/proxy_init:1.0.2-gke.0
Image ID: docker-pullable://gcr.io/gke-release/istio/proxy_init#sha256:e30d47d2f269347a973523d0c5d7540dbf7f87d24aca2737ebc09dbe5be53134
Port: <none>
Host Port: <none>
Args:
-p
15001
-u
1337
-m
REDIRECT
-i
*
-x
-b
8089,
-d
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 02 Jan 2019 22:08:34 +0100
Finished: Wed, 02 Jan 2019 22:08:35 +0100
Ready: True
Restart Count: 0
Environment: <none>
Mounts: <none>
Containers:
myapp:
Container ID: docker://5566a3e8242ec6755dc2f26872cfb024fab42d5f64aadc3db1258fcb834f8418
Image: gcr.io/myproject-224713/firstapp:v4
Image ID: docker-pullable://gcr.io/myproject-224713/firstapp#sha256:0cbd4fae0b32fa0da5a8e6eb56cb9b86767568d243d4e01b22d332d568717f41
Port: 8089/TCP
Host Port: 0/TCP
State: Running
Started: Wed, 02 Jan 2019 22:09:19 +0100
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 02 Jan 2019 22:08:35 +0100
Finished: Wed, 02 Jan 2019 22:09:19 +0100
Ready: False
Restart Count: 1
Requests:
cpu: 100m
Liveness: http-get http://:8089/health delay=15s timeout=20s period=10s #success=1 #failure=3
Readiness: http-get http://:8089/health delay=5s timeout=5s period=10s #success=1 #failure=3
Environment:
POSTGRES_DB_HOST: 127.0.0.1:5432
POSTGRES_DB_USER: <set to the key 'username' in secret 'mysecret'> Optional: false
POSTGRES_DB_PASSWORD: <set to the key 'password' in secret 'mysecret'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9vtz5 (ro)
cloudsql-proxy:
Container ID: docker://414799a0699abe38c9759f82a77e1a3e06123714576d6d57390eeb07611f9a63
Image: gcr.io/cloudsql-docker/gce-proxy:1.11
Image ID: docker-pullable://gcr.io/cloudsql-docker/gce-proxy#sha256:5c690349ad8041e8b21eaa63cb078cf13188568e0bfac3b5a914da3483079e2b
Port: <none>
Host Port: <none>
Command:
/cloud_sql_proxy
-instances=myproject-224713:europe-west4:osm=tcp:5432
-credential_file=/secrets/cloudsql/credentials.json
State: Running
Started: Wed, 02 Jan 2019 22:08:36 +0100
Ready: True
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/secrets/cloudsql from cloudsql-instance-credentials (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9vtz5 (ro)
istio-proxy:
Container ID: docker://898bc95c6f8bde18814ef01ce499820d545d7ea2d8bf494b0308f06ab419041e
Image: gcr.io/gke-release/istio/proxyv2:1.0.2-gke.0
Image ID: docker-pullable://gcr.io/gke-release/istio/proxyv2#sha256:826ef4469e4f1d4cabd0dc846f9b7de6507b54f5f0d0171430fcd3fb6f5132dc
Port: <none>
Host Port: <none>
Args:
proxy
sidecar
--configPath
/etc/istio/proxy
--binaryPath
/usr/local/bin/envoy
--serviceCluster
myapp
--drainDuration
45s
--parentShutdownDuration
1m0s
--discoveryAddress
istio-pilot.istio-system:15007
--discoveryRefreshDelay
1s
--zipkinAddress
zipkin.istio-system:9411
--connectTimeout
10s
--statsdUdpAddress
istio-statsd-prom-bridge.istio-system:9125
--proxyAdminPort
15000
--controlPlaneAuthPolicy
NONE
State: Running
Started: Wed, 02 Jan 2019 22:08:36 +0100
Ready: True
Restart Count: 0
Requests:
cpu: 10m
Environment:
POD_NAME: myapp-95df4dcd6-lptnq (v1:metadata.name)
POD_NAMESPACE: default (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
ISTIO_META_POD_NAME: myapp-95df4dcd6-lptnq (v1:metadata.name)
ISTIO_META_INTERCEPTION_MODE: REDIRECT
Mounts:
/etc/certs/ from istio-certs (ro)
/etc/istio/proxy from istio-envoy (rw)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
cloudsql-instance-credentials:
Type: Secret (a volume populated by a Secret)
SecretName: cloudsql-instance-credentials
Optional: false
default-token-9vtz5:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-9vtz5
Optional: false
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
istio-certs:
Type: Secret (a volume populated by a Secret)
SecretName: istio.default
Optional: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 68s default-scheduler Successfully assigned myapp-95df4dcd6-lptnq to gke-standard-cluster-1-default-pool-59600833-pcj3
Normal SuccessfulMountVolume 68s kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 MountVolume.SetUp succeeded for volume "istio-envoy"
Normal SuccessfulMountVolume 68s kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 MountVolume.SetUp succeeded for volume "default-token-9vtz5"
Normal SuccessfulMountVolume 68s kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 MountVolume.SetUp succeeded for volume "cloudsql-instance-credentials"
Normal SuccessfulMountVolume 68s kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 MountVolume.SetUp succeeded for volume "istio-certs"
Normal Pulled 67s kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 Container image "gcr.io/gke-release/istio/proxy_init:1.0.2-gke.0" already present on machine
Normal Created 67s kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 Created container
Normal Started 67s kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 Started container
Normal Pulled 66s kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 Container image "gcr.io/cloudsql-docker/gce-proxy:1.11" already present on machine
Normal Created 66s kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 Created container
Normal Started 66s kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 Started container
Normal Created 65s kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 Created container
Normal Started 65s kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 Started container
Normal Pulled 65s kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 Container image "gcr.io/gke-release/istio/proxyv2:1.0.2-gke.0" already present on machine
Normal Created 65s kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 Created container
Normal Started 65s kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 Started container
Warning Unhealthy 31s (x4 over 61s) kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 Readiness probe failed: HTTP probe failed with statuscode: 404
Normal Pulled 22s (x2 over 66s) kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 Container image "gcr.io/myproject-224713/firstapp:v4" already present on machine
Warning Unhealthy 22s (x3 over 42s) kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 Liveness probe failed: HTTP probe failed with statuscode: 404
Normal Killing 22s kubelet, gke-standard-cluster-1-default-pool-59600833-pcj3 Killing container with id docker://myapp:Container failed liveness probe.. Container will be killed and recreated.
This is just how Kubernetes works, as long as your pod has processes running it will remain 'up'. The moment you kill one if its processes Kubernetes will restart the pod because it crashed or something went wrong.
If you really want to debug with npm run debug consider either:
Create a container with the CMD (at the end) or ENTRYPOINT value in your Dockerfile that is npm run debug. Then run it using a Deployment definition in Kubernetes.
Override the command in the myapp container in your deployment definition with something like:
spec:
containers:
- name: myapp
image: gcr.io/myproject-224713/firstapp:v4
ports:
- containerPort: 8089
command: ["npm", "run", "debug" ]
env:
- name: POSTGRES_DB_HOST
value: 127.0.0.1:5432
- name: POSTGRES_DB_USER
valueFrom:
secretKeyRef:
name: mysecret
key: username
- name: POSTGRES_DB_PASSWORD
valueFrom:
secretKeyRef:
name: mysecret
key: password

Resources