nodeSelector constraint ignored on AKS on mixed node pools (windows/linux)? - azure

Tried installing a basic Nginx ingress using Helm by running the following command:
helm install nginx-ingress --namespace ingress-basic ingress-nginx/ingress-nginx \
--set controller.service.loadBalancerIP='52.232.109.226' \
--set controller.nodeSelector."beta\.kubernetes\.io/os"='linux' \
--set defaultBackend.nodeSelector."beta\.kubernetes\.io/os"='linux' \
--set controller.replicaCount=1 \
--set rbac.create=true
Shortly after installing I noticed the pod was scheduled onto a Windows node instead of a Linux node:
wesley#Azure:~$ kubectl get pods -n ingress-basic -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-ingress-ingress-nginx-admission-create-jcp6x 0/1 ContainerCreating 0 18s <none> akswin000002 <none> <none>
wesley#Azure:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
aks-agentpool-59412422-vmss000000 Ready agent 5h32m v1.17.11 10.240.0.4 <none> Ubuntu 16.04.7 LTS 4.15.0-1096-azure docker://19.3.12
aks-linuxpool-59412422-vmss000000 Ready agent 5h32m v1.17.11 10.240.0.128 <none> Ubuntu 16.04.7 LTS 4.15.0-1096-azure docker://19.3.12
akswin000000 Ready agent 5h28m v1.17.11 10.240.0.35 <none> Windows Server 2019 Datacenter 10.0.17763.1397 docker://19.3.11
akswin000001 Ready agent 5h28m v1.17.11 10.240.0.66 <none> Windows Server 2019 Datacenter 10.0.17763.1397 docker://19.3.11
akswin000002 Ready agent 5h28m v1.17.11 10.240.0.97 <none> Windows Server 2019 Datacenter 10.0.17763.1397 docker://19.3.11
Running a describe on the nginx pod revealed that the field Node-selectors remains to be set to <none>.
Name: nginx-ingress-ingress-nginx-admission-create-jcp6x
Namespace: ingress-basic
Priority: 0
PriorityClassName: <none>
Node: akswin000002/10.240.0.97
Start Time: Fri, 16 Oct 2020 20:09:36 +0000
Labels: app.kubernetes.io/component=admission-webhook
app.kubernetes.io/instance=nginx-ingress
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=ingress-nginx
app.kubernetes.io/version=0.40.2
controller-uid=d03091cd-8138-4923-a369-afeca669099c
helm.sh/chart=ingress-nginx-3.7.1
job-name=nginx-ingress-ingress-nginx-admission-create
Annotations: <none>
**Status: Pending**
IP:
Controlled By: Job/nginx-ingress-ingress-nginx-admission-create
Containers:
create:
Container ID:
Image: docker.io/jettech/kube-webhook-certgen:v1.3.0
Image ID:
Port: <none>
Host Port: <none>
Args:
create
--host=nginx-ingress-ingress-nginx-controller-admission,nginx-ingress-ingress-nginx-controller-admission.$(POD_NAMESPACE).svc
--namespace=$(POD_NAMESPACE)
--secret-name=nginx-ingress-ingress-nginx-admission
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
POD_NAMESPACE: ingress-basic (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from nginx-ingress-ingress-nginx-admission-token-8x6ct (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nginx-ingress-ingress-nginx-admission-token-8x6ct:
Type: Secret (a volume populated by a Secret)
SecretName: nginx-ingress-ingress-nginx-admission-token-8x6ct
Optional: false
QoS Class: BestEffort
**Node-Selectors: <none>**
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 5m (x5401 over 2h) kubelet, akswin000002 Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 31s (x5543 over 2h) kubelet, akswin000002 (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "nginx-ingress-ingress-nginx-admission-create-jcp6x": Error response from daemon: container a734c23d20338d7fed800752c19f5e94688fd38fe82c9e90fc14533bae90c6bc encountered an error during hcsshim::System::CreateProcess: failure in a Windows system call: The user name or password is incorrect. (0x52e) extra info: {"CommandLine":"cmd /S /C pauseloop.exe","User":"2000","WorkingDirectory":"C:\\","Environment":{"PATH":"c:\\Windows\\System32;c:\\Windows"},"CreateStdInPipe":true,"CreateStdOutPipe":true,"CreateStdErrPipe":true,"ConsoleSize":[0,0]}
I expected the pod to be scheduled onto a Linux node instead. Does anyone have a clue why this is happening? I saw no taints or anything and this is just a newly spin up cluster. The only workaround for now seems to be to scale the windows node back to 0. Install nginx ingress and then scale up the windows nodes again.
Kubernetes version: 1.17.11

Works with below. Added admissionWebhooks.patch.nodeSelector.
https://learn.microsoft.com/en-us/azure/aks/ingress-basic#create-an-ingress-controller
helm install nginx-ingress ingress-nginx/ingress-nginx \
--namespace ingress-basic \
--set controller.replicaCount=2 \
--set controller.nodeSelector."beta\.kubernetes\.io/os"=linux \
--set defaultBackend.nodeSelector."beta\.kubernetes\.io/os"=linux \
--set controller.admissionWebhooks.patch.nodeSelector."beta\.kubernetes\.io/os"=linux

Related

Kubernetes nfs-subdir-external-provisioner stuck in ContainerCreating / Unable to attach or mount volumes: unmounted volumes=[nfs-client-root]

I'm trying to install a nfs-client-provisioner and run a mongdb with it.
Unfortunately, the nfs-client-provisioner hangs in ContainerCreating and says "Warning FailedMount 3m35s (x13 over 37m) kubelet Unable to attach or mount volumes: unmounted volumes=[nfs-client-root], unattached volumes=[nfs-client-root kube-api-access-lr9tl]: timed out waiting for the condition
".
The nfs server is configured on the same VPS machine (Debian 10).
I am able to mount and write files on the nfs server from a second vps with debian 10.
The cluster is setup with K0s
I have an error with the helm chart and manual installation.
Any help is apechiatet1
For Some more info see below:
Helm version:
version.BuildInfo{Version:"v3.8.2", GitCommit:"6e3701edea09e5d55a8ca2aae03a68917630e91b", GitTreeState:"clean", GoVersion:"go1.17.5"}
Kubernetes version:
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:49:13Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5+k0s", GitCommit:"5ab78974affb1a76f1e5687aaa8b02aeac4380b8", GitTreeState:"clean", BuildDate:"2022-03-24T22:59:27Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"linux/amd64"}
k0s version: v1.23.5+k0s.0
worker added with:
token=$(k0s token create --role=worker)
docker run -d --name k0s-worker1 --hostname k0s-worker1 --privileged -v /var/lib/k0s docker.io/k0sproject/k0s:latest k0s worker $token
kubectl get nodes
NAME STATUS ROLES AGE VERSION
k0s-worker9 Ready <none> 42m v1.23.5+k0s
v2202204173709187201 Ready control-plane 43m v1.23.5+k0s
kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
managed-nfs-storage k8s-sigs.io/nfs-subdir-external-provisioner Delete Immediate false 40m
/ect/exports
/data/nfs-storage *(rw,sync,no_root_squash,no_subtree_check,insecure)
Output with
sudo k0s kubectl describe pod nfs-client-provisioner-6889579fdb-t7j74
Name: nfs-client-provisioner-6889579fdb-t7j74
Namespace: default
Priority: 0
Node: k0s-worker9/172.17.0.2
Start Time: Tue, 26 Apr 2022 08:45:49 +0200
Labels: app=nfs-client-provisioner
pod-template-hash=6889579fdb
Annotations: kubernetes.io/psp: 00-k0s-privileged
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/nfs-client-provisioner-6889579fdb
Containers:
nfs-client-provisioner:
Container ID:
Image: gcr.io/k8s-staging-sig-storage/nfs-subdir-external-provisioner:v4.0.1
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
PROVISIONER_NAME: k8s-sigs.io/nfs-subdir-external-provisioner
NFS_SERVER: 47.122.181.39
NFS_PATH: /data/nfs-storage
Mounts:
/persistentvolumes from nfs-client-root (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-lr9tl (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
nfs-client-root:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 47.122.181.39
Path: /data/nfs-storage
ReadOnly: false
kube-api-access-lr9tl:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18m default-scheduler Successfully assigned default/nfs-client-provisioner-6889579fdb-t7j74 to k0s-worker9
Warning FailedMount 2m42s (x6 over 16m) kubelet Unable to attach or mount volumes: unmounted volumes=[nfs-client-root], unattached volumes=[nfs-client-root kube-api-access-lr9tl]: timed out waiting for the condition
Warning FailedMount 24s (x2 over 7m14s) kubelet Unable to attach or mount volumes: unmounted volumes=[nfs-client-root], unattached volumes=[kube-api-access-lr9tl nfs-client-root]: timed out waiting for the condition
command using helm:
helm install nfs-subdir-external-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner \
--set nfs.server=47.122.181.39 \
--set nfs.path=/data/nfs-storage
without helm:
from https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner/tree/v4.0.2/deploy
kubectl create -f rbac.yaml
kubectl create -f class.yaml
kubectl create -f deployment.yaml
class.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: managed-nfs-storage
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
archiveOnDelete: "false"
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
labels:
app: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: nfs-client-provisioner
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: gcr.io/k8s-staging-sig-storage/nfs-subdir-external-provisioner:v4.0.1
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: k8s-sigs.io/nfs-subdir-external-provisioner
- name: NFS_SERVER
value: 47.122.181.39
- name: NFS_PATH
value: /data/nfs-storage
volumes:
- name: nfs-client-root
nfs:
server: 47.122.181.39
path: /data/nfs-storage
K0s cluster setup:
sudo curl -sSLf https://get.k0s.sh | sudo sh
sudo k0s install controller --enable-worker
sudo k0s start
sudo cp /var/lib/k0s/pki/admin.conf ~/admin.conf
export KUBECONFIG=~/admin.conf
token=$(k0s token create --role=worker)
docker run -d --name k0s-worker9 --hostname k0s-worker9 --privileged -v /var/lib/k0s docker.io/k0sproject/k0s:latest k0s worker $token
Have you tried:
nfs.mountOptions = {
nfsvers = 4
}
I also used this provisioner. It works only with nfs4. And also check mount point of your NFS server if it's root / or not. If your provisioner is configured and ready try to recreate pvc.

Unable to get Azure Key Vault integrated with Azure Kubernetes Service

Stuck on getting this integration working. I'm following the documentation step-by-step.
The following is everything I have done starting from scratch, so if it isn't listed here, I haven't tried it (I apologize in advance for the long series of commands):
# create the resource group
az group create -l westus -n k8s-test
# create the azure container registery
az acr create -g k8s-test -n k8stestacr --sku Basic -l westus
# create the azure key vault and add a test value to it
az keyvault create --name k8stestakv --resource-group k8s-test -l westus
az keyvault secret set --vault-name k8stestakv --name SECRETTEST --value abc123
# create the azure kubernetes service
az aks create -n k8stestaks -g k8s-test --kubernetes-version=1.19.7 --node-count 1 -l westus --enable-managed-identity --attach-acr k8stestacr -s Standard_B2s
# switch to the aks context
az aks get-credentials -b k8stestaks -g k8s-test
# install helm charts for secrets store csi
helm repo add csi-secrets-store-provider-azure https://raw.githubusercontent.com/Azure/secrets-store-csi-driver-provider-azure/master/charts
helm install csi-secrets-store-provider-azure/csi-secrets-store-provider-azure --generate-name
# create role managed identity operator
az role assignment create --role "Managed Identity Operator" --assignee <k8stestaks_clientId> --scope /subscriptions/<subscriptionId>/resourcegroups/MC_k8s-test_k8stestaks_westus
# create role virtual machine contributor
az role assignment create --role "Virtual Machine Contributor" --assignee <k8stestaks_clientId> --scope /subscriptions/<subscriptionId>/resourcegroups/MC_k8s-test_k8stestaks_westus
# install more helm charts
helm repo add aad-pod-identity https://raw.githubusercontent.com/Azure/aad-pod-identity/master/charts
helm install pod-identity aad-pod-identity/aad-pod-identity
# create identity
az identity create -g MC_k8s-test_k8stestaks_westus -n TestIdentity
# give the new identity a reader role for AKV
az role assignment create --role "Reader" --assignee <TestIdentity_principalId> --scope /subscriptions/<subscription_id/resourceGroups/k8s-test/providers/Microsoft.KeyVault/vaults/k8stestakv
# allow the identity to get secrets from AKV
az keyvault set-policy -n k8stestakv --secret-permissions get --spn <TestIdentity_clientId>
That is pretty much it for az cli commands. Everything up to this point executes fine with no errors. I can go into the portal, see these roles for the MC_ group, the TestIdentity with read-only for secrets, etc.
After that, the documentation has you build secretProviderClass.yaml:
apiVersion: secrets-store.csi.x-k8s.io/v1alpha1
kind: SecretProviderClass
metadata:
name: azure-kvname
spec:
provider: azure
parameters:
usePodIdentity: "true"
useVMManagedIdentity: "false"
userAssignedIdentityID: ""
keyvaultName: "k8stestakv"
cloudName: ""
objects: |
array:
- |
objectName: SECRETTEST
objectType: secret
objectVersion: ""
resourceGroup: "k8s-test"
subscriptionId: "<subscriptionId>"
tenantId: "<tenantId>"
And also the podIdentityBinding.yaml:
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentity
metadata:
name: azureIdentity
spec:
type: 0
resourceID: /subscriptions/<subscriptionId>/resourcegroups/MC_k8s-test_k8stestaks_westus/providers/Microsoft.ManagedIdentity/userAssignedIdentities/TestIdentity
clientID: <TestIdentity_clientId>
---
apiVersion: aadpodidentity.k8s.io/v1
kind: AzureIdentityBinding
metadata:
name: azure-pod-identity-binding
spec:
azureIdentity: azureIdentity
selector: azure-pod-identity-binding-selector
Then just apply them:
# this one executes fine
kubectl apply -f k8s/secret/secretProviderClass.yaml
# this one does not
kubectl apply -f k8s/identity/podIdentityBinding.yaml
Problem #1
With the last one I get:
unable to recognize "k8s/identity/podIdentityBinding.yaml": no matches for kind "AzureIdentity" in version "aadpodidentity.k8s.io/v1"
unable to recognize "k8s/identity/podIdentityBinding.yaml": no matches for kind "AzureIdentityBinding" in version "aadpodidentity.k8s.io/v1"
Not sure why because the helm install pod-identity aad-pod-identity/aad-pod-identity command was successful. Looking at my Pods however...
Problem #2
I've followed these steps three times and every time the issue is the same--the aad-pod-identity-nmi-xxxxx will not launch:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
aad-pod-identity-mic-7b4558845f-hwv8t 1/1 Running 0 37m
aad-pod-identity-mic-7b4558845f-w8mxt 1/1 Running 0 37m
aad-pod-identity-nmi-4sf5q 0/1 CrashLoopBackOff 12 37m
csi-secrets-store-provider-azure-1613256848-cjlwc 1/1 Running 0 41m
csi-secrets-store-provider-azure-1613256848-secrets-store-m4wth 3/3 Running 0 41m
$ kubectl describe pod aad-pod-identity-nmi-4sf5q
Name: aad-pod-identity-nmi-4sf5q
Namespace: default
Priority: 0
Node: aks-nodepool1-40626841-vmss000000/10.240.0.4
Start Time: Sat, 13 Feb 2021 14:57:54 -0800
Labels: app.kubernetes.io/component=nmi
app.kubernetes.io/instance=pod-identity
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=aad-pod-identity
controller-revision-hash=669df55fd8
helm.sh/chart=aad-pod-identity-3.0.3
pod-template-generation=1
tier=node
Annotations: <none>
Status: Running
IP: 10.240.0.4
IPs:
IP: 10.240.0.4
Controlled By: DaemonSet/aad-pod-identity-nmi
Containers:
nmi:
Container ID: containerd://5f9e17e95ae395971dfd060c1db7657d61e03052ffc3cbb59d01c774bb4a2f6a
Image: mcr.microsoft.com/oss/azure/aad-pod-identity/nmi:v1.7.4
Image ID: mcr.microsoft.com/oss/azure/aad-pod-identity/nmi#sha256:0b4e296a7b96a288960c39dbda1a3ffa324ef33c77bb5bd81a4266b85efb3498
Port: <none>
Host Port: <none>
Args:
--node=$(NODE_NAME)
--http-probe-port=8085
--operation-mode=standard
--kubelet-config=/etc/default/kubelet
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Sat, 13 Feb 2021 15:34:40 -0800
Finished: Sat, 13 Feb 2021 15:34:40 -0800
Ready: False
Restart Count: 12
Limits:
cpu: 200m
memory: 512Mi
Requests:
cpu: 100m
memory: 256Mi
Liveness: http-get http://:8085/healthz delay=10s timeout=1s period=5s #success=1 #failure=3
Environment:
NODE_NAME: (v1:spec.nodeName)
FORCENAMESPACED: false
Mounts:
/etc/default/kubelet from kubelet-config (ro)
/run/xtables.lock from iptableslock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from aad-pod-identity-nmi-token-8sfh4 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
iptableslock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
kubelet-config:
Type: HostPath (bare host directory volume)
Path: /etc/default/kubelet
HostPathType:
aad-pod-identity-nmi-token-8sfh4:
Type: Secret (a volume populated by a Secret)
SecretName: aad-pod-identity-nmi-token-8sfh4
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/pid-pressure:NoSchedule op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 38m default-scheduler Successfully assigned default/aad-pod-identity-nmi-4sf5q to aks-nodepool1-40626841-vmss000000
Normal Pulled 38m kubelet Successfully pulled image "mcr.microsoft.com/oss/azure/aad-pod-identity/nmi:v1.7.4" in 14.677657725s
Normal Pulled 38m kubelet Successfully pulled image "mcr.microsoft.com/oss/azure/aad-pod-identity/nmi:v1.7.4" in 5.976721016s
Normal Pulled 37m kubelet Successfully pulled image "mcr.microsoft.com/oss/azure/aad-pod-identity/nmi:v1.7.4" in 627.112255ms
Normal Pulling 37m (x4 over 38m) kubelet Pulling image "mcr.microsoft.com/oss/azure/aad-pod-identity/nmi:v1.7.4"
Normal Pulled 37m kubelet Successfully pulled image "mcr.microsoft.com/oss/azure/aad-pod-identity/nmi:v1.7.4" in 794.669637ms
Normal Created 37m (x4 over 38m) kubelet Created container nmi
Normal Started 37m (x4 over 38m) kubelet Started container nmi
Warning BackOff 3m33s (x170 over 38m) kubelet Back-off restarting failed container
I guess I'm not sure if both problems are related and I haven't been able to get the failing Pod to start up.
Any suggestions here?
Looks it is related to the default network plugin that AKS picks for you if you don't specify "Advanced" for network options: kubenet.
This integration can be done with kubenet outlined here:
https://azure.github.io/aad-pod-identity/docs/configure/aad_pod_identity_on_kubenet/
If you are creating a new cluster, enable Advanced networking or add the --network-plugin azure flag and parameter.

Airflow Kubernetes executor , Tasks in queued state and pod says invalid image and uses local executor

strong textI have installed airflow in docker and using kubernetes executor unable to trigger dags. Dag runs are in queued state. KubernetesExecutor is creating pod but it says invalid image. If i describe the pod it uses local executor instead of kubernetes executor. Please help
Attaching log file for reference
**kubectl describe pod tablescreationschematablescreation-ecabd38a66664a33b6645a72ef056edc
Name: swedschematablescreationschematablescreation-ecabd38a66664a33b6645a72ef056edc
Namespace: default
Priority: 0
Node: 10.73.96.181
Start Time: Mon, 11 May 2020 18:22:15 +0530
Labels: airflow-worker=5888feda-6aee-49c8-a94b-39cbe5758062
airflow_version=1.10.10
dag_id=Swed-schema-tables-creation
execution_date=2020-05-11T12_52_09.829627_plus_00_00
kubernetes_executor=True
task_id=Schema_Tables_Creation
try_number=1
Annotations: <none>
Status: Pending
IP: 172.17.0.46
IPs:
IP: 172.17.0.46
Containers:
base:
Container ID:
Image: :
Image ID:
Port: <none>
Host Port: <none>
Command:
airflow
run
Swed-schema-tables-creation
Schema_Tables_Creation
2020-05-11T12:52:09.829627+00:00
--local
--pool
default_pool
-sd
/root/airflow/dags/User_Creation_dag.py
State: Waiting
Reason: InvalidImageName
Ready: False
Restart Count: 0
Environment:
AIRFLOW__CORE__EXECUTOR: LocalExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql://airflowkube:airflowkube#10.73.96.181:5434/airflowkube
Mounts:
/root/airflow/dags from airflow-dags (ro)
/root/airflow/logs from airflow-logs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-64cxg (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
airflow-dags:
Type: HostPath (bare host directory volume)
Path: /data/Naveen/Airflow/dags
HostPathType:
airflow-logs:
Type: HostPath (bare host directory volume)
Path: /data/Naveen/Airflow/Logs
HostPathType:
default-token-64cxg:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-64cxg
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned default/swedschematablescreationschematablescreation-ecabd38a66664a33b6645a72ef056edc to evblfnclnullnull1538
Warning Failed 2m15s (x12 over 4m28s) kubelet, evblfnclnullnull1538 Error: InvalidImageName
Warning InspectFailed 2m (x13 over 4m28s) kubelet, evblfnclnullnull1538 Failed to apply default image tag ":": couldn't parse image reference ":": invalid reference format**
It looks like there is no image defined for the workers. You can set this in the airflow.cfg. Be sure to set these worker_container_repository and worker_container_tag. These are null by default which results in the behaviour you're experiencing.
A working example:
AIRFLOW__KUBERNETES__WORKER_CONTAINER_REPOSITORY: "apache/airflow"
AIRFLOW__KUBERNETES__WORKER_CONTAINER_TAG: "1.10.10-python3.6"
AIRFLOW__KUBERNETES__RUN_AS_USER: "50000"
The airflow docs have some more detailed info on the available properties: https://airflow.apache.org/docs/stable/configurations-ref.html#kubernetes

kubernetes giving CrashLoopBackOff error while creating pods

I'm creating a pod of node container, and it is giving CrashLoopBackOff error.
kubectl get pods
kubectl describe pod test-node3
Any help would be appreciated.
You can add command as below so that pod will remain in running state.
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox
command: ['sh', '-c', 'echo Hello Kubernetes! && sleep 3600']
Ref: Doc
Your container does not have a long running process. The main process in the container is exiting with exit code 0 which usually means that the process has terminated successfully. You can see it in the kubectl describe output you have shared.
Reason: Completed
Exit Code: 0
Once Pod is assigned to a node by scheduler, kubelet starts creating containers using container runtime. There are three possible states of containers: Waiting, Running and Terminated.
Terminated: Indicates that the container completed its execution and has stopped running.
A container enters into this when it has successfully completed execution or when it has failed for some reason. Regardless, a reason and exit code is displayed, as well as the container’s start and finish time.
On your screenshot its clear that container inside pod is running to completion with its work done, with exit code 0 as below snippet
State: Terminated
Reason: Completed
Exit Code: 0
You should either add a long running process to your container or define restartPolicy: Never on pod definition.
Tested your image with adding correct restart policy and POD runs correctly to completion with no crash
kubectl run test --image=abhishekk27/kube-pub:new --restart=Never
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test 0/1 Completed 0 8m12s
yaml genrated :
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: test
name: test
spec:
containers:
- image: abhishekk27/kube-pub:new
name: test
resources: {}
dnsPolicy: ClusterFirst
restartPolicy: Never
Result:
$ kubectl describe pod test
Name: test
Namespace: default
Priority: 0
Node: dlv-k8s-node-1/131.160.200.104
Start Time: Fri, 17 Jan 2020 09:45:00 +0000
Labels: run=test
Annotations: <none>
Status: Succeeded
IP: 10.244.1.12
IPs:
IP: 10.244.1.12
Containers:
test:
Container ID: docker://b335e5fef022dced824f85ba2bfe4c024608c9b5463599eb36591a14d709786d
Image: abhishekk27/kube-pub:new
Image ID: docker-pullable://abhishekk27/kube-pub#sha256:6a696bd733edaa48b9be781960f4ee178d16f1c9aea51e53bd0f54326a3d05b1
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 17 Jan 2020 09:45:48 +0000
Finished: Fri, 17 Jan 2020 09:45:48 +0000
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-7f4mt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-7f4mt:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-7f4mt
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m50s default-scheduler Successfully assigned default/test to dlv-k8s-node-1
Normal Pulling 6m46s kubelet, dlv-k8s-node-1 Pulling image "abhishekk27/kube-pub:new"
Normal Pulled 5m58s kubelet, dlv-k8s-node-1 Successfully pulled image "abhishekk27/kube-pub:new"
Normal Created 5m58s kubelet, dlv-k8s-node-1 Created container test
Normal Started 5m58s kubelet, dlv-k8s-node-1 Started container test

Can't resolve monitoring-influxdb on Kubernetes with heapster and kube-dns

I am trying to get Heapster working on my Kubernetes cluster. I am using Kube-DNS for DNS resolution.
My Kube-DNS seems to be set up correctly:
kubectl describe pod kube-dns-v20-z2dd2 -n kube-system
Name: kube-dns-v20-z2dd2
Namespace: kube-system
Node: 172.31.48.201/172.31.48.201
Start Time: Mon, 22 Jan 2018 09:21:49 +0000
Labels: k8s-app=kube-dns
version=v20
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
scheduler.alpha.kubernetes.io/tolerations=[{"key":"CriticalAddonsOnly", "operator":"Exists"}]
Status: Running
IP: 172.17.29.4
Controlled By: ReplicationController/kube-dns-v20
Containers:
kubedns:
Container ID: docker://13f95bdf8dee273ca18a2eee1b99fe00e5fff41279776cdef5d7e567472a39dc
Image: gcr.io/google_containers/kubedns-amd64:1.8
Image ID: docker-pullable://gcr.io/google_containers/kubedns-amd64#sha256:39264fd3c998798acdf4fe91c556a6b44f281b6c5797f464f92c3b561c8c808c
Ports: 10053/UDP, 10053/TCP
Args:
--domain=cluster.local.
--dns-port=10053
State: Running
Started: Mon, 22 Jan 2018 09:22:05 +0000
Ready: True
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/healthz-kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9zxzd (ro)
dnsmasq:
Container ID: docker://576ebc30e8f7aae13000a2d06541c165a3302376ad04c604b12803463380d9b5
Image: gcr.io/google_containers/kube-dnsmasq-amd64:1.4
Image ID: docker-pullable://gcr.io/google_containers/kube-dnsmasq-amd64#sha256:a722df15c0cf87779aad8ba2468cf072dd208cb5d7cfcaedd90e66b3da9ea9d2
Ports: 53/UDP, 53/TCP
Args:
--cache-size=1000
--no-resolv
--server=127.0.0.1#10053
--log-facility=-
State: Running
Started: Mon, 22 Jan 2018 09:22:20 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:8080/healthz-dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9zxzd (ro)
healthz:
Container ID: docker://3367d05fb0e13c892243a4c86c74a170b0a9a2042387a70f6690ed946afda4d2
Image: gcr.io/google_containers/exechealthz-amd64:1.2
Image ID: docker-pullable://gcr.io/google_containers/exechealthz-amd64#sha256:503e158c3f65ed7399f54010571c7c977ade7fe59010695f48d9650d83488c0a
Port: 8080/TCP
Args:
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
--url=/healthz-dnsmasq
--cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1:10053 >/dev/null
--url=/healthz-kubedns
--port=8080
--quiet
State: Running
Started: Mon, 22 Jan 2018 09:22:32 +0000
Ready: True
Restart Count: 0
Limits:
memory: 50Mi
Requests:
cpu: 10m
memory: 50Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9zxzd (ro)
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-9zxzd:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-9zxzd
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 43m default-scheduler Successfully assigned kube-dns-v20-z2dd2 to 172.31.48.201
Normal SuccessfulMountVolume 43m kubelet, 172.31.48.201 MountVolume.SetUp succeeded for volume "default-token-9zxzd"
Normal Pulling 43m kubelet, 172.31.48.201 pulling image "gcr.io/google_containers/kubedns-amd64:1.8"
Normal Pulled 43m kubelet, 172.31.48.201 Successfully pulled image "gcr.io/google_containers/kubedns-amd64:1.8"
Normal Created 43m kubelet, 172.31.48.201 Created container
Normal Started 43m kubelet, 172.31.48.201 Started container
Normal Pulling 43m kubelet, 172.31.48.201 pulling image "gcr.io/google_containers/kube-dnsmasq-amd64:1.4"
Normal Pulled 42m kubelet, 172.31.48.201 Successfully pulled image "gcr.io/google_containers/kube-dnsmasq-amd64:1.4"
Normal Created 42m kubelet, 172.31.48.201 Created container
Normal Started 42m kubelet, 172.31.48.201 Started container
Normal Pulling 42m kubelet, 172.31.48.201 pulling image "gcr.io/google_containers/exechealthz-amd64:1.2"
Normal Pulled 42m kubelet, 172.31.48.201 Successfully pulled image "gcr.io/google_containers/exechealthz-amd64:1.2"
Normal Created 42m kubelet, 172.31.48.201 Created container
Normal Started 42m kubelet, 172.31.48.201 Started container
kubectl describe svc kube-dns -n kube-system
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=KubeDNS
Annotations: <none>
Selector: k8s-app=kube-dns
Type: ClusterIP
IP: 10.254.0.2
Port: dns 53/UDP
TargetPort: 53/UDP
Endpoints: 172.17.29.4:53
Port: dns-tcp 53/TCP
TargetPort: 53/TCP
Endpoints: 172.17.29.4:53
Session Affinity: None
Events: <none>
kubectl describe ep kube-dns -n kube-system
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=KubeDNS
Annotations: <none>
Subsets:
Addresses: 172.17.29.4
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
dns 53 UDP
dns-tcp 53 TCP
Events: <none>
kubectl exec -it busybox1 -- nslookup kubernetes.default
Server: 10.254.0.2
Address 1: 10.254.0.2 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default
Address 1: 10.254.0.1 kubernetes.default.svc.cluster.local
However if I am trying to resolve http://monitoring-influxdb on either the busybox container (outside the kube-system namespace) it can't get resolved:
kubectl exec -it heapster-v1.2.0-7657f45c77-65w7w --container heapster -n kube-system -- nslookup http://monitoring-influxdb
Server: (null)
Address 1: 127.0.0.1 localhost
Address 2: ::1 localhost
nslookup: can't resolve 'http://monitoring-influxdb': Try again
command terminated with exit code 1
kubectl exec -it heapster-v1.2.0-7657f45c77-65w7w --container heapster -n kube-system -- cat /etc/resolv.conf
nameserver 10.254.0.2
search kube-system.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal
options ndots:5
kubectl exec -it busybox1 -- nslookup http://monitoring-influxdb
Server: 10.254.0.2
Address 1: 10.254.0.2 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve 'http://monitoring-influxdb'
command terminated with exit code 1
kubectl exec -it busybox1 -- cat /etc/resolv.conf
nameserver 10.254.0.2
search default.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal
options ndots:5
Finally here are the logs from the heapster pod. I could not find any error in the dns pod logs:
kubectl logs heapster-v1.2.0-7657f45c77-65w7w heapster -n kube-system
E0122 09:22:46.966896 1 influxdb.go:217] issues while creating an InfluxDB sink: failed to ping InfluxDB server at "monitoring-influxdb:8086" - Get http://monitoring-influxdb:8086/ping: dial tcp: lookup monitoring-influxdb on 10.254.0.2:53: server misbehaving, will retry on use
Any pointers are highly appreciated.
EDIT:
The monitoring-influxdb is located in the same namespace as the heapster (kube-system).
kubectl exec -it heapster-v1.2.0-7657f45c77-65w7w --container heapster -n kube-system -- nslookup monitoring-influxdb.kube-system
Server: (null)
Address 1: 127.0.0.1 localhost
Address 2: ::1 localhost
nslookup: can't resolve 'monitoring-influxdb.kube-system': Name does not resolve
command terminated with exit code 1
But for whatever reason busybox is able to resolve the server.
kubectl exec -it busybox1 -- nslookup http://monitoring-influxdb.kube-system
Server: 10.254.0.2
Address 1: 10.254.0.2 kube-dns.kube-system.svc.cluster.local
Name: monitoring-influxdb.kube-system
Address 1: 10.254.48.109 monitoring-influxdb.kube-system.svc.cluster.local
kubectl -n kube-system get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
heapster ClusterIP 10.254.193.208 <none> 80/TCP 1h
kube-dns ClusterIP 10.254.0.2 <none> 53/UDP,53/TCP 1h
kubernetes-dashboard NodePort 10.254.89.241 <none> 80:32431/TCP 1h
monitoring-grafana ClusterIP 10.254.176.96 <none> 80/TCP 1h
monitoring-influxdb ClusterIP 10.254.48.109 <none> 8083/TCP,8086/TCP 1h
kubectl -n kube-system get ep
NAME ENDPOINTS AGE
heapster 172.17.29.7:8082 1h
kube-controller-manager <none> 1h
kube-dns 172.17.29.6:53,172.17.29.6:53 1h
kubernetes-dashboard 172.17.29.5:9090 1h
monitoring-grafana 172.17.29.3:3000 1h
monitoring-influxdb 172.17.29.3:8086,172.17.29.3:8083 1h
In kubernetes, you can resolve services by their name alone, but only if you are inside the same namespace.
Services are also reachable through a DNS name in the form:
<service name>.<namespace>
From your question is not clear in which namespace you deployed influxdb, but give the above suggestion a try.

Resources