I created an aks cluster using az aks create command with kubenet network and 2 nodes. Due to permissions issue in the AD account, the NSG had to be switched off before running the aks create command. After the AKS cluster created successfully, the NSG was reapplied.
In order to check the health of the newly created cluster, when I run:
kubectl get nodes --all-namespaces;
there are no nodes returned.
However, when I look into the azure portal and the corresponding vNet, there are 2vmss created using the ips within the subnet range.
When I run:
kubectl get pods --all-namespaces;
all pods are in pending state:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-xxxxdxxxxx-xxxxx 0/1 Pending 0 5h
kube-system coredns-autoscaler-xxdxxxxxxxx-xxxx 0/1 Pending 0 5h
kube-system kubernetes-dashboard-xxdxxxxxx-xxxxx 0/1 Pending 0 5h
kube-system metrics-server-xxxxxxxdxx-xxxx 0/1 Pending 0 5h
kube-system omsagent-rs-xxxxxxxxdx-xxxxx 0/1 Pending 0 5h
kube-system tiller-deploy-xxxxxxxdxxx-xxxx 0/1 Pending 0 34m
kube-system tunnelfront-xxxxxxxdx-xxxxx 0/1 Pending 0 5h
I then did a describe on the coredns pod:
kubectl describe pod coredns-xxxxxxxxxx-xxxx -n kube-system
Warning FailedScheduling 2m40s (x2242 over 2d5h) default-scheduler
no nodes available to schedule pods
I need to deploy some containers using helm/tiller and when I run the installation commands I get the error
Error: could not find a ready tiller pod
I know this is not directly to do with helm/tiller installation, the issue may be a bit more deeper.
I am new to Kubernetes, any thoughts on how to diagnose the issue will be much appreciated.
if no nodes are returned from kubectl get nodes I'd suggest recreating the cluster, since if there are no nodes - no pods can ever run on this cluster. you might try and upgrade the cluster to a newer version of kubernetes (this would effectively redeploy the nodes), that might help.
You need to manually deploy
kubectl logs --namespace kube-system tiller-deploy-xxxxxxxdxxx-xxxx
as stated in the below comments, there are no nodes and all the pods are in the pending state according to your logs, as recommended here you need to delete the cluster and recreate the cluster.
Related
I am trying to install the helm in the kubernetes, I have installed the helm successfully.
When I check the helm version it is showing the below error
`helm version
Client: $version .version{SemVer:V2XXX",Git commit:"XXXXXXXXXXXXXXXXX",GitTreeState: "clean"}
Error:could not find the tiller`
When I executed the Init command it is showing Tiller is already installed in the cluster
helm init --history-max 200 --service-account tiller
$HELM_HOME has been configured at home/user/.helm
warning: Tiller is already installed in the cluster
When I check the logs for the pod I am able to see below error
`Type Reason Age From Message
Waring: FailedCreate 11m (x25 over 132m) replicaset-controller error creating: pod "tiller-deploy-xxxxx" is forbidden: errorlooking up service account :tiller not found"`
How to resolve this issue any idea?
I tried to reproduce the same issue in my environment and got the below results
When I check the helm version I got the same Error
When I do the init command it is showing the same error like its already exist
helm init --history-max 200 --service-account tiller
I am getting this error because of I am not having the service account
To resolve this issue I have created the yaml file with service account as shown
I have created the service account using below script, this script i have taken from the SO link and made the changes as per requirements
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{},"name":"tiller"},"roleRef":{"apiGroup":"rbac.authorization.k8s.io","kind":"ClusterRole","name":"cluster-admin"},"subjects":[{"kind":"ServiceAccount","name":"tiller","namespace":"kube-system"}]}
creationTimestamp: "XXXXXXX"
name: tiller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: tiller
namespace: kube-system
I have deployed this service account using below command
kubectl apply -f filename.yaml
Deleted the replica set and recreated new replica sets again
kubectl -n kube-system delete replicaset replica-name
After deleting the replica set it automatically recreates the new one
kubectl -n kube-system get replicaset
When I check the helm version I am able to see as shown below
Are you sure tiller service account is created?
Try create the service account and giv it the required permissions
kubectl create serviceaccount tiller --namespace kube-system
kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
After that initialize Helm again and see if the error goes away
helm init --history-max 200 --service-account tiller
At first, I set limit range for namespace kube-system as below:
apiVersion: v1
kind: LimitRange
metadata:
name: cpu-limit-range
namespace: kube-system
spec:
limits:
- default:
cpu: 500m
memory: 500Mi
defaultRequest:
cpu: 100m
memory: 100Mi
type: Container
However, later found that there is insufficient CPU and Memory to start up my pod as limits are > 100% from namespace kube-system already.
How can I reset reasonable limits for pods in kube-system. It is better to set their limits to unlimitted but I don't know how to set it.
Supplement information for namespace kube-system:
Not sure if your kube-system namespace has a limit set. You can confirm it checking the namespace itself:
kubectl describe namespace kube-system
If you have a limit range or a resource quota set, it will appear in the description. Something like the following:
Name: default-cpu-example
Labels: <none>
Annotations: <none>
Status: Active
No resource quota.
Resource Limits
Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio
---- -------- --- --- --------------- ------------- -----------------------
Container cpu - - 500m 1 -
In this case I have set resource limits for my namespace.
Now I can list all the ResourceQuotas and LimitRanges using:
kubectl get resourcequotas -n kube-system
kubectl get limitranges -n kube-system
If somethings returns, from those you can simply remove it:
kubectl delete resourcequotas NAME_OF_YOUR_RESOURCE_QUOTA -n kube-system
kubectl delete limitranges NAME_OF_YOUR_LIMIT_RANGE -n kube-system
I'm still not sure if that's your true problem, but that answers your question.
You can find more info here:
https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/
I have tried to install Cassandra on my Kubernetes cluster. After executing the commands
kubectl apply -f Cassandra-service.yaml
and
kubectl apply -f cassandra-statefulset.yaml
I have no errors, but the three Cassandras pods are not setting up.
When I execute
kubectl get pods -o wide
the result is that a pod called Cassandra-0 is not ready. I expected that the Cassandra pods would be already set up.
This is my cassandra-statefulset.yaml file: https://raw.githubusercontent.com/kubernetes/website/master/content/en/examples/application/cassandra/cassandra-statefulset.yaml
I expect there to be three Cassandra pods but there is only one in the pending state:
Here is the result of the previous command:
What Kubernetes environment do you use? Do you use Minikube?
It seems that cluster cannot create PersistentVolumeClaim. Maybe StorageClass configuration doesn't suit your cluster.
Also example Cassandra deployment contains:
resources:
limits:
cpu: "500m"
memory: 1Gi
requests:
cpu: "500m"
memory: 1Gi
So, your cluster should has free 1.5cpu and ~3Gb.
On my opinion, it's better and easier to configure Helm charts for infrastructure deployments, for example - https://github.com/bitnami/charts/tree/master/bitnami/cassandra
Maybe there are insufficient resources on minikube config so try to delete, reconfigure and start minikube, then deploy cassandra again.
Note: minikube delete will delete all the k8s cluster configured, be caferul.
minikube delete
minikube config set cpus 4
minikube config set memory 5120
minikube start
kubectl apply -f https://k8s.io/examples/application/cassandra/cassandra-service.yaml
kubectl apply -f https://k8s.io/examples/application/cassandra/cassandra-statefulset.yaml
Ref: https://kubernetes.io/docs/tutorials/stateful-application/cassandra/
I try to pull image from an ACR using a secret and I can't do it.
I created resources using azure cli commands:
az login
az provider register -n Microsoft.Network
az provider register -n Microsoft.Storage
az provider register -n Microsoft.Compute
az provider register -n Microsoft.ContainerService
az group create --name aksGroup --location westeurope
az aks create --resource-group aksGroup --name aksCluster --node-count 1 --generate-ssh-keys -k 1.9.2
az aks get-credentials --resource-group aksGroup --name aksCluster
az acr create --resource-group aksGroup --name aksClusterRegistry --sku Basic --admin-enabled true
After that I logged in and pushed image successfully to created ACR from local machine.
docker login aksclusterregistry.azurecr.io
docker tag jetty aksclusterregistry.azurecr.io/jetty
docker push aksclusterregistry.azurecr.io/jetty
The next step was creating a secret:
kubectl create secret docker-registry secret --docker-server=aksclusterregistry.azurecr.io --docker-username=aksClusterRegistry --docker-password=<Password from tab ACR/Access Keys> --docker-email=some#email.com
And eventually I tried to create pod with image from the ACR:
#pod.yml
apiVersion: v1
kind: Pod
metadata:
name: jetty
spec:
containers:
- name: jetty
image: aksclusterregistry.azurecr.io/jetty
imagePullSecrets:
- name: secret
kubectl create -f pod.yml
In result I have a pod with status ImagePullBackOff:
>kubectl get pods
NAME READY STATUS RESTARTS AGE
jetty 0/1 ImagePullBackOff 0 1m
> kubectl describe pod jetty
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m default-scheduler Successfully assigned jetty to aks-nodepool1-62963605-0
Normal SuccessfulMountVolume 2m kubelet, aks-nodepool1-62963605-0 MountVolume.SetUp succeeded for volume "default-token-w8png"
Normal Pulling 2m (x2 over 2m) kubelet, aks-nodepool1-62963605-0 pulling image "aksclusterregistry.azurecr.io/jetty"
Warning Failed 2m (x2 over 2m) kubelet, aks-nodepool1-62963605-0 Failed to pull image "aksclusterregistry.azurecr.io/jetty": rpc error: code = Unknown desc = Error response from daemon: Get https://aksclusterregistry.azurecr.io/v2/jetty/manifests/latest: unauthorized: authentication required
Warning Failed 2m (x2 over 2m) kubelet, aks-nodepool1-62963605-0 Error: ErrImagePull
Normal BackOff 2m (x5 over 2m) kubelet, aks-nodepool1-62963605-0 Back-off pulling image "aksclusterregistry.azurecr.io/jetty"
Normal SandboxChanged 2m (x7 over 2m) kubelet, aks-nodepool1-62963605-0 Pod sandbox changed, it will be killed and re-created.
Warning Failed 2m (x6 over 2m) kubelet, aks-nodepool1-62963605-0 Error: ImagePullBackOff
What's wrong? Why does approach with secret not work?
Please don't advice me approach with service principal, because I would like to understand why this aproach doesn't work. I think it must be working.
The "old" way with AKS was to do create secret as you mentioned. That is no longer recommended.
The "new" way is to attach the container registry. This article explains the "new" way to attach ACR, and also provides a link to the old way to clear up confusion. When you create your cluster, attach with:
az aks create -n myAKSCluster -g myResourceGroup --attach-acr $MYACR
Or if you've already created your cluster, update it with:
az aks update -n myAKSCluster -g myResourceGroup --attach-acr $MYACR
Notes:
$MYACR is just the name of your registry without the .azurecr.io. Ex: MYACR=foobar not MYACR=foobar.azurecr.io.
After you attach your ACR, it will take a few minutes for the ImagePullBackOff to transition to Running.
This looks good to me as well. That said, the recommendation is not to use the admin account, rather a service principle. With the SP you gain some granular control over access rights to the ACR instance (read, contributor, owner).
This doc includes two methods for authentication between AKS and ACR using service principles.
https://learn.microsoft.com/en-us/azure/container-registry/container-registry-auth-aks
It's not exactly the question case. But I was having similar issue with utilization of Attach ACR approach. My problem was with Upper case characters in the registry name. Below warning was being generated by az cli.
Uppercase characters are detected in the registry name. When using its server url in docker commands, to avoid authentication errors, use all lowercase
So ensure to use all lowercases in ACR urls on Docker commands.
I have tectonic kubernetes cluster installed on Azure. It's made from tectonic-installer GH repo, from master (commit 0a7a1edb0a2eec8f3fb9e1e612a8ef1fd890c332).
> kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.2", GitCommit:"922a86cfcd65915a9b2f69f3f193b8907d741d9c", GitTreeState:"clean", BuildDate:"2017-07-21T08:23:22Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3+coreos.0", GitCommit:"42de91f04e456f7625941a6c4aaedaa69708be1b", GitTreeState:"clean", BuildDate:"2017-08-07T19:44:31Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
On the cluster I created storage class, PVC and pod as in: https://gist.github.com/mwieczorek/28b7c779555d236a9756cb94109d6695
But the pod cannot start. When I run:
kubectl describe pod mypod
I get in events:
FailedMount Unable to mount volumes for pod "mypod_default(afc68bee-88cb-11e7-a44f-000d3a28f26a)":
timeout expired waiting for volumes to attach/mount for pod "default"/"mypod". list of unattached/unmounted volumes=[mypd]
In kubelet logs (https://gist.github.com/mwieczorek/900db1e10971a39942cba07e202f3c50) I see:
Error: Volume not attached according to node status for volume "pvc-61a8dc6a-88cb-11e7-ad19-000d3a28f2d3"
(UniqueName: "kubernetes.io/azure-disk//subscriptions/abc/resourceGroups/tectonic-cluster-mwtest/providers/Microsoft.Compute/disks/kubernetes-dynamic-pvc-61a8dc6a-88cb-11e7-ad19-000d3a28f2d3") pod "mypod" (UID: "afc68bee-88cb-11e7-a44f-000d3a28f26a")
When I create PVC - new disc on Azure is created.
And after creating pod - I see on the azure portal that the disc is attached to worker VM where the pod is scheduled.
> fdisk -l
shows:
Disk /dev/sdc: 2 GiB, 2147483648 bytes, 4194304 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
I found similar issue on GH ( kubernetes/kubernetes/issues/50150) but I have cluster built from master so it's not the udev rules (I checked - file /etc/udev/rules.d/66-azure-storage.rules exists)
Does anybody knows if it's a bug (maybe know issue)?
Or am I doing something wrong?
Also: how can I troubleshoot that further?
I had test in lab, use your yaml file to create pod, after one hour, it still show pending.
root#k8s-master-ED3DFF55-0:~# kubectl get pod
NAME READY STATUS RESTARTS AGE
mypod 0/1 Pending 0 1h
task-pv-pod 1/1 Running 0 2h
We can use this yaml file to create pod:
PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mypvc
namespace: kube-public
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
Output:
root#k8s-master-ED3DFF55-0:~# kubectl get pvc --namespace=kube-public
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
mypvc Bound pvc-1b097337-8960-11e7-82fc-000d3a191e6a 100Gi RWO default 3h
Pod:
kind: Pod
apiVersion: v1
metadata:
name: task-pv-pod
spec:
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: task-pv-claim
containers:
- name: task-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: task-pv-storage
Output:
root#k8s-master-ED3DFF55-0:~# kubectl get pods
NAME READY STATUS RESTARTS AGE
task-pv-pod 1/1 Running 0 3h
As a workaround, we can use default as the storageclass.
In Azure, there are managed disk and unmanaged disk. if your nodes are use managed disk, two storage classes will be created to provide access to create Kubernetes persistent volumes using Azure managed disks.
They are managed-premium and managed-standard and map to Standard_LRS and Premium_LRS managed disk types respectively.
If your nodes are use non-managed disk, the default storage class will be used if persistent volume resources don't specify a storage class as part of the resource definition.
The default storage class uses non-managed blob storage and will provision the blob within an existing storage account present in the resource group or provision a new storage account.
Non-managed persistent volume types are available on all VM sizes.
More information about managed disk and non-managed disk, please refer to this link.
Here is the test result:
root#k8s-master-ED3DFF55-0:~# kubectl get pvc --namespace=default
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
shared Pending standard-managed 2h
shared1 Pending managed-standard 15m
shared12 Pending standard-managed 14m
shared123 Bound pvc-a379ced4-897c-11e7-82fc-000d3a191e6a 2Gi RWO default 12m
task-pv-claim Bound pvc-3cefd456-8961-11e7-82fc-000d3a191e6a 3Gi RWO default 3h
Update:
Here is my K8s agent's unmanaged disk:
In your case, "kubectl describe pod-name" does not provide suffiecient info, you need to provide k8s contoller manager logs for troubleshooting
Get the controller manager logs on master:
#get the "CONTAINER ID" of "/hyperkube controlle"
docker ps -a | grep "hyperkube controlle" | awk -F ' ' '{print $1}'
#get controller manager logs
docker logs "CONTAINER ID" > "CONTAINER ID".log 2>&1 &
Provisioning should be very quick. Check your controller logs to make sure the PV required by the PVC is provisioned correctly:
Navigate to Azure portal > cluster > Activity Log
Remove filter for namespaces and look for "Update Storage Account Create" entries.
In our case we needed to register our cluster subscription for the 'Microsoft.Storage' namespace so that the controller can provision the required PV. You can do this with the azure cli:
az provider register --namespace Microsoft.Storage
I had a similar issue, this command worked for me.
az resource update --ids /subscriptions/<SUBSCRIPTION-ID>/resourcegroups/<RESOURCE-GROUP>/providers/Microsoft.ContainerService/managedClusters/<AKS-CLUSTER-NAME>/agentpools/<NODE-GROUP-NAME>