Why does pod status remain 'PENDING'? - linux

Here is the output I am getting:
[root#ip-10-0-3-103 ec2-user]# kubectl get pod --namespace=migration
NAME READY STATUS RESTARTS AGE
clear-nginx-deployment-cc77649fb-j8mzj 0/1 Pending 0 118m
clear-nginx-deployment-temp-cc77649fb-hxst2 0/1 Pending 0 41s
Could not understand the message shown in json:
*"status":
{
"conditions": [
{
"message": "0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.",
"reason": "Unschedulable",
"status": "False",
"type": "PodScheduled"
}
],
"phase": "Pending",
"qosClass": "BestEffort"
}*
If you could please help to get through this.
The earlier question on stackoverflow doesn't answer my query as my message output is different.

This is due to the fact that your Pods have been instructed to claim storage, however, in your case there is storage available.
Check your Pods with kubectl get pods <pod-name> -o yaml and look at the exact yaml that has been applied to the cluster. In there you should be able to see that the Pod is trying to claim a PersistentVolume (PV).
To quickly create a PV backed by a hostPath apply the following yaml:
apiVersion: v1
kind: PersistentVolume
metadata:
name: stackoverflow-hostpath
namespace: migration
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
Kubernetes will exponentially try to schedule the Pod again; to speed things up delete one of your pods (kubectl delete pods <pod-name>) to reschedule it immediately.

Related

Azure Kubernetes Service - Increase Windows disk space

I'd like to host Windows containers, which act as build agents, at an Azure Kubernetes Service instance - unfortunately I can't increase the default 20GB pod disk space. I'd need more disk space for running build jobs at the pods.
The pod is getting deployed using an ADO pipeline by applying YAML which describes the workload.
Attaching the pod, and proving the disk space results in following:
PS: C:\ Get-PSDrive C
Name Used (GB) Free (GB) Provider Root
---- --------- --------- -------- ----
C 0.31 19.57 FileSystem C:\
Does anybody know how to increase the disk space?
At our on-premise cluster it is possible by adding
--storage-opt 50G
as parameter with regard to the modified Docker service parameter.
But how does it work for AKS?
Thank you a lot in advance!
We can increase the pod disk size in AKS by creating the disks manually using persistent volume
By default disk size will be 4GiB
For me its 30GiB, I increased to 50GiB
To increase the disk size please follow the below steps
I have created the storage class for disk
vi sc.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: azuredisk-premium-retain
provisioner: kubernetes.io/azure-disk
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters :
storageaccounttype: Premium_LRS
kind: Managed
To deploy the Storage class use below command
kubectl apply -f sc.yaml
Please use the below command to check the storage class created or not
kubectl get sc
I have created the persistent volume to to create the disk manually
vi pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: azure-managed-disk-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: azuredisk-premium-retain
resources:
requests:
storage: 50GiB
In the pvc file i am increasing the storage to 50GiB
To deploy the PVC use below commands
kubectl apply -f pvc.yaml
kubectl get pvc
I have created the pod for mounting the volume
vi pod.yaml
kind: Pod
apiVersion: v1
metadata:
name: newpod #pod name
spec:
containers:
- name: newpod
image: nginx:latest
volumeMounts:
- mountPath: "/mnt/azure" # mounting the volume
name: volume
volumes:
- name: volume
persistentVolumeClaim:
claimName: azure-managed-disk-pvc
To deploy the pod
kubectl apply -f pod.yaml
kubectl get pods
After deploying the pvc_file Go-To>portal>disks>search with pvc_name you created, disk will be increased with created with 50GiB
Previously it was 30GiB now it increased to 50GiB
NOTE : we cannot decrease the disk size once it increase
Reference:
MS-DOC

How to unset limit of pods to Unlimited in Kubernetes Yaml fle?

At first, I set limit range for namespace kube-system as below:
apiVersion: v1
kind: LimitRange
metadata:
name: cpu-limit-range
namespace: kube-system
spec:
limits:
- default:
cpu: 500m
memory: 500Mi
defaultRequest:
cpu: 100m
memory: 100Mi
type: Container
However, later found that there is insufficient CPU and Memory to start up my pod as limits are > 100% from namespace kube-system already.
How can I reset reasonable limits for pods in kube-system. It is better to set their limits to unlimitted but I don't know how to set it.
Supplement information for namespace kube-system:
Not sure if your kube-system namespace has a limit set. You can confirm it checking the namespace itself:
kubectl describe namespace kube-system
If you have a limit range or a resource quota set, it will appear in the description. Something like the following:
Name: default-cpu-example
Labels: <none>
Annotations: <none>
Status: Active
No resource quota.
Resource Limits
Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio
---- -------- --- --- --------------- ------------- -----------------------
Container cpu - - 500m 1 -
In this case I have set resource limits for my namespace.
Now I can list all the ResourceQuotas and LimitRanges using:
kubectl get resourcequotas -n kube-system
kubectl get limitranges -n kube-system
If somethings returns, from those you can simply remove it:
kubectl delete resourcequotas NAME_OF_YOUR_RESOURCE_QUOTA -n kube-system
kubectl delete limitranges NAME_OF_YOUR_LIMIT_RANGE -n kube-system
I'm still not sure if that's your true problem, but that answers your question.
You can find more info here:
https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/cpu-default-namespace/

How to configure a manually provisioned Azure Managed Disk to use as a Kubernetes persistent volume?

I'm trying to run the Jenkins Helm chart. As part of this setup, I'd like to pass in a persistent volume that I provisioned ahead of time (or perhaps exported from another cluster during a migration).
I'm trying to get my persistent volume (PV) and persistent volume claim (PVC) setup in a such a way that when Jenkins starts, it uses my predefined PV and PVC.
I think the problem originates from the persistent storage definition for the Azure disk points to a VHD in my storage account. Is there any way to point it to an existing managed disk -and not a blob?
This is how I setup my persistent storage using Azure Managed Disk
apiVersion: v1
kind: PersistentVolume
metadata:
name: jenkins-home
spec:
capacity:
storage: 10Gi
storageClassName: default
azureDisk:
diskName: jenkins-home
diskURI: https://<storageaccount>.blob.core.windows.net/jenkins-data/jenkins-home.vhd
fsType: ext4
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
claimRef:
name: jenkins-home-pvc
namespace: default
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jenkins-home-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: default
I then start helm like this...
helm install --name jenkins stable/jenkins --values=values.yaml
Where my values.yaml file looks like
Persistence:
ExistingClaim: jenkins-home-pvc
Here is the error I receive when the Jenkins' pod starts.
AttachVolume.Attach failed for volume "jenkins-home" : Attach volume "jenkins-home" to instance "aks-agentpool-40897452-0" failed with compute.VirtualMachinesClient#CreateOrUpdate: Failure responding to request: StatusCode=409 -- Original Error: autorest/azure: Service returned an error. Status=409 Code="OperationNotAllowed" Message="Addition of a blob based disk to VM with managed disks is not supported."
I posed this question to the Azure team here.
Through their help I arrived at the following solution...
I had tried to use the managed disk resource ID before but it yelled at me saying it expected a .vhd file. But after adding 'kind: Managed', it was perfectly happy to take the managed disk resource id.
Creating an empty and formatted managed disk is of course a pre-requisite for this to work. Copying the managed disk into the same resource group as the AKS cluster was also required.
So now my PV and PVC look like this and it's working...
apiVersion: v1
kind: PersistentVolume
metadata:
name: jenkins-home
spec:
capacity:
storage: 10Gi
storageClassName: default
azureDisk:
kind: Managed
diskName: jenkins-home
diskURI: /subscriptions/{subscription-id}/resourceGroups/{aks-controlled-resource-group-name}/providers/Microsoft.Compute/disks/jenkins-home
fsType: ext4
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
claimRef:
name: jenkins-home-pvc
namespace: default
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jenkins-home-pvc
annotations:
volume.beta.kubernetes.io/storage-class: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: default

PVC volume mount on Azure kubernetes takes over an hour

I have tectonic kubernetes cluster installed on Azure. It's made from tectonic-installer GH repo, from master (commit 0a7a1edb0a2eec8f3fb9e1e612a8ef1fd890c332).
> kubectl version
Client Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.2", GitCommit:"922a86cfcd65915a9b2f69f3f193b8907d741d9c", GitTreeState:"clean", BuildDate:"2017-07-21T08:23:22Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"7", GitVersion:"v1.7.3+coreos.0", GitCommit:"42de91f04e456f7625941a6c4aaedaa69708be1b", GitTreeState:"clean", BuildDate:"2017-08-07T19:44:31Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
On the cluster I created storage class, PVC and pod as in: https://gist.github.com/mwieczorek/28b7c779555d236a9756cb94109d6695
But the pod cannot start. When I run:
kubectl describe pod mypod
I get in events:
FailedMount Unable to mount volumes for pod "mypod_default(afc68bee-88cb-11e7-a44f-000d3a28f26a)":
timeout expired waiting for volumes to attach/mount for pod "default"/"mypod". list of unattached/unmounted volumes=[mypd]
In kubelet logs (https://gist.github.com/mwieczorek/900db1e10971a39942cba07e202f3c50) I see:
Error: Volume not attached according to node status for volume "pvc-61a8dc6a-88cb-11e7-ad19-000d3a28f2d3"
(UniqueName: "kubernetes.io/azure-disk//subscriptions/abc/resourceGroups/tectonic-cluster-mwtest/providers/Microsoft.Compute/disks/kubernetes-dynamic-pvc-61a8dc6a-88cb-11e7-ad19-000d3a28f2d3") pod "mypod" (UID: "afc68bee-88cb-11e7-a44f-000d3a28f26a")
When I create PVC - new disc on Azure is created.
And after creating pod - I see on the azure portal that the disc is attached to worker VM where the pod is scheduled.
> fdisk -l
shows:
Disk /dev/sdc: 2 GiB, 2147483648 bytes, 4194304 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
I found similar issue on GH ( kubernetes/kubernetes/issues/50150) but I have cluster built from master so it's not the udev rules (I checked - file /etc/udev/rules.d/66-azure-storage.rules exists)
Does anybody knows if it's a bug (maybe know issue)?
Or am I doing something wrong?
Also: how can I troubleshoot that further?
I had test in lab, use your yaml file to create pod, after one hour, it still show pending.
root#k8s-master-ED3DFF55-0:~# kubectl get pod
NAME READY STATUS RESTARTS AGE
mypod 0/1 Pending 0 1h
task-pv-pod 1/1 Running 0 2h
We can use this yaml file to create pod:
PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mypvc
namespace: kube-public
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
Output:
root#k8s-master-ED3DFF55-0:~# kubectl get pvc --namespace=kube-public
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
mypvc Bound pvc-1b097337-8960-11e7-82fc-000d3a191e6a 100Gi RWO default 3h
Pod:
kind: Pod
apiVersion: v1
metadata:
name: task-pv-pod
spec:
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: task-pv-claim
containers:
- name: task-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: task-pv-storage
Output:
root#k8s-master-ED3DFF55-0:~# kubectl get pods
NAME READY STATUS RESTARTS AGE
task-pv-pod 1/1 Running 0 3h
As a workaround, we can use default as the storageclass.
In Azure, there are managed disk and unmanaged disk. if your nodes are use managed disk, two storage classes will be created to provide access to create Kubernetes persistent volumes using Azure managed disks.
They are managed-premium and managed-standard and map to Standard_LRS and Premium_LRS managed disk types respectively.
If your nodes are use non-managed disk, the default storage class will be used if persistent volume resources don't specify a storage class as part of the resource definition.
The default storage class uses non-managed blob storage and will provision the blob within an existing storage account present in the resource group or provision a new storage account.
Non-managed persistent volume types are available on all VM sizes.
More information about managed disk and non-managed disk, please refer to this link.
Here is the test result:
root#k8s-master-ED3DFF55-0:~# kubectl get pvc --namespace=default
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
shared Pending standard-managed 2h
shared1 Pending managed-standard 15m
shared12 Pending standard-managed 14m
shared123 Bound pvc-a379ced4-897c-11e7-82fc-000d3a191e6a 2Gi RWO default 12m
task-pv-claim Bound pvc-3cefd456-8961-11e7-82fc-000d3a191e6a 3Gi RWO default 3h
Update:
Here is my K8s agent's unmanaged disk:
In your case, "kubectl describe pod-name" does not provide suffiecient info, you need to provide k8s contoller manager logs for troubleshooting
Get the controller manager logs on master:
#get the "CONTAINER ID" of "/hyperkube controlle"
docker ps -a | grep "hyperkube controlle" | awk -F ' ' '{print $1}'
#get controller manager logs
docker logs "CONTAINER ID" > "CONTAINER ID".log 2>&1 &
Provisioning should be very quick. Check your controller logs to make sure the PV required by the PVC is provisioned correctly:
Navigate to Azure portal > cluster > Activity Log
Remove filter for namespaces and look for "Update Storage Account Create" entries.
In our case we needed to register our cluster subscription for the 'Microsoft.Storage' namespace so that the controller can provision the required PV. You can do this with the azure cli:
az provider register --namespace Microsoft.Storage
I had a similar issue, this command worked for me.
az resource update --ids /subscriptions/<SUBSCRIPTION-ID>/resourcegroups/<RESOURCE-GROUP>/providers/Microsoft.ContainerService/managedClusters/<AKS-CLUSTER-NAME>/agentpools/<NODE-GROUP-NAME>

Kubernetes Unable to mount volumes for pod with timeout

I am trying to mount an NFS volume to my pods but with no success.
I have a server running the nfs mount point, when I try to connect to it from some other running server
sudo mount -t nfs -o proto=tcp,port=2049 10.0.0.4:/export /mnt works fine
Another thing worth mentioning is when I remove the volume from the deployment and the pod is running. I log into it and i can telnet to 10.0.0.4 with ports 111 and 2049 successfully. so there really doesnt seem to be any communication problems
as well as:
showmount -e 10.0.0.4
Export list for 10.0.0.4:
/export/drive 10.0.0.0/16
/export 10.0.0.0/16
So I can assume that there is no network or configuration problems between the server and the client (I am using Amazon and the server that i tested on is in the same security group as the k8s minions)
P.S:
The server is a simple ubuntu->50gb disk
Kubernetes v1.3.4
So I start creating my PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs
spec:
capacity:
storage: 50Gi
accessModes:
- ReadWriteMany
nfs:
server: 10.0.0.4
path: "/export"
And my PVC
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nfs-claim
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 50Gi
here is how kubectl describes them:
Name: nfs
Labels: <none>
Status: Bound
Claim: default/nfs-claim
Reclaim Policy: Retain
Access Modes: RWX
Capacity: 50Gi
Message:
Source:
Type: NFS (an NFS mount that lasts the lifetime of a pod)
Server: 10.0.0.4
Path: /export
ReadOnly: false
No events.
AND
Name: nfs-claim
Namespace: default
Status: Bound
Volume: nfs
Labels: <none>
Capacity: 0
Access Modes:
No events.
pod deployment:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: mypod
labels:
name: mypod
spec:
replicas: 1
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
name: mypod
labels:
# Important: these labels need to match the selector above, the api server enforces this constraint
name: mypod
spec:
containers:
- name: abcd
image: irrelevant to the question
ports:
- containerPort: 80
env:
- name: hello
value: world
volumeMounts:
- mountPath: "/mnt"
name: nfs
volumes:
- name: nfs
persistentVolumeClaim:
claimName: nfs-claim
When I deploy my POD i get the following:
Volumes:
nfs:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: nfs-claim
ReadOnly: false
default-token-6pd57:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-6pd57
QoS Tier: BestEffort
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
13m 13m 1 {default-scheduler } Normal Scheduled Successfully assigned xxx-2140451452-hjeki to ip-10-0-0-157.us-west-2.compute.internal
11m 7s 6 {kubelet ip-10-0-0-157.us-west-2.compute.internal} Warning FailedMount Unable to mount volumes for pod "xxx-2140451452-hjeki_default(93ca148d-6475-11e6-9c49-065c8a90faf1)": timeout expired waiting for volumes to attach/mount for pod "xxx-2140451452-hjeki"/"default". list of unattached/unmounted volumes=[nfs]
11m 7s 6 {kubelet ip-10-0-0-157.us-west-2.compute.internal} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "xxx-2140451452-hjeki"/"default". list of unattached/unmounted volumes=[nfs]
Tried everything I know, and everything i can think of. What am i missing or doing wrong here?
I tested version 1.3.4 and 1.3.5 of Kubernetes and NFS mount didn't work for me. Later I switched to the 1.2.5 and that version gave me some more detailed info ( kubectl describe pod ...). It turned out that 'nfs-common' is missing in the hyperkube image. After I added nfs-common to all container instances based on hyperkube image on master and worker nodes the NFS share started to work normally (mount was successful). So that's the case here. I tested it in practice and it solved my problem.

Resources