Docker Registry Stays Pending After Deployment - openshift-enterprise

I have installed OpenShift Enterprise as per the online guide (quick installation) but I'm stuck at deploying the registry.
[https://docs.openshift.com/enterprise/3.0/admin_guide/install/docker_registry.html#deploy-registry][1]
I create the registry
oadm registry --config=/etc/openshift/master/admin.kubeconfig \
--credentials=/etc/openshift/master/openshift-registry.kubeconfig \
--images='registry.access.redhat.com/openshift3/ose-${component}:${version}'
I check that it was configured
[justin#172 ~]$ oc get se docker-registry
NAME LABELS SELECTOR IP(S) PORT(S)
docker-registry docker-registry=default docker-registry=default 172.30.144.220 5000/TCP
But it never runs it stays pending
[justin#172 ~]$ oc get pods
NAME READY STATUS RESTARTS AGE
docker-registry-1-deploy 0/1 Pending 0 2h
I try to get some more info
[justin#172 ~]$ oc logs docker-registry-1-deploy
[justin#172 ~]$
but the logs command returns nothing

I had attempted an install with one node sharing the machine with the master.
My nodes looked like this:
[root#master ~]# oc get nodes
NAME LABELS STATUS
master.mydomain.com kubernetes.io/hostname=master.mydomain.com Ready,SchedulingDisabled
Note: SchedulingDisabled
I ran this command:
oc describe pod docker-registry-1-deploy
And it gave the reason for not being deployed which was that there were no nodes to schedule a deployment on. Just to get things going quickly I performed the install again added a node on another VM.
Then
[root#master ~]# oc get nodes
NAME LABELS STATUS
master.mydomain.com kubernetes.io/hostname=master.mydomain.com Ready,SchedulingDisabled
node1.mydomain.com kubernetes.io/hostname=node1.mydomain.com Ready
and I managed to successfully deploy the registry.

Related

Could not get apiVersions from Kubernetes: Unable to retrieve the complete list of server APIs

While trying to deploy an application got an error as below:
Error: UPGRADE FAILED: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Output of kubectl api-resources consists some resources along with the same error in the end.
Environment: Azure Cloud, AKS Service
Solution:
The steps I followed are:
kubectl get apiservices : If metric-server service is down with the error CrashLoopBackOff try to follow the step 2 otherwise just try to restart the metric-server service using kubectl delete apiservice/"service_name". For me it was v1beta1.metrics.k8s.io .
kubectl get pods -n kube-system and found out that pods like metrics-server, kubernetes-dashboard are down because of the main coreDNS pod was down.
For me it was:
NAME READY STATUS RESTARTS AGE
pod/coredns-85577b65b-zj2x2 0/1 CrashLoopBackOff 7 13m
Use kubectl describe pod/"pod_name" to check the error in coreDNS pod and if it is down because of /etc/coredns/Corefile:10 - Error during parsing: Unknown directive proxy, then we need to use forward instead of proxy in the yaml file where coreDNS config is there. Because CoreDNS version 1.5x used by the image does not support the proxy keyword anymore.
This error happens commonly when your metrics server pod is not reachable by the master node. Possible reasons are
metric-server pod is not running. This is the first thing you should check. Then look at the logs of the metric-server pod to check if it has some permission issues trying to get metrics
Try to confirm communication between master and slave nodes.
Try running kubectl top nodes and kubectl top pods -A to see if metric-server runs ok.
From these points you can proceed further.

exec user process caused "exec format error" during setup

I'm trying to install haproxy-ingress under Kubernetes ver 1.18 (hosted on raspberry pi).
The master node has been correctly labeled with role=ingress-controller.
The kubectl create works also fine:
# kubectl create -f https://haproxy-ingress.github.io/resources/haproxy-ingress.yaml
namespace/ingress-controller created
serviceaccount/ingress-controller created
clusterrole.rbac.authorization.k8s.io/ingress-controller created
role.rbac.authorization.k8s.io/ingress-controller created
clusterrolebinding.rbac.authorization.k8s.io/ingress-controller created
rolebinding.rbac.authorization.k8s.io/ingress-controller created
configmap/haproxy-ingress created
daemonset.apps/haproxy-ingress created
But then, the pod is in crash loop:
# kubectl get pods -n ingress-controller -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
haproxy-ingress-dpcvc 0/1 CrashLoopBackOff 1 30s 192.168.1.101 purple.cloudlet <none> <none>
And the logs shows that error:
# kubectl logs haproxy-ingress-dpcvc -n ingress-controller
standard_init_linux.go:211: exec user process caused "exec format error"
Does anyone experience something similar? Can this be related to the arm (32-bit) architecture of the raspbian that I'm using?
raspberry pi's run arm architectures which unfortunately are not supported by haproxy-ingress.

Kops rolling-update fails with "Cluster did not pass validation" for master node

For some reason my master node can no longer connect to my cluster after upgrading from kubernetes 1.11.9 to 1.12.9 via kops (version 1.13.0). In the manifest I'm upgrading kubernetesVersion from 1.11.9 -> 1.12.9. This is the only change I'm making. However when I run kops rolling-update cluster --yes I get the following error:
Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-01234567" has not yet joined cluster.
Cluster did not validate within 5m0s
After that if I run a kubectl get nodes I no longer see that master node in my cluster.
Doing a little bit of debugging by sshing into the disconnected master node instance I found the following error in my api-server log by running sudo cat /var/log/kube-apiserver.log:
controller.go:135] Unable to perform initial IP allocation check: unable to refresh the service IP block: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: connect: connection refused
I suspect the issue might be related to etcd, because when I run sudo netstat -nap | grep LISTEN | grep etcd there is no output.
Anyone have any idea how I can get my master node back in the cluster or have advice on things to try?
I have made some research I got few ideas for you:
If there is no output for the etcd grep it means that your etcd server is down. Check the logs for the 'Exited' etcd container | grep Exited | grep etcd and than logs <etcd-container-id>
Try this instruction I found:
1 - I removed the old master from de etcd cluster using etcdctl. You
will need to connect on the etcd-server container to do this.
2 - On the new master node I stopped kubelet and protokube services.
3 - Empty Etcd data dir. (data and data-events)
4 - Edit /etc/kubernetes/manifests/etcd.manifests and
etcd-events.manifest changing ETCD_INITIAL_CLUSTER_STATE from new to
existing.
5 - Get the name and PeerURLS from new master and use etcdctl to add
the new master on the cluster. (etcdctl member add "name"
"PeerULR")You will need to connect on the etcd-server container to do
this.
6 - Start kubelet and protokube services on the new master.
If that is not the case than you might have a problem with the certs. They are provisioned during the creation of the cluster and some of them have the allowed master's endpoints. If that is the case you'd need to create new certs and roll them for the api server/etcd clusters.
Please let me know if that helped.

AKS using Kubernetes : not able to connect to cluster nodes once logged in to the cluster through azure-cli on Ubuntu

I am getting issues when trying to getting the information about the nodes created using AKS(Azure Connected Service) for Kubernetes after the execution of creating the clusters and getting the credentials.
I am using the azure-cli on ubuntu linux machine.
Followed the Url for creation of clusters: https://learn.microsoft.com/en-us/azure/aks/kubernetes-walkthrough
I get the following error when using the command kubectl get nodes
after execution of connecting to cluster using
az aks get-credentials --resource-group <resource_group_name> --name <cluster_name>
Error:
kubectl get nodes
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get nodes)
I do get the same error when i use :
kubectl get pods -n kube-system -o=wide
When i connect back as another user by the following commands i.e.,
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
I will be able to retrieve the nodes i.e..,
kubectl get nodes
NAME STATUS ROLES AGE VERSION
<host-name> Ready master 20m v1.10.0
~$ kubectl get pods -n kube-system -o=wide
NAME READY STATUS RESTARTS AGE
etcd-actaz-prod-nb1 1/1 Running 0
kube-apiserver-actaz-prod-nb1 1/1 Running 0
kube-controller-manager-actaz-prod-nb1 1/1 Running 0
kube-dns-86f4d74b45-4qshc 3/3 Running 0
kube-flannel-ds-bld76 1/1 Running 0
kube-proxy-5s65r 1/1 Running 0
kube-scheduler-actaz-prod-nb1 1/1 Running 0
But this is actually overwriting newly clustered information from file $HOME/.kube/config
Am i missing something when we connect to AKS-cluster get-credentials command-let that's leading me to the error
*Error from server (InternalError): an error on the server ("") has prevented the request from succeeding (get nodes)*
After you
az aks get-credentials -n cluster-name -g resource-group
If should have merged to your local configuration:
/home/user-name/.kube/config
Can you check your config
kubectl config view
And check if it is pointing to the right cluster.
Assuming you have chosen default configuartion while deploying AKS. So You need to create SSH key pair to login to AKS Node.
Push above created public key to AKS node using "az vm user update" {plz take help to know what all switch you need to pass. It quite simple)
To create an SSH connection to an AKS node, you run a helper pod in your AKS cluster. This helper pod provides you with SSH access into the cluster and then additional SSH node access.
To create and use this helper pod, complete the following steps:
- Run a debian (or any other container like centos7 etc) container image and attach a terminal session to it. This container can be used to create an SSH session with any node in the AKS cluster:
kubectl run -it --rm aks-ssh --image=debian
The base Debian image doesn't include SSH components.
apt-get update && apt-get install openssh-client -y
Copy private key (the one you created in the begining to pod) using kubelet cmd. kubelet toolkit must be present on your machine from where you created ssh pair.
kubectl cp :/
Now you will see private key file on your container location, change the private key permission to 600 and now able to ssh your AKS node
Hope this helps.

Cannot get kube-dns to start on Kubernetes

Hoping someone can help.
I have a 3x node CoreOS cluster running Kubernetes. The nodes are as follows:
192.168.1.201 - Controller
192.168.1.202 - Worker Node
192.168.1.203 - Worker Node
The cluster is up and running, and I can run the following commands:
> kubectl get nodes
NAME STATUS AGE
192.168.1.201 Ready,SchedulingDisabled 1d
192.168.1.202 Ready 21h
192.168.1.203 Ready 21h
> kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
kube-apiserver-192.168.1.201 1/1 Running 2 1d
kube-controller-manager-192.168.1.201 1/1 Running 4 1d
kube-dns-v20-h4w7m 2/3 CrashLoopBackOff 15 23m
kube-proxy-192.168.1.201 1/1 Running 2 1d
kube-proxy-192.168.1.202 1/1 Running 1 21h
kube-proxy-192.168.1.203 1/1 Running 1 21h
kube-scheduler-192.168.1.201 1/1 Running 4 1d
As you can see, the kube-dns service is not running correctly. It keeps restarting and I am struggling to understand why. Any help in debugging this would be greatly appreciated (or pointers at where to read about debugging this. Running kubectl logs does not bring anything back...not sure if the addons function differently to standard pods.
Running a kubectl describe pods, I can see the containers are killed due to being unhealthy:
16m 16m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Created Created container with docker id 189afaa1eb0d; Security:[seccomp=unconfined]
16m 16m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Started Started container with docker id 189afaa1eb0d
14m 14m 1 {kubelet 192.168.1.203} spec.containers{kubedns} Normal Killing Killing container with docker id 189afaa1eb0d: pod "kube-dns-v20-h4w7m_kube-system(3a545c95-ea19-11e6-aa7c-52540021bfab)" container "kubedns" is unhealthy, it will be killed and re-created
Please find a full output of this command as a github gist here: https://gist.github.com/mehstg/0b8016f5398a8781c3ade8cf49c02680
Thanks in advance!
If you installed your cluster with kubeadm you should add a pod network after installing.
If you choose flannel as your pod network, you should have this argument in your init command kubeadm init --pod-network-cidr 10.244.0.0/16.
The flannel YAML file can be found in the coreOS flannel repo.
All you need to do if your cluster was initialized properly (read above), is to run kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Once this is up and running (it will create pods on every node), your kube-dns pod should come up.
If you need to reset your installation (for example to add the argument to kubeadm init), you can use kubeadm reset on all nodes.
Normally, you would run the init command on the master, then add a pod network, and then add your other nodes.
This is all described in more detail in the Getting started guide, step 3/4 regarding the pod network.
as your gist says your pod network seems to be broken. You are using some custom podnetwork with 10.10.10.X. You should communicate this IPs to all components.
Please check, there is no collision with other existing nets.
I recommend you to setup with Calico, as this was the solution for me to bring up CoreOS k8s up working
After followed the steps in the official kubeadm doc with flannel networking, I run into a similar issue
http://janetkuo.github.io/docs/getting-started-guides/kubeadm/
It appears as networking pods get stuck in error states:
kube-dns-xxxxxxxx-xxxvn (rpc error)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
kube-flannel-ds-xxxxx (CrashLoopBackOff)
In my case it is related to rbac permission errors and is resolved by running
kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel-rbac.yml
Afterwards, all kube-system pods went into running states. The upstream issue is discussed on github https://github.com/kubernetes/kubernetes/issues/44029

Resources