What happens when I stop aks cluster and start?
Will my pods remain in the same state?
Do the node pool and nodes inside that stop?
Do the services inside the cluster still runs and cost me if it is a load balancer?
Stopping cluster will lost all the pods and starting it again it will create a new pod with the same name but Ip address of pod will changes.
Pods are only scheduled once in their lifetime. Once a Pod is scheduled (assigned) to a Node, the Pod runs on that Node until it stops or is terminated.
Do the node pool and nodes inside that stop?Do the services inside the cluster still runs and cost me if it is a load balancer?
Yes It will Stop the nodes and Complete Node Pool as well.Service Inside the cluster will also stop and it will not cost as well.
Reference : https://learn.microsoft.com/en-us/azure/aks/start-stop-cluster?tabs=azure-cli
Related
I know that with Azure AKS , master components are fully managed by the service. But I'm a little confused here when it comes to pick the node pools. I understand that there are two kind of pools system and user, where the user nodes pool offer hosting my application pods. I read on official documentation that System node pools serve the primary purpose of hosting critical system pods such as CoreDNS and tunnelfront. And i'm aware that we can only rely on system nodes to create and run our kubernetes cluster.
My question here, do they mean here by the system node the MASTER node ? If it is so, why then we have the option to not create the user nodes (worker node by analogy) ? because as we know -in on prem kubernetes solution- we cannot create a kubernetes cluster with master nodes only.
I'll appreciate any help
System node pools in AKS does not contain Master nodes. Master nodes in AKS are 100% managed by Azure and are outside your VNet. A system node pool contains worker nodes on which AKS automatically assigns the label kubernetes.azure.com/mode: system, that's about it. AKS then use that label to deploy critical pod like tunnelfront, which is use to create a secure communication from your nodes to the control plane. You need at least 1 system node pool per cluster and they have the following restrictions :
System pools osType must be Linux.
System pools must contain at least one node, and user node pools may contain zero or more nodes.
System node pools require a VM SKU of at least 2 vCPUs and 4GB memory. But burstable-VM(B series) is not recommended.
System node pools must support at least 30 pods as described by the minimum and maximum value formula for pods.
I am using AWS EKS (managed Kubernetes service) and Fargate (managed nodes) to deploy a pod running a nodejs React service on port 5000. The pod switches from "Running" state to "Terminating" state continuously immediately after deployment to Fargate. Eventually, it settles on "Running". Other pods are running fine on Fargate.
I am unable to view the logs due to Kubernetes reporting net/http: TLS handshake timeout .
The service is fronted by AWS Application Load Balancer (ALB). In the target group, I can see continuous registration and deregistration of the pod/node IP.
How can I troubleshoot this further?
Some ways to troubleshoot:
With kubectl, if your pods are run with a K8s deployment:
kubectl describe deployment <deployment-name> 👈 check for events
With kubectl, before the pod goes into Terminating
kubectl logs <pod-id>
kubectl describe pod <pod-id> 👈 check for events
Check EKS control plane logs in the S3 bucket where you are sending them to.
The idea is here is to troubleshoot with the Kubernetes tools.
It appears the React service was taking a long time to start due the compute allocation 0.25 vCPU and 0.5 GB, and eventually failing after 10 minutes. We set the following resource requests and limits in the deployment manifest. The pod starts within a couple of minutes without problems.
resources:
limits:
cpu: 1000m
memory: 2000Mi
requests:
cpu: 800m
memory: 1500Mi
I wanted to install Kubeflow into the Azure, So I started off creating an Azure Kubernetes Cluster(AKS) with a single node(B4MS virtual machine). During the installation, I didn't enable the virtual node pool option. After creating the AKS cluster, I ran the command "$ kubectl describe node aks-agentpool-3376354-00000" to check the specs. The allocatable number of Pods were 110 and I was able to install Kubeflow without any issues. However, sometime later I wanted an AKS Cluster with virtual node pool enabled so I could use GPUs for training. So I deleted the old Cluster and created a new AKS Cluster with the same B4MS virtual machine and with the virtual node pool option enabled. This time when I ran the same command as above to describe the node specs, the allocatable number of Pods were 30 and the kubeflow installation failed due to lack of pods to provision.
Can someone explain me why the number of allocatable Pods change when the virtual node option is enabled or disabled? How do I maintain the number of allocatable Pods as 110 while having the virtual node pool option enabled?
Thank you in advance!
Virtual Node Pool requires the usage of the Advance Networking configuration of AKS which brings in AZURE CNI network plugin.
The Default POD count per node on AKS when using AZURE CNI is 30 pods.
https://learn.microsoft.com/en-us/azure/aks/configure-azure-cni#maximum-pods-per-node
This is the main reason why you are now getting 30 MAX pods per node.
This can be updated to a bigger number when using the AZ CLI to provision your cluster.
https://learn.microsoft.com/en-us/cli/azure/ext/aks-preview/aks?view=azure-cli-latest#ext-aks-preview-az-aks-create
--max-pods -m
The maximum number of pods deployable to a node.
We have a k8s deployment of several services including Apache Spark. All services seem to be operational. Our application connects to the Spark master to submit a job using the k8s DNS service for the cluster where the master is called spark-api so we use master=spark://spark-api:7077 and we use spark.submit.deployMode=cluster. We submit the job through the API not by the spark-submit script.
This will run the "driver" and all "executors" on the cluster and this part seems to work but there is a callback to the launching code in our app from some Spark process. For some reason it is trying to connect to harness-64d97d6d6-4r4d8, which is the pod ID, not the k8s cluster IP or DNS.
How could this pod ID be getting into the system? Spark somehow seems to think it is the address of the service that called it. Needless to say any connection to the k8s pod ID fails and so does the job.
Any idea how Spark could think the pod ID is an IP address or DNS name?
BTW if we run a small sample job with master=local all is well, but the same job executed with the above config tries to connect to the spurious pod ID.
BTW2: the k8s DNS for the calling pod is harness-api
You can consider to use Headless service for harness-64etcetc Pod in order to accomplish backward DNS discovery. Actually, it will create particular endpoint for the relevant service by matching appropriate selector inside your application Pod and as result A record expects to be added into Kubernetes DNS configuration.
Eventually, I've found related #266 Github issue, which probably can bring some useful information for further investigation.
i use google k8s as a service with preemptible instances.
i faced with problem when google preempt one of node which serving kube-dns pod i get 5-7 mins failures in all another pods with "Cannot resolve" error.
I tried run second kube-dns pod but sometime both dns is running on the same node and i get failures again. I tried define nodeSelector for kube-dns pod but got error
Pod "kube-dns-2185667875-8b42l" is invalid: spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`
There is a possibility to run dns pods on different nodes redundantly? Maybe there are any best practice?
You can not modify POD like this, you need to modify your Deployment. Also you might want to look into pod anti-affinity to separate your pods in the same deployment in a way that will never schedule them on the same node. Alternatively, you can also switch from Deployment to DaemonSet to get exactly one pod running per node in cluster.