Who manages the nodes in an AKS cluster? - azure

I started using the AKS service with 3 nodes setup. As I was curious I peeked at the provisioned VMs which are used as nodes. I noticed I can get root on these and that there need to be some updates installed. As I couldn't find anything in the docs, my question is: Who is in charge of managing the AKS nodes (vms).
Do I have to do this myself or what is the idea here?
Thank you in advance.

Azure automatically applies security patches to the nodes in your cluster on a nightly schedule. However, you are responsible for ensuring that nodes are rebooted as required.
You have several options for performing node reboots:
Manually, through the Azure portal or the Azure CLI.
By upgrading your AKS cluster. Cluster upgrades automatically cordon
and drain nodes, then bring them back up with the latest Ubuntu
image. Update the OS image on your nodes without changing Kubernetes
versions by specifying the current cluster version in az aks
upgrade.
Using Kured, an open-source reboot daemon for Kubernetes.
Kured runs as a DaemonSet and monitors each node for the presence of
a file indicating that a reboot is required. It then manages OS
reboots across the cluster, following the same cordon and drain
process described earlier.

Related

what is an agent node in aks

I found the mention of an agent node in the aks documentation but i'm not finding the defition of it. can anyone please explain it to ? also want to know if is it an azure concept or a kubernetes concept.
Regards,
In Kubernetes the term node refers to a compute node. Depending on the role of the node it is usually referred to as control plane node or worker node. From the docs:
A Kubernetes cluster consists of a set of worker machines, called nodes, that run containerized applications. Every cluster has at least one worker node.
The worker node(s) host the Pods that are the components of the application workload. The control plane manages the worker nodes and the Pods in the cluster. In production environments, the control plane usually runs across multiple computers and a cluster usually runs multiple nodes, providing fault-tolerance and high availability.
Agent nodes in AKS refers to the worker nodes (which should not be confused with the Kubelet, which is the primary "node agent" that runs on each worker node)

How can I upgrade the AKS cluster using terraform without downtime

I want to upgrade my AKS cluster using terraform without or with minimal downtime.
What happens to the workloads during the cluster upgrade.
Can i do the AKS cluster upgrade and node upgrade same time.
Azure provides the Scheduled AKS cluster maintenance (preview feature) , is it Azure does the cluster upgrade?
You have several questions listed here so I will try to answer them as best as I can. Your questions are generic and not related to Terraform, so I will address Terraform separately at the bottom.
What happens to the workloads during the cluster upgrade.
During an upgrade, it depends on whether Azure is doing the upgrade, or you are doing it manually. If Azure does the upgrade, it may be disruptive depending on the settings you choose when you create the cluster.
If you do the upgrade yourself, you can do it with no downtime, but it does require some azure cli usage due to how the AKS terraform code is designed.
Can i do the AKS cluster upgrade and node upgrade same time.
Yes. If your node is out of date and you schedule a cluster upgrade, the nodes will be brought up to date in the process of upgrading the cluster.
Azure provides the Scheduled AKS cluster maintenance (preview feature) , is it Azure does the cluster upgrade?
No. A different setting determines if Azure does the upgrade. This Scheduled Maintenance feature is designed to allow you to specify what times and days Microsoft is NOT allowed to do maintenance. The default when you don't specify a Scheduled Maintenance is that Microsoft may perform upgrades at any time:
https://learn.microsoft.com/en-us/azure/aks/planned-maintenance
Your AKS cluster has regular maintenance performed on it automatically. By default, this work can happen at any time. Planned Maintenance allows you to schedule weekly maintenance windows that will update your control plane as well as your kube-system Pods on a VMSS instance and minimize workload impact
The feature you are looking for regarding AKS performing cluster upgrades is called Cluster Autoupgrade, and you can read about that here: https://learn.microsoft.com/en-us/azure/aks/upgrade-cluster#set-auto-upgrade-channel-preview
Now in regards to performing a cluster upgrade with Terraform. Currently, due to how azurerm_kubernetes_cluster is designed, it is not possible to perform an upgrade of a cluster using only Terraform. Some azure-cli usage is required. It is possible to perform a cluster upgrade without downtime, but not possible by exclusively using Terraform. The steps to perform such an upgrade are detailed pretty well in this blog post: https://blog.gft.com/pl/2020/08/26/zero-downtime-migration-of-azure-kubernetes-clusters-managed-by-terraform/
AKS Cluster uses concept of buffer node when upgrade is performed. It brings a buffer node, move the workload to buffer node and upgrades the actual node. Time taken to upgrade the cluster depends on number of nodes in the cluster.
https://learn.microsoft.com/en-us/azure/aks/upgrade-cluster#upgrade-an-aks-cluster
You can upgrade Control Plane as well as Hosted Plane using Azure CLI.
#az aks upgrade --resource-group <ResourceGroup> --name <ClusterName> -k <KubernetesVersion>

What you cannot do when you use AKS over self-managed kubernetes cluster?

I'm deciding if I should provide use vanilla kubernetes or use Azure Kubernetes Service for my CI build agents.
What control will I lose if used AKS; SSH inside cluster? turning on and off the VMS? How about the cost, I see that AKS use the VM pricing, is there something beyond that
There are several limitations which come to my mind, but neither of them should restrict your use case:
You lose control over master nodes (control plane). Shouldn't be an issue in your use case, and I hardly imagine where this may be a limitation. You still can SSH into worker nodes in AKS.
You lose fine-grained control over size of worker nodes. Node pools become an abstraction to control size of the VMs. In a self-managed cluster you can attach VMs of completely different size to the cluster. In AKS all the nodes in the same pool must be of the same size (but you can create several node pools with different VM sizes).
It's not possible to choose node's OS in AKS (it's Ubuntu-based).
You're not flexible in choosing network plugins for k8s. It's either kubenet or Azure CNI. But that's fine as long as you're not using some weird applications which requre L2 networking, more info here
There are definitely benefits of AKS:
You're not managing control plane which is a real pain reliever.
AKS can scale its nodes dynamically, which may be a good option for bursty workloads like build agents, but also imposes additional delay during node scaling procedure.
Cluster (control and data planes) upgrades are just couple of clicks in azure portal.
Control plane is free in AKS (in contrast e.g. to EKS in Amazon), you pay only for the worker nodes, you can calculate your price here

Can a kubernetes cluster share windows and linux nodes?

I am trying to create windows nodes in an already existing kubernetes cluster in Azure. The kubernetes cluster has two Linux nodes running on them.
I am trying to use az aks cli to create windows nodes but I don't see any option.
So can we create Linux and Windows nodes in the same kubernetes cluster? If yes, How?
Yes, this is posible, but not using the CLI\portal (at this stage). You need to use ACS engine.
You need to use this definition (adjust it to your needs):
https://github.com/Azure/acs-engine/blob/master/examples/windows/kubernetes-hybrid.json
There is a bit of a learning curve, but not that hard.
https://github.com/Azure/acs-engine/blob/master/docs/kubernetes/deploy.md
https://github.com/Azure/acs-engine/blob/master/docs/clusterdefinition.md

Declarative way of specifying units that should run on a CoreOS cluster?

Is there a declarative way of specifying what units should be running on a CoreOS cluster and to apply that to an existing CoreOS cluster? It is undesirable to have to run manual fleetctl commands each time the unit setup changes.
It would be similar to how Ansible makes it possible to declaratively specify what packages should be installed in an server and then apply that to an existing server.
CoreOS machines can be customized by writing a cloud configuration file.
The cloud config is executed upon reboot, so you should expect to reboot the machines in your cluster when you make any changes. However, CoreOS is designed for this kind of ad-hoc rebooting so there shouldn't be any problem.
There are a couple ways to associate cloud configuration data to a VM instance. You can have instances pull cloud configuration files from a read-only storage drive or you can attach the cloud configuration file to a VM instance directly as meta-data if the cloud provider supports this (EC2 and GCE support this style of meta-data tagging)

Resources