Upgrade virtual-node-aci-linux in Azure Kubernetes Cluster - azure

Does anyone have any links or know how to upgrade the virtual-node-aci-linux on Azure?
I am currently on version v1.19.10-vk-azure-aci-v1.4.1 however my other node pools are now on v.1.22.11 after upgrading Kubernetes.
I am getting some odd behaviour since the upgrade, It seems I now have to specify a single instance in my VMMS for the virtual-node-aci-linux node to be ready. I don't remember having to do this before.
NAME STATUS ROLES AGE VERSION
aks-control-13294507-vmss000006 Ready agent 86s v1.22.11
virtual-node-aci-linux Ready agent 164d v1.19.10-vk-azure-aci-v1.4.1
Also previously I am sure that only my virtual-node-aci-linux was visible in the node list.
Any help would be appreciated.

Related

How to patch GKE Managed Instance Groups (Node Pools) for package security updates?

I have a GKE cluster running multiple nodes across two zones. My goal is to have a job scheduled to run once a week to run sudo apt-get upgrade to update the system packages. Doing some research I found that GCP provides a tool called "OS patch management" that does exactly that. I tried to use it but the Patch Job execution raised an error informing
Failure reason: Instance is part of a Managed Instance Group.
I also noticed that during the creation of the GKE Node pool, there is an option for enabling "Auto upgrade". But according to its description, it will only upgrade the version of the Kubernetes.
According to the Blog Exploring container security: the shared responsibility model in GKE:
For GKE, at a high level, we are responsible for protecting:
The nodes’ operating system, such as Container-Optimized OS (COS) or Ubuntu. GKE promptly makes any patches to these images available. If you have auto-upgrade enabled, these are automatically deployed. This is the base layer of your container—it’s not the same as the operating system running in your containers.
Conversely, you are responsible for protecting:
The nodes that run your workloads. You are responsible for any extra software installed on the nodes, or configuration changes made to the default. You are also responsible for keeping your nodes updated. We provide hardened VM images and configurations by default, manage the containers that are necessary to run GKE, and provide patches for your OS—you’re just responsible for upgrading. If you use node auto-upgrade, it moves the responsibility of upgrading these nodes back to us.
The node auto-upgrade feature DOES patch the OS of your nodes, it does not just upgrade the Kubernetes version.
OS Patch Management only works for GCE VM's. Not for GKE
You should refrain from doing OS level upgrades in GKE, that could cause some unexpected behavior (maybe a package get's upgraded and changes something that will mess up the GKE configuration).
You should let GKE auto-upgrade the OS and Kubernetes. Auto-upgrade will upgrade the OS as GKE releases are inter-twined with the OS release.
One easy way to go is to signup your clusters to release channels, this way they get upgraded as often as you want (depending on the channel) and your OS will be patched regularly.
Also you can follow the GKE hardening guide which provide you with step to make sure your GKE clusters are as secured as possible

Azure kubernetes service node pool upgrades & patches

I have some confusion on AKS Node pool upgrades and Patching. Could you please clarify on this.
I have one AKS node pool, which is having 4 nodes, so now I want to upgrade the kubernetes version only in two nodes of node pool. Is it possible?
if it is possible to upgrade in two nodes, then how we can upgrade remaining two nodes? and how we can find out which two nodes are having old kubernetes version instead of latest kubernetes version
While doing the Upgrade process, will it create two new nodes with latest kubernetes version, and then will it delete old nodes in node pool?
Actually azure automatically applies patches on nodes, but will it creates new nodes with new patches and deleted old nodes?
1. According to the docs:
you can upgrade specific node pool.
So the approach with additional node-pool mentioned by 4c74356b41.
Additional info:
Node upgrades
There is an additional process in AKS that lets you upgrade a cluster. An upgrade is typically to move to a newer version of Kubernetes, not just apply node security updates.
An AKS upgrade performs the following actions:
A new node is deployed with the latest security updates and Kubernetes version applied.
An old node is cordoned and drained.
Pods are scheduled on the new node.
The old node is deleted.
2. By default, AKS uses one additional node to configure upgrades.
You can control this process by increase --max-surge parameter
To speed up the node image upgrade process, you can upgrade your node images using a customizable node surge value.
3. Security and kernel updates to Linux nodes:
In an AKS cluster, your Kubernetes nodes run as Azure virtual machines (VMs). These Linux-based VMs use an Ubuntu image, with the OS configured to automatically check for updates every night. If security or kernel updates are available, they are automatically downloaded and installed.
Some security updates, such as kernel updates, require a node reboot to finalize the process. A Linux node that requires a reboot creates a file named /var/run/reboot-required. This reboot process doesn't happen automatically.
This tutorial summarize the process of Cluster Maintenance and Other Tasks
no, create another pool with 2 nodes and test your application there. or create another cluster. you can find node version with kubectl get nodes
it gradually updates nodes one by one (default). you can change these. spot instances cannot be upgraded.
yes, latest patch version image will be used

Need to upgrade AKS version from 1.14.8 to 1.15.10. Not sure if the Nodes will reboot with this or not

Need to upgrade AKS version from 1.14.8 to 1.15.10. Not sure if the Nodes will reboot with this or not.
Could anyone pls clear my doubt on this
If you are using higher level controllers such as deployment and running multiple replicas of the pod then you are not going to have a downtime in your application because kubernetes will guarantee that replicas of pod get distributed between different kubernetes nodes and when a particular node is cordoned/drained for upgrade or maintenance you still have other replica of the pod running in other nodes.
If you use pod directly then you are going to have downtime in your application while upgrade is happening.
Reading documetation we can find:
During the upgrade process, AKS adds a new node to the cluster that runs the specified Kubernetes version, then carefully cordon and drains one of the old nodes to minimize disruption to running applications. When the new node is confirmed as running application pods, the old node is deleted.
They will not be rebooted, only replaced with new ones.
When we try to upgrade by default AKS will to upgrade nodes by increasing the existing node capacity. So one extra node will be spinup with kubernetes version you are planning to upgrade.
Then using rolling strategy it will try to upgrade the nodes one by one.
It will move all the pods to new extra node and deletes the old node. This cycle continues until all nodes are updated with latest version.
If we have replicaset or deployment then there should be no downtime ideally.
We can also use the concept of podAntiAffinity so that no 2 pods will be in same node, and there will be no downtime

Unable to add node via OpsCenter

When trying to add a node via OpsCenter 5.0.1 I get the following
The Ec2Snitch is being used by this cluster. Provisioning nodes using this endpoint_snitch is not supported at this time.
Which seems contrary to the instructions given here.
i had the same problem, just add the new node using dsedelegate snitch, after the provisioning it's done, change the snitch to ec2snitch, and restart the node and thats it

Cassandra: where to modify opscenter agent for a newly added node to existing cluster

I have a single node Cassandra cluster on EC2 (launched from a Datastax AMI) and I manually added a new node which is also backed by the same Datastax AMI after deleting data directory and modifying cassandra.yaml. I can see two nodes in the Nodes section of Opscenter but I see Opscenter agent is not installed in the new node (1 of 2 agents are connected). It looks like in the new node it has its own opscenter installation and that somehow conflicts with the opscenter installation in the first node? I guess I have to fix some configuration file of opscenter agent in the new node so that it can point to the opscenter installation of the first node? But I can't find where to modify.
Thanks!
It is stomp_interface section of /var/lib/datastax-agent/conf/address.yaml
I had to manually put stomp_interface into the configuration file. Also, I noticed that the process was looking for /etc/datastax-agent/address.yaml and never looked for /var/lib/datastax-agent/conf/address.yaml
Also, local_interface was not necessary to get things to work for me. YMMV.
I'm not sure where this gets set, or if this changed between agent versions at some point in time. FWIW, I installed both opscenter and the agents via packages.

Resources