Pre pull image for new Kubernetes cluster autoscale-up Nodes - azure

My pods/containers run on a docker image which is about 4GiB in size. Pulling the image from the container registry takes about 2 mins whenever a new VM node is spun up when resources are insufficient.
That is to say, whenever a new request comes in and the Kubernetes service auto scale up a new node, it takes 2 mins+. User has to wait 2 mins to put through a request. Not ideal. I am currently using the Azure AKS to deploy my application and using their cluster autoscaler feature.
I am using a typical deployment set up with 1 fix master pod, and 3 fix worker pods. These 3 worker pods correspond to 3 different types of requests. Each time a request comes in, the worker pod will generate a K8 Job to process the request.
BIG Question is, how can I pre pull the images so that when a new node is spun up in the Kubernetes cluster, users don't have to wait so long for the new Job to be ready?

If you are using Azure Container Registry (ACR) for storing and pulling your images you can enable teleportation that will significantly reduce your image pull time. Refer link for more information

Related

Scheduling start of a container in Azure Container Apps

I have a short running container that does background processing (no ingress) that I have deployed to the Azure Container Apps service in Azure, my config is min replicas 0 (for when the container completes its work and exits) and max replicas 1 (i want only want one instance of my container running at any time).
I want to start my container once every hour, it generally runs for 3 mins and completes its task and closes.
Is there anyway with Azure Container Apps to schedule the start of my container? At the moment I have reverted to running my Azure DevOps pipeline on a schedule which calls the az containerapp update command, but it feels like the wrong way to go about this.
There's no scheduling concept in Container Apps. Here are some ideas:
1-Enable Ingress and create a Function or a Logic App that runs on a schedule and "ping" the Container App to start the process.
2-Create a Logic App that runs on a schedule and creates a Container Instance every hour, wait for it to complete, and delete it.

How to reschedule my pods after scaling down a node in Azure Kubernetes Service (AKS)?

I am going to start with an example. Say I have an AKS cluster with three nodes. Each of these nodes runs a set of pods, let's say 5 pods. That's 15 pods running on my cluster in total, 5 pods per node, 3 nodes.
Now let's say that my nodes are not fully utilized at all and I decide to scale down to 2 nodes instead of 3.
When I choose to do this within Azure and change my node count from 3 to 2, Azure will close down the 3rd node. However, it will also delete all pods that were running on the 3rd node. How do I make my cluster reschedule the pods from the 3rd node to the 1st or 2nd node so that I don't lose them and their contents?
The only way I feel safe to scale down on nodes right now is to do the rescheduling manually.
Assuming you are using Kubernetes deployments (or replica sets) then it should do this for you. Your deployment is configured with a set number of replicas to create for each pod when you remove a node the scheduler will see that the current active number is less than the desired number and create new ones.
If you are just deploying pods without a deployment, then this won't happen and the only solution is manually redeploying, which is why you want to use a deployment.
Bear in mind though, what you get created are new pods, you are not moving the previously running pods. Any state you had on the previous pods that is not persisted will be gone. This is how it is intended to work.

Azure Kubernetes Cluster Node Failure Scenario

Let's say I have 3 nodes in my cluster and I want to run 300 jobs.
If I run 1 job per POD and 100 pods per NODE, what will happen if a node fails in Azure Kubernetes Service?
Those Jobs will go to pending, as Kubernetes supports 110 pods per node, so wouldn't have the resources to support the failed over jobs. You could look at using the Cluster Autoscaler (Beta) and it would provision more host to satisfy running those jobs that are in a pending state.
if a node fails
Cluster Autoscaler (CA) can be used to handle node failures in Azure using autoscaling groups:
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/README.md
https://learn.microsoft.com/en-us/azure/aks/autoscaler
https://learn.microsoft.com/en-us/azure/aks/scale-cluster

Long Running Tasks in Service Fabric and Scaling Cluster In

We are using Azure Service Fabric (Stateless Service) which gets messages from the Azure Service Bus Message Queue and processes them. The tasks generally take between 5 mins and 5 hours.
When its busy we want to scale out servers, and when it gets quiet we want to scale back in again.
How do we scale in without interrupting long running tasks? Is there a way we can tell Service Fabric which server is free to scale in?
Azure Monitor Custom Metric
Integrate your SF service with
EventFlow. For instance, make it sending logs into Application Insights
While your task is being processed, send some logs in that will indicate that
it's in progress
Configure custom metric in Azure Monitor to scale in only in case on absence of the logs indicating that machine
has in-progress tasks
The trade-off here is to wait for all the events finished until the scale-in could happen.
There is a good article that explains how to Scale a Service Fabric cluster programmatically
Here is another approach which requires a bit of coding - Automate manual scaling
Develop another service either as part of SF application or as VM extension. The point here is to make the service running on all the nodes in a cluster and track the status of tasks execution.
There are well-defined steps how one could manually exclude SF node from the cluster -
Run Disable-ServiceFabricNode with intent ‘RemoveNode’ to disable the node you’re going to remove (the highest instance in that node type).
Run Get-ServiceFabricNode to make sure that the node has indeed transitioned to disabled. If not, wait until the node is disabled. You cannot hurry this step.
Follow the sample/instructions in the quick start template gallery to change the number of VMs by one in that Nodetype. The instance removed is the highest VM instance.
And so forth... Find more info here Scale a Service Fabric cluster in or out using auto-scale rules. The takeaway here is that these steps could be automated.
Implement scaling logic in a new service to monitor which nodes are finished with their tasks and stay idle to scale them in using instructions described in previous steps.
Hopefully it makes sense.
Thanks a lot to #tank104 for the help on elaborating my answer!

Marathon on Azure Container Service - cannot scale to all nodes

I have setup up a VM cluster using Azure Container Service. The container orchestrator is DC/OS. There are 3 Master nodes and 3 slave agents.
I have a Docker app that I am trying to launch on my cluster using Marathon. Each time I launch, I notice that the CPU utilization of 3 nodes is always 0 i.e. the app is never scheduled on them. The other 3 nodes, on the other hand, have almost 100% CPU utilization. (As I scale the application.) At that point, the scaling stops and Marathon shows state "waiting" for resource ads from Mesos.
I don't understand why Marathon is not scheduling more containers, despite there being empty nodes when I try to scale the application.
I know that Marathon runs on the Master nodes; is it unaware of the presence of the slave agents? (Assuming that the 3 free nodes are the slaves.)
Here is the config file of the application: pastebin-config-file
How can I make full use of the machines using Marathon?
Tasks are not scheduled to the masters. They are reserved for management of the cluster.

Resources