Kubernetes workload scaling on multi-threaded code - multithreading

Getting started with Kubernetes so have the following question:
Say a microservice has the following C# code snippet:
var tasks = _componentBuilders.Select(b =>
{
return Task.Factory.StartNew(() => b.SetReference(context, typedModel));
});
Task.WaitAll(tasks.ToArray());
On my box, I understand that each thread be executed on a vCPU. So if I have 4 cores with hyperthreading enabled I will be able to execute 8 tasks concurrently. Therefore, if I have about 50000 tasks, it will take roughly
(50,000/8) * approximate time per task
to complete this work. This ignores context switch, etc.
Now, moving to the cloud and assuming this code is in a docker container managed by Kubernetes Deployment and we have a single docker container per VM to keep this simple. How does the above code scale horizontally across the VMs in the deployment? Can not find very clear guidance on this so if anyone has any reference material, that would be helpful.

You'll typically use a Kubernetes Deployment object to deploy application code. That has a replicas: setting, which launches some number of identical disposable Pods. Each Pod has a container, and each pod will independently run the code block you quoted above.
The challenge here is distributing work across the Pods. If each Pod generates its own 50,000 work items, they'll all do the same work and things won't happen any faster. Just running your application in Kubernetes doesn't give you any prebuilt way to share thread pools or task queues between Pods.
A typical approach here is to use a job queue system; RabbitMQ is a popular open-source option. One part of the system generates the tasks and writes them into RabbitMQ. One or more workers reads jobs from the queue and runs them. You can set this up and demonstrate it to yourself without using container technology, then repackage it in Docker or Kubernetes just changing the RabbitMQ broker address at deploy time.
In this setup I'd probably have the worker run jobs serially, one at a time, with no threading. That will simplify the implementation of the worker. If you want to run more jobs in parallel, run more workers; in Kubernetes, increase the Deployment replica: count.

In Kubernetes, when we deploy containers as Pods we can include the resources.limits.cpu and resources.requests.cpu fields for each container in the Pod's manifest:
resources:
requests:
cpu: "1000m"
limits:
cpu: "2000m"
In the example above we have a request for 1 CPU and a limit for a maximum of 2 CPUs. This means the Pod will be scheduled to a worker node which can satisfy above resource requirements.
One cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers and 1 hyperthread on bare-metal Intel processors.
We can vertically scale by increasing / decreasing the values for the requests and limits fields. Or we can horizontally scale by increasing / decreasing the number of replicas of the pod.
For more details about resource units in Kubernetes here

Related

Kubernetes Jobs or Pods for completion Jobs with auto scaling

I have CPU Intensive Jobs/tasks,
Need to run them in kubernetes, below is the process of job/task
We get request in terms queue or API Call
POd should be created and process the task ( few Jobs may run in minutes, few in hours)
delete pod once task completed
This should happen in scale, if more jobs in queue, create more jobs (Max 10, 20, 30 2e should define it)
I am used KEDA, POD will be created and after Job completion it is going crashloopbback, It is default behaviour in POD life cycle, because it try to recreate pod since restart policy is set to Always. We have other options like OnFailure, Never, But I read it Kubernetes Jobs are more suitable
Which is the better option Kubernetes Pods or Jobs for above task, we should consider scaling POds and also required scale kubernetes nodes (Cloud vendors supports it) based on usage and numbers of tasks in queue.
KEDA ScaledJobs are best for such scenarios and can be triggered through Queue, Storage, etc. (the currently available scalers can be found here)

Does clustering in Node.js and auto-scaling web application using Kubernetes serve the same purpose?

Node.js has introduced the Cluster module to scale up applications for performance optimization. We have Kubernetes doing the same thing.
I'm confused if both are serving the same purpose? My assumption is clustering can spawn up to max 8 processes (if there are 4 cpu cores with 2 threads each) and there is no such limitation in Kubernetes.
Kubernetes and the Node.js Cluster module operate at different levels.
Kubernetes is in charge of orchestrating containers (amongst many other things). From its perspective, there are resources to be allocated, and deployments that require or use a specific amount of resources.
The Node.js Cluster module behaves as a load-balancer that forks N times and spreads the requests between the various processes it owns, all within the limits defined by its environment (CPU, RAM, Network, etc).
In practice, Kubernetes has the possibility to spawn additional Node.js containers (scaling horizontally). On the other hand, Node.js can only grow within its environment (scaling vertically). You can read about this here.
While from a performance perspective both approaches might be relatively similar (you can use the same number of cores in both cases); the problem with vertically scaling on a single machine is that you lose the high-availability aspect that Kubernetes provides. On the other hand, if you decide to deploy several Node.js containers on different machines, you are much more tolerant for the day one of them is going down.

Should You Use PM2, Node Cluster, or Neither in Kubernetes?

I am deploying some NodeJS code into Kubernetes. It used to be that you needed to run either PM2 or the NodeJS cluster module in order to take full advantage of multi-core hardware.
Now that we have Kubernetes, it is unclear if one must use one or the other, to get the full benefit of multiple cores.
Should a person specify the number of CPU units in their pod YAML configuration?
Or is there simply no need to account for multiple cores with NodeJS in Kubernetes?
You'll achieve utilization of multiple cores either way; the difference being that with the nodejs cluster module approach, you'd have to "request" more resources from Kubernetes (i.e., multiple cores), which might be more difficult for Kubernetes to schedule than a few different containers requesting one core (or less...) each (which it can, in turn, schedule on multiple nodes, and not necessarily look for one node with enough available cores).

Start kubernetes pod memory depending on size of data job

is there a way to scale dynamically the memory size of Pod based on size of data job (my use case)?
Currently we have Job and Pods that are defined with memory amounts, but we wouldn't know how big the data will be for a given time-slice (sometimes 1000 rows, sometimes 100,000 rows).
So it will break if the data is bigger than the memory we have allocated beforehand.
I have thought of using slices by data volume, i.e. cut by every 10,000 rows, we will know memory requirement of processing a fixed amount of rows. But we are trying to aggregate by time hence the need for time-slice.
Or any other solutions, like Spark on kubernetes?
Another way of looking at it:
How can we do an implementation of Cloud Dataflow in Kubernetes on AWS
It's a best practice always define resources in your container definition, in particular:
limits:the upper level of CPU and memory
requests: the minimal level of CPU and memory
This allows the scheduler to take a better decision and it eases the assignment of Quality of Service (QoS) for each pod (https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/) which falls into three possible classes:
Guaranteed (highest priority): when requests = limits
Burstable: when requests < limits
BestEffort (lowest priority): when requests and limits are not set.
The QoS enables a criterion for killing pods when the system is overcommited.
If you don’t know the memory requirement for your pod a priori for a given time-slice, then it is difficult for Kubernete Cluster Autoscaler to automatically scale node pool for you as per this documentation [1]. Therefore for both of your suggestions like running either Cloud Dataflow or Spark on Kubernete with Kubernete Cluster Autoscaler, may not work for your case.
However, you can use custom scaling as a workaround. For example, you can export memory related metrics of the pod to Stackdriver, then deploy HorizontalPodAutoscaler (HPA) resource to scale your application as [2].
[1] https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler#how_cluster_autoscaler_works
[2] https://cloud.google.com/kubernetes-engine/docs/tutorials/custom-metrics-autoscaling
I have found the partial solution to this.
Note there are 2 parts to this problem.
1. Make the Pod request the correct amount of memory depending on size of data job
2. Ensure that this Pod can find a Node to run on.
The Kubernetes Cluster Autoscaler (CA) can solve part 2.
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
According to the readme:
Cluster Autoscaler is a tool that automatically adjusts the size of the Kubernetes cluster when there are pods that failed to run in the cluster due to insufficient resources.
Thus if there is a data job that needs more memory than available in the currently running nodes, it will start a new node by increasing the size of a node group.
Details:
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md
I am still unsure how to do point 1.
An alternative to point 1, start the container without specific memory request or limit:
https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/#if-you-don-t-specify-a-memory-limit
If you don’t specify a memory limit for a Container, then one of these
situations applies:
The Container has no upper bound on the amount of memory it uses.
or
The Container could use all of the memory available on the Node where it is running.

How to scale Azure VM Cores

I have a Python code that I need to run on 1000 CSVs in parallel computing to do calculations. One CPU core can finish running the code over each CSV in 8 hours.
Thus I am looking for a way to use Azure for this. I would like to create several virtual machines, say 4x D5v2 with 16 cores each to access a Windows Server that runs on a 64 Cores machine.
I tried to create these VMs in the same Cloud Service and I put them into the same Availability Set, which worked fine. When all VMs are running and I access any one of those VMs, I see that the cores on all other VMs are allocated to "Other Roles".
My questions are:
1) Is it possible to create a hypothetical VM out of 4 VMs to use more cores?
2) How can I manually allocate all cores in the Cloud Service to one specific VM?
Your best solution would be to use Azure Batch With Batch you create a job, and it will run on as many CPU's as you specify it can run on.
Taken from the Batch front page
When you are ready to run a job, Batch starts a pool of compute virtual machines for you, installing applications and staging data, running jobs with as many tasks as you have, identifying failures and re-queuing work and scaling down the pool as work completes. You have control over scale to meet deadlines, manage costs, and run at the right scale for your application.
1) Is it possible to create a hypothetical VM out of 4 VMs to use more cores?
No you can not.
2) How can I manually allocate all cores in the Cloud Service to one specific VM?
You can not do this. You need to use a cloud native solution to scale your process over multiple resources.

Resources