We have Django application that run with gunicorn. We ware facing site downtime because of high trafic. Gunicorn is utilizng 96 % of the cpu that what cauing the issue.
Our system specification:
8 GB ram, 4 cpu
How to setup gunicron in such way that it can handle more than 100 request per second ?
What are the system specifcation required for handling 100 request per second ?
Is gunicorn workers are cpu count ?
Keep one worker only and increase number of threads in that worker. or use something like gevent in gunicorn
I have a spark cluster managed by YARN with 200GB RAM and 72 vCPUs.
And I have a number of small pyspark applications that perform Spark Streaming tasks. These applications are long-running, with each micro-batch running between 1 - 30min.
However, I can only run 7 applications stably. When I tried to run the 8th application, all jobs would restart frequently.
When 8 jobs are running, resource consumption is only 120 GB and about 30 CPUs.
May I understand why jobs could be instable although there are huge memories (80GB) left?
BTW, there are about 16GB RAM configured for OS.
I wish to run the Spring batch application in Azure Kubernetes.
At present, my on-premise VM has the below configuration
CPU Speed: 2,593
CPU Cores: 4
My application uses multithreading(~15 threads)
how do I define the CPU in AKS.
resources:
limits:
cpu: "4"
requests:
cpu: "0.5"
args:
- -cpus
- "4"
Reference: Kubernetes CPU multithreading
AKS Node Pool:
First of all, please note that Kubernetes CPU is an absolute unit:
Limits and requests for CPU resources are measured in cpu units. One
cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers
and 1 hyperthread on bare-metal Intel processors.
CPU is always requested as an absolute quantity, never as a relative
quantity; 0.1 is the same amount of CPU on a single-core, dual-core,
or 48-core machine
In other words, a CPU value of 1 corresponds to using a single core continiously over time.
The value of resources.requests.cpu is used during scheduling and ensures that the sum of all requests on a single node is less than the node capacity.
When you create a Pod, the Kubernetes scheduler selects a node for the
Pod to run on. Each node has a maximum capacity for each of the
resource types: the amount of CPU and memory it can provide for Pods.
The scheduler ensures that, for each resource type, the sum of the
resource requests of the scheduled Containers is less than the
capacity of the node. Note that although actual memory or CPU resource
usage on nodes is very low, the scheduler still refuses to place a Pod
on a node if the capacity check fails. This protects against a
resource shortage on a node when resource usage later increases, for
example, during a daily peak in request rate.
The value of resources.limits.cpu is used to determine how much CPU can be used given that it is available, see How pods with limist are run
The spec.containers[].resources.limits.cpu is converted to its
millicore value and multiplied by 100. The resulting value is the
total amount of CPU time in microseconds that a container can use
every 100ms. A container cannot use more than its share of CPU time
during this interval.
In other words, the requests is what the container is guaranteed in terms of CPU time, and the limit is what it can use given that it is not used by someone else.
The concept of multithreading does not change the above, the requests and limits apply to the container as a whole, regardless of how many threads run inside. The Linux scheduler do scheduling decisions based on waiting time, and with containers Cgroups is used to limit the CPU bandwidth. Please see this answer for a detailed walkthrough: https://stackoverflow.com/a/61856689/7146596
To finally answer the question
Your on premises VM has 4 cores, operating on 2,5 GHz, and if we assume that the CPU capacity is a function of clock speed and number of cores, you currently have 10 GHz "available"
The CPU's used in standard_D16ds_v4 has a base speed of 2.5GHz and can run up to 3.4GHz or shorter periods according to the documentation
The D v4 and Dd v4 virtual machines are based on a custom Intel® Xeon®
Platinum 8272CL processor, which runs at a base speed of 2.5Ghz and
can achieve up to 3.4Ghz all core turbo frequency.
Based on this specifying 4 cores should be enough ti give you the same capacity as onpremises.
However number of cores and clock speed is not everything (caches etc also impacts performance), so to optimize the CPU requests and limits you may have to do some testing and fine tuning.
I'm afraid there is no easy answer to your question, while planning the right size of VM Node Pools for Kubernetes cluster to fit appropriately your workload requirements for resource consumption . This is a constant effort for cluster operators, and requires you to take into account many factors, let's mention few of them:
What Quality of Service (QoS) class (Guaranteed, Burstable, BestEffort) should I specify for my Pod Application, and how many of them I plan to run ?
Do I really know the actual usage of CPU/Memory resources by my app VS. how much of VM compute resources stay idle ? (any on-prem monitoring solution in place right now, that could prove it, or be easily moved to Kubernetes in-cluster one ?)
Do my cluster is multi-tenant environment, where I need to share cluster resources with different teams ?
Node (VM) capacity is not the same as total available resources to workloads
You should think here in terms of cluster Allocatable resources:
Allocatable = Node Capacity - kube-reserved - system-reserved
In case of Standard_D16ds_v4 VM size in AZ, you would have for workloads disposal: 14 CPU Cores not 16 as assumed earlier.
I hope you are aware, that scpecified through args number of CPUs:
args:
- -cpus
- "2"
is app specific approach (in this case the 'stress' utility written in go), not general way to spawn a declared number of threads per CPU.
My suggestion:
To avoid over-provisioning or under-provisioning of cluster resources to your workload application (requested resource VS. actually utilized resources), and to optimize costs and performance of your applications, I would in your place do a preliminary sizing estimation on your own of VM Node Pool size and type required by your SpringBoot multithreaded app, and thus familiarize first with concepts like bin-packing and app right-sizing. For these two last topics I don't know a better public guide than recently published by GCP tech team:
"Monitoring gke-clusters for cost optimization using cloud monitoring"
I would encourage you to find an answer to your question by your self. Do the proof of concept on GKE first (with free trial), replace in quide above the demo app with your own workload, come back here, and share your own observation, would be valuable for others too with similar task !
We are running a Koa web app in 5 Fargate containers. They are pretty straightforward crud/REST API's with Koa over Mongo Atlas. We started doing capacity testing, and noticed that the node servers started to slow down significantly with plenty of headroom left on CPU (sitting at 30%), Memory (sitting at or below 20%), and Mongo (still returning in < 10ms).
To further test this, we removed the Mongo operations and just hammered our health-check endpoints. We did see a lot of throughput, but significant degradation occurred at 25% CPU and Node actually crashed at 40% CPU.
Our fargate tasks (containers) are CPU:2048 (2 "virtual CPUs") and Memory 4096 (4 gigs).
We raised our ulimit nofile to 64000 and also set the max-old-space-size to 3.5 GB. This didn't result in a significant difference.
We also don't see significant latency in our load balancer.
My expectation is that CPU or memory would climb much higher before the system began experiencing issues.
Any ideas where a bottleneck might exist?
The main issue here was that we were running containers with 2 CPUs. Since Node only effectively uses 1 CPU, there was always a certain amount of CPU allocation that was never used. The ancillary overhead never got the container to 100%. So node would be overwhelmed on its 1 cpu while the other was basically idle. This resulted in our autoscaling alarms never getting triggered.
So adjusted to 1 cpu containers with more horizontal scale out (ie more instances).
We have a solr cloud setup of 4 shards (one shard per physical machine) having ~100 million documents. Zookeeper is on one of those 4 machines. We encounter complex queries having wild cards and proximity searches together and it sometimes takes more than 15 secs to get top 100 documents. Query traffic is very very low at the moment (2-3 queries every minute). 4 Servers hosting cloud have following specs:
(2 servers -> 64 GB RAM, 24 CPU cores, 2.4 GHz) + (2 servers -> 48 GB RAM, 24 CPU cores, 2.4GHz).
We are providing 8 GB JVM memory per shard. Our 510GB index on SSDs per machine (totalling to 4*510 GB = 2.4TB) is mapped into OS disk cache on remaining RAM on each server. So I suppose RAM is not an issue for us.
Now Interesting thing to note is: When a query is fired to the cloud, only one CPU core is utilized to 100% and rest are all at 0%. Same behaviour is replicated on all the machines. No other processes are running on these machines.
Shouldn't solr be doing multi-threading of some-kind to utilize the CPU cores? Can I anyhow increase CPU consumption for each queries as traffic is not a problem. If so, how?
A single request to a Solr shard is largely processed single-threaded (you can set threads for faceting on multiple fields). Rule of thumb is to keep document count for shards to no more than a very few hundreds of millions. You are well below that with 25M/shard, but as you say, your queries are complex. What you see is simple the effect of single-threaded processing.
The solution to your problem is to use more shards, as all shards are queried in parallel. As you have a lot of free CPU cores and very little traffic, you might want to try running 10 shards on each machine. It is not a problem for SolrCloud to use 40 shards in total and the increased merging overhead should be insignificant compared to your heavy queries.