We have Django application that run with gunicorn. We ware facing site downtime because of high trafic. Gunicorn is utilizng 96 % of the cpu that what cauing the issue.
Our system specification:
8 GB ram, 4 cpu
How to setup gunicron in such way that it can handle more than 100 request per second ?
What are the system specifcation required for handling 100 request per second ?
Is gunicorn workers are cpu count ?
Keep one worker only and increase number of threads in that worker. or use something like gevent in gunicorn
Related
I am testing node-webrtc project on 16 core cpu and 32 GB RAM.
I started process with pm2 and after some time node process stop responding.
Url returns not reachable, video streaming stopped.
What i noticed:
1) Every time it stopped at memory consumption 3.5 GB , CPU 900% but i tried to increase old memory size to 24 GB then it failed randomly after reaching 9 GB Memory and 1100 cpu..
2) In pm2 logs i found
"(node:3397) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 newBroadcast listeners added. Use emitter.setMaxListeners() to increase limit" but process keep running after this warning..
A) not sure this is memory leakage issue?
B) cpu consumption (900% out of 1600%) as i know node is single thread process so is there any chance thread assign to main node process reached to their peak point?
Please any suggestion how i can debug it..
concurrent users that time are around 110-120
Issue was server outbound bandwidth.
Server has maximum uplink speed 128 MB/s (~1 Gbps) and stream consuming maximum allowed bandwidth and after that connection to server goes unreachable...
It fixed by swtcihing our server to 500 MB/s bandwidth..
We are running a Koa web app in 5 Fargate containers. They are pretty straightforward crud/REST API's with Koa over Mongo Atlas. We started doing capacity testing, and noticed that the node servers started to slow down significantly with plenty of headroom left on CPU (sitting at 30%), Memory (sitting at or below 20%), and Mongo (still returning in < 10ms).
To further test this, we removed the Mongo operations and just hammered our health-check endpoints. We did see a lot of throughput, but significant degradation occurred at 25% CPU and Node actually crashed at 40% CPU.
Our fargate tasks (containers) are CPU:2048 (2 "virtual CPUs") and Memory 4096 (4 gigs).
We raised our ulimit nofile to 64000 and also set the max-old-space-size to 3.5 GB. This didn't result in a significant difference.
We also don't see significant latency in our load balancer.
My expectation is that CPU or memory would climb much higher before the system began experiencing issues.
Any ideas where a bottleneck might exist?
The main issue here was that we were running containers with 2 CPUs. Since Node only effectively uses 1 CPU, there was always a certain amount of CPU allocation that was never used. The ancillary overhead never got the container to 100%. So node would be overwhelmed on its 1 cpu while the other was basically idle. This resulted in our autoscaling alarms never getting triggered.
So adjusted to 1 cpu containers with more horizontal scale out (ie more instances).
I am using 3 containers as microservices and RabbitMQ for communication b/w them. I am using azure VMs for running the container. The machine has 64 GB ram and 32 cores. The architecture is given below
Container C1: A python web server just to receive request and small db operation
Container C2: Processing but not intensive
Container C3: Processing High Intensive tasks
When i try to process one web request it takes around 20 mins including all container (C3 takes around 90% of that). Then I tested with high cores machine (eg., 64 core, 128 GB RAM and 72 cores and 144 GB RAM) I was shocked to see that time also increases but I guess it should be less because there are large cpu cores.
Using htop I can see that C3 container uses all cores 100%. When i tried with multiple web requests (=10) time took to complete processing for one request is around 40-50 mins becuase they all are handled in parallel in C2 & C3. I dont see the use of large number of cpu cores in docker containers.
Is there is any thing wrong with the architecture or should i start new containers for ever web request (C2 & C3).
I have 4 ec2 instances running on AWS. PM2 is running in cluster mode on all instances. When I get 5K+ Concurrent request, response time of app increases significantly.
All requests fetch redis key, and a normal fetch takes upto 10 seconds which without so many concurrent requests takes only 50ms. What can be issue here?
We need to pinpoint the bottleneck. Let's do some diagnostics:
Are the EC2 instances multicore to take advantage of PM2's clustering?
When you execute pm2 start app.js -i X are you sure X=number_of_vCPUs of EC2 instance?
When you execute pm2 monit do you see all instances of the cluster sharing the equal CPU and memory usage?
When you run htop what is your total CPU and memory usage %?
When you execute iftop what is your total of your RX and TX traffic compared to the maximum available in your machine?
The celery docs don't have too much information about the optimal number of celery workers to use per machine. I believe by default celery creates one worker per machine core.
I know from experimentation that on a single core machine, starting more celery workers is definitely beneficial (default is 1 worker because of 1 core). I'm looking for the threshold where adding more workers has marginally diminishing returns -- the optimal worker number per core. I am currently using a celery daemon with the daemon config file celeryd being having this line:
CELERYD_NODES="worker1 worker2 worker3"
My intention is to create 3 workers PER CORE (so if I started a 4 core machine, there would be 12 workers total). Am I doing this correctly, or will this only start 3 workers regardless of number of cores?