I am doing performance testing on an app. I found when the number of virtual users increases, the response time increases linearly(should be natural, right?), but the CPU utilization stops increasing when reaches around 60%. Does it mean the CPU is the bottleneck? If not, what could be the bottleneck?
The bottleneck might or might not be CPU, you need to consider monitoring other OS metrics as well, to wit:
Physical RAM
Swap usage
Network IO
Disk IO
Each of them could be the bottleneck.
Also when you increase number of users ideal system should increase the number of TPS (transactions per second) by the same factor. When you increase virtual users and TPS is not getting increased the situation is called saturation point and you need to find out what is slowing your system down.
If resources utilization is far from 95-100% and your system provides large response times the reason can be non-optimal code of your application or slow database query or something like that, in this case you will need to use profiling tools to get to the bottom of the issue.
See How to Monitor Your Server Health & Performance During a JMeter Load Test article for more information on the application under test monitoring concept
Related
I currently have an Azure App Service API that usually runs extremely low average and max CPU (<10% utilization). Every now and again, the CPU will spike due to a temporary spike in client requests. These spikes seemingly only last for a split second, but I’m wondering if this is cause for concern. What is the result of an Azure App Service CPU maxing out temporarily (either for a split second or for several seconds). Will this cause the app to crash, or will it just buffer requests until intensive tasks complete? It is worth noting that despite the spike in CPU, memory utilization remains low. Thanks in advance for input.
It looks like the CPU load is caused entirely from a large number of intensive requests all coming in at the same time.
When the CPU utilization of your Azure App Service API spikes temporarily, it could cause the app to slow down or become unresponsive, depending on the level and duration of the spike.
The system will try to buffer requests during this time, but if the spike is too high, some requests may be dropped or time out. This can result in a poor user experience or errors, especially if the spike is sustained for an extended period of time.
To mitigate this issue, you can take a few steps.
You can optimize the code of your API to reduce the CPU utilization of each request. This could involve reducing the number of operations performed for each request, using caching or other performance-enhancing techniques, or optimizing the algorithm used for processing requests.
You can also consider scaling up the resources of your App Service, such as increasing the number of CPU cores or adding more memory, to handle the increased load during spikes. Another option is to use horizontal scaling by adding more instances of your API to distribute the load across multiple servers, which can help reduce the impact of spikes.
And you can monitor your App Service to detect and respond to spikes in CPU utilization in real-time.
For example, you can use Azure Monitor to set up alerts that trigger when CPU utilization exceeds a certain threshold, and automatically scale your App Service in response.
By checking the Azure monitor logs and Web App Diagnostics, can find the reasons behind CPU Utilization.
Diagnose and solve problems:
CPU Usage:
CPU Drill Down:
References taken from
Application monitoring for Azure App Service
CPU Diagnostics, Identify and Diagnose High CPU issues
We are running a web API hosted in IIS 10 on an 8 core machine with 16 GB Memory and running Windows 10, and throwing a load of say 100 to 200 requests per second through JMeter on the server.
Individual transactions are taking less than 500 milliseconds. When we throw the load initially, IIS threads grow up to around 150-160 mark (monitored through resource monitor and Performance monitor) and throughput increases up to 22-24 transactions per second but throughput and number of threads stop to grow beyond this point even though the CPU usage is less than 40 per cent and we have enough physical memory also available at the peak, the resource monitor does not show any choking at the network or IO level.
The web API is making calls to the Oracle database (3-4 select calls and 2-3 inserts/updates).
We fail to understand what is stopping IIS to further grow its thread pool to process more requests in parallel while all the resources including processing power, memory, network etc are available.
We have placed many performance counters as well, there is no queue build-up (that's probably because jmeter works in synchronous mode)
Also, we have tried to set the min and max threads settings through machine.config as well as ThreadPool.SetMin and Max threads APIs but no difference was observed and seems like those setting are not taking any effect.
Important to mention that we are using synchronous calls/operations (no asnch and await). Someone has advised to convert all our blocking IO calls e.g. database calls to asynchronous mode to achieve more throughput but my understanding is that if threads cant be grown beyond this level then making async calls might not help or may indeed negatively impact the throughput. Since our code size is huge, that would be a very costly activity in terms of time and effort and we dont want to invest in it till we are sure that it would really help. If someone has anything to share on these two problems, pls do share.
Below is a screenshot of the permanence monitor.
So, i have a nodejs process which is, when scaled consumes around 5-8gb of RAM. It is running within the docker container. launching with the arg --max-old-space-size=12192 to increase the node process limit.
The memory consumption is OK, since i try to use a dedicated server (AMD EPYC CPU, 64GB memory) in place of the horizontal scaling with AWS or other cloud provider, because it is 10x cheaper in case if i can make it work on dedicated server (most of expenses for AWS/Google Cloud goes for network traffic, while the VDS have unlimited. The network side is already optimised with the use of GraphQL and minimising the amount of requests). The process itself processes huge amount of data in memory, in multithreaded fashion. There is no further significant optimisation from the side of the process code itself.
When the process memory consumption reaches 3Gb+, it is significantly slowing down. Docker is not limiting the container resources. The server itself is running on 5-10% load in terms of memory and CPU. SSD driver -> low drive load (low amount of I/O on the server side).
I guess re-writing the app to golang for example might improve it significantly, however that is really a lot of work.
Anything can be done on the server setup / nodejs app side to prevent slowing down?
Thanks!
Solved by horizontal scaling within a single VDS with docker.
As jfriend00 noticed, the nature of the problem might be with the garbage collection - personally have no other guess.
I am doing load testing for one hour and observed one transaction is taking high response time compared to expected value. Why is this happening? What could be the reasons even if GC, thread and system resources (CPU and Memory) utilization are normal.
How to analyze it?
Numerous. The most obvious ones would be:
Slow database query - use a DB Monitoring Tool to see what happens on database level
An issue with your application code (memory leak, large object, "heavy" function) - re-run your test with Profiler Tool telemetry and collect all the information on JVM heap, threads, objects, etc. as you can. A thread dump can shed light on where your application is stuck
It can be even a networking issue, response time includes metrics like Connect Time and Latency (time to first byte) so you can receive higher response times due to low network capacity or even a faulty router
Hey guys I am having a big problem and i need some advice .
I have a Dedicated server with those informations :
Atom C2750 8/8t 2,4 / 2,6 GHz
16 GB RAM DDR3 1600MHz
12TB
500Mbps Bandwidth
List item
1Gbps Network Burst
I am running a website using Nodejs where users can download high volume files .
The website evolved rapidly and I am having 10K users per day and an average of 1K concurrent users (downloads).
The problem is the server is getting lower and lower download speed on client's side so I have added a throttle to the downloads to 800Kb/s , it did help a bit but the problem remains the same what should I ?
Thanks
You have to figure out where your worst bottleneck is. You will have to design some measurements and tests and maybe some calculations to help determine where the bottleneck is.
Here are some possibilities:
Total server bandwidth over your server ethernet connection to the Internet. If you have 1K users all trying to download something and you have 500Mbps total bandwidth, then you're only going to get 0.5Mbps or 500Kbps per user. At that point, you need to either shrink the data, reduce the number of users or increase your server bandwidth.
Server CPU. This should be easy to detect. Check your CPU utilization. If your node.js process is using 100% of a CPU, then you are CPU bound and you either need a faster computer or if you have a multi-CPU processor, you can cluster your server on the same host to get more CPUs working for you. If you don't have a multi-CPU process, then get one or cluster across multiple servers (the idea is you need more CPUs). Though I have no idea if you are CPU bound (I suspect you're more likely bandwidth bound), this Atom C2750 has 8 cores so it would be good for clustering, but each core is not particularly fast compared to other Intel CPUs.
Network card. It's possible that your network card could be holding you back and not fully saturating your bandwidth. For example, if you only had a 100Mbps network connection to your server, then that's the max bandwidth you can use. If you think you should have a 1Gbps network connection to your server, then you need to make sure you actually are getting that fast a link.
FYI, the 1Gbps Network Burst probably doesn't help you much if you have lots of users downloading stuff over a longer period of time. That is most useful for a sudden and short peak of activity, not for a continuous high load.