What to expect if Azure App Service CPU maxes out momentarily? - azure

I currently have an Azure App Service API that usually runs extremely low average and max CPU (<10% utilization). Every now and again, the CPU will spike due to a temporary spike in client requests. These spikes seemingly only last for a split second, but I’m wondering if this is cause for concern. What is the result of an Azure App Service CPU maxing out temporarily (either for a split second or for several seconds). Will this cause the app to crash, or will it just buffer requests until intensive tasks complete? It is worth noting that despite the spike in CPU, memory utilization remains low. Thanks in advance for input.
It looks like the CPU load is caused entirely from a large number of intensive requests all coming in at the same time.

When the CPU utilization of your Azure App Service API spikes temporarily, it could cause the app to slow down or become unresponsive, depending on the level and duration of the spike.
The system will try to buffer requests during this time, but if the spike is too high, some requests may be dropped or time out. This can result in a poor user experience or errors, especially if the spike is sustained for an extended period of time.
To mitigate this issue, you can take a few steps.
You can optimize the code of your API to reduce the CPU utilization of each request. This could involve reducing the number of operations performed for each request, using caching or other performance-enhancing techniques, or optimizing the algorithm used for processing requests.
You can also consider scaling up the resources of your App Service, such as increasing the number of CPU cores or adding more memory, to handle the increased load during spikes. Another option is to use horizontal scaling by adding more instances of your API to distribute the load across multiple servers, which can help reduce the impact of spikes.
And you can monitor your App Service to detect and respond to spikes in CPU utilization in real-time.
For example, you can use Azure Monitor to set up alerts that trigger when CPU utilization exceeds a certain threshold, and automatically scale your App Service in response.
By checking the Azure monitor logs and Web App Diagnostics, can find the reasons behind CPU Utilization.
Diagnose and solve problems:
CPU Usage:
CPU Drill Down:
References taken from
Application monitoring for Azure App Service
CPU Diagnostics, Identify and Diagnose High CPU issues

Related

IIS - Worker threads not increasing beyond certain number even though the CPU usage is less than 40 percent

We are running a web API hosted in IIS 10 on an 8 core machine with 16 GB Memory and running Windows 10, and throwing a load of say 100 to 200 requests per second through JMeter on the server.
Individual transactions are taking less than 500 milliseconds. When we throw the load initially, IIS threads grow up to around 150-160 mark (monitored through resource monitor and Performance monitor) and throughput increases up to 22-24 transactions per second but throughput and number of threads stop to grow beyond this point even though the CPU usage is less than 40 per cent and we have enough physical memory also available at the peak, the resource monitor does not show any choking at the network or IO level.
The web API is making calls to the Oracle database (3-4 select calls and 2-3 inserts/updates).
We fail to understand what is stopping IIS to further grow its thread pool to process more requests in parallel while all the resources including processing power, memory, network etc are available.
We have placed many performance counters as well, there is no queue build-up (that's probably because jmeter works in synchronous mode)
Also, we have tried to set the min and max threads settings through machine.config as well as ThreadPool.SetMin and Max threads APIs but no difference was observed and seems like those setting are not taking any effect.
Important to mention that we are using synchronous calls/operations (no asnch and await). Someone has advised to convert all our blocking IO calls e.g. database calls to asynchronous mode to achieve more throughput but my understanding is that if threads cant be grown beyond this level then making async calls might not help or may indeed negatively impact the throughput. Since our code size is huge, that would be a very costly activity in terms of time and effort and we dont want to invest in it till we are sure that it would really help. If someone has anything to share on these two problems, pls do share.
Below is a screenshot of the permanence monitor.

100% Memory usage on Azure App Service Plan with two Apps - working set used 10gb+

I've got an app service plan with 14gb of memory - it should be plenty for my application's needs. There are two application services running on it, each identical - the private memory consumption of these hovers around 1gb but can spike to 4gb during periods of high usage. One app has a heavier usage pattern than the other.
Lately, during periods of high usage, I've noticed that the heavily used service can become unresponsive, and memory usage stays at 100% in the App Service Plan.
The high traffic service is using 4gb of private memory and starting to massively slow down. When I head over to the /scm.../ProcessExplorer/ page, I can see that the low traffic service has 1gb private memory used and 10gb of 'Working Set'.
As I understand it, on a single machine at least, the working set should be freed up when that memory is needed on another process. Does this happen naturally when two App Services share a single Plan?
It looks to me like the working set on the low-traffic instance is not being freed up to supply the needs of the high-traffic App Service.
If this is indeed the case, the simple fix is to move them to separate App Service Plans, each with 7gb of memory. However this seems like it might potentially be just shifting the problem around - has anyone else noticed similar issues with multiple Apps on a single App Service Plan? As far as I understand it, these shouldn't interfere with one another to the extent that they all need to be separated. Or have I got the wrong diagnosis?
In some high memory-consumption scenarios, your app might truly require more computing resources. In that case, consider scaling to a higher service tier so the application gets all the resources it needs. Other times, a bug in the code might cause a memory leak. A coding practice also might increase memory consumption. Getting insight into what's triggering high memory consumption is a two-part process. First, create a process dump, and then analyze the process dump. Crash Diagnoser from the Azure Site Extension Gallery can efficiently perform both these steps. For more information.
refer Capture and analyze a dump file for intermittent high memory for Web Apps.
In the end we solved this one via mitigation, rather than getting to the root cause.
We found a mitigation strategy to our previous memory issues several months ago, which was just to restart the server each night using a powershell script. This seems to prevent the memory just building up over time, and only costs us a few seconds of downtime. Our system doesn't have much overnight traffic as our users are all based in the same geographic location.
However we recently found that the overnight restart was reporting 'success' but actually failing each night due to expired credentials. Which meant that the memory issues we were having in the question I posted were actually exacerbated by server uptimes of several weeks. Restoring the overnight restart resolved the memory issues we were seeing, and we certainly don't see our system ever using 10gb+ again.
We'll investigate the memory issues if they rear their heads again. KetanChawda-MSFT's suggestion of using memory dumps to analyse the memory usage will be employed for this investigation when it's needed.

What would cause high KUDU usage (and eventual 502 errors) on an Azure App Service Plan?

We have a number of API apps and WebApps on an Azure App Service P2v2 instance. We've been experiencing an amount of platform instability: the App Service becomes unhealthy and we get a rash of 502 errors across various of the Apps (different ones each time), attributable to very high CPU and Memory usage on the app service. We've tried scaling all the way up to P3v2, but whatever the issue is seems eventually to consume all resources available.
Whenever we've been able to trace a culprit among the apps, it has turned dout not to be the app itself but the Kudu service related to it.
A sample error message is High physical memory usage detected on multiple occasions. The kudu process for the app [sitename]'pe-services-color' is the most common cause of high memory usage. The most common cause of high memory usage for the kudu process is web jobs. where the actual app whose Kudu service is named changes quite frequently.
What could be causing the Kudu services to consume so much CPU/Memory, and what can we do to stabilise this app service?
Is it simply that we have too many apps running on one plan? This seems unlikely since all these apps ran previously on a single classic cloud service instance, but if so, what are the limits for apps and slots on a single plan?
(I have seen this question but the answer doesn't help)
Update
From Azure support, these are apparently the limits on Small - Medium - Large non-shared app services:
Worker Size Max sites
Small 5 Medium 10 Large 20
with 'sites' comprising app services/api apps and their slots.
They seem ridiculously low, and make the larger App Service units highly uneconomic. Can anyone confirm these numbers?
(Incidentally, we found that turning off Always On across the board fixed the issue - it was only causing a problem on empty sites though - we haven't had a chance yet to see if performance is good with all the sites filled.)
High CPU and memory utilization would be mostly caused by your program/code itself. If there are lot of CPU intensive tasks and you applied lot of parallel programming that spawn lot of new threads can contribute to high cpu and memory utilization. So review your code and see such instances. When number of parallel threads increased cpu utilization goes high and it starts scaling up frequently that adds up your cost also sometime thread loss and unexpected results. As Azure resources costs are high you need to plan your performance accordingly.
You can monitor this using the Metrics option of the app service plan in the blade .

CPU utilization in performance testing

I am doing performance testing on an app. I found when the number of virtual users increases, the response time increases linearly(should be natural, right?), but the CPU utilization stops increasing when reaches around 60%. Does it mean the CPU is the bottleneck? If not, what could be the bottleneck?
The bottleneck might or might not be CPU, you need to consider monitoring other OS metrics as well, to wit:
Physical RAM
Swap usage
Network IO
Disk IO
Each of them could be the bottleneck.
Also when you increase number of users ideal system should increase the number of TPS (transactions per second) by the same factor. When you increase virtual users and TPS is not getting increased the situation is called saturation point and you need to find out what is slowing your system down.
If resources utilization is far from 95-100% and your system provides large response times the reason can be non-optimal code of your application or slow database query or something like that, in this case you will need to use profiling tools to get to the bottom of the issue.
See How to Monitor Your Server Health & Performance During a JMeter Load Test article for more information on the application under test monitoring concept

Limit CPU time given to an azure web app

I have an app on azure. Once in a while the CPU usage jumps and stays high to the point of none responsiveness.
I would like to be able to profile the process to see what work the CPU is doing
(no problem doing this normally). however any action to do with the app "container" is super slow because the app is taking up all the CPU time.
Is there a way to limit the app process to, lets say, 80% of the CPU time so that I can get a reading on the cause of the problem?

Resources