Azure AppService Performance Issue - azure

We have ASP.Net Core 2.1 Web API hosted in AppService (S1) that talks to Azure SQL DB (S1-20DTUs). Both are in same region. During load testing we found that some API instances are taking too much time to return the result.
We tried to troubleshoot the performance issue and below are our observations.
API responds within 0.5 secs most of the time.
API methods are all async methods.
Sometimes it takes around 50 secs to over a minute.
CPU & Memory utilization are below 60%
Database has 20 DTU capacity, out of which 6 DTUs are used during load testing.
In the below example snapshot from Application Insights, we see total duration of the request was 27.4 secs. But the database dependency duration was just 97ms. There is no activity till the database was called. Please refer below example.
Can someone please help me to understand what was happening in this 27 secs of wait time. What could be the reason for this?

I would recommend checking the Application Map on Application Insights resource as shown below to double check the dependencies.
Verify the CPU and Memory metrics by going to the "Diagnose and solve problems" link on App service as shown below and run the Availability and Performance report to find out if there were any issues during your load testing.
Use Async methods on your API to maximize the CPU usage. It may be that the worker process threads are hitting the limits and your app is the bottleneck. You should get some insights when you run the report mentioned in point 2 above.

The S1 tier will support no more than 900 concurrent session. If you request per second (RPS rate) during the load test is very high you may face issues.
Also S3 and above are recommended for intensive workloads. Checking if all the connections are closed properly also helps
You can find details about different pricing tiers and their capabilities in the below link
https://learn.microsoft.com/en-us/azure/sql-database/sql-database-dtu-resource-limits-single-databases

Related

What is the optimal architecture design on Azure for an infrequently used backend that needs a robust configuration?

I'm trying to find the optimal cloud architecture to host a software on Microsoft Azure.
The scenario is the following:
A (containerised) REST API is exposed to the users through which they can submit POST and GET requests. POST requests trigger a backend that needs a robust configuration to operate properly and GET requests are sent to fetch the result of the backend, if any. This component of the solution is currently hosted on an Azure Web App Service which does the job perfectly.
The (containerised) backend (triggered by POST requests) perform heavy calculations during a short amount of time (typically 5-10 minutes are allotted for the calculation). This backend needs (at least) 4 cores and 16 Gb RAM, but the more the better.
The current configuration consists in the backend hosted together with the REST API on the App Service with a plan that accommodates the backend's requirements. This is clearly not very cost-efficient, as the backend is idle ~90% of the time. On top of that it's not really scalable despite an automatic scaling rule to spawn new instances based on the CPU use: it's indeed possible that if several POST requests come at the same time, they are handled by the same instance and make it crash due to a lack of memory.
Azure Functions doesn't seem to be an option: the serverless (consumption plan) solution they propose is restricted to 1.5 Gb RAM and doesn't have Docker support.
Azure Container Instances neither, because first the max number of CPUs is 4 (which is really few for the needs here, although acceptable) and second there are cold starts of approximately 2 minutes (I imagine due to the creation of the container group, pull of the image, and so on). Despite the process is async from a user perspective, a high latency is not allowed as the result is expected within 5-10 minutes, so cold starts are a problem.
Azure Batch, which at first glance appears to be a perfect fit (beefy configurations available, made for hpc, cost effective, made for time limited tasks, ...) seems to be slow too (it takes a couple of minutes to create a pool and jobs don't run immediately when submitted).
Do you have any idea what I could use?
Thanks in advance!
Azure Functions
You could look at Azure Functions Elastic Premium plan. EP3 has 4 cores, 14GB of RAM and 250GB of storage.
Premium plan hosting provides the following benefits to your functions:
Avoid cold starts with perpetually warm instances
Virtual network connectivity.
Unlimited execution duration, with 60 minutes guaranteed.
Premium instance sizes: one core, two core, and four core instances.
More predictable pricing, compared with the Consumption plan.
High-density app allocation for plans with multiple function apps.
https://learn.microsoft.com/en-us/azure/azure-functions/functions-premium-plan?tabs=portal
Batch Considerations
When designing an application that uses Batch, you must consider the possibility of Batch not being available in a region. It's possible to encounter a rare situation where there is a problem with the region as a whole, the entire Batch service in the region, or your specific Batch account.
If the application or solution using Batch always needs to be available, then it should be designed to either failover to another region or always have the workload split between two or more regions. Both approaches require at least two Batch accounts, with each account located in a different region.
https://learn.microsoft.com/en-us/azure/batch/high-availability-disaster-recovery

App services on azure seems to be very slow

I am trying to track down when our frontend started to work that slow. Recently I created new app services within the same service plan.
so now I have six apps (2 frontend, 4 backend) running under same App Service plan using Basic pricing tier. Also, we use Kudu for deployments.
Could that be the reason? or how to look for the reason?
this is overview of that service plan
appreciating any ideas and suggestions
#user122222 This is a high CPU issue and not a slow request issue as others have pointed out.
An immediate action you can take is to scale up. If you are using a B1 instance in the basic tier, try to scale up to a B3, which will provide you with more CPU cores and RAM. See if that provides you relief. If so, then you likely need to remain at this instance level. At this point it would also be worth while to analyze your number of requests. You should scale up when you are running many sites or resource intensive sites and you should scale out when you are receiving a high number of requests.
My money is on the fact that you likely have an issue with your code that is causing a deadlock or similar. Your CPU usage graph is stuck at 100% usage over many hours. Even an overloaded ASP will see a few dips over the course of a few hours.
To troubleshoot high CPU usage, start by using the diagnose and solve problems blade in your app service plan. This is the same troubleshooting tool that a support engineer would use in a paid technical support case. Use it to troubleshoot high CPU (not slow requests as based on your screenshot, it would appear the CPU is the culprit of the slow requests).
This can tell you what app in the ASP is causing the issue and sometimes even tell you the process in that app that is causing the issue. Beyond this, I'd suggest creating and analyzing a memory dump of the problematic web app. More steps on how to do that here.
Please try to restart the worker instance.
https://learn.microsoft.com/en-us/rest/api/appservice/app-service-plans/reboot-worker#code-try-0

How do I reduce cpu percentage in portal azure?

Right now my website is slow and when I see the xxxx-cd-hp it looks like picture below. CPU: 90, 36%. Is this still normal?
Apparently at certain times, CPU percentage increased. Maybe because many users have access
How can I solve this problem?
CPU time or process time is an indication of how much processing time on the CPU, a process has used since the process has started and CPU Percentage = Process time/Total CPU Time* 100
Suppose If the process has been running for 5 hours and the CPU time is 5 hours, and it is a single core machine, then that means that the process has been utilizing 100% of the resources of the CPU. This may either be a good or bad thing depending on whether you want to keep resource consumption low or want to utilize the entire power of the system.
App Service Diagnostics is an intelligent and interactive experience to help you troubleshoot your app with no configuration required. When you run into issues with your app, App Service Diagnostics points out what’s wrong to guide you to the right information to more easily troubleshoot and resolve issues. To access App Service diagnostics, navigate to your App Service web app in the Azure portal. In the left navigation, click on Diagnose and solve problems.

ASP.NET Core 2.2 experiencing high CPU usage

So I have hosted asp.net core 2.2 web service on Azure(S2 plan). The problem is that my application sometimes getting high CPU usage(almost 99%). What I have done for now - checked process explorer on azure. I see there a lot of processes who are consuming CPU. Maybe someone knows if it's okay for these processes consume CPU?
Currently, I don't have an idea where do they come from. Maybe it's normal to have them here.
Shortly about my application:
Currently, there is not much traffic. 500-600 request in a day. Most of the request is used to communicate with MS SQL by querying records, adding, etc.
As well I am using MS Websocket, but high CPU happens when no WebSocket client is connected to web service, so I hardly believe that it's a cause. I tried to use apache ab for load testing, but there isn't any pattern, that after one request's load test, I would get high CPU. So sometimes happens, sometimes don't during load testing.
So I just update screenshot of processes, I see that lots of threads are being locked/used during the time when fluent migrator start running its logging.
Update*
I will remove fluent migrator logging middleware from Configure method. Will look forward with the situation.
UPDATE**
So I removed logging of FluentMigrator. Until now I didn't notice any CPU usage over 90%.
But still, I am confused. My CPU usage is spinning. Is it health CPU usage graph or not?
Also, I tried to make a load test on the websocket server.
I made a script that calls some functions of WebSocket every 100ms from 6-7 clients. So every 100ms there are 7 calls to WebSocket server from different clients, every function within itself queries some data/insert (approximately 3-4 queries of every WebSocket function).
What I did notice, on Azure S1 DTU 20 after 2min I am getting out of SQL pool connections, If I increase DTU to 100, it handles 7 clients properly without any errors of 'no connection pool'.
So the first question: is it a normal CPU spinning?
Second: should I get an error message of 'no SQL connection free' using this kind of load test on DTU 10 Azure SQL. I am afraid that when creating a scoped service on singleton WebSocket Service I am leaking connections.
This topic gets too long, maybe I should move it to a new topic?
-
At this stage I would say you need to profile your application and figure out what areas of your code are CPU intensive. In the past I have used dotTrace, this highlighted methods which are the most expensive with a call tree.
Once you know what areas of your code base are the least efficient, you can begin to refactor them so that they are more efficient. This could simply be changing some small operations, adding caching for queries or using distributed locking for example.
I believe the reason the other DLLs are showing CPU usage is because your code calling methods which are within those DLLs.

Occasional delays in response from azure cloud service

I maintain an azure cloud service. It is set to auto-scale based on load. To monitor the health of this service I have another service which pings this service every 2 minutes. The usual response time from this service is around 100ms.
Once or twice a week I see that the service does not respond. It is not really a worry for me - because it happens quite infrequently. I still am trying to figure out what could be causing the service to not respond. I do not think the problem is with the pinging service - I don't see any of the other services (not on azure, but on other servers) that it pings having any issues.
What could be causing these occasional delays. Any other azure service owners seeing such delays ?
Having quite similar problems. But I use Applications Inside, so I have some statistics. For example that reponse time increases together with SQL azure access time and CPU usage. My average response time according to Applications Inside is about 600ms and average RPS is about 0,6. During these problems RPS usually higher than avarage - up to 1.5, but average response time grows up to 1min! (During the day my RPS can grow up to 3 or even higher without any reponse time growth). As I have 1min sql connection timeout and I have drammatical growth of total SQL azure access time during this periods I can assume that problem happens bacause of SQL Azure. This also happens once a day or two, for about 10-15 minutes max and my ping service also always reports that service doesn't respond.
So my advice here - install Application Insights to analyze what happens dusring these response delays. It would be great if you share your results here.
P.S. I also use autoscale based on load. Though it doesn't really help in these concrete situations.

Resources