azure app service throttling? - azure

I'm facing an issue with my MVC app service deployed on Azure.
My MVC action method receives requests and depening on parameters in querystring, it performs a redirect to external URLs.
The usual response time is milliseconds but sometimes there are requests that took a real higher response time:
The action method is real simple and there's not so much logic in it so it could be summarized as follows:
public ActionResult performRedirect(string id)
{
System.Diagnostics.Trace.TraceInformation("start");
if (id == "1")
return Redirect("http://URLA");
else if (id == "2")
return Redirect("http://URLB");
else
return Redirect("http://URLC");
}
My application uses ApplicationInsights therefore i performed analysis on that and what i found is that whenever there are "slow" requests there's a sort of delay between the time the request is handled by the action method and the diagnostic tracking "start" (up to 10 seconds!).
My question is: why is this happening? is it because of an increase in the requests to the action that cannot be managed and therefore there's a incoming queue to be emptied? should i increase the performances of the resource (now i'm using a S1 with 2 instances)?

To the performance issue, there is not a specific explanation. It may caused by several reasons, like bandwidth restrictions, source limited, etc.
You could try to troubleshoot with the article. Also some ways to mitigate the issue,
Scale the web app
Use AutoHeal
Restart the web app
Besides, here is a similar issue, you could refer to it.

Related

Azure slow communication between APIs

In some 1-5% of our requests, we are seeing slow communication between APIs (REST API requests). Both APIs are developed by us and hosted on Azure, each app service on its own app service plan in the same region, P1v2 tier.
What we are seeing on application insights is that POST or GET requests on origin API can take a few seconds to execute, while real execution time on destination API is only a few milliseconds.
Examples (first line POST request on origin, second execution time on destination API): slow req 1, slow req 2
Our best guess is that the time difference is lost in communication between components. We don't have an explanation for it since the payload is really small and in most cases, communication takes less than 5 milliseconds.
We dismiss the possible explanation it could be due to component cold start since it happens during constant load and no horizontal scaling was performed.
Do you have any idea what might cause it or how to do additional analysis in order to discover it?
If you're running multiple sites on the App Service Plan, then enable the "Always On" setting for your web app > All Settings > Application Settings > Click on Always On
See here for details: https://azure.microsoft.com/en-us/documentation/articles/web-sites-configure/
When Always On is off, the site is shut down after 20 minutes of inactivity to free up resources for any additional websites that might be using the same App Service Plan.
The amount of information it needs to collect, process and then present itself requires some time, and involve internal calls as well, that is why considering the server load and usage, it takes around 6 to 7 seconds sometimes even more.
To Troubleshoot that latency, try this steps, provided by Microsoft.

High response duration on first request for .net core api on Azure

I have deployed a .Net Core API to Azure as an App Service.
I have set the Always on feature to true.
When I log the requests, I see that Azure Always on requests are coming every 5 minutes.
My usage with API is HTTPS but Always on requests are sending with HTTP. I don't know if this is the case
For the first request, it is sometimes 10 seconds, but after the first request, it is around 100ms.
What is missing here?
I have logged the durations:
There are quite a few reasons why this might be the case:
You're connecting to resources that take time connecting to the first time
Some information is being cached and needs to be read the first time
There is initialization code present
Lazy instantiation of (static/singleton) instances
... other ...
Add some logging to your application, maybe enable Application Insights if you haven't done so already and go try to find the culprit.

Azure Function with ServiceBusTrigger circuit breaker pattern

I have an Azure function with ServiceBusTrigger which will post the message content to a webservice behind an Azure API Manager. In some cases the load of the (3rd party) webserver backend is too high and it collapses returning error 500.
I'm looking for a proper way to implement circuit breaker here.
I've considered the following:
Disable the azure function, but it might result in data loss due to multiple messages in memory (serviceBus.prefetchCount)
Implement API Manager with rate-limit policy, but this seems counter productive as it runs fine in most cases
Re-architecting the 3rd party webservice is out of scope :)
Set the queue to ReceiveDisabled, this is the preferred solution, but it results in my InputBinding throwing a huge amount of MessagingEntityDisabledExceptions which I'm (so far) unable to catch and handle myself. I've checked the docs for host.json, ServiceBusTrigger and the Run parameters but was unable to find a useful setting there.
Keep some sort of responsecode resultset and increase retry time, not ideal in a serverless scenario with multiple parallel functions.
Let API manager map 500 errors to 429 and reschedule those later, will probably work but since we send a lot of messages it will hammer the service for some time. In addition it's hard to distinguish between a temporary 500 error or a consecutive one.
Note that this question is not about deciding whether or not to trigger the circuitbreaker, merely to handle the appropriate action afterwards.
Additional info
Azure functionsV2, dotnet core 3.1 run in consumption plan
API Manager runs Basic SKU
Service Bus runs in premium tier
Messagecount: 300.000

How to find/cure source of function app throughput issues

I have an Azure function app triggered by an HttpRequest. The function app reads the request, tosses one copy of it into a storage table for safekeeping and sends another copy to a queue for further processing by another element of the system. I have a client running an ApacheBench test that reports approximately 148 requests per second processed. That rate of processing will not be enough for our expected load.
My understanding of function apps is that it should spawn as many instances as is needed to handle the load sent to it. But this function app might not be scaling out quickly enough as it’s only handling that 148 requests per second. I need it to handle at least 200 requests per second.
I’m not 100% sure the problem is on my end, though. In analyzing the performance of my function app I found a LOT of 429 errors. What I found online, particularly https://learn.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits, suggests that these errors could be due to too many requests being sent from a single IP. Would several ApacheBench 10K and 20K request load tests within a given day cause the 429 error?
However, if that’s not it, if the problem is with my function app, how can I force my function app to spawn more instances more quickly? I assume this is the way to get more throughput per second. But I’m still very new at working with function apps so if there is a different way, I would more than welcome your input.
Maybe the Premium app service plan that’s in public preview would handle more throughput? I’ve thought about switching over to that and running a quick test but am unsure if I’d be able to switch back?
Maybe EventHub is something I need to investigate? Is that something that might increase my apparent throughput by catching more requests and holding on to them until the function app could accept and process them?
Thanks in advance for any assistance you can give.
You dont provide much context of you app but this is few steps how you can improve
If you want more control you need to use App Service plan with always on to avoid cold start, also you will need to configure auto scaling since you are responsible in this plan and auto scale is not enabled by default in app service plan.
Your azure function must be fully async as you have external dependencies so you dont want to block thread while you are calling them.
Look on the limits. Using host.json you can tweek it.
429 error means that function is busy to process your request, so probably when you writing to table you are not using async and blocking thread
Function apps work very well and scale as it says. It could be because request coming from Single IP and Azure could be considering it DDOS. You can do the following
AzureDevOps Load Test
You can load test using one of the azure service . I am very sure they have better criteria of handling IPs. Azure DeveOps Load Test
Provision VM in Azure
The way i normally do is provision the VM (windows 10 pro) in azure and use JMeter to Load test. I have use this method to test and it works fine. You can provision couple of them and subdivide the load.
Use professional Load testing services
If possible you may use services like Loader.io . They use sophisticated algos to run the load test and provision bunch of VMs to run the same test.
Use Application Insights
If not already you must be using application insights to have a better look from server perspective. Go to live stream and see how many instance it would provision to handle the load test . You can easily look into events and error logs that may be arising and investigate. You can deep dive into each associated dependency and investigate the problem.

TelemetryClient produces inconsistent results in Application Insights

I tried tracking custom metrics with and without flushing it. However, the metrics only intermittently shows up in Application Insights under the "Custom" section. First question: Is it required to run "flush()" after every single "TrackMetric(metric)" call in order for the telemetry to be sent to Application Insights? Second: Why is there this intermittent behavior? I'm only writing one metric at a time, so it's not as if I'm overloading Application Insights with thousands of separate calls. Here is my code (This is from a simple Console App):
public class Program
{
public static void Main(string[] args)
{
var telemetryClient = new TelemetryClient()
{
Context = { InstrumentationKey = "{{hidden instrumentation key}}" }
};
var metric = new MetricTelemetry
{
Name = "ImsWithContextMetric2",
Sum = 42.0
};
telemetryClient.TrackMetric(metric);
telemetryClient.Flush();
}
}
I'm also getting this strange behavior in Application Insights in which the custom metric I add shows up under a "Unavailable/deprecated Metrics" section. And a metric that I didn't even add called "Process CPU (all cores)" pops up under the "Custom" section. Any ideas why this strange behavior would occur?:
Is it required to run "flush()" after every single "TrackMetric(metric)" call in order for the telemetry to be sent to Application Insights?
Since you are using a Console Application to send events to Application Insights, which might be short-lived, it is definitely a good practice to call .Flush() every once in a while. The SDK uses the InMemoryChannel to send telemetry and sends it in batches using from an in-memory queue. So it is very important to call the .Flush() so that the data is forcefully pushed. A good practice might be to add a bit of wait after the event:
telemetryClient.Flush();
Thread.Sleep(1000);
More reading: Flushing data, Ensure you don't lose telemetry
However, the metrics only intermittently shows up in Application Insights under the "Custom" section. Why is there this intermittent behavior? I'm only writing one metric at a time, so it's not as if I'm overloading Application Insights with thousands of separate calls.
Sometimes there is a delay in metrics showing up in the Azure Portal. It can be up to a few minutes too. But if you have set it up correctly, you aren't exceeding the throttling limit, and adaptive sampling is disabled, then there is no reason for which telemetry should be intermittent. However if you still feel something is wrong, start a fiddler trace (make sure you are capturing from non-browser sessions) and check if a call is going out to dc.services.visualstudio.com. Make sure the response is 200 OK and if the items were accepted by the server.
I'm also getting this strange behavior in Application Insights in which the custom metric I add shows up under a "Unavailable/deprecated Metrics" section.
What version of the SDK are you using? I just tried out the same scenario and the custom metrics are showing up correctly.
And a metric that I didn't even add called "Process CPU (all cores)" pops up under the "Custom" section.
"Process CPU" is a performance counter which is used to track CPU utilization. I believe the SDK will only be able to track these counters if the app is running under IIS or on Azure. It probably got added internally when you created your Application Insights resource. You can ignore it since it won't have data to chart.
Hope this helps!

Resources