Azure API Manager increased response time when function scales - azure

After doing some load testing with a an azure function on the consumption plan that scales I ran into an interesting scenario. Right now I have a function app that is expecting an http post behind an api management instance. The function app does some processing and returns a response to the caller. The API manager has no caching for the time being since we are expecting the process to just pass through the body through for processing.
When I call the function directly, the app will scale appropriately and I see SOME cold start behavior from the function app, but the average response time is sub 200 ms from a local request to azure for a sustained load.
When I call the function through the api manager and have a sustained load on the instance I start seeing 30-60 second response times around the 5 minute mark of the load test.
The load is 2 requests per second and fully asynchronous so there is no blocking. Additionally, there are no database calls, this is solely a compute function.
Has anyone else seen this behavior?

It seems to be a setting or LB problem. What you can do is reduce the functionTimeout setting in your host.json (e.g. 20 seconds) and then, add a retry policy on API Management.
https://learn.microsoft.com/en-us/azure/api-management/api-management-advanced-policies#example-7

Related

Azure Functions service not recognizing request sent from outside client

We have a service which pings our EP1 Premium service and yesterday we received 3 client side timeout errors after 2 minutes of waiting. When opening the trace in App insights, these requests which time out are not even logged and have no trace of ever being received Azure side, and therefore stay unanswered. By looking at the metrics provided in the Azure Functions app, I found out that 1-2 minutes after the request has been sent, the app loses all its ability to work as its Total App Domains falls to 0 as well as all connections, threads and so on and this state lasts until the next request is received, therefore "skipping" the request that happened beforehand. This is a big issue as I need to make sure requests get answered in a timely manner.
The client service sent HTTP requests to the Azure Functions app expecting an answer, only to time out while the Azure-side doesn't have any record of ever receiving the request.
I believe this issues is related to Consumption Plan of Azure Functions called Cold Start behaviour. The "skipping" mechanism is explained below:
Apps may scale to zero when idle, meaning some requests may have additional latency at startup. The consumption plan does have some optimizations to help decrease cold start time, including pulling from pre-warmed placeholder functions that already have the function host and language processes running.https://learn.microsoft.com/en-us/azure/azure-functions/functions-scale#cold-start-behavior
Please also consider of having look on this article, which explains the behaviour. https://azure.microsoft.com/en-us/blog/understanding-serverless-cold-start/

Azure slow communication between APIs

In some 1-5% of our requests, we are seeing slow communication between APIs (REST API requests). Both APIs are developed by us and hosted on Azure, each app service on its own app service plan in the same region, P1v2 tier.
What we are seeing on application insights is that POST or GET requests on origin API can take a few seconds to execute, while real execution time on destination API is only a few milliseconds.
Examples (first line POST request on origin, second execution time on destination API): slow req 1, slow req 2
Our best guess is that the time difference is lost in communication between components. We don't have an explanation for it since the payload is really small and in most cases, communication takes less than 5 milliseconds.
We dismiss the possible explanation it could be due to component cold start since it happens during constant load and no horizontal scaling was performed.
Do you have any idea what might cause it or how to do additional analysis in order to discover it?
If you're running multiple sites on the App Service Plan, then enable the "Always On" setting for your web app > All Settings > Application Settings > Click on Always On
See here for details: https://azure.microsoft.com/en-us/documentation/articles/web-sites-configure/
When Always On is off, the site is shut down after 20 minutes of inactivity to free up resources for any additional websites that might be using the same App Service Plan.
The amount of information it needs to collect, process and then present itself requires some time, and involve internal calls as well, that is why considering the server load and usage, it takes around 6 to 7 seconds sometimes even more.
To Troubleshoot that latency, try this steps, provided by Microsoft.

High response duration on first request for .net core api on Azure

I have deployed a .Net Core API to Azure as an App Service.
I have set the Always on feature to true.
When I log the requests, I see that Azure Always on requests are coming every 5 minutes.
My usage with API is HTTPS but Always on requests are sending with HTTP. I don't know if this is the case
For the first request, it is sometimes 10 seconds, but after the first request, it is around 100ms.
What is missing here?
I have logged the durations:
There are quite a few reasons why this might be the case:
You're connecting to resources that take time connecting to the first time
Some information is being cached and needs to be read the first time
There is initialization code present
Lazy instantiation of (static/singleton) instances
... other ...
Add some logging to your application, maybe enable Application Insights if you haven't done so already and go try to find the culprit.

Azure Function with ServiceBusTrigger circuit breaker pattern

I have an Azure function with ServiceBusTrigger which will post the message content to a webservice behind an Azure API Manager. In some cases the load of the (3rd party) webserver backend is too high and it collapses returning error 500.
I'm looking for a proper way to implement circuit breaker here.
I've considered the following:
Disable the azure function, but it might result in data loss due to multiple messages in memory (serviceBus.prefetchCount)
Implement API Manager with rate-limit policy, but this seems counter productive as it runs fine in most cases
Re-architecting the 3rd party webservice is out of scope :)
Set the queue to ReceiveDisabled, this is the preferred solution, but it results in my InputBinding throwing a huge amount of MessagingEntityDisabledExceptions which I'm (so far) unable to catch and handle myself. I've checked the docs for host.json, ServiceBusTrigger and the Run parameters but was unable to find a useful setting there.
Keep some sort of responsecode resultset and increase retry time, not ideal in a serverless scenario with multiple parallel functions.
Let API manager map 500 errors to 429 and reschedule those later, will probably work but since we send a lot of messages it will hammer the service for some time. In addition it's hard to distinguish between a temporary 500 error or a consecutive one.
Note that this question is not about deciding whether or not to trigger the circuitbreaker, merely to handle the appropriate action afterwards.
Additional info
Azure functionsV2, dotnet core 3.1 run in consumption plan
API Manager runs Basic SKU
Service Bus runs in premium tier
Messagecount: 300.000

Is there any method to increase the request time out for a rest api call in Azure Logic App?

I'm creating a Recurrence Azure Logic App for calling a rest API one after another in the interval of 30 minutes. The scenario is, once I call the primary API, after the response comes as 200 (status code) then only the next API should be called. But the issue I'm facing is my primary API is taking about 3 mins 40 seconds to fetch the data. So every time the request fails as by default the request timeout is 2 minutes.
Please suggest a way to create this logic app.
As of now, you don't have any option in the Logic App HTTP action to configure the timeout duration, so it is probably not possible at the moment.
May be you should consider using Durable Function with the Function Chaining Pattern.

Resources