Azure Timer Function connection limits - azure

I have an Azure Trimer function that executes every 15 minutes. The function compiles data from 3 data sources, WCF service, REST endpoint and Table Storage, and insert the data into CosmosDb. Where I am running into an issues is that after 7 or 8 executions of function I get the "Host thresholds exceeded: [Connections]" error. Here is what is really strange, the function takes about 2 minutes to execute. The error doesn't show in the logs until well after the function is done executing.
I have gone through all the connection limits documentation and understand it. Where I am a bit confused is when the limits matter. A single execution of my function does not come anywhere close to hitting the 600 active connection limit. Do the connection limits apply to the individual execution of the timer function or are the limits an cumulative over multiple executions?
Here is the real kicker, this function was running fine for two weeks until 07/22/2012. Nothing in the code has changed and it has not been redeployed.
Runtime is 3.1.3

Is your function on a Consumption Plan or in an App Service Plan?
From your description it just sounds like your code may be leaking connections and building up a large number of them over time.
Maybe this blog post can help in ensuring the right usage patterns? https://4lowtherabbit.github.io/blogs/2019/10/SNAT/

Related

Azure Function Proxy - Cold Startup - Error 429 Too many requests

I've set up a function app in Azure. I've added a proxy to the function (so I can assign it a different URI).
When the proxy and function have been torn down and its time to wake it up, I sometimes get the error code 429: Too many requests from a single Postman/insomnia request to wake it up.
How do I stop this from happening?
For the time being, I've added a logic app to ping it every 5 mins.
Seems to be something with the last release of https://github.com/Azure/azure-functions-host/releases/tag/v3.0.15185
On the date of this release we started receiving 429s, a lot, on the functions we had running for a long time.
We fixed it by adding the following to the hosts.json:
"extensions": {
"http": {
"dynamicThrottlesEnabled": false
}
}
Doc: https://learn.microsoft.com/pt-br/azure/azure-functions/functions-bindings-http-webhook-output
My guess is that they've changed some default values.
EDIT:
We are operating for a long time using BOTH, the hosts.json update from above and the pinned version, stated by sanjo (https://stackoverflow.com/a/65311645/10585914).
You can follow the entire discussion here: https://github.com/Azure/azure-functions-host/issues/6984
And the PR: https://github.com/Azure/azure-functions-host/pull/6986
We are also experiencing 429's in our azure-function and has been advised by MS to force the Azure Functions Extensions to a lower version by setting FUNCTIONS_EXTENSION_VERSION to 3.0.14916.0 instead of ~3
We're still evaluating the "solution".
From Microsoft support, there are 2 workarounds:
Cassio's answer, which actually worked for us for a couple hours but then stopped working. We had been getting very consistent 429s for multiple days, then a brief stoppage after the change, then it came back.
Update your FUNCTIONS_EXTENSION_VERSION app setting to the previous version (3.0.14916.0). This has worked again in the short time since we've changed it.
App Setting Update
I don't think your 5 minute ping is a problem like the answer from Hury Shen. We have recently begun receiving 429 requests anytime our functions wake from a cold period. I don't know what has changed at Azure side but it is not good! One fix you could try is simply redeploy your function, we did this and it worked at least for a time! Will report back if we find anything else
It seems the error was caused by the logic app ping the function every 5 mins. Per my understanding, you schedule the logic app request function to keep the function awake.
If so, you do not need to create the logic specifically to wake it up. You can choose Premium plan for your function app when you create it.
And then go to "Scale out" tab of your function app, you can set Always Ready Instances as 1. Then your function will have one instance always awake, function will not cold start when a request come.
As Premium plan plan provides the same features and scaling mechanism used on the Consumption plan (based on number of events) with no cold start, so it will cost much more than Consumption plan. You can refer to this page about function cost.

How to find/cure source of function app throughput issues

I have an Azure function app triggered by an HttpRequest. The function app reads the request, tosses one copy of it into a storage table for safekeeping and sends another copy to a queue for further processing by another element of the system. I have a client running an ApacheBench test that reports approximately 148 requests per second processed. That rate of processing will not be enough for our expected load.
My understanding of function apps is that it should spawn as many instances as is needed to handle the load sent to it. But this function app might not be scaling out quickly enough as it’s only handling that 148 requests per second. I need it to handle at least 200 requests per second.
I’m not 100% sure the problem is on my end, though. In analyzing the performance of my function app I found a LOT of 429 errors. What I found online, particularly https://learn.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits, suggests that these errors could be due to too many requests being sent from a single IP. Would several ApacheBench 10K and 20K request load tests within a given day cause the 429 error?
However, if that’s not it, if the problem is with my function app, how can I force my function app to spawn more instances more quickly? I assume this is the way to get more throughput per second. But I’m still very new at working with function apps so if there is a different way, I would more than welcome your input.
Maybe the Premium app service plan that’s in public preview would handle more throughput? I’ve thought about switching over to that and running a quick test but am unsure if I’d be able to switch back?
Maybe EventHub is something I need to investigate? Is that something that might increase my apparent throughput by catching more requests and holding on to them until the function app could accept and process them?
Thanks in advance for any assistance you can give.
You dont provide much context of you app but this is few steps how you can improve
If you want more control you need to use App Service plan with always on to avoid cold start, also you will need to configure auto scaling since you are responsible in this plan and auto scale is not enabled by default in app service plan.
Your azure function must be fully async as you have external dependencies so you dont want to block thread while you are calling them.
Look on the limits. Using host.json you can tweek it.
429 error means that function is busy to process your request, so probably when you writing to table you are not using async and blocking thread
Function apps work very well and scale as it says. It could be because request coming from Single IP and Azure could be considering it DDOS. You can do the following
AzureDevOps Load Test
You can load test using one of the azure service . I am very sure they have better criteria of handling IPs. Azure DeveOps Load Test
Provision VM in Azure
The way i normally do is provision the VM (windows 10 pro) in azure and use JMeter to Load test. I have use this method to test and it works fine. You can provision couple of them and subdivide the load.
Use professional Load testing services
If possible you may use services like Loader.io . They use sophisticated algos to run the load test and provision bunch of VMs to run the same test.
Use Application Insights
If not already you must be using application insights to have a better look from server perspective. Go to live stream and see how many instance it would provision to handle the load test . You can easily look into events and error logs that may be arising and investigate. You can deep dive into each associated dependency and investigate the problem.

Delay in Azure function triggering off IOThub

I have data going from my system to an azure iot. I timestamp the data packet when I send it.Then I have an azure function that is triggered by the iothub. In the azure function I get the message and get the timestamp and record how long it took the data to get to the function. I also have another program running on my system that listens for data on the iothub and records that time too.
So most of the time, the time in the azure function is in millisecs, but sometimes, I see a large time for the azure function to be triggered(I conclude it is this because the program that reads from the iot hub shows that the data reached the iot hub quickly and there was no delay).
Would anybody know the reasons for why azure function might be triggering late
Is this the same question that was asked here? https://github.com/Azure/Azure-Functions/issues/711
I'll copy/paste my answer for others to see:
Based on what I see in the logs and your description, I think the latency can be explained as being caused by a cold-start of your function app process. If a function app goes idle for approximately 20 minutes, then it is unloaded from memory and any subsequent trigger will initiate a cold start.
Basically, the following sequence of events takes place:
The function app goes idle and is unloaded (this happened about 5 minutes before the trigger you mentioned).
You send the new event.
The event eventually gets noticed by our scale controller, which polls for events on a 10 second interval.
Our scale controller initiates a cold-start of your function app. This can add a few more seconds depending on the content of your function app (it was about 6 seconds in this case).
So unfortunately this is a known behavior with the consumption plan. You can read up on this issue here: https://blogs.msdn.microsoft.com/appserviceteam/2018/02/07/understanding-serverless-cold-start/. The blog post also discusses some ways you can work around this if it's problematic for your scenario.

Can I use a retry policy in an azure function?

I'm using event hubs to temporary store data which will first be saved to azure table storage and then indexed to elasticsearch.
I was thinking that I should do the storage saving calls in an azure function, and do the same for the elasticsearch indexing using NEST.
It is important that the data is processed, so I was thinking that I'll use Polly as a retry policy in case the elasticsearch server is failing. However, won't a retry policy potentially make the azure function expensive?
Is azure functions even the right way to go?
Yes, you can use Polly for retries inside your Azure Functions. Some further considerations:
Yes, you will pay for the retry time. But given that your Elastic Search is "mostly up", the extra price for occasional retries should not be too high.
If you want to retry saving to Table Storage too, you will have to write calls decorated with Polly yourself instead of otherwise preferred output binding
Make sure to check if order of writes is important to you and whether you should retry Table Storage writes to completion before you start writing to Elastic, or vice versa. Otherwise you can do them in parallel with async and then Task.WaitAll
The maximum execution time of a Function is 5 minutes by default, you can configure it up to 10 minutes max. If you need to handle outages longer than that, you probably need a plan B. E.g. start copying the events that are failing for longer than 4 (or 9) minutes to a dedicated Queue, and retry from there. Or disabling the Function for such periods of downtime.
Yes it is. You could use a library or better just write a simple linear backoff strategy —
like try 5 times with 5 seconds sleep in between — and do something like
context.log.error({
message: `Transient failure. This is Retry number ${retryCount}.`,
errorCode: errorCodeFromCallingElasticSearch,
errorDetails: moreContextMaybeSomeStack
});
every time you hit the retry logic so it goes to App Insights (make sure you integrate with App Insights, else you have no ops or it's completely dark ops).
You can then query for how often is it really a miss and get an idea on how well things go at the 95% percentile.
Occasionally running 10 seconds over the normal 1 second execution time for your function is going to cost extra, but probably nowhere near a full dedicated App Service Plan. If it comes close, just switch to that, it means your function is mostly on rather than off - which is still a perfectly good case for running a function.
App Insights can also trigger alerts if some metric goes haywire, like your retry count goes up to 11 for 24 hours, you probably want to know about that deviation. You'll need to send the retry count as a custom metric to trigger an alert off of it:
context.log.metric("CallElasticSearchRetryCount", retryCount);

Queue trigger in azure apparently not clearing up after succesful functions run

I am very new to Azure so I am not sure if my question is stated correctly but I will do my best:
I have an App that sends data in the form (1.bin, 2.bin, 3.bin...) always in consecutive order to a blob input container, when this happens it triggers an Azure function via QueueTrigger and the output of the function (1output.bin, 2output.bin, 3output.bin...) is stored in a blog output container.
When azure crashes the program tries 5 times before giving up. When azure succeeds it will run just once and that's it.
I am not sure what happened last week but since last week after each successful run, functions is idle like for 7 minutes and then it starts the process again as if it was the first time. So for example the blob receives 22.bin and functions process 22.bin and generates 22output.bin, it is supossed to stop after that but after seven minutes is processing 22.bin again.
I don't think is the app because each time the app sends data, even if it is the same one it will name the data with the next number (in my example 23.bin) but this is not the case it is just doing 22.bin again as if the trigger queue was not clear after the successful azure run, and it keeps doing it over and over again until I have to stop functions and make it crash i order to stop it.
Any idea in why is this happening and what can I try to correct it is greatly appreciated. I am just starting to learn about all this stuff.
One thing that could be possibly happening is that, the function execution time is exceeding 5 mins. Since this is a hard limit, function runtime would terminate the current execution and restart the function host.
One way to test this would be to create a Function app using Standard App Service plan instead of Consumption plan. Function app created with standard plan does not have execution time limit. You can log function start time and end time to see if it is taking longer than 5 mins to complete processing queue message.

Resources