I have an Azure Functions application which once in a while "freezes" and stops processing messages and timed events.
When this happens I do not see anything in the logs (AppInsight), neither exceptions nor any kind of unfamiliar traces.
The application has following functions:
One processing messages from a Service Bus topic subscription (belonging to another application)
One processing from an internal storage queue
One timer based function triggered every half hour
Four HTTP endpoints
Our production app runs fine. This is due to an internal dashboard (on big screen in the office), which polls one of the HTTP endpoints every 5 minutes, there by keeping it alive.
Our test, stage and preproduction apps stop after a while, stopping to process messages and timer events.
This question is more or less the same as my previous question, but the without error message that was in focus then. Much fewer error messages now, as our deployment has been fixed.
A more detailed analysis can be found in the GitHub issue.
On a consumption plan, all triggers are registered in the host, so that these can be handled, leading to my functions being called at the right time. This part of the host also handles scalability.
I had two bugs:
Wrong deployment. Do zip based deployment as described in the Docs.
Malformed host.json. Comments in JSON are not right, although it does work in most circumstances in Azure Functions. But not all.
The sites now works as expected, both concerning availability and scalability.
Thanks to the people in the Azure Functions team (Ling Toh, Fabio Cavalcante, David Ebbo) for helping me out with this.
Related
We have a service which pings our EP1 Premium service and yesterday we received 3 client side timeout errors after 2 minutes of waiting. When opening the trace in App insights, these requests which time out are not even logged and have no trace of ever being received Azure side, and therefore stay unanswered. By looking at the metrics provided in the Azure Functions app, I found out that 1-2 minutes after the request has been sent, the app loses all its ability to work as its Total App Domains falls to 0 as well as all connections, threads and so on and this state lasts until the next request is received, therefore "skipping" the request that happened beforehand. This is a big issue as I need to make sure requests get answered in a timely manner.
The client service sent HTTP requests to the Azure Functions app expecting an answer, only to time out while the Azure-side doesn't have any record of ever receiving the request.
I believe this issues is related to Consumption Plan of Azure Functions called Cold Start behaviour. The "skipping" mechanism is explained below:
Apps may scale to zero when idle, meaning some requests may have additional latency at startup. The consumption plan does have some optimizations to help decrease cold start time, including pulling from pre-warmed placeholder functions that already have the function host and language processes running.https://learn.microsoft.com/en-us/azure/azure-functions/functions-scale#cold-start-behavior
Please also consider of having look on this article, which explains the behaviour. https://azure.microsoft.com/en-us/blog/understanding-serverless-cold-start/
I have an Azure function with ServiceBusTrigger which will post the message content to a webservice behind an Azure API Manager. In some cases the load of the (3rd party) webserver backend is too high and it collapses returning error 500.
I'm looking for a proper way to implement circuit breaker here.
I've considered the following:
Disable the azure function, but it might result in data loss due to multiple messages in memory (serviceBus.prefetchCount)
Implement API Manager with rate-limit policy, but this seems counter productive as it runs fine in most cases
Re-architecting the 3rd party webservice is out of scope :)
Set the queue to ReceiveDisabled, this is the preferred solution, but it results in my InputBinding throwing a huge amount of MessagingEntityDisabledExceptions which I'm (so far) unable to catch and handle myself. I've checked the docs for host.json, ServiceBusTrigger and the Run parameters but was unable to find a useful setting there.
Keep some sort of responsecode resultset and increase retry time, not ideal in a serverless scenario with multiple parallel functions.
Let API manager map 500 errors to 429 and reschedule those later, will probably work but since we send a lot of messages it will hammer the service for some time. In addition it's hard to distinguish between a temporary 500 error or a consecutive one.
Note that this question is not about deciding whether or not to trigger the circuitbreaker, merely to handle the appropriate action afterwards.
Additional info
Azure functionsV2, dotnet core 3.1 run in consumption plan
API Manager runs Basic SKU
Service Bus runs in premium tier
Messagecount: 300.000
We are looking at implementing AppInsights for our non-web application. One of the things that we want to monitor for is processes that may be "hung" for more than N number of seconds or minutes. I have been unable to find something built in that does this. The closest thing I have seen or thought of would be to log 2 custom events for the start and end of a process, and then have an alert for a custom log that queries events with no matching "end" event after N minutes.
Is there another way to monitor for hung processes using AppInsights that I am not seeing? Thanks for any help.
If you choose to use application insights, here is the suggestion just for your reference(but if you have another better solution, you can ignore this):
As per this post, you can leverage heartbeat feature, details as below:
if this application runs more than several seconds, you can leverage heartbeat
feature - it sends metric every N minutes/seconds (configurable) and the absence of such
metric will indicate that application is no longer actively running. However, if
Application Insights thread survives, then heartbeat will still be reported.
You can rely on presense/absense of the telemetry from this app in general as well as
couple custom events as you outlined above - Azure Monitor allows to set an alert on
analytics query, so you'll be able to craft a query that returns nothing in case of
application issues and set an alert on 0 count returned by such a query.
I've got an Azure Function app in production on an event hub trigger, it's low throughput with the function typically only being triggered once daily. It's running on an S1 plan at the moment and has a few other functions such as timer triggered and HTTP triggered.
It's been running fine but today it stopped being triggered by new messages until I restarted the app. All other functions were working just fine and responding to their associated triggers.
I've look through App Insights and there are no reported errors or issues, it's just not doing anything.
Has anyone else had this issue or know of what may be causing it?
First of all - is your App Service has Always On enabled?
Second thing - have you tried to test your trigger locally, so you can be sure, that there are no issues with your Event Hub?
Personally, I faced such issues when Event Host Processor implemented in EventHubTrigger was losing a lease because of additional processor introduced. It is also possible, that since it faces a low throughput, it lost a lease and for some reason was not able to renew it:
As an instance of EventProcessorHost starts it will acquire as many
leases as possible and begin reading events. As the leases draw near
expiration EventProcessorHost will attempt to renew them by placing a
reservation. If the lease is available for renewal the processor
continues reading, but if it is not the reader is closed and
CloseAsync is called - this is a good time to perform any final
cleanup for that partition.
https://blogs.msdn.microsoft.com/servicebus/2015/01/21/event-processor-host-best-practices-part-2/
Nonetheless, it is worth to contact the support to make sure there were no other issues.
I have an Azure Function app with 4 functions
one triggered on a timer every 24 hours
one triggered on events from IoT Hub
two others triggered on events from Service Bus as a result of the previous function
All functions work as expected when first deployed but after a period of time the functions stop running and the app appears to be scaled down with no servers online. At this point the functions are never triggered again unless I either restart the app, or drill into a function and view details of it (supposedly, forcing the function to start up).
I have the exact same code deployed to a different environment and it runs perfectly and has never encountered this issue. I've checked all the settings and configuration and can't see any material differences between the two.
This is really frustrating and is becoming a big issue. Any help would be much appreciated.
Function App is hosted in Australia Southeast.
This is the last execution (as of now)
10:45 PM UTC - Function started (Id=4d29555b-d3af-43d7-95e9-1a4a2d43dc46)
The event triggered function should run every few minutes as the IoT Hub it's triggering from has a steady stream of events coming in. When I prod the function (or restart it) and it comes to life it quickly churns through a backlog of messages queued in the IoT Hub.
I see the problem: you have comments in your host.json, which makes it invalid and throws off the parser at the scale controller level.
Admittedly, the error handling is quite poor here. But anyway, remove the commented out logger, and it should all work.