Azure App Service health check alert not firing - azure

I have an app service with provides a health check endpoint. I have enabled "Health check" on this service and provided the health check endpoint path. I have validated the endpoint in a browser and it is reachable. When everything is running it is reporting a value for the metric of 100. I have set up an alert rule on this metric in Application Insights and tried Average and Min in this rule (< 100). When I kill or stop the service the rule never fires.
It is stated here that this should be possible but I have not found a way to do this:
https://azure.github.io/AppService/2020/08/24/healthcheck-on-app-service.html#alerts
Also I'm not sure what the 100 even is: %?
In the chart when I hover over the last few minutes it doesn't show a numeric value but rather "--". Which is probably why the rule doesn't fire. Anyone got this working? Is it a bug?

Just to report back, this seems to be working as expected now. It also depends on the type of "unhealthiness" of the service. If you stop the service manually it doesn't fire as it is then a planned stop. If you kill the main process then the service is automatically restarted (as it is seemingly implemented with docker and the main process is the entrypoint). If however you do something for the server to report downness (5XX status code for example on the health endpoint) then it works as expected. The metrics are recorded correctly and thus metric graphs are correct. Alerting on such metrics also works then.

This is not a direct answer, but as a workaround, we suggest you can take a look at Application Insights Availability test.
It's easy to configure and use for health check purpose.

Related

Why is my Azure node.js app becoming unresponsive?

I recently deployed a Node.js Backend Service to Azure and have the following problem. The service becomes unresponsive after a certain amount of time, and only comes back to life if a external request is sent. The problem is, that it takes about 3 minutes for the Container to start back up and actually return the request. I'm running Node 14 LTS. I also added a health check yesterday, but azure simply doesn't bother actually keeping the app alive, here is the metric off azure
I verified azure is actually trying to reach the correct endpoint, and it does. I also have "Always On" enabled. I also verified that the app itself, is not crashing. I log every request and all of a sudden requests are no longer received, which means the health endpoint doesn't respond either, but it does not result in a container restart. It just waits for an external request to appear and then decides to start everything back up, which takes too long.
I feel like it's some kind of configuration issue, because the app itself is not very complex and I never experienced crashes when doing local development.
The official document tells us that the Free pricing tier you are currently using, Always on does not take effect.
How do I decrease the response time for the first request after idle time?

Azure Function Proxy - Cold Startup - Error 429 Too many requests

I've set up a function app in Azure. I've added a proxy to the function (so I can assign it a different URI).
When the proxy and function have been torn down and its time to wake it up, I sometimes get the error code 429: Too many requests from a single Postman/insomnia request to wake it up.
How do I stop this from happening?
For the time being, I've added a logic app to ping it every 5 mins.
Seems to be something with the last release of https://github.com/Azure/azure-functions-host/releases/tag/v3.0.15185
On the date of this release we started receiving 429s, a lot, on the functions we had running for a long time.
We fixed it by adding the following to the hosts.json:
"extensions": {
"http": {
"dynamicThrottlesEnabled": false
}
}
Doc: https://learn.microsoft.com/pt-br/azure/azure-functions/functions-bindings-http-webhook-output
My guess is that they've changed some default values.
EDIT:
We are operating for a long time using BOTH, the hosts.json update from above and the pinned version, stated by sanjo (https://stackoverflow.com/a/65311645/10585914).
You can follow the entire discussion here: https://github.com/Azure/azure-functions-host/issues/6984
And the PR: https://github.com/Azure/azure-functions-host/pull/6986
We are also experiencing 429's in our azure-function and has been advised by MS to force the Azure Functions Extensions to a lower version by setting FUNCTIONS_EXTENSION_VERSION to 3.0.14916.0 instead of ~3
We're still evaluating the "solution".
From Microsoft support, there are 2 workarounds:
Cassio's answer, which actually worked for us for a couple hours but then stopped working. We had been getting very consistent 429s for multiple days, then a brief stoppage after the change, then it came back.
Update your FUNCTIONS_EXTENSION_VERSION app setting to the previous version (3.0.14916.0). This has worked again in the short time since we've changed it.
App Setting Update
I don't think your 5 minute ping is a problem like the answer from Hury Shen. We have recently begun receiving 429 requests anytime our functions wake from a cold period. I don't know what has changed at Azure side but it is not good! One fix you could try is simply redeploy your function, we did this and it worked at least for a time! Will report back if we find anything else
It seems the error was caused by the logic app ping the function every 5 mins. Per my understanding, you schedule the logic app request function to keep the function awake.
If so, you do not need to create the logic specifically to wake it up. You can choose Premium plan for your function app when you create it.
And then go to "Scale out" tab of your function app, you can set Always Ready Instances as 1. Then your function will have one instance always awake, function will not cold start when a request come.
As Premium plan plan provides the same features and scaling mechanism used on the Consumption plan (based on number of events) with no cold start, so it will cost much more than Consumption plan. You can refer to this page about function cost.

How to find/cure source of function app throughput issues

I have an Azure function app triggered by an HttpRequest. The function app reads the request, tosses one copy of it into a storage table for safekeeping and sends another copy to a queue for further processing by another element of the system. I have a client running an ApacheBench test that reports approximately 148 requests per second processed. That rate of processing will not be enough for our expected load.
My understanding of function apps is that it should spawn as many instances as is needed to handle the load sent to it. But this function app might not be scaling out quickly enough as it’s only handling that 148 requests per second. I need it to handle at least 200 requests per second.
I’m not 100% sure the problem is on my end, though. In analyzing the performance of my function app I found a LOT of 429 errors. What I found online, particularly https://learn.microsoft.com/en-us/azure/azure-resource-manager/resource-manager-request-limits, suggests that these errors could be due to too many requests being sent from a single IP. Would several ApacheBench 10K and 20K request load tests within a given day cause the 429 error?
However, if that’s not it, if the problem is with my function app, how can I force my function app to spawn more instances more quickly? I assume this is the way to get more throughput per second. But I’m still very new at working with function apps so if there is a different way, I would more than welcome your input.
Maybe the Premium app service plan that’s in public preview would handle more throughput? I’ve thought about switching over to that and running a quick test but am unsure if I’d be able to switch back?
Maybe EventHub is something I need to investigate? Is that something that might increase my apparent throughput by catching more requests and holding on to them until the function app could accept and process them?
Thanks in advance for any assistance you can give.
You dont provide much context of you app but this is few steps how you can improve
If you want more control you need to use App Service plan with always on to avoid cold start, also you will need to configure auto scaling since you are responsible in this plan and auto scale is not enabled by default in app service plan.
Your azure function must be fully async as you have external dependencies so you dont want to block thread while you are calling them.
Look on the limits. Using host.json you can tweek it.
429 error means that function is busy to process your request, so probably when you writing to table you are not using async and blocking thread
Function apps work very well and scale as it says. It could be because request coming from Single IP and Azure could be considering it DDOS. You can do the following
AzureDevOps Load Test
You can load test using one of the azure service . I am very sure they have better criteria of handling IPs. Azure DeveOps Load Test
Provision VM in Azure
The way i normally do is provision the VM (windows 10 pro) in azure and use JMeter to Load test. I have use this method to test and it works fine. You can provision couple of them and subdivide the load.
Use professional Load testing services
If possible you may use services like Loader.io . They use sophisticated algos to run the load test and provision bunch of VMs to run the same test.
Use Application Insights
If not already you must be using application insights to have a better look from server perspective. Go to live stream and see how many instance it would provision to handle the load test . You can easily look into events and error logs that may be arising and investigate. You can deep dive into each associated dependency and investigate the problem.

Azure Front Door - How to do rolling update of backend pool?

Has anyone successfully done rolling updates with Azure Front Door? We have an application in 2 regions, and we want to disable the backend in region 1 while it gets updated and same for the backend in region 2. However, there seems to be a ridiculous amount of lag time between when you disable or remove a backend from a pool, making this basically impossible.
We've tried:
Disabling/totally removing backends
Setting high/low backend priorities/weights
Modifying health probe intervals
Changing sample size/successful samples/latency to 1/1/100
I have an endpoint that I watch during the deployment process which tells me which region it's in, and it never changes during the operation, and becomes unavailable when the region is being updated. There's gotta be a way to do this, right?
I have a suggestion,
Reduce the Health Probe Interval.
Reduce the sample size and successful sample required. (Make sure you are probing to a simple HTTP page so your backend resource can handle the loads. You will start receiving probes from all the POP servers with the interval you specified.)
3.For the sever which you need to do maintenance, stop the service or make the probe fail, so that all traffic will switch to the healthy server. Then do the maintenance and start the service again. This will make sure your service is not disrupted.

AppInsights - Monitor for Hung Processes

We are looking at implementing AppInsights for our non-web application. One of the things that we want to monitor for is processes that may be "hung" for more than N number of seconds or minutes. I have been unable to find something built in that does this. The closest thing I have seen or thought of would be to log 2 custom events for the start and end of a process, and then have an alert for a custom log that queries events with no matching "end" event after N minutes.
Is there another way to monitor for hung processes using AppInsights that I am not seeing? Thanks for any help.
If you choose to use application insights, here is the suggestion just for your reference(but if you have another better solution, you can ignore this):
As per this post, you can leverage heartbeat feature, details as below:
if this application runs more than several seconds, you can leverage heartbeat
feature - it sends metric every N minutes/seconds (configurable) and the absence of such
metric will indicate that application is no longer actively running. However, if
Application Insights thread survives, then heartbeat will still be reported.
You can rely on presense/absense of the telemetry from this app in general as well as
couple custom events as you outlined above - Azure Monitor allows to set an alert on
analytics query, so you'll be able to craft a query that returns nothing in case of
application issues and set an alert on 0 count returned by such a query.

Resources