We are trying to test Azure's "Healthcheck status" to be used in an auto-scaled app service.
We have an API endpoint which return status code 200 when it's OK and error 500 when it fails.
The problem is that we don't see any indication on the metrics blade. We are forcing the endpoint to return error 500 during a 10 minutes period every 30 minutes and the metrics doesn't show any difference. We only see "count" aggregation values.
Anyone knows what's is the meaning of "avg", "min", "max" in that blade ?
Additionally, is there any place where to validate the endpoint is being called every minute and the status code received? We are assuming it's working but we have a tracelog -using log4net- that only show messages if we call the endpoint manually but there are no records every minute as it should be if the endpoint is being called automatically.
Help is appreciated
Related
Apparently, Azure App Service has a 230-second timeout.. However, when I look at my logs in App Insights' Request table, I see requests to my .NET API with 400-500 seconds duration that resulted in 200. On the other hand, I do see some 500s where the duration is over 230 seconds.
So my question is why do I see this discrepancy?
I can think of two theories:
Either, the 230 seconds is not always enforced.
Or the logs in the Request table in App Insight, show the information from what is returned from the app, and NOT the actual user experience. i.e. if, for example, my backend takes 300 seconds and returns a 200, then that's what I see in logs. However, the user got a 500 after 230 seconds.
Answering my own question in case anyone ran into this ...
I did some testing and confirmed that the 230s timeout is indeed enforced, i.e. the caller of the API would get a 500 after 230s if the API hasn't returned a response yet. However, the duration field in the logs' Request table indicates the time the app took to return a response, i.e. if the API takes 5 minutes to return a 200, the caller would get a 500 right after 230 seconds, however, in the logs, you'd see the request took 5 minutes.
That timeout is enforced on load balancer and its documented. But app service will continue to process your request it will only affect your client.
If you need to know when something completed then probably have a look into Asynchronous Request-Reply Pattern
I've defined an availability test on a Function App (called watchdog) accessible with a function key. The watchdog performs an URL-ping on other health endpoints (protected by AAD). The JSON response is evaluated if one of the functions contains an unhealthy or degraded status the overall overall status will be unhealthy and the HTTP status will be 500.
Watchdog results are correct, since I see proper outcomes when adding/removing fails injection into one of the watched functions.
The test is defined as follows:
Test type: URL ping test
URL: watchdog endpoint with function key
Parse dependent requests: unchecked
Enable retries for availability test failures: unchecked
Test frequency: 5 minutes
Test locations: West Europe
Test Timeout: 120 seconds
Status code must equal: 200
Content match: {successful JSON response}
Alerts: Enabled
I've selected the test and clicked on Open Rules (Alerts) page to edit the generated alert rule.
Then only thing I've added is an action group configured to send a message to my job mail.
I've double checked the mail for correctness. I've double checked my spam folder in outlook.
I've read docs, watched tutorials and investigated but I still have no clue about this alert is not properly fired.
If someone could put me on the right path, I'll be very grateful!
Regards,
Giacomo S. S.
This was happening cause in first place cause I've done a lot different configuration experiments, reaching maximum alerts.
Now the alert is properly fired and the email sent.
I want an Azure Alert to trigger when a certain function app fails. I set it up as a GTE 1 threshold on the [function name] Failed metric thinking that would yield the expected result. However, when it runs daily I am getting notifications that the alert fired but I cannot find anything in the Application Insights to indicate the failure and it appears to be running successfully and completing.
Here is the triggered alert summary:
Here is the invocation monitoring from the portal showing that same function over the past few days with no failures:
And here is an application insights search over that time period showing no exceptions and all successful dependency actions:
The question is - what could be causing a Azure Function Failed metric to be registering non-zero values without any telemetry in Application Insights?
Update - here is the alert configuration
And the specific condition settings-
Failures blade for wider time range:
There are some dependency failures on a blob 404 but I think that is from a different function that explicitly checks for the existence of blobs at paths to know which files to download from an external source. Also the timestamps don't fall in the sample period.
No exceptions:
Per comment on the question by #ivan-yang I have switched the alerting to use a custom log search instead of the built-in Azure Function metric. At this point that metric seems to be pretty opaque as to what is triggering it and it was triggering every day when I ran the Azure Function with no apparent underlying failure. I plan to avoid this metric now.
My log based alert is using the following query for now to get what I was looking for (an exception happened or a function failed):
requests
| where success == false
| union (exceptions)
| order by timestamp desc
Thanks to #ivan-yang and #krishnendughosh-msft for the help
I have created a AppService on Azure that runs on Tomcat. I'm using metrics for monitoring and set alert rule, that should send me an email, when error 4xx will occur. Nothing happens even though I've created more errors than rule needs to run alert.
Thanks, Dominik
According to your screenshot & the reference for Metric Definitions, it seems to be normal to not happen the event of sending mail, because your alert rule means the count of the HTTP 4xx event is greater than or equal to 5 times over the last 5 minutes, not average count per minute. So you can try to increase the threshold value or shorten the period to check the mail sender whether be triggered when the condition satisfied obviously. Meanwhile, if you doubt the alert trigger whether works fine, you can retrieve the logs via Azure CLI or Powershell.
I have configured an Azure Web Application Monitoring rule such that if there are more than 30 requests over a five minute period, then an alert should fire which should both send me an email and trigger a webhook.
Problem is, the alert doesn't fire even when the parameters for the alert are clearly satisfied. I took a screenshot of the traffic graph after I made over 30 requests to the server within a five-minute window. I've also included the specific configuration menus for this alert.
How can I make this alert fire?
I checked one of my alerts a similar one that was set to a threshold of 5 mins for response time, I find that these alerts were fired , if my response time for a give requests exceed a certain time (12MS) and that if it had happened for a period of 5 minutes, email needs to be sent. I have attached a snapshot as to when this happened to help understand what this might be - so in your case , to measure if the requests were greater than 30 at say 12:00PM - until 12:05 PM - (ie) for a period of 5 mins, your alert would fire - if it did not, then you may need to check something else.
So my guess here is that if there was a flat line # more than 30 for a period of 5 mins - meaning if you had requests greater than 30 for a continuous period of of 5 mins, your alert would and should work.