Azure alert not working - azure

Does anyone know why my Azure alert isn't working? I've created an alert using response time as the metric and 5 seconds as the threshold, 15 minutes as the evaluation window.
As you can see in the linked image, the response time has been above 30 seconds for most of the previous 2 hours but there are no alerts.

Because that's below for less than 15 minutes (the evaluation window).

Related

TTL configuration and billing in Azure Integration Runtime

Doing some tests, I could see that having an Azure Integration Runtime (AIR) allowed us to reduce considerably the amount of time required to finish a pipeline.
To fully understand the use of this configuration and its billing as well, I've got these questions. Let's assume I've got two independent pipelines, all of their Data Flow activities use the same AIR with a TTL = 10 minutes.
The first pipeline takes 7 minutes to finish. The billing will be (if I understand well):
billing: time to acquire cluster + job execution time + TTL (7 + 10)
Five minutes later, I trigger the second pipeline. It will take only 3 minutes to finish (I understand it will also use the same pool as the first one). After it concludes, the TTL is setting up to 10 minutes again or is equal to 2 minutes
10 - 5 -3 (original TTL - start time second pipe - runtime second pipe), in this case, what will happen if I trigger a third pipeline that could take more than 2 minutes?
What about the billing, how is it going to be calculated?
Regards.
Look at the ADF pipeline monitoring view and find all of your data flow activity executions.
Add up that total data flow activity execution time.
Now add the TTL value for that Azure IR you were using to that total.
That is the total time you will be billed.

Azure logic app twitter trigger not working

I have created a logic app to trigger when a tweet is posted with a given hashtag. The trigger is set to check every 10 seconds. The reality is that the Logic App does not run, even if I wait minutes for it, but then if i manually run it, it then executes with the expected input. Any idea what is happening here?
I was having a similar issue, and believe this is due to the specific limitations set for the Twitter Connector (4. Frequency of trigger polls: 1 hour).
https://learn.microsoft.com/en-us/connectors/twitterconnector/
LIMITS
The following are some of the limits and restrictions:
Maximum number of connections per user: 2
API call rate limit for POST operation: 12 per hour
API call rate limit for other operations: 600 per hour
Frequency of trigger polls: 1 hour
Maximum size of image upload: 5 MB
Maximum size of video upload: 15 MB
Maximum number of search results: 100
Maximum number of new tweets tracked within one polling interval: 5
There should be some error occurred. You can inspect all runs of the triggers on the 'Trigger History' blade. This page gives a good overview of monitoring of logic apps: https://azure.microsoft.com/en-us/documentation/articles/app-service-logic-monitor-your-logic-apps/

Azure Monitor Custom log search Query - understanding Period and Frequency

UPDATE:
the actual problem is different from what I've described. I'll provide and update/edit to this ticket once we'll resolve the issue. More details may be found at this thread - https://techcommunity.microsoft.com/t5/Azure-Log-Analytics/Reliably-trigger-alerts-for-Log-Analytics-log-entries/m-p/319315/highlight/false#M1224
Original question:
We use Azure Monitor to create alerts based on logs in Log Analytics. For this we choose our Log Analytics account as a "RESOURCE", then choose "Custom log search" signal name for "CONDITION". Alert logic - "Number of results greater than 0".
Sample query:
search *
| where ResourceProvider == "MICROSOFT.DATAFACTORY" and status_s == "Failed"
For Period and Frequency lets set 15 minutes. All looks simple, but...
The issue: described above setup does not work (it works sometimes), because alerts are fired only sometimes, a lot of them are missed which is completely unacceptable behavior.
If we set Period = Frequency = 5 minutes we basically miss almost every event. Period = Frequency = 15 minutes works better, but still a lot of events are missing. Period = Frequency = 30 works even better, but all this looks weird.
Important notice - logs are collected from Data Factory V2 into Log Analytics. I suspect that alert misses are due to the fact that logs are delivered to Log Analytics with some delay (up to several minutes). So when Azure Monitor evaluates alert query for the last 15 minutes (Period=15) it might be that most resent log entries are still not in Log Analytics. When next query evaluation is executed in 15 minutes it will miss the logs that were ingressed with a delay for prev 15 minutes interval. Is this assumption correct? If so, this is very weird - how then we supposed to configure Period and Frequency values? If I set Period > Frequency (e.g. Period = 30 and Frequency = 5, which means "evaluate expression every 5 minutes, take data for last 30 minutes from current time") then we get multiple duplicated alerts because Period is larger than Frequency so there is a big chance of log search query returning the same log entries every 5 minutes - this is highly undesirable behavior.
Issue happened to be with a buggy bahavior of ARM template creating alerts. Thanks to Stanislav Zhelyazkov it has been nailed down and resolved - I use alternative ARM API now and it seems to work fine. More details on the topic may be found here - https://techcommunity.microsoft.com/t5/Azure-Log-Analytics/Reliably-trigger-alerts-for-Log-Analytics-log-entries/m-p/309610.

Azure LogicApp Metrics

I have a LogicApp which runs for every 2 minutes. All runs in this LogicApp has failed between 4:45pm to 5:00pm.
when I check for the metric Run Failure Percentage under 'Metrics' in the Azure portal with time granularity as 15 minutes and aggregation as Average, it shows me Run Failure Percentage as 114.29% for time window 4:45 pm to 5:00 pm.
My question is, how the metric value is calculated? shouldn't it be less than 100 for percentage?

Tracking metrics using StatsD (via etsy) and Graphite, graphite graph doesn't seem to be graphing all the data

We have a metric that we increment every time a user performs a certain action on our website, but the graphs don't seem to be accurate.
So going off this hunch, we invested the updates.log of carbon and discovered that the action had happened over 4 thousand times today(using grep and wc), but according the Integral result of the graph it returned only 220ish.
What could be the cause of this? Data is being reported to statsd using the statsd php library, and calling statsd::increment('metric'); and as stated above, the log confirms that 4,000+ updates to this key happened today.
We are using:
graphite 0.9.6 with statsD (etsy)
After some research through the documentation, and some conversations with others, I've found the problem - and the solution.
The way the whisper file format is designed, it expect you (or your application) to publish updates no faster than the minimum interval in your storage-schemas.conf file. This file is used to configure how much data retention you have at different time interval resolutions.
My storage-schemas.conf file was set with a minimum retention time of 1 minute. The default StatsD daemon (from etsy) is designed to update to carbon (the graphite daemon) every 10 seconds. The reason this is a problem is: over a 60 second period StatsD reports 6 times, each write overwrites the last one (in that 60 second interval, because you're updating faster than once per minute). This produces really weird results on your graph because the last 10 seconds in a minute could be completely dead and report a 0 for the activity during that period, which results in completely nuking all of the data you had written for that minute.
To fix this, I had to re-configure my storage-schemas.conf file to store data at a maximum resolution of 10 seconds, so every update from StatsD would be saved in the whisper database without being overwritten.
Etsy published the storage-schemas.conf configuration that they were using for their installation of carbon, which looks like this:
[stats]
priority = 110
pattern = ^stats\..*
retentions = 10:2160,60:10080,600:262974
This has a 10 second minimum retention time, and stores 6 hours worth of them. However, due to my next problem, I extended the retention periods significantly.
As I let this data collect for a few days, I noticed that it still looked off (and was under reporting). This was due to 2 problems.
StatsD (older versions) only reported an average number of events per second for each 10 second reporting period. This means, if you incremented a key 100 times in 1 second and 0 times for the next 9 seconds, at the end of the 10th second statsD would report 10 to graphite, instead of 100. (100/10 = 10). This failed to report the total number of events for a 10 second period (obviously).Newer versions of statsD fix this problem, as they introduced the stats_counts bucket, which logs the total # of events per metric for each 10 second period (so instead of reporting 10 in the previous example, it reports 100).After I upgraded StatsD, I noticed that the last 6 hours of data looked great, but as I looked beyond the last 6 hours - things looked weird, and the next reason is why:
As graphite stores data, it moves data from high precision retention to lower precision retention. This means, using the etsy storage-schemas.conf example, after 6 hours of 10 second precision, data was moved to 60 second (1 minute) precision. In order to move 6 data points from 10s to 60s precision, graphite does an average of the 6 data points. So it'd take the total value of the oldest 6 data points, and divide it by 6. This gives an average # of events per 10 seconds for that 60 second period (and not the total # of events, which is what we care about specifically).This is just how graphite is designed, and for some cases it might be useful, but in our case, it's not what we wanted. To "fix" this problem, I increased our 10 second precision retention time to 60 days. Beyond 60 days, I store the minutely and 10-minutely precisions, but they're essentially there for no reason, as that data isn't as useful to us.
I hope this helps someone, I know it annoyed me for a few days - and I know there isn't a huge community of people that are using this stack of software for this purpose, so it took a bit of research to really figure out what was going on and how to get a result that I wanted.
After posting my comment above I found Graphite 0.9.9 has a (new?) configuration file, storage-aggregation.conf, in which one can control the aggregation method per pattern. The available options are average, sum, min, max, and last.
http://readthedocs.org/docs/graphite/en/latest/config-carbon.html#storage-aggregation-conf

Resources