How to exclude a health check resource from Datadog metric alert monitor query? - terraform

We are setting up a metric alert monitor and other monitors using Terraforms. The query looks like this:
query = "max(last_10m):p95:trace.netty.request{env:${var.env},service:${local.service_name}} >= 4"
We would like to exclude health checks from this particular metric only, e.g. GET /healthcheck
How can this be achieved? Are there some examples?

Resources like the health check have a resource_name tag. This tag can be used to exclude them, e.g. !resource_name:get_/health
Here is an example of a query excluding the health check resources:
query = "max(last_10m):p95:trace.netty.request{env:${var.env},service:${local.service_name},!resource_name:get_/health} >= 4"
Visit DataDog documentation for more information.

Related

How to make an azure cost management export run daily and only export that day or previous days data? Using terraform or azure api

Taking the Terraform resource example:
resource "azurerm_billing_account_cost_management_export" "example" {
name = "example"
billing_account_id = "example"
recurrence_type = "Daily"
recurrence_period_start_date = "2020-08-18T00:00:00Z"
recurrence_period_end_date = "2020-09-18T00:00:00Z"
export_data_storage_location {
container_id = azurerm_storage_container.example.resource_manager_id
root_folder_path = "/root/updated"
}
export_data_options {
type = "Usage"
time_frame = "Custom"
}
}
The documentation isn't very clear what time_frame = "Custom" does and what else to add here?, I would like to create an export that runs daily however it only exports that day or maybe the previous days worth of data not month-to-date being the closest to this. As i do not want all of the other days data on that export. Is Setting the time_frame to custom allow me to do this? Will i have to set a start_date and end_date? and if so can i then run an update request daily potentially to change the days in a script someway as an alternative option
Tried creating a day-to-month export however the file is too large and comes with unwanted data as the end of the month comes along
Is Setting the time_frame to custom allow me to do this?
Yes, we can do this via json api.
Terraform provider it self not supporting this option. Refer below screenshots.
I have replicated the ways using terraform, no luck. Using terraform we have only below custom options. time_frame only allows below mentioned parameters.
Possible values include: WeekToDate, MonthToDate, BillingMonthToDate, TheLastWeek, TheLastMonth, TheLastBillingMonth, Custom.
seems the respective custom parameters are specific to Month.
we can use like
time_frame = "TheLastMonth"
I would suggest you to use the Azure Cost Management connector in Power BI Desktop , upload that on Power BI and set a daily refresh.
You can then query the Datamart like a database daily.

Transcribing Splunk's "transaction" Command into Azure Log Analytics / Azure Data Analytics / Kusto

We're using AKS and have our container logs writing to Log Analytics. We have an application that emits several print statements in the container log per request, and we'd like to group all of those events/log lines into aggregate events, one event per incoming request, so it's easier for us to find lines of interest. So, for example, if the request started with the line "GET /my/app" and then later the application printed something about an access check, we want to be able to search through all the log lines for that request with something like | where LogEntry contains "GET /my/app" and LogEntry contains "access_check".
I'm used to queries with Splunk. Over there, this type of inquiry would be a cinch to handle with the transaction command:
But, with Log Analytics, it seems like multiple commands are needed to pull this off. Seems like I need to use extend with row_window_session in order to give all the related log lines a common timestamp, then summarize with make_list to group the lines of log output together into a JSON blob, then finally parse_json and strcat_array to assemble the lines into a newline-separated string.
Something like this:
ContainerLog
| sort by TimeGenerated asc
| extend RequestStarted= row_window_session(TimeGenerated, 30s, 2s, ContainerID != prev(ContainerID))
| summarize logLines = make_list(LogEntry) by RequestStarted
| extend parsedLogLines = strcat_array(parse_json(logLines), "\n")
| where parsedLogLines contains "GET /my/app" and parsedLogLines contains "access_check"
| project Timestamp=RequestStarted, LogEntry=parsedLogLines
Is there a better/faster/more straightforward way to be able to group multiple lines for the same request together into one event and then perform a search across the contents of that event?
After reading your question, there is no such an easy way to do that in azure log analytics.
If the logs are in this format, you need to do some other work to meet your requirement.

Azure Metric: dtu_used does not accept zero dimension case

I am trying to get the metrics of Microsoft.Sql/servers/ resource:
The resource_id is: /subscriptions/******-8**2-44**-95**-****13a5****/resourceGroups/SQLTesting/providers/Microsoft.Sql/servers/sqltest
As in example metrics
resource_id="/subscriptions/******-8**2-44**-95**-****13a5****/resourceGroups/SQLTesting/providers/Microsoft.Sql/servers/sqltest"
today = datetime.datetime.now().date()
yesterday = today - datetime.timedelta(days=1)
metrics_data = client.metrics.list(
resource_id,
timespan="{}/{}".format(yesterday, today),
interval='PT1H',
metricnames='dtu_used',
aggregation='Average'
)
for item in metrics_data.value:
# azure.mgmt.monitor.models.Metric
print("{} ({})".format(item.name.localized_value, item.unit.name))
for timeserie in item.timeseries:
for data in timeserie.data:
# azure.mgmt.monitor.models.MetricData
print("{}: {}".format(data.time_stamp, data.total))
But I am getting ErrorResponseException: Metric: dtu_used does not accept zero dimension case error. How to fix it?
In metric definition, there is a property called isDimensionRequired, when querying for its value you must provide a filter for the dimension; if you do not do so the metric query will fail.
So, for dtu_used metric of SQL Server, the value is true which means that you need to provide the DataBaseResourceId in filter.
Here is a ample:
So, basically, you need to add a filter in the metric query.
I can reproduce your issue, the metric dtu_used is for the sql db not the sql server, i.e for the resource type Microsoft.Sql/servers/databases, not Microsoft.Sql/servers, see this link.
To fix the issue, you need to use the resource_id with the resource id of a specific sql db, like /subscriptions/******-8**2-44**-95**-****13a5****/resourceGroups/SQLTesting/providers/Microsoft.Sql/servers/sqltest/databases/Mydatabase, then it will work fine.

Azure Log Analytics - Alerts Advice

I have a question about azure log analytics alerts, in that I don't quite understand how the time frame works within the context of setting up an alert based on an aggregated value.
I have the code below:
Event | where Source == "EventLog" and EventID == 6008 | project TimeGenerated, Computer | summarize AggregatedValue = count(TimeGenerated) by Computer, bin_at(TimeGenerated,24h, datetime(now()))
For time window : 24/03/2019, 09:46:29 - 25/03/2019, 09:46:29
In the above the alert configuration interface insights on adding the bin_at(TimeGenerated,24h, datetime(now())) so I add the function, passing the arguments for a 24h time period. If you are already adding this then what is the point of the time frame.
Basically the result I am looking for is capturing this event over a 24 hour period and alerting when the event count is over 2. I don't understand why a time window is also necessary on top of this because I just want to run the code every five minutes and alert if it detects more than two instances of this event.
Can anyone help with this?
AFAIK you may use the query something like shown below to accomplish your requirement of capturing the required event over a time period of 24 hour.
Event
| where Source == "EventLog" and EventID == 6008
| where TimeGenerated > ago(24h)
| summarize AggregatedValue= any(EventID) by Computer, bin(TimeGenerated, 1s)
The '1s' in this sample query is the time frame with which we are aggregating and getting the output from Log Analytics workspace repository. For more information, refer https://learn.microsoft.com/en-us/azure/kusto/query/summarizeoperator
And to create an alert, you may have to go to Azure portal -> YOURLOGANALYTICSWORKSPACE -> Monitoring tile -> Alerts -> Manager alert rules -> New alert rule -> Add condition -> Custom log search -> Paste any of the above queries under 'Search query' section -> Type '2' under 'Threshold value' parameter of 'Alert logic' section -> Click 'Done' -> Under 'Action Groups' section, select existing action group or create a new one as explained in the below mentioned article -> Update 'Alert Details' -> Click on 'Create alert rule'.
https://learn.microsoft.com/en-us/azure/azure-monitor/platform/action-groups
Hope this helps!! Cheers!! :)
To answer your question in the comments part, yes the alert insists on adding the bin function and that's the reason I have provided relevant query along with bin function by having '1s' and tried to explain about it in my previous answer.
If you put '1s' in bin function then you would fetch output from Log Analytics by aggregating value of any EventID in the timespan of 1second. So output would look something like shown below where aaaaaaa is considered as a VM name, x is considered as a particular time.
If you put '24h' instead of '1s' in bin function then you would fetch output from Log Analytics by aggregating value of any EventID in the timespan of 24hours. So output would look something like shown below where aaaaaaa is considered as a VM name, x is considered as a particular time.
So in this case, we should not be using '24h' in bin function along with 'any' aggregation because if we use it then we would see only one occurrence of output in 24hours of timespan and that doesn't help you to find out event occurrence count using the above provided query having 'any' for aggregation. Instead you may use 'count' aggregation instead of 'any' if you want to have '24h' in bin function. Then this query would look something like shown below.
Event
| where Source == "EventLog" and EventID == 6008
| where TimeGenerated > ago(24h)
| summarize AggregatedValue= count(EventID) by Computer, bin(TimeGenerated, 24h)
The output of this query would look something like shown below where aaaaaaa is considered as a VM name, x is considered as a particular time, y and z are considered as some numbers.
One other note is, all the above mentioned queries and outputs are in the context of setting up an alert based on an aggregated value i.e., setting up an alert when opting 'metric measurement' under alert logic based on section. In other words, aggregatedvalue column is expected in alert query when you opt 'metric measurement' under alert logic based on section. But when you say 'you get a count of the events' that means If i am not wrong, may be you are opting 'number of results' under alert logic based on section, which would not required any aggregation column in the query.
Hope this clarifies!! Cheers!!

Create an alert on calling a third party API using Azure Application Insights

I've enabled application insights on an Azure WebApp that I created. My WebApp is calling a third party API which runs on a quota. I am only allowed 100k calls per month.
I need to track those API calls so that I can create an alert when the number of calls has reached 50%, then another alert 75%.
I am using TrackEvent every time the call is made and the event in the AppInsights dashboard does increment. But I can't seem to create an alert when a certain number of calls is made. I can't see it from the list of 'events' dropdown.
Also in addition, one other requirement that I need is to create an alerts when the number of calls to the goes over 10 per minutes.
Is TrackEvent the right method to use for these requirements?
I did something like this ...
var telemetryEventClient = new Microsoft.ApplicationInsights.TelemetryClient(new Microsoft.ApplicationInsights.Extensibility.TelemetryConfiguration() { InstrumentationKey = "Instrumentation Key" });
telemetryEventClient.Context.Operation.Name = "MyAPIProvider";
var properties = new Dictionary<string, string>
{
{ "Source", "WebAppToAPI" }
};
var metrics = new Dictionary<string, double>
{
{ "CallingAPIMetric", 1 }
};
telemetryEventClient.TrackEvent("CallingAPI", properties, metrics);
but when I looked at setting up the alert and placed a threshold of 50000 (for testing, I just put 5), I never reach that as the event count is always 1. Am I approaching this the right way?
The alert you're trying to define always looks at the value you supply in your custom event - not the amount of events you're firing.
You can create an automated flow to query your events and send you an email whenever the query result passes some threshold.
The Application Insights Connector which works both for Flow and Microsoft Logic Apps was created just for that, and can be defined on any query result from any document type (event, metric or even traces).
Step-by-step documentation on how to create your own flow are here.
As for your query - you need a simple analytics query like this:
customEvents
| where timestamp > ago(1h) // or any time range you need
| where name == "CallingAPI"
| count

Resources