Im trying to setup health check alerts for critical functionality across my site. So for things like registrations, payments and critical emails I have started logging custom event telemetry using the telemetry client like so:
var tc = new TelemetryClient();
tc.TrackEvent(emailType.ToString());
This is currently working great and im able to create an Application Insights analytics dashboard out of this data, which form the basis of my alerts.
From the portal i have now started creating alerts on which the criteria is a custom log search (Azure Portal > Application Insights > Alerts > Add New Rule > Add Criteria), shown below:
The problem is the period has a max length of 24 hours which means for an event that fires in-frequently (lets say once over the cource of a week). We would get false alerts on a daily basis.
Question is how can I setup alerting in application insights for events like these?
I prefer if the solution does not require additional webjobs or code crunching numbers to figure out if thresholds are not met, as i feel an alerting system should have as little moving parts as possible.
Update 1
After having contacted Microsoft's alert feedback group they have extended the period dropdown to 48 Hours, however this is still inadequate for my usecase.
I have tried seeking alternative tools like Grafana (with and app insights plugins). However sadly that particular plugin does not support alerting (whilst Grafana does).
Related
I wanted to monitor Azure Logic Apps with the help of Azure Monitor alerts. In alerts, I came across a metric Run Throttled events which is showing some numbers in recent days. But I couldn't find any events anywhere to resolve the issue. Is it possible view the actual run throttled events in Azure Portal?
You will need to setup diagnostic logging for Logic Apps, see here.
When you are done with the setup and initial run through of logs and if interested you want to look at more advanced queries via this logs data then go here.
Specifically on throttling you need to see this. Also take a look at limits set for Logic Apps from here as well.
There's an awful lot of disjointed documentation on monitoring network/resources in Azure. What I'm looking for is which pieces are needed to get information from VMs, NVA firewalls, azure load balancers, and other network resources and network connectivity into a single pain of glass in Azure. Only concerned about Azure, not on-prem for now.
I've come across azure monitor, log analytics work spaces, event hub, vm extensions, network watcher, insights, etc...but I'm not sure which are required and which are not. One doc leads to the next and I end up with 30 tabs open. I'll also need to be able to push logs to other security devices such as a SIEM.
Does anyone know of a deployment guide that wraps this all up in a more logical fashion? Does anyone have any feedback on which pieces from azure (not 3rd parties) are required at a minimum to accomplish a single pane of glass to view my Azure environment holistically?
General overview of observability in Azure
Likely, the thing you're looking for is Azure Monitor. It's an umbrella term for everything observability related inside Azure.
To store Metrics and Logs you need Log Analytics: it can query data with kusto query language, visualize results, define Alerts on queries.
Alerts is quite a complex beast, as it is spread across the entire cloud. Two types that I use the most:
log-analytics alert (which I mentioned above)
Alerts tab, which is available at every Azure component view. for example, open resource group, and scroll down to Monitoring section
Each component also has a subset of built-in metrics. Likely, you noticed that many azure components on the Overview view display some charts. For example, Azure Storage Account displays Total egress, Total ingress, and other line-charts. When you click on these charts you can customize them. These metrics and charts are free to use.
Microsoft also has all-in-one observability solution for Azure Functions and Web Apps: Application Insights
Dashboards allows to join multiple charts into a single view and share it with others.
If you care about security, Azure proposes Azure Security Center
Deployment/management strategy
I suggest to start with:
Create Log Analytics Workspace, which is the storage for metrics and logs. The azure docs article explains how to design it: how many instances to use, how to rate-limit ingestion (it might be expensive if goes out of control), how to access it and so on.
To get Azure components logs, look for Diagnostic Settings tab at a component page at Azure portal, but not all components has it (sic!). I suggest
sending the most critical data to Log Analytics workspace to store them in a queryable format for 30 days (it's in free tier). This is needed for investigating current issues with your infrastructure
if you might need logs later than 30 days - send them to Storage Account
you mentioned SIEM integration - route required events to Event Hub and then process the stream according to your requirements
So, if you need long-term storage - you need to create Azure Storage Account.
If you need real-time analysis - you need to build a pipeline based on Azure Event Hub.
If you have Azure Functions and Web Apps - add Application Insights. According to my experience, I would suggest starting with a separate instance per each Azure Function resource or Service.
Create Alerts for each component separately. If you do it through UI - open component page at the portal and look for Alerts tab there. If you're automating the process (please do so as soon as possible), do not expect easy trip: I used ARM templates and terraform - in both cases, there are dozens of barely documented features.
Join related components core-metrics into Dashboards and share it with the team. This guide is a good starting point. Note, when you share the dashboard, it's also persisted as an azure resource in the subscription.
I am working on Azure web API which uses Log4Net Application Insight appender to track traces. We are planning to use the Application Insight SDK directly (TrackTrace(), TrackEvent()) in order to use built-in alerting features.
However it does seem like, Azure is not supporting trace or event alerts but only metric alerts. So we got some issue there.
If I go one step back, web API is invoked by number of Logic Apps runs in x time intervals. Logic App simply calls web API (business logic is here) and it log all information/managed exceptions.
The main requirement is to be proactive when an exception happens like sending a mail to technical inbox. Secondary requirement is notify sources, if it got any data issues.
Any suggestion on our approach please, what we can do more to fulfill our requirements?
Please refer to this App Insights Exception Alerts. It is possible to set alerts for rates of exceptions in a defined time period.
Set up Exception alerts
You can also invoke webhooks to perform additional actions when the alert is fired.
Thanks Sreejit for your suggestion.
To conclude the answer, what's the preferred engine to process alerts please, through application insight or log analytics?
application insight only support metric alerts only, so the application exceptions need to be flagged as custom metrics as opposed to traces.
[TelemetryClient --> TrackTrace() vs TrackMetric()]
If we go for log analytics, we can use custom events. Then use alert management solution in OMS workspace. Perhaps even use of a separate logic app to built the logic of sending alerts.
[TelemetryClient --> TrackEvent()]
I've recently been playing around with the Bing's image search api, however I have a concern I hope to resolve.
It is to do with the limit on the number of api requests that are allowed per month. After doing some reading it seems like if I were to exceed this limit, my Azure account would be billed depending on the number of api calls I have gone over my limit. Is it possible to set up some kind of alert through the Azure management portal that will stop the api from processing any more calls once a specific threshold has been passed?
If anyone has any experience using the Search api and can enlighten me, that would be great.
Try Metrics Monitoring. Go to the service within Azure Portal, Scroll Down to Monitoring -> Metrics and then click Add Metric Alert.
You can create an alert based on the number of successful calls or total calls and the alert can notify you via e-mail. Additionally, if you want to take action automatically after reaching the threshold, you can use Webhooks to make a call out to a web application or Azure Automation Runbook to automatically run PowerShell scripts or some code to prevent overuse. You can also use Logic Apps for that. Check the following link for further details and examples at the end of the page:
https://learn.microsoft.com/en-us/azure/monitoring-and-diagnostics/insights-webhooks-alerts
A customer has asked we start tracking user and administration actions on our website for security purposes.
e.g. AdminUser {id: 3} impersonated user {id: 5} on 2015-08-04T12:00:00
The thought was we would publish that data using the Application Insights SDK. Our customer would then be able to monitor these events through the Azure portal.
Given the data retention policies of Application Insights, we would enable Continuous Export of data to table storage if a forensic analysis needed to be conducted past the 30 day cut-off.
Are there any obvious red flags to using Application Insights to provide visibility into these security details?
No, that should just work.
Here is how you create events: https://azure.microsoft.com/en-us/documentation/articles/app-insights-api-custom-events-metrics/
Note that if you also want to add custom properties to events there is a limit on number of unique property names per application. Currently it is 200 but that may decrease in the future.