Trigger an alert in GCP Monitoring with more than a certain number of tasks in a Cloud tasks queue and with a retries value > 0 - google-cloud-monitoring

Is there a way to create an alert policy in GCP Monitoring that triggers an incident if there are more than 50 tasks in a Cloud Tasks queue with a retries value greater than 0?

To answer your question, it's not possible to combine each metric to work with each other as you want it. Each condition is separate from each other as metric-based alerting policies are used to track metric data collected by Cloud Monitoring. Cloud Tasks is also currently not a monitored resource for custom metrics.
Cloud Tasks currently have the following metrics relevant to your use case:
Queue depth
Task attempt count
You can set multiple conditions per policy and each condition can relate to a Cloud Tasks metric. What will happen is that an incident will be created if there are more than 50 tasks on a queue, or/and if one of your tasks have a retry higher than 0.
For more information, feel free to check Introduction to Alerting.

Related

Azure Function running with batchsize of 1 in consumption mode still running in parallel

Per this documentation, I am using the Azure Function consumption plan and am trying to limit the parallelism of one of the queue-triggered functions so that only one instance runs at a time:
{
"queues": {
"batchSize": 1
}
}
The queue is part of a Microsoft Storage Account, and isn't a service bus.
However, my problem is that the function is still being run in parallel if there are multiple items on the queue at once. I read in the fine print of the documentation above:
If you want to avoid parallel execution for messages received on one queue, you can set batchSize to 1. However, this setting eliminates concurrency only so long as your function app runs on a single virtual machine (VM). If the function app scales out to multiple VMs, each VM could run one instance of each queue-triggered function.
Since I am using a consumption plan, how do I know if the function app is running on a single VM or on multiple? How can I successfully limit this function's batch size to one?
In consumption plan, a single function app only scales up to a maximum of 200 instances. A single instance may process more than one message or request at a time though, so there isn't a set limit on number of concurrent executions.
Also, when you're using a Consumption plan, instances of the Azure Functions host are dynamically added and removed based on the number of incoming events.
As you want to limit the parallelism of one of the queue-triggered functions, so I suggest that you could use Azure App Service Plan to achieve it.
For more details, you could refer to this article.

How to enable alerts on Azure runbook?

Is there a way to monitor the number of times a runbook has been called and then report on it (send email, text)? When I try to create an alert rule I only see an option for activity log not metrics. The runbook is getting called from event grid via webhook.
You can use automation runbooks with three alert types:
Classic metric alerts - Sends a notification when any platform-level metric meets a specific condition. For example, when the value for CPU % on a VM is greater than 90 for the past 5 minutes.
Activity log alerts
Near real-time metric alerts - Sends a notification faster than metric alerts when one or more platform-level metrics meet specified conditions. For example, when the value for CPU % on a VM is greater than 90, and the value for Network In is greater than 500 MB for the past 5 minutes.
When an alert calls a runbook, the actual call is an HTTP POST request to the webhook. The body of the POST request contains a JSON-formated object that has useful properties that are related to the alert.
This Microsoft documentation link might help for metrics alerts for runbook :
https://learn.microsoft.com/en-us/azure/automation/automation-create-alert-triggered-runbook
You can send your Azure Automation runbook status data to Log Analytics. From there, you can alert on the different states. This documentation should help you with this process: https://azure.microsoft.com/en-us/updates/send-your-runbook-job-status-and-job-streams-from-automation-to-log-analytics-oms/
There are multiple answers to this question although none of them are a perfect solution. What I ended up doing was putting a logic app in front of the runbook that then calls the runbook. This allows me to alert on the metrics of my logic app.
We have just added support for Monitoring runbooks and having Alerts on it. Please go to the Alerts experience in Azure Monitor where you should be able to choose an Automation Account and then a Runbook as a dimension and then perform actions based on number of jobs of the Runbook etc.

Sending emails through Amazon SES with Azure Functions

The Problem:
So we are building a newsletter system for our app that must have a capacity to send 20k-40k emails up to several times a day.
Tools of Preference:
Amazon SES - for pricing and scalability
Azure Functions - for serverless compute to send emails
Limitations of Amazon SES:
Amazon SES Throttling having Max Send Rate - Amazon SES throttles sending via their services by imposing a max send rate. Right now, being out of the sandbox SES environment, our capacity is 14 emails/sec with 50K daily emails cap. But this limit can be increased via a support ticket.
Limitations of Azure Functions:
On a Consumption Plan, there's no way to limit the scale as to how many instances of your Azure Function execute. Currently the scaling is handled internally by Azure, and thus the function can execute between just a few to hundreds of instances.
From reading other post on Azure Functions, there seems to be "warm-up" period for Azure Functions, meaning the function may not execute as soon as it is triggered via one of the documented triggers.
Limitations of Azure Functions with SES:
The obvious problem would be Amazon SES throttling sending emails from Azure functions because the scaled execution of Azure Function that sends out an email will be much higher than allowed send rate for SES.
Due to "warm-up" period of Azure Function messages may end up being piled up in a queue before Azure Function actually starts processing them at a scale and sending out the email, thus there's a very high probability of hitting that send/rate limit.
Question:
How can we utilize sending emails via Azure Functions while still being under X emails/second limit of SES? Is there a way to limit how many times an Azure Function can execute per time frame? So like let's says we don't want more than 30 instances of Azure Function running per/second?
Other thoughts:
Amazon SES might not like continuous throttling of SES for a customer if the customer's implementation is constantly hitting up that throttling limit. Amazon SES folks, can you please comment?
Azure Functions - as per documentation, the scaling of Azure Functions on a Consumption Plan is handled internally. But isn't there a way to put a manual "cap" on scaling? This seems like such a common requirement from a customer's point of view. The problem is not that Azure Functions can't handle the load, the problem is that other components of the system that interface with Azure Functions can't handle the load at the massive scale at which Azure Functions can handle it.
Thank you for your help.
If I understand your problem correctly, the easiest method is a custom queue throttling solution.
Basically your AF just retrieve the calls for all the mailing requests, and then queue them into a queue system (say ServiceBus / EventHub / IoTHub) then you can have another Azure function that runs by x minute interval, which will pull a maximum of y messages and push it to SES. Your control point becomes that clock function, and since the queue system will help you to ensure you know your message delivery status (has sent to SES yet) and you can pop the queue once done, it would allow you to ensure the job is eventually processed.
You should be able to set the maxConcurrentCalls to 1 within the host.json file for the function; this will ensure that only 1 function execution is occurring at any given time and should throttle your processing rate to something more agreeable from AWS perspective in terms of sends per second:
host.json
{
// The unique ID for this job host. Can be a lower case GUID
// with dashes removed. When running in Azure Functions, the id can be omitted, and one gets generated automatically.
"id": "9f4ea53c5136457d883d685e57164f08",
// Configuration settings for 'serviceBus' triggers. (Optional)
"serviceBus": {
// The maximum number of concurrent calls to the callback the message
// pump should initiate. The default is 16.
"maxConcurrentCalls": 1,
...

Azure Eventhub alerting system

We have an IoT service running on Azure which produces a lot of events. We need to build a new feature which allows our end-user to configure alerts based on system events. It allows the user to pick an event and configure an action (e-mail, webhook, etc) to be executed when such an event occurs. We're evaluating Azure Eventhub and possibly Azure Stream Analtyics as candidates for the job.
The problem we face is: We think we will get a lot of stream analytics jobs running. When for example we have 3000 customers each configuring 3 alerts we need to run 9000 stream analtyics jobs which select specific events from the eventhub pushing it in a queue which does the alert processing. This will not only be a tough maintenance job, but I think it not a really cost effective solution.
Any thoughts on this or better solutions?
I am assuming events from multiple customers go to a fixed small set of event hubs and actions go to a fixed small set of queues.
You can design it such that a single Azure stream analytics job handles processing for multiple customers. Reference Data (lets call it CustomerToAlertLookup) can be used to decide which customer has configured which alert, and event stream can be joined with CustomerToAlertLookup to decide if there should be an alert or not.
Number of Azure stream analytics jobs required would be a factor of number of outputs and manageability preferences.
I can attempt to write a sample query if you add a hypothetical but more concrete scenario.

Azure Queues: Enqueue a message periodically from only one worker role instance

I want to do a certain task periodically (per day) on our Web/Worker role. I have multiple instances in my Cloud Service, and I want only one of these instances to do this task per day (for example Instance0 can do it one day, next day it could be Instance1 doing the work, but 0 and 1 will not try to do the same work during the same day/period)
Azure queues seem to be a great way to achieve this because by design only one instance will dequeue the message (assuming it deletes it after doing the work).
What I am having trouble with is figuring out a way to put only one copy of this message in the queue per day. The only way I have figured to do this is by enqueuing a message every day from Azure scheduler jobs.
My problem with Azure scheduler is the fact that I need to create a job for every single storage account I have across all of my deployments.
Is there a way to do this from within the cloud service, without taking the scheduler dependency?
If you don't want to have Scheduler dependency, consider using Blob leases as a semaphore of sorts. http://justazure.com/azure-blob-storage-part-8-blob-leases/
At a certain time of the day, have your worker instances compete to get a lease on some central storage blob. Whoever gets the lease, prevents other instances from getting that lease and can queue up messages into the queue.
Having said that, why are you afraid of Scheduler dependency? Have it kick off a single job that will queue up a "start work" message. Have your instances monitor that queue. Whoever picks up that message, can then run thru all of your storage accounts and queue up individual storage-work messages for all instances to pickup.
If you know you need to perform one job a day I don't understand why you need to use a queue - you could just have a scheduled job run once per day. If the job only needs to run if a certain condition is met I would just build this logic into your scheduled task - maybe by setting a property in a Table or something like that.

Resources