All,
I'm in the process of learning Databricks. If there is a runaway process that has been running a Job Cluster, I would like to get notified/alerted and if possible event terminated. In SQLServer, this is normally done with the help of Query Governor.
IS there something comparable in Azure Databricks that I put to use?
Thanks,
grajee
Databricks jobs (or new name - workflows) support sending the email alerts on jobs start and/or termination (successful or errors only) - just use UI or API to configure these notifications. Also you can configure such notifications on the level of individual tasks inside the workflow/job
You can add one or more email addresses to notify when runs of this job begin, complete, or fail:
Click Edit alerts.
Click Add.
Enter the email address to notify and click the check box for each alert type to send to that address.
To enter another email address for notification, click Add.
If you do not want to receive alerts for skipped job runs, click the check box.
Click Confirm.
Related
I have a Functions app where I've configured signal logic to send me an alert whenever a failure greater than or equal to one has occurred in my application. I have been getting emails everyday saying my Azure Monitor alert was triggered followed by an email later saying that the failure was resolved. I know that my app didn't fail because I checked in Application Insights. For instance, I did not have a failure today, but did have a failures the prior 2 days:
However, I did receive a failure email today. If I go to configure the signal logic where I set a static threshold of failure count greater than or equal to 1 it shows this:
Why is it showing a failure for today, when I know that isn't true from the Application Insights logs? Also, if I change the signal logic to look at total failures instead of count of failures, it looks correct:
I've decided to use the total failures metric instead, but it seems that the count functionality is broken.
Edit:
Additional screenshot:
I suggest you can use Custom log search as the signal if you have already connected your function app with Application insights(I'd like to use this kind of signal, and don't see such behavior like yours).
The steps as below:
Step 1: For signal, please select Custom log search. The screenshot is as below:
Step 2: When the azure function times out, it will throw an error and the error type is Microsoft.Azure.WebJobs.Host.FunctionTimeoutException, so you can use the query below to check if it times out or not:
exceptions
| where type == "Microsoft.Azure.WebJobs.Host.FunctionTimeoutException"
Put the above query in the "Search query" field, and configure other settings as per your need. The screenshot is as below:
Then configure other settings like action group etc. Please let me know if you still have such issue.
One thing should be noted: Some kinds of triggers support retry logic, like blogtrigger. So if it reties, you can also receive the alert email. But you can disable the retry logic as per this doc.
I've been trying to find the best way to trigger an alert when an ADF task (i.e. CopyActivity or Stored Procedure Task) has been running for more than N hours, I wanted to use the Azure Monitor as it is one of the recommended notification services in Azure, however I have not been able to find a "Running" criteria, hence I had to play with the available criteria (Succeeded and Failed) and check this every N hours, however this is still not perfect as I don't know when the process started and we may run the process manually multiple times a day, is there any way you would recommend doing this? like a event-based notification that listens to some time variable and as soon as it is greater than the threshold triggers an email notification?
is there any way you would recommend doing this? like a event-based
notification that listens to some time variable and as soon as it is
greater than the threshold triggers an email notification?
Based on your requirements, I suggest you using Azure Data Factory SDKs to monitor your pipelines and activities.
You could create a time trigger Azure Function which is triggered every N hours. In that trigger function :
You could list all running activities in data factory account.
Then loop them to monitor the DurationInMs Property in ActivityRun Class to check if any activity has been running for more than N hours and it's still In-Progress status.
Finally, send the email or kill the activity or do whatever you want.
I would suggest simple solution:
Kusto query for listing all pipeline runs where status is "Queued" and joining it on CorrelationId with those that we are not interested in - typically "Succeeded", "Failed". Join flavor leftanti does the job by "Returning all the records from the left side that don't have matches from the right." (as specified in MS documentation).
Next step would be to set your desired timeout value - it is 30m in the example code below.
Finally, you can configure Alert rule based on this query and get your email notification, or whatever you need.
ADFPipelineRun
| where Status == "Queued"
| join kind=leftanti ( ADFPipelineRun
| where Status in ("Failed", "Succeeded") )
on CorrelationId
| where Start < ago(30m)
I tested this only briefly, maybe there is something missing. I have an idea about adding other statuses to be removed from result - like "Cancelled".
I have timer-triggered Azure functions running in production, but now I want to be notified if the function fails.
In my case, access to various connected services can cause crashes, and there are many to troubleshoot. The crash is the type of error I need notification for.
When the function does fail, the log entry indicates failure, so I wonder if there is a hook in the system that would allow me to cause the system to generate a notification.
I know that blob and queue bindings, for instance, support the creation of poison queue entries, but timer trigger binding doesn't say anything about any trigger outputs of that nature.
I see that functions can pass their $return status as input to other functions, but that operation is not explained in depth in the docs. Also, in that case, I need to write another function to process the error status, and I was looking for something built-in.
I Have inquired with #AzureSupport on this, but their answer had nothing to do with Azure Functions, instead referring me to DLL notification hooks, then recommending I file on uservoice.
I'm sure there must be people here who have implemented some sort of error status notification. I prefer a solution that doesn't require code.
The recommended way to monitor and alert on failures is to use AppInsights which integrates fully with Azure Functions now
https://blogs.msdn.microsoft.com/appserviceteam/2017/04/06/azure-functions-application-insights/
Since all the logs are available in AppInsights it's easy to monitor for failures and setup alerts based on your own criteria.
However, if you only care about alerting and not things like monitoring etc, you could use Azure Monitor instead: https://learn.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-get-started
When the function does fail, the log entry indicates failure, so I wonder if there is a hook in the system that would allow me to cause the system to generate a notification.
...
I prefer a solution that doesn't require code.
This is a zero-code solution:
I poked #AzureFunctions once before on this topic, and a suggested response was to use Application Insights. It can handle the alerts upon failure and also can use webhooks.
See the Azure Functions App-Insights documentation on how to link your function app to App Insights. Then set up any alerts you want.
Unfortunately this hook doesn't exist.
Can you switch from a timer trigger to a queue trigger?
You can get retries (if you want them), and after the specified number of attempts the message is sent to a poison queue.
To schedule executions you can add queue messages with a visibility timeout to match your schedule.
In order to get alerts on failure you have two options:
A timer trigger than scans the execution logs (via SFTP) for failures.
Wrap the whole function in a try/catch block and in the catch block write a few lines to send you an email with the error details.
Hope this helps.
No code:
Go to your azure cloud account
From the menu select Monitor
Then select Add New Rule
Then Select your condition, action and add the alert details.
I am wondering if there is a way to hook in some notifications for jobs submitted in Dataproc. We are planning to use Dataproc to run a streaming application 24/7. But Dataproc doesnt seem to have a way to notify for failed jobs.
Just wondering if Google StackDriver can be used by any means.
Thanks
Suren
Sure, StackDriver can be used to set an alert policy on a defined log-metric.
For example, you can set a Metric Absence policy which will monitor for successful job completion and alert if it's missing for a defined period of time.
Go to Logging in your console and set a filter:
resource.type="cloud_dataproc_cluster"
jsonPayload.message:"completed with exit code 0"
Click on Create Metric, after filling the details you'll be redirected to the log-metrics page where you'll be able to create an alert from the metric
As noted in the answer above, log-based metrics can be coerced to provide the OP required functionality. But, metric absence for long-running jobs would imply you would have to wait for longer than a guess at the longest job running time (and you still might get an alert if the job takes a bit longer but is not failing). What 'we' really want is a way of monitoring and alerting on job status failed, or, service completion message indicating failure (like your example), such that we are alerted immediately. Yes, you can define a Stackdriver log-based metric, looking for specific strings or values indicating failure, and this 'works', but metrics are measures that are counted, for example 'how many jobs failed', and require inconvenient workarounds to turn alert-from-metric into a simple 'this job failed' alert. To make this work, for example, the alert filters on a metric and also needs to specify a mean aggregator over an interval to fire an alert. Nasty :(
I need to execute a long running webjob on certain schedules or on-demand with some parameters that need to be passed. I had it in a way where the scheduled webjob would put a message on the queue with the parameters and a queue message triggered job would take over - OR - some user interaction would put the same message on the queue with the parameters and the triggered job would take over. However for some reason the triggered function never-finishes - and right now i cannot see any exceptions being displayed in the dashboard outputs (see Time limit on Azure Webjobs triggered by Queue)
I m looking into whether I can execute my triggered webjob as an On-demand webjob and pass the parameters to it? Is there anyway to call an on-demand web job from a scheduled web job and pass it some command line parameters?
Thanks for your help!
QueueTriggered WebJob functions work very well when configured properly. Please see my answer on the other question which points to documentation resources on how to set your WebJobs SDK Continuous host up properly.
Queue messaging is the correct pattern for you to be using for this scenario. It allows you to pass arbitrary data along to your job, and will also allow you to scale out to multiple instances as needed when your load increases.
You can use the WebJobs Dashboard to invoke your job function directly (see "Run Function" button below) - you can specify the queue message input directly in the Dashboard as a string. This allows you to invoke the function directly as needed with whatever inputs you want, in addition to allowing the function to continue to respond to qeueue messages actually added to the queue.