Azure functions with Azure Service bus trigger - Metrics for service bus - azure

I am using Azure Functions V2 with Azure service bus trigger which is setup to be triggered when there is a message on a ASB subscription. I am trying to see if there is any metrics available on the message receive latency so that i can plot it in my dashboard. I am using ASB standard SKU so when a noisy neighbor issue happens, at least i can identify it as so.
I suspect the code under trigger attribute takes care of retrieving the message. So is there a way i can instrument this to visualize the avg latency? Basically the function provides metrics just for the execution. What i am looking for is end to end latency metrics.

If you are using application insights for logging, the function runtime logs a line like below for each message:
2019-08-04 21:09:06.026 Trigger Details: MessageId: <Guid>, DeliveryCount: 1, EnqueuedTime: 8/4/2019 9:09:05 PM, LockedUntil: 8/4/2019 9:09:35 PM
If you parse this log line for the EnqueuedTime, and also the logging time (the first timestamp), then we can calculate the send-to-receive latency.
I know this is not the ideal way. I wish function runtime would log events that can easily be looked at in Azure Monitor. Or if it passed in the EnqueuedTime with the input trigger, then we can log it ourselves as a custom event in Application Insights.

Related

Azure Function App Execution count and IoT Hub triggers

I am using Consumption Plan Function App.
I have IoT Devices which are communicating with the IoT Hub. The IoT Hub triggers an Azure Function from my Function App.
The image below was obtained from the Azure Function App setting, and it shows the IoT Hub-triggered function has an execution count of over 250.
Does this mean that there are 250 instances of Azure Function App? Normal?
If was to introduce batch processing for IoT Hub messages, what would be classified as batch of messages? Do they need to have a timestamp within a certain limit?
Your Consumption Plan Function App won't scale over 200 instances (100 for Linux), that's a hard limit. You can use the Function's metrics to check the amount of instances running. Select a metric and split it by 'instance'
Edit with info from comments:
It's possible that you're handling all those requests from 1 instance. The Function will scale out automatically. How that works is described in the docs; however, no exact logic for scaling is described. it's not a hard link to the number of messages available on the Event Hub.

How to log messages to Azure Event Hub from Azure API Management Service

In one of the recent project, I need to add messages(>200kb) to Azure Event Hub through an endpoint exposed by Azure API Management Service. Then, the Stream Analytics job reads this message from Event Hub and writes it to the respective tables in SQL Server.
I was using "log-to-eventhub" policy to log the messages to event hub. But it has a size limitation associated with it, which is 200kb.
What would be the best approach to overcome this size limitation or should I consider a different way to log the payload to Event Hub?
Any help is much appreciated.
Here is a limit about this described in official docs.
The maximum supported message size that can be sent to an event hub
from this API Management policy is 200 kilobytes (KB). If a message
that is sent to an event hub is larger than 200 KB, it will be
automatically truncated, and the truncated message will be transferred
to event hubs.
You could consider using Azure Event Hubs output binding for Azure Functions.
About How Function consume Event Hubs events, you could try using multiple parallel Functions instances under consumption plan.

Can I create monitoring alerts for azure event grid domain topics?

I would like to setup following alerts for domain topics when
Delivery Failed Events (at domain) exceed x in y amount of time
Delivery Failed Events (at domain topic 1) exceed x in y amount of time
Delivery Failed Events (at domain topic 2) exceed x in y amount of time
The reason why I want the domain topic granularity is that topic 1 customer may be fine but topic 2 customer may be having issues. So customer (for topic 2) is down currently and is in extended outage period (that may last more than a day). So I want to be able to disable the alert for topic 2 only and would like to enable it once customer (for topic 2) is up and running again. Meanwhile, I want to have all other topic level alerts enabled.
I did not see a way to configure the above in the portal. Is it possible (or not) to configure above at this time in any other way? If so, can please provide the direction on how to achieve it?
The AEG provides durable delivery for each event message at least once to each subscriber based on its subscription. More details can be found in the docs.
In the case, when the AEG can not successfully deliver a message after retrying, the dead-lettering feature (configured for each subscriber) can be used for notification and/or analyzing process via a storage eventing, where a dead-letter message is stored.
On the publisher side, the publisher received a standard Http response from the event domain endpoint immediately after its posting, see more details in the docs.
The current version of the AEG is not integrated to the Diagnostic settings (for instance, like it is done for Event Hubs) which will be enabled to push the metrics and/or logs to the stream pipeline for their analyzing process.
However, as a workaround for that, the Azure Monitoring REST API can help you.
Using Lists the metrics values for event domain, we can obtained the metrics for topics such as Publish Succeeded, Publish Failed and Unmatched.
the following is an example of the REST Get:
https://management.azure.com/subscriptions/{myId}/resourceGroups/{myRG}/providers/Microsoft.EventGrid/domains/{myDomain}/providers/Microsoft.Insights/metrics?api-version=2018-01-01&interval=PT1M&aggregation=none&metricnames=PublishSuccessCount,PublishFailCount,PublishSuccessLatencyInMs,DroppedEventCount
Based on the polling technique, you can push the event domain metrics values to the stream pipeline for their analyzing, monitoring, alerting, etc. using an Azure Stream Analytics job. Your management requirements (for instance, publisher_topic1 is disabled, etc.) can be referenced to the input stream job.
Note, that the event domain metrics didn't give a topic granularity and also there is no an activity event log at that level. I do recommend to use the AEG feedback page.

Diagnosing failures in azure event grid?

I did not find much in the way of troubleshooting events lost scenario in the azure event grid.
Hence I am asking question in relation to following scenario:
Our code publishes the events to the domain.
The events are delivered to the configured web hook in the subscription.
This works for a while.
The consumer (who owns the web hook endpoint) complains that he is not receiving some events but most are coming through.
We look in the configured dead-letter queue and find that there are no events. It has been more than a day and hence all retries are already exhausted.
Hence we assume that all events are being delivered because there are no failed delivery events in the metrics.
We also make sure that we indeed submitted these mysterious events to the grid.
But consumer insists about the problem and proves that there is nothing wrong with his side.
Now we need to figure out if some of these events are being swallowed by the event grid.
How do I go about troubleshooting this scenario?
The current version of the AEG is not integrated for Diagnostic settings feature which can be help very well for streaming the metrics and logs.
For your scenario which is based on the Event Domains (still in the public preview, see limits) can help an Azure Monitoring REST API, to see all metrics in the specific your Event Domain.
The valid metrics are:
PublishSuccessCount,PublishFailCount,PublishSuccessLatencyInMs,MatchedEventCount,DeliveryAttemptFailCount,DeliverySuccessCount,DestinationProcessingDurationInMs,DroppedEventCount,DeadLetteredCount
The following example is a REST GET request to obtain all metrics values within your event domain for specific timespan and interval:
https://management.azure.com/subscriptions/{mySubId}/resourceGroups/{myRG}/providers/Microsoft.EventGrid/domains/{myDomain}/providers/Microsoft.Insights/metrics?api-version=2018-01-01&interval=PT1H&aggregation=count,total&timespan=2019-02-06T07:58:12Z/2019-02-07T08:58:12Z&metricnames=PublishSuccessCount,PublishFailCount,PublishSuccessLatencyInMs,MatchedEventCount,DeliveryAttemptFailCount,DeliverySuccessCount,DestinationProcessingDurationInMs,DroppedEventCount,DeadLetteredCount
Based on the response values, you can see metrics of the AEG behavior from the publisher side and the event delivery to the subscriber. For your production version, I do recommend to use a polling technique to obtain all metrics from AEG and pushing them to the Event Hub for a streaming analyzing, alerting, etc. Based on the query parameters (such as timespan, interval, etc.), it can be close to the real-time. When the Diagnostic settings will be supported by AEG, than this polling and publishing all metrics is obsoleted and small modification at the analyzing stream job can be continued.
The other point is to extend your eventing model for auditing part. I do recommend the following:
Add a domain scope subscription to capture all events in the event domain and push them to the Event Hub for streaming purposes. Note, that any published event within that event domain should be in this published stream pipeline.
Add a storage subscription for dead-letter messages and push them to the same Event Hub for streaming purposes.
(optional) Add the Diagnostic settings (some metrics) of the dead-letter storage to the same Event Hub for streaming purposes. Note, that the dead-letter message is dropped after 4 hours trying to store it in the blob container. There is no any log message for that failed process, just only metric counter.
For the customer side, I do recommend that each subscriber will create a log message (aeg headers + event message) for auditing and troubleshooting purposes. It should be stored in the blob container or locally and then uploaded, etc. The point is, that this reference can be very useful for analyzing stream job to quickly figure out where is the problem.
In addition to your eventing model, your publisher should periodically (for instance once per hour) probes the event domain endpoint and also should send a probe event message to the probe topic for test purposes. The event subscription for that probe topic will configure a deadlettering option. The subscriber webhook handler should be always failed with a error code = HttpStatusCode.BadRequest such as no retrying action. Note, that there is a 300 seconds delay time, when the deadletter message will be stored in the storage. In other words, after probe event + 5 minutes, the deadlettering message should be in the stream pipeline. This probe scenario in your eventing model will probe a functionality of the AEG from the publisher and delivery point of the view.
The above described solution is shown in the following screen snippet:

Monitoring Azure Event Hub

I have been researching on Microsoft Azure Event Hubs. My goal is to figure out a way to provide automatic scalability. This is a experimental work and I am really only trying to know what can I do with Azure event hubs. I do not have access to the Azure platform to test test anything :( .
Well, so far, I found that through REST API and Service Bus Powershell I can add Throughput Units (to increase performance - I am relying on this: Scale Azure Service Bus through Powershell or API) and increase or decrease Event's Expiration time (which might influence capacity - https://msdn.microsoft.com/en-us/library/azure/dn790675.aspx).
The problem is that, presuming that the previous techniques work and I am able to scale event hubs' performance automatically, I still need a way to know when to trigger scalability mechanisms. To know when and how to trigger scalability, I need to work on some functions that rely upon the event hub's metrics (or a way to monitoring it). The problem is that I can't really find any metrics. The only thing that I find is this: https://azure.microsoft.com/en-us/documentation/articles/cloud-services-how-to-monitor/ - Which actually does not solve my problem because although it may present some interesting metrics, it does not serve the purposes of my "application" (which will come if I can prove that I can successfully scale Azure automatically); and this Azure service bus statistics/Monitoring - which's links are not working.
Surely I can find more information about Service Bus Explorer, and surely it may provide some interesting insights over the event hub metrics, I am just wondering if there is something like this: https://github.com/HBOCodeLabs/incubator-storm/blob/master/STORM-UI-REST-API.md that allow me to access some kind of metrics, rather than creating my own metrics
Thanks in advance
Best regards
You can retrieve metrics about Event Hubs (an Event Hub is a Service Bus Entity) using the Service Bus Entity Metrics REST APIs(https://msdn.microsoft.com/library/azure/dn163589.aspx). Using this you can retrieve the same metrics displayed in the portal such as:
Number of incoming messages
Incoming throughput
Outgoing throughput
These should help you determine when you need to scale your application up or down.
This video is useful for getting started https://channel9.msdn.com/Blogs/Subscribe/Service-Bus-Namespace-Management-and-Analytics
If 3rd party services are an option, look into CloudMonix # http://cloudmonix.com
It can monitor Event Hubs (among gazillion other Azure-related things) and execute Azure Automation runbooks (among gazillion other actions) as a reaction to load conditions/throughout of a whole hub or individual partitions and optionally based on any other metrics in your environment.
Your Azure Automation runbooks could have the logic to execute increases in your EH's throughout, etc.
Disclaimer: I'm affiliated with the product.
HTH
Service Bus Explorer is great. I actually use this.
ServiceBus Explorer

Resources