Why IotHub events are delayed when stored in Time Series Insights? - azure

I have a Time Series Insights Environment with an IoT Hub data source configured.
What I noticed is that there is a specific 20-30 seconds delay from sending an event to IoT Hub and seeing it stored in TSI.
After I found this, I attached a Function Trigger directly to the Iot Hub. What happened is that events were received immediately by the trigger, but TSI returned them 20-30 seconds later.
So, I have two questions:
Where does that delay come from?
Is there anything I can do about minimizing the delay?
Thanks!

There is an expected measurable delay of up to 1 minute before you will see it in TSI and you cannot dial that up/down. It's just how the service works.
Just in case you haven't already, also make sure you've configured your SKU and capacity to support your use cases.

Related

IoT EDGE Device Connection state monitoring

We have a business requirement to maintain Iot Edge devices Connected state in Digital Twins Instance. It should be near to real time, but short delays up to few minutes are acceptable.
I.e., In Digital Twins instance we have DT entity for each IoT Edge device, and it have property Online (true/false).
In production we will have up to few hundreds of devices in total.
We are looking for a good method of monitoring Edge devices connected state.
Our initial attempt was to subscribe an AZ Function for Event Grid Device Connected/Disconnected notifications in IoT Hub events.
After initial testing we found that Event Grid seems cannot be used as a single source. After more research we found following information:
https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-event-grid#limitations-for-device-connected-and-device-disconnected-events
IoT Hub does not report each individual device connect and disconnect, but rather publishes the current connection state taken at a periodic 60 second snapshot. Receiving either the same connection state event with different sequence numbers or different connection state events both mean that there was a change in the device connection state during the 60 second window.
And another one:
https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-troubleshoot-connectivity#mqtt-device-disconnect-behavior-with-azure-iot-sdks
Azure IoT device SDKs disconnect from IoT Hub and then reconnect when they renew SAS tokens over the MQTT (and MQTT over WebSockets) protocol….
…
If you're monitoring device connections with Event Hub, make sure you build in a way of filtering out the periodic disconnects due to SAS token renewal. For example, do not trigger actions based on disconnects as long as the disconnect event is followed by a connect event within a certain time span.
Next, after more search on the topic, we found the following question:
Best way to Fetch connectionState from 1000's of devices - Azure IoTHub
Accepted answer suggests using heartbeat pattern, however in official documentation it is clearly stated that it should not be used in production environment:
https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-devguide-identity-registry#device-heartbeat
And in the article describing heartbeat pattern there is a mention of “short expiry time pattern” but not much information given to detail it.
For complete picture, we also found the following article:
https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-how-to-order-connection-state-events
But it is based on Event Grid subscription and therefore will not provide accurate data.
Finally, after reading all of this, we have the following plan to address the problem:
We will have AZ Function subscribed for Event Grid Device Connected/Disconnected notifications.
If DeviceConnected event received, the function will check device connectivity immediately.
If DeviceDisconnected event received, the function will delay for 90 seconds, as we found DeviceConnected event usually come after ~60 seconds for a given device. And after the delay it will check the device connectivity.
Device Connectivity will be checked with Cloud to Device message send with acknowledgment as described here:
https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-csharp-csharp-c2d#receive-delivery-feedback
Concerns of the solution:
Complexity.
AZ function would need IoT HUB service Connection string.
Device disconnected event might be delayed up to few minutes.
Can anyone suggest better solution?
Thanks!
EDIT:
In our case, we do not use DeviceClient, but ModuleClient on the Edge devices, and modules does not support C2D messages, which is stated here:
https://learn.microsoft.com/en-us/azure/iot-edge/module-development?view=iotedge-2018-06&WT.mc_id=IoT-MVP-5004034#iot-hub-primitives
So we would need to use Direct Methods instead to test if the device is Online.

ServiceBus message delivery time reliable?

I'm working on creating an events system with Azure ServiceBus, I find events generally hits reliably at the scheduled time I had them set to run - so if event 'pop' is supposed to run at 12:30pm it generally would be delivered at that time to my reciever.
I wanted to know is there a guarantee that events are always fired within the scheduled time or is that more of a suggested time and the system can get clogged and backlogged causing longer queues to form?
There are quite a few differences between messages (which are handled with Service Bus) and events, as you can see in the article Choose between Azure messaging services - Event Grid, Event Hubs, and Service Bus.
An event is a lightweight notification of a condition or a state change. The publisher of the event has no expectation about how the event is handled. The consumer of the event decides what to do with the notification. Events can be discrete units or part of a series.
[...]
A message is raw data produced by a service to be consumed or stored elsewhere. The message contains the data that triggered the message pipeline.
It sounds like you need a reliable way to have a timer trigger execute on a specific time. Service Bus is not the correct service for that, since "the message enquing time does not mean that the message will be sent at that time. It will get enqueued, but the actual sending time depends on the queue's workload and its state." (see BrokeredMessage.ScheduledEnqueueTimeUtc Property).
For handling the triggering in a reliable way, you could use services like Logic Apps (if you want to create it low-code/no-code) or Azure Functions (for the Serverless solution with code).
If you're actually looking for events, consider Event Grid.

Azure Event Grid Function Trigger - Probation

We have a Azure setup with a Azure Event Grid Topic and to that we have a Azure Function Service with about 15 functions that are subscribing to the topic via different prefix filters. The Azure Function Service is set up as a consumption based resource and should be able to scale as it prefers.
Each subscription is set up to try deliveries for 10 times during maximum 4 hours befor dropping the event. So far so good and the setup is working as expected – most of the time.
In certain, for us unknown situations, it seems like the Event Grid Topic cannot deliver events to the different functions. What we can see is that our dead letter storage fill up with events that have not been delivered.
Now to my question
From the logs we can see the reason for various events not being delivered. The reason is most often Outcome: Probation. We can not find any information from Microsoft on what this actually means.
In addition, the Grid fails and adds the event to the dead letter log before both the timeout policy (4 hours) and the delivery attempts policy (10 retries) has exceeded. Some times the Function Service is idling and do not receive any events from the Grid.
Do any of you good people have ideas of how we can proceed with the troubleshooting for this? What has happened between the Grid and Funciton App when the error message Probation occurs? One thing that we have noticed is that the number of connections from the Grid to our function app is quite high in comparison to the number of events delivered.
There are not other incoming connections to the Function App besides the Event Grid.
Example of a dead letter message
[{
"id":"a40a1f02-5ec8-46c3-a349-aea6aaff646f",
"eventTime":"2020-06-02T17:45:09.9710145Z",
"eventType":"mitbalAdded",
"dataVersion":"1",
"metadataVersion":"1",
"topic":"/subscriptions/XXXXXXX/resourceGroups/XXXX_STAGING/providers/Microsoft.EventGrid/topics/XXXXXstaging",
"subject":"odl/type/mitbal/v1",
"deadLetterReason":"TimeToLiveExceeded",
"deliveryAttempts":6,
"lastDeliveryOutcome":"Probation",
"publishTime":"2020-06-02T17:45:10.1869491Z",
"lastDeliveryAttemptTime":"2020-06-02T19:30:10.5756332Z",
"data":"<?xml version=\"1.0\" encoding=\"utf-8\"?><Stock><Action>ADD</Action><Id>123456</Id><Store>123</Store><Shelf>1</Shelf></Stock>"
}]
Function Service Metrics
Blue = Connections (count)
Red = Function Executions (count)
White = Requests (count)
I'm not sure if you have figured the issue here, but here are some insights for others in a comparable situation.
Firstly, probation is the outcome when the destination is not healthy, for which Event Grid would still attempt deliveries.
Based on the graph, it looks like functions hit the 100 executions mark and then took a while to scale out for the next 100. You could get better results by tweaking the host.json settings depending on what each function execution does.
Including scale controller logs could shed more light into what is happening internally when scaling out.
Also, another option would be to send events into service bus or event hubs first and then have a function run from there.

Azure Function - Event Hub Trigger stopped

I've got an Azure Function app in production on an event hub trigger, it's low throughput with the function typically only being triggered once daily. It's running on an S1 plan at the moment and has a few other functions such as timer triggered and HTTP triggered.
It's been running fine but today it stopped being triggered by new messages until I restarted the app. All other functions were working just fine and responding to their associated triggers.
I've look through App Insights and there are no reported errors or issues, it's just not doing anything.
Has anyone else had this issue or know of what may be causing it?
First of all - is your App Service has Always On enabled?
Second thing - have you tried to test your trigger locally, so you can be sure, that there are no issues with your Event Hub?
Personally, I faced such issues when Event Host Processor implemented in EventHubTrigger was losing a lease because of additional processor introduced. It is also possible, that since it faces a low throughput, it lost a lease and for some reason was not able to renew it:
As an instance of EventProcessorHost starts it will acquire as many
leases as possible and begin reading events. As the leases draw near
expiration EventProcessorHost will attempt to renew them by placing a
reservation. If the lease is available for renewal the processor
continues reading, but if it is not the reader is closed and
CloseAsync is called - this is a good time to perform any final
cleanup for that partition.
https://blogs.msdn.microsoft.com/servicebus/2015/01/21/event-processor-host-best-practices-part-2/
Nonetheless, it is worth to contact the support to make sure there were no other issues.

Delay in Azure function triggering off IOThub

I have data going from my system to an azure iot. I timestamp the data packet when I send it.Then I have an azure function that is triggered by the iothub. In the azure function I get the message and get the timestamp and record how long it took the data to get to the function. I also have another program running on my system that listens for data on the iothub and records that time too.
So most of the time, the time in the azure function is in millisecs, but sometimes, I see a large time for the azure function to be triggered(I conclude it is this because the program that reads from the iot hub shows that the data reached the iot hub quickly and there was no delay).
Would anybody know the reasons for why azure function might be triggering late
Is this the same question that was asked here? https://github.com/Azure/Azure-Functions/issues/711
I'll copy/paste my answer for others to see:
Based on what I see in the logs and your description, I think the latency can be explained as being caused by a cold-start of your function app process. If a function app goes idle for approximately 20 minutes, then it is unloaded from memory and any subsequent trigger will initiate a cold start.
Basically, the following sequence of events takes place:
The function app goes idle and is unloaded (this happened about 5 minutes before the trigger you mentioned).
You send the new event.
The event eventually gets noticed by our scale controller, which polls for events on a 10 second interval.
Our scale controller initiates a cold-start of your function app. This can add a few more seconds depending on the content of your function app (it was about 6 seconds in this case).
So unfortunately this is a known behavior with the consumption plan. You can read up on this issue here: https://blogs.msdn.microsoft.com/appserviceteam/2018/02/07/understanding-serverless-cold-start/. The blog post also discusses some ways you can work around this if it's problematic for your scenario.

Resources