Azure Messaging services: detecting the absence of a message - azure

Within an Azure micro-services environment, implementing an event driven architecture, I have the challenge of acting upon a received event. I also have the counter challenge of detecting a duration with no events.
Has anyone identified a way in which the recipient can be notified after a period of no events?
I have considered things like Azure functions polling the dataset however this won't scale particularly well due to the volume of data.

You could use Azure Stream Analytics, and Windowing to detect if a window, e.g. the last x minutes, did not contain any new events. See here for a similar answer: https://stackoverflow.com/a/53430421/1537195

Related

Azure EventHub Push/Pull?

When it comes to Apache Kafka, on the consumer side I know it's a pull model. What about Azure EventHubs? Are they pull or push?
From what I've gathered so far unlike kafka event hubs "push" events to the listeners. Can someone confirm? Any additional details or references would be helpful.
A simple google search landed me on the this page to back up my claim
Is there a simple way to test this theory out?
Yes, Azure Event Hub push events to event consumers, there is no need to 'poll' for consuming the events. The event processor defines event handlers which are invoked as new events are ingested into the event stream.
The event consumer can do something called as checkpoint that marks the event upto which the events have been consumed.
See the doc for more details.
The short answer to this is that the model for consuming events depends on the type of client that your application has chosen to use. The official Azure SDK packages offer consumer types that are push-based and those that are pull-based.
You don't mention the specific language that you're using but, since you're comparing to Kafka, I'll assume that you're interested in Java. The azure-messaging-eventhubs family of packages are the current generation of the Azure SDK and has the following clients for reading events:
EventProcessorClient: This is a push-based client intended to serve as the primary consumer of events in production scenarios for the majority of workloads. It is responsible for reading and processing events for all partitions of an Event Hub and collaborates with other EventProcessorClient instances using the same Event Hub and consumer group to balance work between them. A high degree of fault tolerance is built-in, allowing the processor to be resilient in the face of errors.
EventHubConsumerAsyncClient: This is a push-based client focused on reading events from a single partition using a Flux-based subscription via the Reactor library. This client requires applications to own responsibility for resilience and processing state persistence.
EventHubConsumerClient: This is a pull-based client focused on reading events from a single partition using an iterator pattern. This client requires applications to own responsibility for resilience and processing state persistence.
More information around the package, its types, and basic usage in the Azure Event Hubs client library for Java overview. More detailed samples can be found in the samples overview, including those for Consuming events and Using the EventProcessorClient.

How to process current and previous IoT Hub event from an azure function?

I have a simple scenario where I want to take the diff between current value of a parameter and previous value from IoT hub telemetry messages and attach this result and send to Time Series Insights environment (via an event hub if required so).
How can I achieve this? I am studying about Azure functions but not able to figure out how to exactly go about it.
The minimum timestamp difference between messages is 1 second and only edge devices (at max perhaps 3) will send the telemetry data. Each edge device might be collecting data from around 500 devices.
I am looking for a guidance on logical steps involved and a few critical pieces of Python code
Are these telemetry messages or property changes? Also what's the scale (number of devices)? To do this effectively you need to ensure you have both the current and previous values, which means storing the last reported value and timestamp externally as it could be a long time between. The Event Hub is not guaranteed to have all past messages (default is 24h), so if there's a long lag between messages it's not the right store to rely on.
Durable Entities can be used to store state (using something similar to the Actor Model). These are persisted in Azure Storage so at extremely high throughput a memory-only calculation option might make sense with delayed persistence, but you can build a memory-caching layer into your function to help if needed. This is likely going to be the best bet for what you want to do.
For most people the performance hit of going to Azure storage and back is minimal and Durable Entities will be the easiest path forward.
If you are doing it in a near-real time stream, the best solution is to use Azsure Streaming Analytics using the LAG operator. ASA has a bunch of useful features that you will need such as the PARTITION BY and event ordering policies. Beware, ASA can be expensive to run and hard to work with, but is a good service for commercial solutions.
If you don't need near-real time, a plain 'ol python script that queries (blob) persisted data is a good option, and can be wrapped up in an Azure function if it doesn't take too long to run.
Azure functions are not recommended for stateful message processing. You simply have insufficient control of the number of function instances running, the size of the batch, etc. So it is impossible to consistently and confidently know what the 'previous' timeseries value is. With Azure functions, you have to develop assuming that concurrency is never going to be an issue, which you cannot do with streaming IoT data.

Ensuring Azure Service Bus never loses a single message

I have a system where losing messages from Azure Service Bus would be a disaster, that is, the data would be lost forever with no practical means to repair the damage without major disruption.
Would I ever be able to rely on ASB entirely in this situation? (even if it was to go down for hours, it would always come back up with the same messages it was holding when it failed)
If not, what is a good approach for solving this problem where messages are crucial and cannot be lost? Is it common to simply persist the message to a database prior to publishing so that it can be referred to in the event of a disaster?
Azure Service Bus doesn’t lose messages. Once a message is successfully received by the broker, it’s there and doesn’t go anywhere. Where usually things go wrong is with message receiving and processing, which is user custom code. That’s one of the reasons why Service Bus has PeekLock receive mode and dead-lettering based on delivery count.
If you need strong guarantees, don’t reinvent the wheel yourself. Use messaging frameworks that do it for you, such as NServiceBus or MassTransit.
With azure service bus you can make this and be sure 99.99% percent,
at worst case you will find your message at the dead-letter queues but it will be never deleted.
Another choice is to use Azure Storage Queue and setting TTL to -1 it will give a infinity life time ,
but because i'am a little bit an old school and to be sure 101% I would suggest an manual solution using azure table storage,
so it's you who decide when add/delete or update a ligne because the criticty of information and data that you work with

Many ordered queues - how to auto rebalancing streams between app instances?

Problem description
I want to deploy distributed, ordered queues solution for my project but I have questions/problems:
Which tool/solution should I use? Which would be the easiest to implement/learn and infrastructure cost me less? RabbitMQ, Kafka, Redis Streams?
How to implement auto rebalancing of topics/streams for each consumer in failure situation or when new topic/stream was added to system?
In other words, I want to realize something like that:
distributed queues
..but, if one of my application are failed, other instances should take all traffic which is currently left with proper distribution (equal load).
Note, that my code was written in node.js v10 (TypeScript) and my infrastructure are based on Azure, so besides self-hosted solution (like RabbitMQ), azure-based solution (like Azure Service Bus) are also possible, but less vendor-lock, the better solution for me
My current architecture
Now I provide a more detailed background of my system:
I have 100 000 vehicle's tracker devices (different ones, many manufactures and protocols), each of them communicate with one of my custom app called decoder. This small microservice decodes and unifies payload from tracker and send it to distributed queue. Each tracker sends message every 10-30 seconds.
Note, that I must keep order of messages from single device, this is very important!
In next step, I have processing app microservice which I want to scale (forking / clustering) depends of number of tracker devices. Each fork of this app should subscribe to some of topics/consumer groups to process messages from devices, while keeping order. Processing of each message takes about 1-3 seconds.
Note, that in every moment of time, I can add or remove tracker devices, and this information should be auto-propagate to forks of processing app and this instances should be able to auto rebalancing traffic from queue.
The question is how to do that with as little as possible lines of (node.js) code, and at the same time, keeping solution easy, clean and cheap? :)
As you see at picture above, if fork no.3 failed, system must decide which of working forks should be get "blue" messages. Also, if fork no.3 return back, rebalancing is also needed.
My own research
I read about Apache Kafka with Consumer Groups, but Kafka is difficult to learn and to implement for me.
I read about RabbitMQ and Consumer Groups / many topics, but I don't know how to write auto rebalancing feature and also how I can use RabbitMQ (which plugins? which settings / configurations? there's so many options...).
I read about Azure Service Bus with message sessions but it has vendor-lock (azure cloud), it costs a lot, and like other solutions, doesn't provide full auto-rebalancing out-of-box.
I read about Redis Streams (with consumer groups) but it's new feature (lack of libraries for node.js) and also doesn't provide auto-rebalancing.
1 Message Brocker
For the first question you should look for a mature m2m protocol brocker which will give you freedom in designing your own intelligent data switching algorithms.
2 Loadbalancer
The answer to the second question you must employ well performed load balancer for handling such a huge number of 100000 connected cars. My suggestion to use Azure API Gateway or Nginx load balancer.
Now lets look at some of connected car solutions and analyze how the Aws IoT or Azure IoT doing the job nicely.
OpenSource IoT Solution
OpenSource IoT Solution
Nginx or API Gateway is used for the load Balancing purposes while the event processing is done on Kafka. Using kafka you can implement your own rule engine for intelligent data switching. Similarly any Message Broker as IoT bridge would do better. If I were you would be using VerneMQ to implement MQTTv5 features and data routing. In this case queue is not required.
Again if you want to use azure queue you have to concentrate on managing the queue forking and preempting. To control the queue seamlessly you have to write Azure Queue Trigger server-less Function. Thus your goal to not be vendor locked would be impossible to achieve.
In single word using VerneMQ, MQTT V5 implementation with Nginx would be great to implement but as all these are opensource product you must be strong in implementation and trouble shooting otherwise your business operation would be in support failure.
Its better to use professional IoT cloud services for a solution of thousands of connected cars. This is paying of as the SLA of the service is very high standard and little effort in system operation management.
Azure IoT Solution
Azure IoT Solution
If you are using Azure Solution, you be using IoT Hub where you don't have to worry about load balancing. Using Azure device SDK you can connect all the car with mobile LTE sim, OBD plugin etc to the cloud. Then azure function can handle the event processing and so on.
AWS IoT Solution
AWS IoT Solution
Unlike Azure IoT Device SDK, AWS IoT have sdk for devices. But in this architecture we want to complete the connected car project a little differently. For the shake of thing shadow and actual device status synchronization we have used AWS GreenGrass core solution in the edge side. Along with the server-less IoT event processing we have settled the whole connected car solution.
Similarly Azure IoT edge could be used to provide all can information to the device twin and synchronize between the actual car and twins.
Hope this will give you a clear idea how to implement and see the cost benefit over the vendor locked or unlocked situation.
Thank you.

Can Azure EventHub be used for critical transactional data in production?

Reading the documentation, Azure EventHubs is meant for:
Application instrumentation
User experience or workflow processing
Internet of Things (IoT) scenarios
Can this be used for any transactional data, handling revenue or application sensitive data?
Based on what I read, looks like it is meant for handling data that one should not be worried about any data loss. Is this the case?
It is mainly designed for large scale ingestion of data. That is why typical scenario's include IoT solutions which consists of a multitude of devices sending mass amounts of telemetry data.
To allow for this kind of scale it does not include some features other messaging service, like Azure Service Bus, do have. I think this blog does a good job of listening the differences. Especially the section Use Case explains things very well:
From a target use case perspective if we consider some of our typical enterprise integration patterns then if you are implementing a pattern which uses a Command Message, or a Request/Reply Message then you probably want to use Azure Service Bus Messaging.  RPC patterns can be implemented using Request/Reply messages on Azure Service Bus using a response queue.  These are really about ESB and EAI style messaging patterns where you want to send messages between applications and probably want to use other features such as property based routing.
Azure Event Hubs is more likely to be used if you’re implementing patterns with Event Messages and you want somewhere reliable to send them that is capable of dealing with a massive scale but will allow you to do stuff with the events out of process.
With these core target use cases in mind it is easy to see where the scale differences come into play.  For messaging it’s about one application telling one or more apps to DO SOMETHING or GIVE ME SOMETHING.  The alternative is that in eventing the applications are saying SOMETHING HAS HAPPENED.  When you consider this in typical application scenarios and you put events into the telemetry and logging space you can quickly see that the SOMETHING HAS HAPPENED scenario will produce a lot more traffic than the other.
Now I’m not saying that you can’t implement some messaging type functions using event hubs and that you can’t push events to a Service Bus topic as in integration there are always different requirements which result in different implementation scenarios, but I think if you follow the above as a general rule then you will usually be on the right path.
That does not mean however, that it is only capable of handling data that one should not be worried about any data loss. Data is stored for a configurable amount of time and if necessary, this data can be read from an earlier point in time.
Now, given your scenario I do not think Event Hub is the best fit. But truth to be told, I am not sure because you will have to elaborate more on what you want to do exactly.
Addition
The idea behind Event Hubs is that you will get at least once delivery at great scale. (Source). See also this question: Does Azure Event Hub guarantees at least once delivery?

Resources