Azure Event Hub with multiple listeners - azure

Is it possible to have an Azure Event Hub with multiple listeners in the same application?
What I'm looking for is to trigger an event to a single hub and have two Azure Functions listen to it so they can both perform a task based on the same event.
I've implemented this at the minute with "Consumer Groups" but this feels wrong as I feel that this should be used when you have multiple applications reading the events.
Is there a better mechanism of doing this or am I looking at this the wrong way?
Thanks

Yes, it is possible. The general answer for the best approach to doing so is "it depends on your application". In the scenario that you're describing, you'll need to use two separate consumer groups due to how the Azure Functions bindings work.
Internally, the Function bindings use an event processor which will attempt to collaborate with other processors working against the consumer group to share work and prevent two instances reading from the same partition. The same collaboration would take place if you were hosting one of the event processor types in your application as well.
For the other consumer types available in the Event Hubs SDK, working against the same consumer group is not a problem so long as you have less than the documented quotas. (at the time of writing, this was 5 consumers per group)

Related

Azure EventHub Push/Pull?

When it comes to Apache Kafka, on the consumer side I know it's a pull model. What about Azure EventHubs? Are they pull or push?
From what I've gathered so far unlike kafka event hubs "push" events to the listeners. Can someone confirm? Any additional details or references would be helpful.
A simple google search landed me on the this page to back up my claim
Is there a simple way to test this theory out?
Yes, Azure Event Hub push events to event consumers, there is no need to 'poll' for consuming the events. The event processor defines event handlers which are invoked as new events are ingested into the event stream.
The event consumer can do something called as checkpoint that marks the event upto which the events have been consumed.
See the doc for more details.
The short answer to this is that the model for consuming events depends on the type of client that your application has chosen to use. The official Azure SDK packages offer consumer types that are push-based and those that are pull-based.
You don't mention the specific language that you're using but, since you're comparing to Kafka, I'll assume that you're interested in Java. The azure-messaging-eventhubs family of packages are the current generation of the Azure SDK and has the following clients for reading events:
EventProcessorClient: This is a push-based client intended to serve as the primary consumer of events in production scenarios for the majority of workloads. It is responsible for reading and processing events for all partitions of an Event Hub and collaborates with other EventProcessorClient instances using the same Event Hub and consumer group to balance work between them. A high degree of fault tolerance is built-in, allowing the processor to be resilient in the face of errors.
EventHubConsumerAsyncClient: This is a push-based client focused on reading events from a single partition using a Flux-based subscription via the Reactor library. This client requires applications to own responsibility for resilience and processing state persistence.
EventHubConsumerClient: This is a pull-based client focused on reading events from a single partition using an iterator pattern. This client requires applications to own responsibility for resilience and processing state persistence.
More information around the package, its types, and basic usage in the Azure Event Hubs client library for Java overview. More detailed samples can be found in the samples overview, including those for Consuming events and Using the EventProcessorClient.

Many ordered queues - how to auto rebalancing streams between app instances?

Problem description
I want to deploy distributed, ordered queues solution for my project but I have questions/problems:
Which tool/solution should I use? Which would be the easiest to implement/learn and infrastructure cost me less? RabbitMQ, Kafka, Redis Streams?
How to implement auto rebalancing of topics/streams for each consumer in failure situation or when new topic/stream was added to system?
In other words, I want to realize something like that:
distributed queues
..but, if one of my application are failed, other instances should take all traffic which is currently left with proper distribution (equal load).
Note, that my code was written in node.js v10 (TypeScript) and my infrastructure are based on Azure, so besides self-hosted solution (like RabbitMQ), azure-based solution (like Azure Service Bus) are also possible, but less vendor-lock, the better solution for me
My current architecture
Now I provide a more detailed background of my system:
I have 100 000 vehicle's tracker devices (different ones, many manufactures and protocols), each of them communicate with one of my custom app called decoder. This small microservice decodes and unifies payload from tracker and send it to distributed queue. Each tracker sends message every 10-30 seconds.
Note, that I must keep order of messages from single device, this is very important!
In next step, I have processing app microservice which I want to scale (forking / clustering) depends of number of tracker devices. Each fork of this app should subscribe to some of topics/consumer groups to process messages from devices, while keeping order. Processing of each message takes about 1-3 seconds.
Note, that in every moment of time, I can add or remove tracker devices, and this information should be auto-propagate to forks of processing app and this instances should be able to auto rebalancing traffic from queue.
The question is how to do that with as little as possible lines of (node.js) code, and at the same time, keeping solution easy, clean and cheap? :)
As you see at picture above, if fork no.3 failed, system must decide which of working forks should be get "blue" messages. Also, if fork no.3 return back, rebalancing is also needed.
My own research
I read about Apache Kafka with Consumer Groups, but Kafka is difficult to learn and to implement for me.
I read about RabbitMQ and Consumer Groups / many topics, but I don't know how to write auto rebalancing feature and also how I can use RabbitMQ (which plugins? which settings / configurations? there's so many options...).
I read about Azure Service Bus with message sessions but it has vendor-lock (azure cloud), it costs a lot, and like other solutions, doesn't provide full auto-rebalancing out-of-box.
I read about Redis Streams (with consumer groups) but it's new feature (lack of libraries for node.js) and also doesn't provide auto-rebalancing.
1 Message Brocker
For the first question you should look for a mature m2m protocol brocker which will give you freedom in designing your own intelligent data switching algorithms.
2 Loadbalancer
The answer to the second question you must employ well performed load balancer for handling such a huge number of 100000 connected cars. My suggestion to use Azure API Gateway or Nginx load balancer.
Now lets look at some of connected car solutions and analyze how the Aws IoT or Azure IoT doing the job nicely.
OpenSource IoT Solution
OpenSource IoT Solution
Nginx or API Gateway is used for the load Balancing purposes while the event processing is done on Kafka. Using kafka you can implement your own rule engine for intelligent data switching. Similarly any Message Broker as IoT bridge would do better. If I were you would be using VerneMQ to implement MQTTv5 features and data routing. In this case queue is not required.
Again if you want to use azure queue you have to concentrate on managing the queue forking and preempting. To control the queue seamlessly you have to write Azure Queue Trigger server-less Function. Thus your goal to not be vendor locked would be impossible to achieve.
In single word using VerneMQ, MQTT V5 implementation with Nginx would be great to implement but as all these are opensource product you must be strong in implementation and trouble shooting otherwise your business operation would be in support failure.
Its better to use professional IoT cloud services for a solution of thousands of connected cars. This is paying of as the SLA of the service is very high standard and little effort in system operation management.
Azure IoT Solution
Azure IoT Solution
If you are using Azure Solution, you be using IoT Hub where you don't have to worry about load balancing. Using Azure device SDK you can connect all the car with mobile LTE sim, OBD plugin etc to the cloud. Then azure function can handle the event processing and so on.
AWS IoT Solution
AWS IoT Solution
Unlike Azure IoT Device SDK, AWS IoT have sdk for devices. But in this architecture we want to complete the connected car project a little differently. For the shake of thing shadow and actual device status synchronization we have used AWS GreenGrass core solution in the edge side. Along with the server-less IoT event processing we have settled the whole connected car solution.
Similarly Azure IoT edge could be used to provide all can information to the device twin and synchronize between the actual car and twins.
Hope this will give you a clear idea how to implement and see the cost benefit over the vendor locked or unlocked situation.
Thank you.

Can Azure EventHub be used for critical transactional data in production?

Reading the documentation, Azure EventHubs is meant for:
Application instrumentation
User experience or workflow processing
Internet of Things (IoT) scenarios
Can this be used for any transactional data, handling revenue or application sensitive data?
Based on what I read, looks like it is meant for handling data that one should not be worried about any data loss. Is this the case?
It is mainly designed for large scale ingestion of data. That is why typical scenario's include IoT solutions which consists of a multitude of devices sending mass amounts of telemetry data.
To allow for this kind of scale it does not include some features other messaging service, like Azure Service Bus, do have. I think this blog does a good job of listening the differences. Especially the section Use Case explains things very well:
From a target use case perspective if we consider some of our typical enterprise integration patterns then if you are implementing a pattern which uses a Command Message, or a Request/Reply Message then you probably want to use Azure Service Bus Messaging.  RPC patterns can be implemented using Request/Reply messages on Azure Service Bus using a response queue.  These are really about ESB and EAI style messaging patterns where you want to send messages between applications and probably want to use other features such as property based routing.
Azure Event Hubs is more likely to be used if you’re implementing patterns with Event Messages and you want somewhere reliable to send them that is capable of dealing with a massive scale but will allow you to do stuff with the events out of process.
With these core target use cases in mind it is easy to see where the scale differences come into play.  For messaging it’s about one application telling one or more apps to DO SOMETHING or GIVE ME SOMETHING.  The alternative is that in eventing the applications are saying SOMETHING HAS HAPPENED.  When you consider this in typical application scenarios and you put events into the telemetry and logging space you can quickly see that the SOMETHING HAS HAPPENED scenario will produce a lot more traffic than the other.
Now I’m not saying that you can’t implement some messaging type functions using event hubs and that you can’t push events to a Service Bus topic as in integration there are always different requirements which result in different implementation scenarios, but I think if you follow the above as a general rule then you will usually be on the right path.
That does not mean however, that it is only capable of handling data that one should not be worried about any data loss. Data is stored for a configurable amount of time and if necessary, this data can be read from an earlier point in time.
Now, given your scenario I do not think Event Hub is the best fit. But truth to be told, I am not sure because you will have to elaborate more on what you want to do exactly.
Addition
The idea behind Event Hubs is that you will get at least once delivery at great scale. (Source). See also this question: Does Azure Event Hub guarantees at least once delivery?

Azure Service Bus - Multiple Topics vs Filtered Topic

I have written an implementation of azure service bus into our application using Topics which are subscribed to by a number of applications. One of the discussions in our team is whether we stick with a single Topic and filter via the properties of the message or alternatively create a Topic for our particular needs.
Our scenario is that we wish to filter by a priority and an environment variable (test and uat environments share a connection).
So do we have Topics (something like):
TestHigh
TestMedium
TestLow
UatHigh
UatMedium
UatLow
OR, just a single topic with these values set as two properties?
My preference is that we create separate topics, as we'd be utilising the functionality available and I would imagine that under high load this would scale better? I've read peeking large queues can be inefficient. It also seems cleaner to subscribe to a single topic.
Any advice would be appreciated.
I would go with separate topics for each environment. It's cleaner. Message counts in topics can be monitored separately for each environment. It's marginally more scalable (e.g. topic size limits won't be shared) - but the limits are generous and won't matter much in testing.
But my main argument: that's how production will (hopefully) go. As in, production will have it's own connection (and namespace) in ASB, and will have separate topics. Thus you would not be filtering messages via properties in production, so why do it differently in testing?
Last tip: to make topic provision easier, I'd recommend having your app auto create them on start up. It's easy to do - check if they exist, and create if they don't.
Either approach works. More topics and subscriptions mean that you have more entities to manage at deployment time. If High/Medium/Low reflect priorities, then multiple topics may be a better choice since you can pull from the the highest priority subscription first.
From a scalability perspective there really isn't too much of a difference that you would notice since Service Bus already spreads the load across multiple logs internally, so if you use six topics or two topics will not make a material difference.
What does impact performance predictability is the choice of service class. If you choose "Standard", throughput and latency are best effort over a shared multi-tenant infrastructure. Other tenants on the same cluster may impact your throughput. If you choose "Premium", you get ringfenced resources that give you predictable performance, and your two or six Topics get processed out of that resource pool.

using cloud services to aggregate and group real-time statistics in a time window to trigger notifications

I'm trying to build a real-time achievements processor for things like:
every time there is a new participant in a thread, send a notification to the last 3 participants
group and aggregate activity stream notifications by type per day
This description of event stream processing seems like a good fit for what I need https://en.wikipedia.org/wiki/Event_stream_processing
If the use case were just to update or trigger from single events, I can use one of the many cloud queue or publisher services from amazon or azure, things like Kinesis or SQS and use say an AWS lambda function to process messages from the queue. Azure seems like it offers something called an Event Hub which can act as the data stream broadcaster. Essentially, have a cloud queue of all actions/events and multiple notification processors as subscribers to the events stream(s) and the logic triggers and aggregations and achievement awards are encapsulated in each achievement processor.
However, since I need to group items by some arbitrary rules (each achievement can have many grouping parameters), I can't just simply look at the latest event in the action queue to process each achievement in real-time. Would I have to keep a set in memory to make this efficient? The alternative is to have each achievement processor do a database lookup with every event (e.g. to select all events for the day that match this type) but I'm worried if I do that it will not be very performant. I've heard mention of things like spark streaming and snowplow, so I'm wondering if there is both a pattern and a product on either AWS or Azure cloud services that can be useful to solve this in a very scalable and simple manner - and if the existing data streaming services on azure and aws (event hubs and kinesis) would fit this data-aggregation use case.
Both Azure and AWS now offer something that can fit this use case:
https://azure.microsoft.com/en-us/services/stream-analytics/
and
https://aws.amazon.com/kinesis/analytics/
disclaimer: I'm a Product Manager at Striim
Just for the sake of answering the question, Striim lets you run SQL queries on lives streams of data, aggregate it with time/count/hybrid windows, and trigger alerts. It's horizontally scalable as well.
Striim is available on both Azure and AWS marketplace. THe other nice thing the same pipeline can easily be transferred between clouds and also run on premise.

Resources