Azure service bus: is it wise to create a separate topic for every event you broadcast? - azure

I am trying to design the strategy that my organization will employ to create topics, and which messages will go to which one. I am looking at either creating a separate topic for each event, or a single topic to hold messages from all events, and then to triage with filters. I am convinced that using a separate topic for every event is better because:
Filters will be less complex and thus more performant, since each
event is already separated in its own topic.
There will be less chance of message congestion in any given topic.
Messages are less likely to be needlessly copied into any given
subscription.
More topics means more messaging stores, which means better message
retrieval and sending.
From a risk management perspective, it seems like having more topics
is better. If I only used a single topic, an outage would affect all
subscribers for all messages. If I use many topics, then perhaps outages would only affect some
topics and leave the others operational.
I get 12 more shared access keys per topic. It's easier to have more granular control over which topics are
exposed to which client apps since I can add/revoke access by
add/revoking the shared access key for each app on a per-topic basis.
Any thoughts would be appreciated

Like Sean already mentioned, there is really no one answer but here are some details about topics that could help you.
Topics are designed for large number of recipients by sending messages to multiple (upto 2000) subscriptions, which actually have the filters
Topics don't really store messages but subscriptions do
For outages, unless you have topics across regions, I'm not sure if it would help as such
The limit is for shared access authorization rules per policy. You should be using one of these to generate a SAS key for your clients.
Also, chaining service bus with autoforwarding is something you could consider as required.

Related

Azure Service Bus - is it a good solution for peer-to-peer messaging platform?

We are designing a system where users can exchange "messages" (let's say XML files for simplicity sake). This system is peer to peer by design - meaning only directed messages are supported. User A can only send message to User B, it is not possible to send messages to "groups" of users etc. FIFO order is mandatory requirement as well.
This must be a reliable solution - so we started looking into Azure and its services. And Service Bus does look like the right solution to me. It offers all bells and whistles we are looking for:
FIFO order is guaranteed
Dead-letter queue with timeouts
Geo-redundancy
Transactions
and so on
So naturally, I started playing with it. And the first idea I had was to give each user of my system a QUEUE from the service bus. It will act as an INBOX for them. Other users send messages to the user (let's say using unique USER_ID as a queue ID for example), messages get accumulated in the queue and when user decides to check the inbox, they will get all the messages in the correct order. This way we "outsource" all routing, security etc etc to the service bus itself - thus considerable simplifying the app logic.
But there is a serious caveat in this approach - Service Bus supports only up to 10,000 queues: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted#capacity-and-quotas and the number of users in my system can reach tens of thousands (but max out at 100,000 or so). So I'm somewhat in the range but not really. Therefore, I have questions:
Is there a flaw in my approach? Overall, is that a good idea to give a queue to the user exclusively? Or perhaps I should implement some kind of metadata and route messages based on it?
Am I looking at the right solution? I want to use SaaS as much as possible so I don't want to start building RabbitMQs on VMs etc - but are there built-in alternatives? may be a different approach should be considered?..
As for the numbers, I'm looking to start with 2,000 users and 200,000 messages a day - not a high load by any means. But if things work out, I see how these numbers can increase by 20x - 30x (but no more).
I would appreciate any options on this. Thank you.

if we use Azure Service bus with SessionId is slower than without ? Or do they have the same speed

I'm using service bus service From azure to Send Messages and I was wondering if Using SessionId will effect the speed of sending messages than the Case if I dont use it.
I know that SessionId will preserve the Order but what about the all in all speed ?
Thanks
Sending a message will not be much slower when you specify a session ID. Processing will be, but this is the wrong terminology to use. You can't compare handling messages w/o a session by multiple concurrent consumers and sessioned messages where the intent is to process those messages in the order they were sent in. Different business requirements that have different justifications, right? If you plan to use sessions, processing will be somewhat slower due to only a single active consumer being able to process all the messages from a given session. And that has to be backed up by a requirement, probably.
Take, for example handling items scanned at a grocery checkout. If you want to know what items are purchased in general, competing consumers is the way to go. However, if you want to know what items were bought per purchase, you can't use a competing consumer and have to use sessions to ensure only items for a given purchase are included and nothing else. Will the latter be somewhat slower? Yes, but you can't accomplish it with a competing consumer and if the business wants it, they'll accept the cost of slightly slower processing to gain the insights. Note, there are always multiple ways to solve the problem and maybe sessions is not what's needed at all.

Architecture issue - Azure servicebus and message order guarantee

Ok so i'm relatively new to the servicebus. Working on a project where we use Azure servicebus for queueing messages. Our architecture roughly looks like the following:
So the idea is that in our SourceSystem all kinds of stuff happens, which leads to messages being put on the servicebustopics. Now our responsibility is syncing these events to the external client so they are aware of what we are doing.
Now the issue is that currently we dont use servicebus sessions so message order isnt guaranteed. Also consider the following scenario:
OrderCreated
OrderUpdate 1
OrderUpdate 2
OrderClosed
What happens now is if the externalclients API is down for say OrderUpdate 1 and OrderUpdate 2, we could potentially send the messages in order: OrderCreated, OrderClosed, OrderUpdate 1, OrderUpdate 2.
Currently we just retry a message a few times and then it moves into the deadletter queue for manual reprocessing.
What steps should we take to better guarantee message order? I feel like in the scope of an order, message order needs to be guaranteed.
Should we force the sourcesystem to put all messages for a order in a servicebus session? But how can we handle this with multiple topics? And what do we do if message 1 from a session ends up in the deadletter?
There are a lot of considerations here, should we use a single topic so its easier to manage the sessions? But this opens up other problems with different message structures being in a single topic?
Id love to hear your opinions on this
Have a look at Durable Functions in Azure. You can use the 'Async Http API' or one of the other patterns to achieve the orchestration you need to do.
NServicebus' Sagas might also be a good option, here is an article that does a very good comparison between NServicebus and Durable Functions.
If the external client has to receive all those events and order matters, sending those messages to multiple topics where a topic is per message type will make your mission extremely hard to accomplish. For ordered messaging first you need to use a single entity (queue or topic) with Sessions enabled. That way you can guarantee ordered message processing. In case you have multiple external clients, you'd need to have a session-enabled entity (topic) per external client.
Another option is to implement a pattern known as Process Manager. The process manager would be responsible to make the decisions about the incoming messages and conclude when the work for a given order is completed or not.
There are also libraries (MassTransit, NServiceBus, etc) that can help you. NServiceBus implements Process Manager via a feature called Saga (tutorial) and MassTransit has it as well (documentation).

what is best practice to consume messages from multiple kafka topics?

I need to consumer messages from different kafka topics,
Should i create different consumer instance per topic and then start a new processing thread as per the number of partition.
or
I should subscribe all topics from a single consumer instance and the should start different processing threads
Thanks & regards,
Megha
The only rule is that you have to account for what Kafka does and doesn't not guarantee:
Kafka only guarantees message order for a single topic/partition. edit: this also means you can get messages out of order if your single topic Consumer switches partitions for some reason.
When you subscribe to multiple topics with a single Consumer, that Consumer is assigned a topic/partition pair for each requested topic.
That means the order of incoming messages for any one topic will be correct, but you cannot guarantee that ordering between topics will be chronological.
You also can't guarantee that you will get messages from any particular subscribed topic in any given period of time.
I recently had a bug because my application subscribed to many topics with a single Consumer. Each topic was a live feed of images at one image per message. Since all the topics always had new images, each poll() was only returning images from the first topic to register.
If processing all messages is important, you'll need to be certain that each Consumer can process messages from all of its subscribed topics faster than the messages are created. If it can't, you'll either need more Consumers committing reads in the same group, or you'll have to be OK with the fact that some messages may never be processed.
Obviously one Consumer/topic is the simplest, but it does add some overhead to have the additional Consumers. You'll have to determine whether that's important based on your needs.
The only way to correctly answer your question is to evaluate your application's specific requirements and capabilities, and build something that works within those and within Kafka's limitations.
This really depends on logic of your application - does it need to see all messages together in one place, or not. Sometimes, consumption from single topic could be easier to implement in terms of business logic of your application.

Azure event hubs and multiple consumer groups

Need help on using Azure event hubs in the following scenario. I think consumer groups might be the right option for this scenario, but I was not able to find a concrete example online.
Here is the rough description of the problem and the proposed solution using the event hubs (I am not sure if this is the optimal solution. Will appreciate your feedback)
I have multiple event-sources that generate a lot of event data (telemetry data from sensors) which needs to be saved to our database and some analysis like running average, min-max should be performed in parallel.
The sender can only send data to a single endpoint, but the event-hub should make this data available to both the data handlers.
I am thinking about using two consumer groups, first one will be a cluster of worker role instances that take care of saving the data to our key-value store and the second consumer group will be an analysis engine (likely to go with Azure Stream Analysis).
Firstly, how do I setup the consumer groups and is there something that I need to do on the sender/receiver side such that copies of events appear on all consumer groups?
I did read many examples online, but they either use client.GetDefaultConsumerGroup(); and/or have all partitions processed by multiple instances of a same worker role.
For my scenario, when a event is triggered, it needs to be processed by two different worker roles in parallel (one that saves the data and second one that does some analysis)
Thank You!
TLDR: Looks reasonable, just make two Consumer Groups by using different names with CreateConsumerGroupIfNotExists.
Consumer Groups are primarily a concept so exactly how they work depends on how your subscribers are implemented. As you know, conceptually they are a group of subscribers working together so that each group receives all the messages and under ideal (won't happen) circumstances probably consumes each message once. This means that each Consumer Group will "have all partitions processed by multiple instances of the same worker role." You want this.
This can be implemented in different ways. Microsoft has provided two ways to consume messages from Event Hubs directly plus the option to use things like Streaming Analytics which are probably built on top of the two direct ways. The first way is the Event Hub Receiver, the second which is higher level is the Event Processor Host.
I have not used Event Hub Receiver directly so this particular comment is based on the theory of how these sorts of systems work and speculation from the documentation: While they are created from EventHubConsumerGroups this serves little purpose as these receivers do not coordinate with one another. If you use these you will need to (and can!) do all the coordination and committing of offsets yourself which has advantages in some scenarios such as writing the offset to a transactional DB in the same transaction as computed aggregates. Using these low level receivers, having different logical consumer groups using the same Azure consumer group probably shouldn't (normative not practical advice) be particularly problematic, but you should use different names in case it either does matter or you change to EventProcessorHosts.
Now onto more useful information, EventProcessorHosts are probably built on top of EventHubReceivers. They are a higher level thing and there is support to enable multiple machines to work together as a logical consumer group. Below I've included a lightly edited snippet from my code that makes an EventProcessorHost with a bunch of comments left in explaining some choices.
//We need an identifier for the lease. It must be unique across concurrently
//running instances of the program. There are three main options for this. The
//first is a static value from a config file. The second is the machine's NETBIOS
//name ie System.Environment.MachineName. The third is a random value unique per run which
//we have chosen here, if our VMs have very weak randomness bad things may happen.
string hostName = Guid.NewGuid().ToString();
//It's not clear if we want this here long term or if we prefer that the Consumer
//Groups be created out of band. Nor are there necessarily good tools to discover
//existing consumer groups.
NamespaceManager namespaceManager =
NamespaceManager.CreateFromConnectionString(eventHubConnectionString);
EventHubDescription ehd = namespaceManager.GetEventHub(eventHubPath);
namespaceManager.CreateConsumerGroupIfNotExists(ehd.Path, consumerGroupName);
host = new EventProcessorHost(hostName, eventHubPath, consumerGroupName,
eventHubConnectionString, storageConnectionString, leaseContainerName);
//Call something like this when you want it to start
host.RegisterEventProcessorFactoryAsync(factory)
You'll notice that I told Azure to make a new Consumer Group if it doesn't exist, you'll get a lovely error message if it doesn't. I honestly don't know what the purpose of this is because it doesn't include the Storage connection string which needs to be the same across instances in order for the EventProcessorHost's coordination (and presumably commits) to work properly.
Here I've provided a picture from Azure Storage Explorer of leases the leases and presumably offsets from a Consumer Group I was experimenting with in November. Note that while I have a testhub and a testhub-testcg container, this is due to manually naming them. If they were in the same container it would be things like "$Default/0" vs "testcg/0".
As you can see there is one blob per partition. My assumption is that these blobs are used for two things. The first of these is the Blob leases for distributing partitions amongst instances see here, the second is storing the offsets within the partition that have been committed.
Rather than the data getting pushed to the Consumer Groups the consuming instances are asking the storage system for data at some offset in one partition. EventProcessorHosts are a nice high level way of having a logical consumer group where each partition is only getting read by one consumer at a time, and where the progress the logical consumer group has made in each partition is not forgotten.
Remember that the throughput per partition is measured so that if you're maxing out ingress you can only have two logical consumers that are all up to speed. As such you'll want to make sure you have enough partitions, and throughput units, that you can:
Read all the data you send.
Catch up within the 24 hour retention period if you fall behind for a few hours due to issues.
In conclusion: consumer groups are what you need. The examples you read that use a specific consumer group are good, within each logical consumer group use the same name for the Azure Consumer Group and have different logical consumer groups use different ones.
I haven't yet used Azure Stream Analytics, but at least during the preview release you are limited to the default consumer group. So don't use the default consumer group for something else, and if you need two separate lots of Azure Stream Analytics you may need to do something nasty. But it's easy to configure!

Resources