Has anybody thought about implementing strategies for Azure storage queues that would allow dequeuing messages in an arbitrary order (other than first-in, first-out). For examples, some people might be interested in LIFO, some people might want to dequeue "important" messages ahead of less important ones, etc.
Personally, I am interested in implementing a strategy that would allow messages in a multi-tenant system to be dequeued in a way that ensures large number of messages related to a particular tenant will not cause messages for other tenants to be delayed.
I am also interested in other queuing systems that may have implemented similar strategies.
Are there other queuing systems that allow this kind of
What you are looking for is referred to as a Priority Queue Pattern which you can read more about here.
There are a couple of strategies for achieving this. One is to use different queues for the higher priority messages. Or in your case, a queue for each customer.
Another approach, and is one I would prefer for your scenario, is to use the ServiceBus Topics and Subscriptions (pub/sub basically).
Both of these are discussed in more detail in the link provided above.
The priority queue pattern is the way to go. Use different queues for different message priorities. You can also assign appropriate numbers of workers to each queue to drain at an appropriate rate.
Related
Is it possible to have an Azure Event Hub with multiple listeners in the same application?
What I'm looking for is to trigger an event to a single hub and have two Azure Functions listen to it so they can both perform a task based on the same event.
I've implemented this at the minute with "Consumer Groups" but this feels wrong as I feel that this should be used when you have multiple applications reading the events.
Is there a better mechanism of doing this or am I looking at this the wrong way?
Thanks
Yes, it is possible. The general answer for the best approach to doing so is "it depends on your application". In the scenario that you're describing, you'll need to use two separate consumer groups due to how the Azure Functions bindings work.
Internally, the Function bindings use an event processor which will attempt to collaborate with other processors working against the consumer group to share work and prevent two instances reading from the same partition. The same collaboration would take place if you were hosting one of the event processor types in your application as well.
For the other consumer types available in the Event Hubs SDK, working against the same consumer group is not a problem so long as you have less than the documented quotas. (at the time of writing, this was 5 consumers per group)
I'm trying to design a robust architecture, however I'm having trouble on solving the message delivery.
Let me try to explain
The API would be clustered on ECS receiving a bunch of requests.
The Workers would be clustered too subscribing the same channels. (that's the problem, if we were working with only one worker it wouldn't have any issue)
How to deal with multiple workers avoiding duplicated messages?
What would be a good simple approach, keeping many workers occupied.?
Thank you.
This sounds like a very fundamental problem, for a message broker: having one channel and multiple workers subscribed to it, and all of them to receive the same message. It wouldn't really be useful to process the same message multiple times.
This problem has been addressed in most message brokers (I believe). For example, when you consume a message from an Amazon SQS queue, that message is not visible to other consumers, for a particular timeframe (visibility timeout).
When the worker processed the message, it has to delete it from the queue. Otherwise, if the timeout expired, other workers will see the message and process it.
SQS in particular has a distributed architecture and sometimes you get duplicate messages in the queue, which are processed by different workers. That's the effect of the at-least-once delivery guarantee that SQS provides.
If your system has to be strict about duplicate messages, then you need to build a de-duplication mechanism around it.
The keywords you are looking for is "exactly once guarantee in a distributed system". With that you can do some research on your own, but here some pointers.
You could use the right Event Queue System that supports "exactly once" guarantees. For example Apache Pulsar (see this link) or Kafka, or you can use their approach as inspiration in your own implementation (which may be somewhat hard to do).
For your own implementation you could write a special consumer that is the only consumer and acts a distributor for worker tasks and whose task it is to guarantee "exactly once". It would be a tradeoff and could prove a bottleneck, depending on your scalability requirements. This article explains why it is a difficult problem in distributed systems.
I have written an implementation of azure service bus into our application using Topics which are subscribed to by a number of applications. One of the discussions in our team is whether we stick with a single Topic and filter via the properties of the message or alternatively create a Topic for our particular needs.
Our scenario is that we wish to filter by a priority and an environment variable (test and uat environments share a connection).
So do we have Topics (something like):
TestHigh
TestMedium
TestLow
UatHigh
UatMedium
UatLow
OR, just a single topic with these values set as two properties?
My preference is that we create separate topics, as we'd be utilising the functionality available and I would imagine that under high load this would scale better? I've read peeking large queues can be inefficient. It also seems cleaner to subscribe to a single topic.
Any advice would be appreciated.
I would go with separate topics for each environment. It's cleaner. Message counts in topics can be monitored separately for each environment. It's marginally more scalable (e.g. topic size limits won't be shared) - but the limits are generous and won't matter much in testing.
But my main argument: that's how production will (hopefully) go. As in, production will have it's own connection (and namespace) in ASB, and will have separate topics. Thus you would not be filtering messages via properties in production, so why do it differently in testing?
Last tip: to make topic provision easier, I'd recommend having your app auto create them on start up. It's easy to do - check if they exist, and create if they don't.
Either approach works. More topics and subscriptions mean that you have more entities to manage at deployment time. If High/Medium/Low reflect priorities, then multiple topics may be a better choice since you can pull from the the highest priority subscription first.
From a scalability perspective there really isn't too much of a difference that you would notice since Service Bus already spreads the load across multiple logs internally, so if you use six topics or two topics will not make a material difference.
What does impact performance predictability is the choice of service class. If you choose "Standard", throughput and latency are best effort over a shared multi-tenant infrastructure. Other tenants on the same cluster may impact your throughput. If you choose "Premium", you get ringfenced resources that give you predictable performance, and your two or six Topics get processed out of that resource pool.
In another words, if I create messaging layout which uses rather large number of messaging entities (like several thousands), instead of smaller number, is there something in Azure ServiceBus that gets irritated by that and makes it perform less than ideally, or generates significantly different costs. Let us assume that number of messages will remain roughly the same in both scenarios.
So to make clear I am not asking if messaging layout with many entities is sound from applications point of view, but rather is there in Azure some that performs badly in such situations. If there are advantages to it (perhaps Azure can scale it more easily), that would be also interesting.
I am aware of 10000 entites limit in single ServiceBus namespace.
It is the more matter of programming and architecture of the solution i think - for example, we saw the problems with the ACS (authentication mechanism) - SB started to throttle the client sometimes when there were many requests. Take a look at the guidance about SB high availability - there are some issues listed that should be considered when you have a lot of load.
And, you always have other options that can be more suitable for highload scenarios - for example, Azure Event Hubs, more lightweight queue mechanism intended to be the service for the extremely high amount of messages.
I have a Java application, which uses an Oracle Queue to store messages in the queue for later processing by multiple threads consuming queued messages. The messages in this queue can be related to each other, and must therefore be processed in a specific order based on the business logic of my application. Basically, I want to achieve that the dequeueing of one message A is held back as long as another message B in the queue has not been completely processed. The only weapon given by Oracle AQ I see here, are the Delay and an Priority parameters. These, however, cannot be used to achieve the scenario outlined above, since there are situations, where two related messages still can be dequeued and processed at the same time. Are there any tools that can help establishing an advanced processing order of messages?
I came to the conclusion that it is not a good idea to order these messages using the queue, because it would need a custom and very specialized dequeue strategy, which has a very bad smell to me, both, complexity and most likely performance wise. It also tries to fix communication protocol issues using the queue, which are application specific and therefore should find treatment in the application itself. Instead, the application / communication protocol should be tolerant enough to handle ordering issues.