Azure Service Bus Scalability - azure

I am trying to understand how can I make Azure Service Bus Topic to be scaleable to handle >10,000 requests/second from more than 50 different clients. I found this article at Microsoft - http://msdn.microsoft.com/en-us/library/windowsazure/hh528527.aspx. This provides lot of good input to scale azure service bus like creating multiple message factories, sending and receiving asynchronously, doing batch send/receive.
But all these input are from the publisher and subscriber client perspective. What if the node running the Topic can not handle the huge number of transactions? How do I monitor that? How do I have the Topic running on multiple nodes? Any input on that would be helpful.
Also wondering if any one has done any capacity testing with Topic/Queue and I am eager to see those results...
Thanks,
Prasanna

If you need 10K or 100K or 1M or more requests per seconds take a look at what's being done on the highway. More traffic, more lanes.
You can get effectively arbitrary flow rates out of Service Bus by partitioning your traffic across multiple entities. Service Bus gives a number of assurances about reliability, e.g. that we don't lose messages once we took them from you or that we assign gapless sequence numbers, and that has throughput impact on the individual entities like a single Topic. That's exactly like a highway lane just being able to deal with X cars/hour. Make more lanes.

Since these replies, Microsoft has released a ton of new capability.
Azure Auto-Scale can monitor the messages in a queue (or CPU load)
and start or stop instances to maintain that target.
Service Bus introduced Partitioned Queue's (& topics). This lets you send messages over multiple queues but they look like a single queue to you API. Dramatically increasing the throughput of a Queue.
Before you do that I'd recommend you try:-
Async & Batched writes to the queue.
Change the Prefetch parameter on the Reads.
Also look at Receive.OnMessage() to ensure you get the messages the millisec they are available.
This will improves your perf from ~5 messages / sec to many 100's or 1,000's per sec.

The Service Bus has its limitations "Capacity and Quotas", check out this article for a very good overview of these: https://learn.microsoft.com/en-gb/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted
I suggest you reach out to your local MSFT Specialist if you have a use case that will push the boundaries of Azure Service Bus, MSFT have dedicated teams in Redmond (around the world) that can help you design and push these boundaries at massive scale, this is the Windows Azure CAT (Customer Advisory Team). Their goal is to solve real world customer problems and it sounds like you might have one...
You need to performance and load test to reach all the answers to your questions above based on your specific scenario.
The Azure CAT team have a wealth of metrics on capacity and load testing with the Service Bus (Azure in general), these are not always publicly available so again reach out if you can...

If it can handle that many requests, you want to make sure that you receive the messages in such a way that you don't hit the max size of the topic. You can use multiple instances of a worker role in Azure to listen to specific subscriptions so you would be able to process messages faster without getting near the max size.

Related

Azure Event Hubs limits and its comparison to pure Kafka cluster

Recently Azure released a feature called Azure Event Hubs for Kafka that allows to use Event Hubs like if it were a Kafka cluster, using the same Kafka libraries. That would allow us to migrate from our current IaaS Kafka solution to a PaaS solution, with all the advantages of a fully managed solution, and with only minimal changes in our base code (at least that's the promise).
However, while analyzing the migration we are finding it hard to get our infrastructure inside the Azure Event Hub limits. We have hundreds of topics in Kafka and we know we will scale to thousands in the future, but that can't be easily be fit inside Event hubs.
In Azure the match for the concept of topic is the Event Hub, and then you also have namespaces, that match a Kafka cluster. In fact, each namespace has a different DNS name, making it a complete different system. The limitations are the following: you can have up to 10 event hubs per namespace, up to 100 namespaces per subscription. That, translated into Kafka jargon, is up to 1000 topics. Let's suppose that's enough for our purposes, however I would need different parts of my application to connect to different Kafka clusters (namespaces) per each 10 topics I have, adding an unneeded complexity to the whole story.
It seems like in the end I am changing the difficulty of managing the infrastructure of my own cluster by the difficulty of re-architecturing my application so that it fits inside that strange 10 topic per cluster limit. With Kafka I can have 100 topics in one cluster. With Event Hubs I need 10 clusters of 10 topics each, what adds the complexity of knowing to which cluster your consumers and producers need to connect to. That completely changes the architecture of your application (making it much more complex).
I've looked through the Internet for an answer to this with no luck, everyone seems to see a lot of advantages using Event Hubs, so I am starting to think maybe I am missing something. Which would be a efficient way of fitting lots of topics inside that 10 topic limit without changing my architecture a lot?
Azure Event Hubs offers Kafka/EH for data streaming in two different umbrellas - Single Tenancy and Multi-tenancy. While multi-tenancy gives you the flexibility to reserve small and use small capacity, it is enforces with Quotas and Limits. These are stringent and cannot be flexed out. Reason, analogically you can imagine multi-tenancy to be a huge kafka cluster of which %CPU and %memory is shared with strict boundaries among different tenants. With this infrastructure to honor multi-tenancy we define boundaries and these boundaries are enforced by quotas and limits. Event Hubs is the only PaaS service that charges you for reserving your bandwidth and the ingress of events. There is no egress charge. We also let you ingress xMBps and egress 2xMBps and the quotas lets us with this boundary. Our single tenant clusters can be thought of as mimicking the exact KAfka cluster where there are no quotas attached. The limits here that we enforce are the actual physical limits. The limits of 1000 topics per namespace and 50 namespace per Capacity units are soft limits which can be relaxed as they are just enforcing the best practices. The cost justification when you compare Standard and Dedicated is not any different and in fact when you do > 50MBps, you can an advantage as the whole capacity is dedicated to one tenant with Dedicated. Also a single Capacity Unit (in which the Dedicated clusters are sold) lets you achieve anywhere between 100MBps - 250MBps based on your send/recieve pattern, payload size, frequency and more. For comparison purpose, although we do not do 0TUs on Standard and there is no direct relation/mapping between dedicate CUs and Standard
TU's, below is a pricing example,
50TU's = $0.03/hr x 50 = $1.5 per hour | 50,000 events per second = 180,000,000 events per hour
180,000,000 / 1,000,000 = 180 units of 1,000,000 messages | 180 X $0.028 = $5.04 | So, a grand total of $6.54 per hour
Note that the above does not include Capture pricing. And for a grand total of $6.85 per hour you get Dedicated with Capture included.
Was looking into the limitation, it seems that the dedicated tier has 1000 event hubs per namespace. Although there will be some additional cost due to the dedicated tier.

Many ordered queues - how to auto rebalancing streams between app instances?

Problem description
I want to deploy distributed, ordered queues solution for my project but I have questions/problems:
Which tool/solution should I use? Which would be the easiest to implement/learn and infrastructure cost me less? RabbitMQ, Kafka, Redis Streams?
How to implement auto rebalancing of topics/streams for each consumer in failure situation or when new topic/stream was added to system?
In other words, I want to realize something like that:
distributed queues
..but, if one of my application are failed, other instances should take all traffic which is currently left with proper distribution (equal load).
Note, that my code was written in node.js v10 (TypeScript) and my infrastructure are based on Azure, so besides self-hosted solution (like RabbitMQ), azure-based solution (like Azure Service Bus) are also possible, but less vendor-lock, the better solution for me
My current architecture
Now I provide a more detailed background of my system:
I have 100 000 vehicle's tracker devices (different ones, many manufactures and protocols), each of them communicate with one of my custom app called decoder. This small microservice decodes and unifies payload from tracker and send it to distributed queue. Each tracker sends message every 10-30 seconds.
Note, that I must keep order of messages from single device, this is very important!
In next step, I have processing app microservice which I want to scale (forking / clustering) depends of number of tracker devices. Each fork of this app should subscribe to some of topics/consumer groups to process messages from devices, while keeping order. Processing of each message takes about 1-3 seconds.
Note, that in every moment of time, I can add or remove tracker devices, and this information should be auto-propagate to forks of processing app and this instances should be able to auto rebalancing traffic from queue.
The question is how to do that with as little as possible lines of (node.js) code, and at the same time, keeping solution easy, clean and cheap? :)
As you see at picture above, if fork no.3 failed, system must decide which of working forks should be get "blue" messages. Also, if fork no.3 return back, rebalancing is also needed.
My own research
I read about Apache Kafka with Consumer Groups, but Kafka is difficult to learn and to implement for me.
I read about RabbitMQ and Consumer Groups / many topics, but I don't know how to write auto rebalancing feature and also how I can use RabbitMQ (which plugins? which settings / configurations? there's so many options...).
I read about Azure Service Bus with message sessions but it has vendor-lock (azure cloud), it costs a lot, and like other solutions, doesn't provide full auto-rebalancing out-of-box.
I read about Redis Streams (with consumer groups) but it's new feature (lack of libraries for node.js) and also doesn't provide auto-rebalancing.
1 Message Brocker
For the first question you should look for a mature m2m protocol brocker which will give you freedom in designing your own intelligent data switching algorithms.
2 Loadbalancer
The answer to the second question you must employ well performed load balancer for handling such a huge number of 100000 connected cars. My suggestion to use Azure API Gateway or Nginx load balancer.
Now lets look at some of connected car solutions and analyze how the Aws IoT or Azure IoT doing the job nicely.
OpenSource IoT Solution
OpenSource IoT Solution
Nginx or API Gateway is used for the load Balancing purposes while the event processing is done on Kafka. Using kafka you can implement your own rule engine for intelligent data switching. Similarly any Message Broker as IoT bridge would do better. If I were you would be using VerneMQ to implement MQTTv5 features and data routing. In this case queue is not required.
Again if you want to use azure queue you have to concentrate on managing the queue forking and preempting. To control the queue seamlessly you have to write Azure Queue Trigger server-less Function. Thus your goal to not be vendor locked would be impossible to achieve.
In single word using VerneMQ, MQTT V5 implementation with Nginx would be great to implement but as all these are opensource product you must be strong in implementation and trouble shooting otherwise your business operation would be in support failure.
Its better to use professional IoT cloud services for a solution of thousands of connected cars. This is paying of as the SLA of the service is very high standard and little effort in system operation management.
Azure IoT Solution
Azure IoT Solution
If you are using Azure Solution, you be using IoT Hub where you don't have to worry about load balancing. Using Azure device SDK you can connect all the car with mobile LTE sim, OBD plugin etc to the cloud. Then azure function can handle the event processing and so on.
AWS IoT Solution
AWS IoT Solution
Unlike Azure IoT Device SDK, AWS IoT have sdk for devices. But in this architecture we want to complete the connected car project a little differently. For the shake of thing shadow and actual device status synchronization we have used AWS GreenGrass core solution in the edge side. Along with the server-less IoT event processing we have settled the whole connected car solution.
Similarly Azure IoT edge could be used to provide all can information to the device twin and synchronize between the actual car and twins.
Hope this will give you a clear idea how to implement and see the cost benefit over the vendor locked or unlocked situation.
Thank you.

Availability of Azure Media Services Job notifications in Storage Queue

I want to get job change notifications from Azure Media Services using Storage queue. I read the statement It is also possible that some state change notifications get skipped here.
Can someone tell me how reliable is this service ?. Is it a very small percentage of all messages or is it quite visible ?
We do not provide a guarantee on delivery, but I would say that it is highly reliable.
There are cases where notifications can get dropped (a storage outage so that Media Services cannot deliver notifications to a queue for an extended period of time for example). I would recommend that you have a fallback plan that if a job hasn’t been updated by notification for some expected duration, that you have backup code to GET the job details from the API directly.
The “expected duration” is a little bit difficult to predict since job processing times vary by operation and input content size/complexity. There also can be queuing depending on the number of jobs submitted and the number of reserved units you have.
Hope that helps,
John

Azure Service Bus - Multiple Topics vs Filtered Topic

I have written an implementation of azure service bus into our application using Topics which are subscribed to by a number of applications. One of the discussions in our team is whether we stick with a single Topic and filter via the properties of the message or alternatively create a Topic for our particular needs.
Our scenario is that we wish to filter by a priority and an environment variable (test and uat environments share a connection).
So do we have Topics (something like):
TestHigh
TestMedium
TestLow
UatHigh
UatMedium
UatLow
OR, just a single topic with these values set as two properties?
My preference is that we create separate topics, as we'd be utilising the functionality available and I would imagine that under high load this would scale better? I've read peeking large queues can be inefficient. It also seems cleaner to subscribe to a single topic.
Any advice would be appreciated.
I would go with separate topics for each environment. It's cleaner. Message counts in topics can be monitored separately for each environment. It's marginally more scalable (e.g. topic size limits won't be shared) - but the limits are generous and won't matter much in testing.
But my main argument: that's how production will (hopefully) go. As in, production will have it's own connection (and namespace) in ASB, and will have separate topics. Thus you would not be filtering messages via properties in production, so why do it differently in testing?
Last tip: to make topic provision easier, I'd recommend having your app auto create them on start up. It's easy to do - check if they exist, and create if they don't.
Either approach works. More topics and subscriptions mean that you have more entities to manage at deployment time. If High/Medium/Low reflect priorities, then multiple topics may be a better choice since you can pull from the the highest priority subscription first.
From a scalability perspective there really isn't too much of a difference that you would notice since Service Bus already spreads the load across multiple logs internally, so if you use six topics or two topics will not make a material difference.
What does impact performance predictability is the choice of service class. If you choose "Standard", throughput and latency are best effort over a shared multi-tenant infrastructure. Other tenants on the same cluster may impact your throughput. If you choose "Premium", you get ringfenced resources that give you predictable performance, and your two or six Topics get processed out of that resource pool.

Are there disadvantages of using large number of entities in Azure ServiceBus

In another words, if I create messaging layout which uses rather large number of messaging entities (like several thousands), instead of smaller number, is there something in Azure ServiceBus that gets irritated by that and makes it perform less than ideally, or generates significantly different costs. Let us assume that number of messages will remain roughly the same in both scenarios.
So to make clear I am not asking if messaging layout with many entities is sound from applications point of view, but rather is there in Azure some that performs badly in such situations. If there are advantages to it (perhaps Azure can scale it more easily), that would be also interesting.
I am aware of 10000 entites limit in single ServiceBus namespace.
It is the more matter of programming and architecture of the solution i think - for example, we saw the problems with the ACS (authentication mechanism) - SB started to throttle the client sometimes when there were many requests. Take a look at the guidance about SB high availability - there are some issues listed that should be considered when you have a lot of load.
And, you always have other options that can be more suitable for highload scenarios - for example, Azure Event Hubs, more lightweight queue mechanism intended to be the service for the extremely high amount of messages.

Resources