Azure Service Bus - is it a good solution for peer-to-peer messaging platform?

Azure Service Bus - is it a good solution for peer-to-peer messaging platform? - azure

We are designing a system where users can exchange "messages" (let's say XML files for simplicity sake). This system is peer to peer by design - meaning only directed messages are supported. User A can only send message to User B, it is not possible to send messages to "groups" of users etc. FIFO order is mandatory requirement as well.
This must be a reliable solution - so we started looking into Azure and its services. And Service Bus does look like the right solution to me. It offers all bells and whistles we are looking for:
FIFO order is guaranteed
Dead-letter queue with timeouts
Geo-redundancy
Transactions
and so on
So naturally, I started playing with it. And the first idea I had was to give each user of my system a QUEUE from the service bus. It will act as an INBOX for them. Other users send messages to the user (let's say using unique USER_ID as a queue ID for example), messages get accumulated in the queue and when user decides to check the inbox, they will get all the messages in the correct order. This way we "outsource" all routing, security etc etc to the service bus itself - thus considerable simplifying the app logic.
But there is a serious caveat in this approach - Service Bus supports only up to 10,000 queues: https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted#capacity-and-quotas and the number of users in my system can reach tens of thousands (but max out at 100,000 or so). So I'm somewhat in the range but not really. Therefore, I have questions:
Is there a flaw in my approach? Overall, is that a good idea to give a queue to the user exclusively? Or perhaps I should implement some kind of metadata and route messages based on it?
Am I looking at the right solution? I want to use SaaS as much as possible so I don't want to start building RabbitMQs on VMs etc - but are there built-in alternatives? may be a different approach should be considered?..
As for the numbers, I'm looking to start with 2,000 users and 200,000 messages a day - not a high load by any means. But if things work out, I see how these numbers can increase by 20x - 30x (but no more).
I would appreciate any options on this. Thank you.

Related

if we use Azure Service bus with SessionId is slower than without ? Or do they have the same speed

I'm using service bus service From azure to Send Messages and I was wondering if Using SessionId will effect the speed of sending messages than the Case if I dont use it.
I know that SessionId will preserve the Order but what about the all in all speed ?
Thanks

Sending a message will not be much slower when you specify a session ID. Processing will be, but this is the wrong terminology to use. You can't compare handling messages w/o a session by multiple concurrent consumers and sessioned messages where the intent is to process those messages in the order they were sent in. Different business requirements that have different justifications, right? If you plan to use sessions, processing will be somewhat slower due to only a single active consumer being able to process all the messages from a given session. And that has to be backed up by a requirement, probably.
Take, for example handling items scanned at a grocery checkout. If you want to know what items are purchased in general, competing consumers is the way to go. However, if you want to know what items were bought per purchase, you can't use a competing consumer and have to use sessions to ensure only items for a given purchase are included and nothing else. Will the latter be somewhat slower? Yes, but you can't accomplish it with a competing consumer and if the business wants it, they'll accept the cost of slightly slower processing to gain the insights. Note, there are always multiple ways to solve the problem and maybe sessions is not what's needed at all.

Architecture issue - Azure servicebus and message order guarantee

Ok so i'm relatively new to the servicebus. Working on a project where we use Azure servicebus for queueing messages. Our architecture roughly looks like the following:
So the idea is that in our SourceSystem all kinds of stuff happens, which leads to messages being put on the servicebustopics. Now our responsibility is syncing these events to the external client so they are aware of what we are doing.
Now the issue is that currently we dont use servicebus sessions so message order isnt guaranteed. Also consider the following scenario:
OrderCreated
OrderUpdate 1
OrderUpdate 2
OrderClosed
What happens now is if the externalclients API is down for say OrderUpdate 1 and OrderUpdate 2, we could potentially send the messages in order: OrderCreated, OrderClosed, OrderUpdate 1, OrderUpdate 2.
Currently we just retry a message a few times and then it moves into the deadletter queue for manual reprocessing.
What steps should we take to better guarantee message order? I feel like in the scope of an order, message order needs to be guaranteed.
Should we force the sourcesystem to put all messages for a order in a servicebus session? But how can we handle this with multiple topics? And what do we do if message 1 from a session ends up in the deadletter?
There are a lot of considerations here, should we use a single topic so its easier to manage the sessions? But this opens up other problems with different message structures being in a single topic?
Id love to hear your opinions on this

Have a look at Durable Functions in Azure. You can use the 'Async Http API' or one of the other patterns to achieve the orchestration you need to do.
NServicebus' Sagas might also be a good option, here is an article that does a very good comparison between NServicebus and Durable Functions.

If the external client has to receive all those events and order matters, sending those messages to multiple topics where a topic is per message type will make your mission extremely hard to accomplish. For ordered messaging first you need to use a single entity (queue or topic) with Sessions enabled. That way you can guarantee ordered message processing. In case you have multiple external clients, you'd need to have a session-enabled entity (topic) per external client.
Another option is to implement a pattern known as Process Manager. The process manager would be responsible to make the decisions about the incoming messages and conclude when the work for a given order is completed or not.
There are also libraries (MassTransit, NServiceBus, etc) that can help you. NServiceBus implements Process Manager via a feature called Saga (tutorial) and MassTransit has it as well (documentation).

Azure service bus: is it wise to create a separate topic for every event you broadcast?

I am trying to design the strategy that my organization will employ to create topics, and which messages will go to which one. I am looking at either creating a separate topic for each event, or a single topic to hold messages from all events, and then to triage with filters. I am convinced that using a separate topic for every event is better because:
Filters will be less complex and thus more performant, since each
event is already separated in its own topic.
There will be less chance of message congestion in any given topic.
Messages are less likely to be needlessly copied into any given
subscription.
More topics means more messaging stores, which means better message
retrieval and sending.
From a risk management perspective, it seems like having more topics
is better. If I only used a single topic, an outage would affect all
subscribers for all messages. If I use many topics, then perhaps outages would only affect some
topics and leave the others operational.
I get 12 more shared access keys per topic. It's easier to have more granular control over which topics are
exposed to which client apps since I can add/revoke access by
add/revoking the shared access key for each app on a per-topic basis.
Any thoughts would be appreciated

Like Sean already mentioned, there is really no one answer but here are some details about topics that could help you.
Topics are designed for large number of recipients by sending messages to multiple (upto 2000) subscriptions, which actually have the filters
Topics don't really store messages but subscriptions do
For outages, unless you have topics across regions, I'm not sure if it would help as such
The limit is for shared access authorization rules per policy. You should be using one of these to generate a SAS key for your clients.
Also, chaining service bus with autoforwarding is something you could consider as required.

Number of channels and billing

I am looking at building an app that monitors the public transport buses for a major city:
I did a quick prototype using pubnub. The buses have a phone transmitting gps signals to a channel and bus users have phones subscribed to channels. I have questions:
I am planning for each bus route there is a channel. The city has 50 routes so there will be 50 routes. Does this adhere to the best practice?
Is there an api to list channels ?
I am sending a message to a channel every second. Assume, there are 50 routes with 5 buses each running 24 hours. There will be 216000000 daily messages. what will i be charged for a day?
Does your Android client open a network connection everytime a publish is call? I want to minimize the bandwith used by the phone that is transmitting the GPS signal.
Bus users may want to see location of multiples buses. I know best practice is to subscribe to one public and one private channel. What is the best way to do it?
I would appreciate if you could answer the above questions.

Full disclosure up front - I work for PubNub Customer Success so responses for pricing related questions are informational in nature only and not to be construed as a promotional. Asker specifically mentions PubNub and the information provided below is publicly available from the PubNub website.
Anant, also as an FYI StackOverflow would normally ask that each of these questions gets asked as a separate thread. Moving forward please do your best to adhere to community guidelines.
1 Every implementation will be different as far as the specific architecture and design pattern strategy though your proposed approach seems to be a sensible utilization of channel methodology. PubNub does not limit the total number of channels in use, however as a practical limitation for most mobile development frameworks subscribing to more than 50 channels simultaneously would be around the upper limit. Adding more than that and both iOS and Android will begin exhibiting performance limitations. If new bus lines are added the subscriptions can be managed to only subscribe to nearby routes, etc.
Question 1 the second with the indent. Yes that can be done with the here_now API
2 PubNub charges $1 per million messages (without SSL enabled) so based on your hypothetical your message charges would be $216 per day. That being said, there is significant room here for design pattern optimization so that busses only publish a new location whenever there is a change - repeated publishes while the bus is standing still are unnecessary. This optimization on it's own will bring the message usage figure down significantly, and there are other strategies which can be utilized to further optimize depending on your specific implementation approach. If you anticipate needing more than 1 billion messages per month, a deployment to Global Cloud would make sense so as to avail yourself of volume discount pricing not otherwise available on Go Cloud.
3 Rather than opening a new connection with every publish, PubNub keeps an active socket connection open until unsubscribed or disconnected via loss of network connection/app force close. The bandwidth utilization to keep this connection active over a period of several hours and absent any other publish/subscribe activity typically measures less than 1K depending on your configuration parameters. Android supports background threading so even when the app is not in focus the connection can remain open to facilitate data push alerts which can be used to prompt the user to bring the app back into the foreground to review any updated information.
4 This question is not clear, assuming that the bus locations are published to the public channel what would the purpose of the private channel serve? If you meant a private channel to receive alerts for the arrival of the user's selected bus, then yes that would be an appropriate implementation strategy. Please clarify if you meant something different.

SocketIO scaling architecture and large rooms requirements

We are using socketIO on a large chat application.
At some points we want to dispatch "presence" (user availability) to all other users.
io.in('room1').emit('availability:update', {userid='xxx', isAvailable: false});
room1 may contains a lot of users (500 max). We observe a significant raise in our NodeJS load when many availability updates are triggered.
The idea was to use something similar to redis store with Socket IO. Have web browser clients to connect to different NodeJS servers.
When we want to emit to a room we dispatch the "emit to room1" payload to all other NodeJS processes using Redis PubSub ZeroMQ or even RabbitMQ for persistence. Each process will itself call his own io.in('room1').emit to target his subset of connected users.
One of the concern with this setup is that the inter-process communication may become quite busy and I was wondering if it may become a problem in the future.
Here is the architecture I have in mind.

Could you batch changes and only distribute them every 5 seconds or so? In other words, on each node server, simply take a 'snapshot' every X seconds of the current state of all users (e.g. 'connected', 'idle', etc.) and then send that to the other relevant servers in your cluster.
Each server then does the same, every 5 seconds or so it sends the same message - of only the changes in user state - as one batch object array to all connected clients.
Right now, I'm rather surprised you are attempting to send information about each user as a packet. Batching seems like it would solve your problem quite well, as it would also make better use of standard packet sizes that are normally transmitted via routers and switches.

You are looking for this library:
https://github.com/automattic/socket.io-redis
Which can be used with this emitter:
https://github.com/Automattic/socket.io-emitter

About available users function, I think there are two alternatives,you can create a "queue Users" where will contents "public data" from connected users or you can use exchanges binding information for show users connected. If you use an "user's queue", this will be the same for each "room" and you could update it when an user go out, "popping" its state message from queue (Although you will have to "reorganize" all queue message for it).
Nevertheless, I think that RabbitMQ is designed for asynchronous communication and it is not very useful approximation have a register for presence or not from users. I think it's better for applications where you don't know when the user will receive the message and its "real availability" ("fire and forget architectures"). ZeroMQ require more work from zero but you could implement something more specific for your situation with a better performance.
An publish/subscribe example from RabbitMQ site could be a good point to begin a new design like yours where a message it's sent to several users at same time. At summary, I will create two queues for user (receive and send queue messages) and I'll use specific exchanges for each "room chat" controlling that users are in each room using exchange binding's information. Always you have two queues for user and you create exchanges to binding it to one or more "chat rooms".
I hope this answer could be useful for you ,sorry for my bad English.

This is the common approach for sharing data across several Socket.io processes. You have done well, so far, with a single process and a single thread. I could lamely assume that you could pick any of the mentioned technologies for communicating shared data without hitting any performance issues.
If all you need is IPC, you could perhaps have a look at Faye. If, however, you need to have some data persisted, you could start a Redis cluster with as many Redis masters as you have CPUs, though this will add minor networking noise for Pub/Sub.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string