What is the cost of having multiple subscriptions in Pulsar?

What is the cost of having multiple subscriptions in Pulsar? - apache-pulsar

What is the cost of creating multiple subscriptions on a topic? Does creating more subscriptions affect the broker, Bookkeeper or both?
I'm assuming it only adds work load to the broker but the work increased by adding another subscription is minimal since it would only have to duplicate the messages?

In Pulsar, a topic (or a partition) is owned by a single Broker meaning that all reads and writes go through that broker. Brokers cache bookies entries in memory so that it can dispatch messages directly to all consumers. This will avoid a network round-trip and a possible disk read on bookies.
In addition, you should note that a broker must send data over the network for each subscription. This can lead to network saturation if you have a very high throughput.
Moreover Bookies also have a write/reach cache to reduce disk access.

Related

Azure Service Bus ordered processing of message

If for a Azure Service Bus topic, there is single subscription with some filter. There is a microservice A which has created a SubscriptionClient for the Subscription with concurrency of 1 for reading messages. Also if there are 2 such replicas of this service A, and say, there are 3 messages in an unpartitioned service bus topic inserted to topic at t1, t2 and t3 time.
t1 < t2 < t3
Is there a possibility that t2 message can get delivered by service bus to Replica-2 before t1 gets delivered to Replica-1 ?
If not, what is scaling strategy for service bus topics while processing subscriptions and adding replicas of consuming microservice.
Note: Compared to kafka, it ensures that a message for 1 partition is delivered to only one replica and to one thread that is listening to that partition and thus ordered processing of message is guaranteed. But not sure w.r.t service bus topic like Azure Service bus, if multiple replicas are listening to same subscription with different subscriptionClients, can they receive/process out-of-order messages ?

If you want to enable ordered message processing with Azure Service Bus, you have to use Sessions.
You can use a message's SessionId as an equivalent of the partitionId you may use in Kafka. This way, you can still scale your consumers, limited by the number of distinct SessionId values at any given time.
Message sessions. Implement workflows that require message ordering or message deferral.
Sessions provide concurrent de-multiplexing of interleaved message streams while preserving and guaranteeing ordered delivery.
When multiple concurrent receivers pull from the queue, the messages belonging to a particular session are dispatched to the specific receiver that currently holds the lock for that session. With that operation, an interleaved message stream in one queue or subscription is cleanly de-multiplexed to different receivers and those receivers can also live on different client machines, since the lock management happens service-side, inside Service Bus.
Service Bus Partitions spread out the load across multiple nodes, and don't provide any ordering guarantees.
Partitioning means that the overall throughput of a partitioned entity is no longer limited by the performance of a single message broker or messaging store. In addition, a temporary outage of a messaging store does not render a partitioned queue or topic unavailable.

Capture messages sent to Azure Service Bus Topics with no subscriptions or filtered out?

I want to create a Service Bus Topic with a couple of subscriptions using filters for different message types. However I need to guarantee that all messages sent to the Topic will be received and successfully processed by at least one subscription, even if all of the subscribing processes go offline.
Is there a better way than auto-forwarding to queues for each filter, and a way to capture messages ignored by all filtering subscribers without capturing all messages?
Edit: my motivation is to provide a queue-like mechanism with prioritisation without creating a queue for each message type/priority level, or at least manage the complexity of multiple queues on the listening side. A queue generally guarantees a consumer. Rather than have the publisher have to push to different queues I would like to use a topic and use filters to manage priority.
Based on my current knowledge of the SB I suspect that I just need to make sure the subscriptions are in place for a topic including an inverse catch-all filter subscription before exposing the topic for use. I don't know whether subscriptions are completely reliable.

However I need to guarantee that all messages sent to the Topic will be received and successfully processed by at least one subscription, even if all of the subscribing processes go offline.
There's a problem in that statement. Topics and subscriptions are there to implement pub/sub and decouple publishers from subscribers. The broker itself does not guarantee there will be subscribers.
While topics support EnableFilteringMessagesBeforePublishing (TopicDescription.EnableFilteringMessagesBeforePublishing) it is not recommended for production use.
Update
Based on the updated question, the general answer remains the same. Topics/subscriptions are for pub/sub and decoupling. If you want to ensure that no message is lost once subscriber is coming online, you will need to ensure that subscription is created first.
I don't know whether subscriptions are completely reliable.
Yes, subscriptions are reliable. Behind the scenes subscription is a queue.
In case you want to route your messages to different processors based on message type, publishing that message to a topic and having forwarding subscriptions is a good approach. You do need to be mindful of the quotas (how many subscriptions per topic you can create), but those are fairly high. And if you get to that point, it's possible to reduce number of subscriptions when a given processor handles multiple message types by having more complex SQL filtering rules.

Unsure which azure queue should I use

Currently, I have a problem handling data which I have sent from application to azure queue. The data I sent required to be sent FIFO but the Azure Queue cannot guarantee to be in order. Whereas Azure Service Bus Queue was guaranteed to be FIFO.
I am not sure is Azure Service Bus Queue has any differences with the Azure Queue.

As a solution architect/developer, you should consider using Storage queues when:
Your application must store over 80 GB of messages in a queue, where the messages have a lifetime shorter than 7 days.
Your application wants to track progress for processing a message inside of the queue. This is useful if the worker processing a message crashes. A subsequent worker can then use that information to continue from where the prior worker left off.
You require server side logs of all of the transactions executed against your queues.
As a solution architect/developer, you should consider using Service Bus queues when:
Your solution must be able to receive messages without having to poll the queue. With Service Bus, this can be achieved through the use of the long-polling receive operation using the TCP-based protocols that Service Bus supports.
Your solution requires the queue to provide a guaranteed first-in-first-out (FIFO) ordered delivery.
You want a symmetric experience in Azure and on Windows Server (private cloud). For more information, see Service Bus for Windows Server.
Your solution must be able to support automatic duplicate detection.
You want your application to process messages as parallel long-running streams (messages are associated with a stream using the SessionId property on the message). In this model, each node in the consuming application competes for streams, as opposed to messages. When a stream is given to a consuming node, the node can examine the state of the application stream state using transactions.
Your solution requires transactional behavior and atomicity when sending or receiving multiple messages from a queue.
The time-to-live (TTL) characteristic of the application-specific workload can exceed the 7-day period.
Your application handles messages that can exceed 64 KB but will not likely approach the 256 KB limit.
You deal with a requirement to provide a role-based access model to the queues, and different rights/permissions for senders and receivers.
Your queue size will not grow larger than 80 GB.
You want to use the AMQP 1.0 standards-based messaging protocol. For more information about AMQP, see Service Bus AMQP Overview.
You can envision an eventual migration from queue-based point-to-point communication to a message exchange pattern that enables seamless integration of additional receivers (subscribers), each of which receives independent copies of either some or all messages sent to the queue. The latter refers to the publish/subscribe capability natively provided by Service Bus.
Your messaging solution must be able to support the "At-Most-Once" delivery guarantee without the need for you to build the additional infrastructure components.
You would like to be able to publish and consume batches of messages.
https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-azure-and-service-bus-queues-compared-contrasted

Messages in Storage queues are typically first-in-first-out, but sometimes they can be out of order; for example, when a message's visibility timeout duration expires (for example, as a result of a client application crashing during processing). When the visibility timeout expires, the message becomes visible again on the queue for another worker to dequeue it. At that point, the newly visible message might be placed in the queue (to be dequeued again) after a message that was originally enqueued after it.
You will find this article helpful in making the decision for your case: Storage Queues and Service Bus Queue comparison. It compares some of the fundamental queuing capabilities provided by Storage queues and Service Bus queues.
Also read Get started with Service Bus Queues.

Does Microsoft's Service Bus replicate message for every subscription in a topic?

Does the Azure Service Bus and its on-premise version, Service Bus for Windows Server, replicate a message for every subscriber?
For example, let's say that there is a single topic with five subscribers, then is that message stored in the service bus' database five times - once for each subscriber - or is that message only stored once with business logic to determine which subscribers have read the message?
It would be nice if there is an official site and/or documentation to provide as a reference.

The behavior the Azure Service Bus seems to be that it is keeping a copy per subscriber. I tested this by creating a topic with two subscriptions. I sent in a single message and I see that the size of the Topic in Bytes is 464 (using topic.SizeInBytes). When I receive one message of a subscription the size the drops in half to 232. I tested it with three subscriptions and same behavior occurred: 696 bytes.
Even if they aren't keeping a copy of the message per subscription they are counting the size of the message times the number of subscriptions against the maximum size of the topic, which may be what you were trying to determine.
I agree it would be nice if they documented the behavior, especially for Service Bus for Windows Server since that could affect planning for the amount of storage you need to set aside. As for the Azure Service Bus side, I'm not sure the implementation behind the scenes matters as much as knowing how it factors towards the max size of the topic.

A subscription to a topic resembles a virtual queue that receives
copies of the messages that were sent to the topic. You can optionally
register filter rules for a topic on a per-subscription basis, which
allows you to filter/restrict which messages to a topic are received
by which topic subscriptions.
I think it copies messages. If it does not copy, it should check always, did all subscribers get the messages ? Additionally, if there is filter, it should check just these subscribers to delete message. I think, copying and applying simple consume implemation cost is less than without copying cost.
Article

Can Azure Service Bus Sub/Topic implementation work for this approach?

I have potentially tens or even hudnreds of thousands of clients who need to communicate with a central server.
Communication is in the form of:
receive command from central servers (process it on the client)
respond with a status to central servers
I would like to avoid having the client machines talk to any intermediate web/API servers and instead, I want them to go directly to ASB
No client can see each other's messages. Whatsoever. I understand I can use SAS tokens to provide temporary privilges to clients and renew them on a scheduled basis and that's great and works within my architecture. However, I'm not sure if I can utilize the same ASB topic and have each client have their own topic inside?
Is ASB even the right technology for this? Can I somehow maintain only two queues/service-bus subscriptions for this (request/reply) or must I create an individual queue for each indivdiual client?
TIA

It’s difficult to tell without knowing more about the nature of the messages you are sending – e.g. how many are being sent. However, with this many clients you may be coming up against the quotas which are shown here:
https://msdn.microsoft.com/en-us/library/azure/ee732538.aspx
The salient limitations are:
100 concurrent connections per entity (i.e. topic, queue or subscription)
2,000 subscriptions per topic
10,000 queues or topics per service bus namespace
100,000 correlation filters per topic
It’s worth taking a look at the Azure scalability scenarios described in the second half of this document:
https://msdn.microsoft.com/en-us/library/azure/hh528527.aspx
It may be possible to get the broadcast side of things going by getting clients to connect with correlation filters though I have not tried using them on this scale.
If you want to have lots of senders going to a single queue then you should consider using the Service Bus REST API for message sending.
Otherwise, I'm afraid you may want to consider a proxy...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string