We have an application which creates lots of topics.
When there are no more clients listening (cluster-wide) to a specific topic, that topic should be destroyed. But how to find out, in a clustered environment, that there are no more clients listening to that topic? We don't want to destroy the topic if there are other clients listening on that topic on other nodes in the cluster.
Regards
Fredrik
You most probably have to register / unregister them using a distributed MultiMap where the key is the name of the topic and the values are the listeners.
Related
Problem
I am facing what I would assume is a common problem where I have a publisher which publishes on topics that are both high and low-volume (i.e. both topics where dropping is okay when subscriber/network is slow because they are high-frequency and topics where updates are slow and I never want to drop messages).
Possible Solutions
Unbound queue on Publisher side
This might work, but the high-volume topics may be so fast that they flood memory. It is desirable for high-volume topics to be dropped and low-volume topics to be protected.
One PUB socket per high-volume topic or all high-volume topics or for every topic
Based on my understanding of ZeroMQ's queueing model, with a queue for each connection both on the publisher side and on the subscriber side, the only way to protect my low-volume topics from being dropped and pushed out by the high-volume topics right now is to create a separate PUB socket for each or all high-volume topics and somehow communicate that to subscribers who will need to connect to multiple endpoints from the same node.
This complicates things on the subscriber side, as they now need prior knowledge of mappings between ports and topics or they need to be able to request an index from the publisher node. This then requires that the port numbers are fixed or that every time a subscriber has a connection issue, it checks the index to see if the port changed.
Publisher- and Subscriber- Side Topic Queues Per Connection
The only solution I can see, at the moment, is to create a queue for each subscriber & topic on the publisher side and on the subscriber side, hidden away inside a library so neither side needs to think about it. When messages for a specific topic overflow, they can still be dropped without pushing out messages for other topics. A separate ordered dictionary would need to be used to maintain queue pull/get order by a worker feeding messages into the PUB socket or pulling events out from the subscriber-side.
This solution only works if I know when the ZeroMQ queue on the publisher side is in a mute state and will drop messages, so I know to hold off "publishing" the next message which will probably be lost. This can be done with the option, ZMQ_XPUB_NODROP (http://api.zeromq.org/master:zmq-setsockopt).
This will probably work, but it is non-trivial, probably slower than it could be because of the language I use (Python), and the kind of thing I kind of would have expected a messaging library to handle for me.
I know you can set topic subscription to be shared subscription to allow for multiple Consumers on the same topic. Can this also be done for multiple Producers?
For some reason when I try to, I get a Producer with name '<topic_name>' is already connected to topic
Yes, you can have multiple producers on a topic. You just have to make sure each producer has a unique name. From the ProducerBuilder.producerName section of the Java client API docs:
When specifying a name, it is up to the user to ensure that, for a
given topic, the producer name is unique across all Pulsar's clusters.
Brokers will enforce that only a single producer a given name can be
publishing on a topic.
The easiest way to ensure the producer name is unique is to let Pulsar set it automatically for you. From the same section:
If not assigned, the system will generate a globally unique name which
can be accessed with Producer.getProducerName().
I'm investigating a tech for our cluster. Pulsar looks good, but the usage looks more like a queueing system. Of course, queueing system is good to have, but I have a specific requirement: broadcasting.
We would like to use one machine to generate the data and publish it to a Pulsar topic. Then we use a group of servers, forming a replica. Each server consumes the message flow on that topic, and serves clients via WebSocket.
This is different than the Shared subscription, because each server needs to receive all messages, not a fraction of it.
I came to this post: https://kafkaesque.io/subscriptions-multiple-groups-of-consumers-on-pulsar-topic/ , which explains how to do such a job: each server needs create a new exclusive subscription, say use a UUID as its subscription name, from the unique exclusive subscription you can get the full message flow of that topic.
But since our server replica can be dynamic, so once some of the server restart, they will create new UUID subscription again, which will leave many orphan subscriptions on the topic, which eventually would become maintenance headache.
Anyone has the experience to setup a broadcast use case using Pulsar?
Actually, I found that the "Reader Interface" is exactly for this kind of use case:
https://pulsar.apache.org/docs/en/concepts-clients/#reader-interface
Using an exclusive subscription for each consumer is the only way to ensure that each of your consumers receives ALL of the messages on the topic, and Pulsar handles multiple subscriptions quite well.
The issue it seems is the server restart use case, and I don't think that simply connecting with a new UUID subscription is the right approach (putting aside the orphaned subscriptions). You really want to have the server reuse the previous subscription after it restarts. This is because each subscription keeps track of the last message in the topic that it had processed and acknowledged, so you can pick up exactly where you had left off before the server crashed if you reconnect with the same subscription UUID. If you connect with a new UUID, then you will start processing messages produced from that point in time forward, and all messages produced during the restart period will be "lost"
Therefore, you will need to find a mechanism to share these UUIDs across server failures and return them to the restarting server. One approach would be to have a mechanism similar to zookeeper leader election, in which each server is granted an exclusive lease that expires periodically. The server must then periodically refresh the lease to retain it. Then if the server were to crash, it would fail to refresh the lease on that UUID and the restarting server would then be granted the lease when it attempts to reconnect.
See https://curator.apache.org/curator-recipes/leader-election.html for a better explanation of the pattern.
We have a Azure service fabric micro-service which listen to multiple azure service bus topics(Topic A, Topic B).
Topic A has more then 10 times message traffic then topic B. and to handle the scale-ability of service we will create the multiple instance of service.
My first question is, In most of the services instance will not get the message in Topic B, As Topic B has less traffic, So will it be waste of resources ?
2 Is it better to create different micro-services for Topic A and Topic B listeners, and create 10x instance of micro-service which listen to topic A and x instance of topic B listener service ?
Is create a message listener in azure service bus, keep on pulling message every time ? means continuously looking/ checking for message, message is there or not.
Thanks Guys for your supports.
If one service receives messages from 2 topics, there's little waste of resources. Listening for messages is not a very resource intensive process.
This depends on your application requirements.
This depends on whether you are using SBMP / SOAP (default) or AMQP as the communication protocol. AMQP is connection based. SBMP does (long) polling.
Microservices advocates the idea of loosely coupled services, where each micro-service will handle his own domain.
Following the microservices approach, if you understand that you had to create two different topics to publish your messages, probably it is because they have different scopes\domain, needing their own micro-service.
In your description it is hard to identify if the domain of TopicA and TopicB are related, so we can not offer a good suggestion.
In any case, if one service listen for both topics, let's assume TopicA handles 1000 messages and TopicB handles 100 per second.
In case you have to publish a new version of your application to handle changes on TopicB messages, you would have to stop the handling of TopicA, that was not necessary. So you are coupling the services, that to begin with should be two independent services, or both topics should be handle as a single one.
Regarding your questions:
1 My first question is, In most of the services instance will not get
the message in Topic B, As Topic B has less traffic, So will it be
waste of resources ?
Waste of resources is relative how you design your application, it might be if your service listen the queue\topic and handle it at the same time, and uses too much memory to keep running all the time. In this scenario, would be case to split them and make a Queue\Topic Listener and other Message Handler that will receive the message to process, and if it keep too long without processing messages you shut it down, leaving just the listener. You could also use actors instead of a service.
2 Is it better to create different micro-services for Topic A and
Topic B listeners, and create 10x instance of micro-service which
listen to topic A and x instance of topic B listener service ?
Yes for the services, regarding the the number of instances, it should be driven by the size of the queue, otherwise you would have too much listeners and also wasting resources, if you follow the approach of splitting the services, you would need one listener receiving the messages from the queue\topic and it would delivery the messages to multiple messages handlers(service instances\actors) and the queue\topic listener control the number of running instances at same time.
3 Is create a message listener in azure service bus, keep on pulling
message every time ? means continuously looking/ checking for message,
message is there or not.
Is not the only approach, but it's correct.
My requirement is to load balance 2 MQTT nodes running on different VMs and then having consumers to these MQTT brokers on both nodes. The job of the consumers will be to subscribe on one topic and after receiving the data, publish it to Kafka. Problem I see if that since both MQTT consumers are subscribed on the same topic, they will receive the same message and both will insert it into Kafka thereby creating duplicates. is there anyway to avoid writing duplicates into Kafka?
I have tried Mosquitto and Mosca brokers but they do not support clustering. So subscribed clients were not getting messages if they got subscribed to a different node then the node where message was published. Both nodes are behind HAProxy.
I am currently using emqtt broker which supports clustering and the load balancing issue gets solved by that but it seems it does not support shared subscriptions across cluster nodes.
A feature like the Kafka consumer group is what is required I believe. Any ideas?
Have you tried HiveMQ?
It offers so called shared subscriptions.
If shared subscriptions are used, all clients which share the same subscription will receive messages in an alternating fashion.