MQTT to Kafka. How to avoid duplicates - node.js

My requirement is to load balance 2 MQTT nodes running on different VMs and then having consumers to these MQTT brokers on both nodes. The job of the consumers will be to subscribe on one topic and after receiving the data, publish it to Kafka. Problem I see if that since both MQTT consumers are subscribed on the same topic, they will receive the same message and both will insert it into Kafka thereby creating duplicates. is there anyway to avoid writing duplicates into Kafka?
I have tried Mosquitto and Mosca brokers but they do not support clustering. So subscribed clients were not getting messages if they got subscribed to a different node then the node where message was published. Both nodes are behind HAProxy.
I am currently using emqtt broker which supports clustering and the load balancing issue gets solved by that but it seems it does not support shared subscriptions across cluster nodes.
A feature like the Kafka consumer group is what is required I believe. Any ideas?

Have you tried HiveMQ?
It offers so called shared subscriptions.
If shared subscriptions are used, all clients which share the same subscription will receive messages in an alternating fashion.

Related

How multiple Kafka Consumers in the same consumer group read messages from one partition in the topic?

I would like to know about how the consumers in the same consumer group read the messages from one topic which has only one partition.
For example, I have 3 consumers in one consumer group and that group is polling messages from Topic A which has partition A so if I have 1000 messages coming one by one in the Topic A how it would be delivered to 3 of the consumers.
Would it be like 3 messages will be delivered to 3 consumers parellely and once it's processed by each the another one would be delivered basically will they receive messages paraellely?
Would it be like any one consumer will fetch those messages as there is only one partition ?
Please also suggest me the best architecture approach for above scenario.
Thanks,
I want to process the multiple messages parallelly from one topic which has one partition to 4 consumers.
I am using the kafka structure with NodeJS microservices with kafkajs package.
In your scenario, only one consumer of that consumer group will read the data, most probably the first one you started. I'm not 100% sure as I never tried it out, but I assume the additional consumers will just idle without workload.
This question is essentially the same as yours.
If you want to achieve parallelity of consumers, you cannot avoid having multiple partitions, that's the main purpose of the whole partitioning concept.

Consumer is receiving only 50 percent of the messages published to the topic

We're noticing that exactly 50 percent of the messages produced to my Pulsar topic are reaching my app. Everything was working fine yesterday where our Pulsar consumer app was getting 100% of messages that were produced to the topic. We haven't made any setting changes in our app. What is happening with the missing messages? Where are they going?
Pulsar isn't losing your messages.
It looks like you're using a shared subscription and more than one consumer connected. That other consumer is receiving your other messages since the topic will dispatch them in a round-robin when using a shared subscription. This behavior can occur by design if your consumers are auto-scaling on a shared subscription.
If you check topic stats ($ pulsar-admin topics stats options, documented here), in the response, in "subscriptions", look for your subscription by its name. In that object, you can see the "type", which will be marked as "shared," and you will see a list of "consumers". I'd expect that you have more than one consumer in that list.

Reading messages in bulk through a Pulsar consumer

I am using node pulsar client to consume messages from a Pulsar topic. The consumer is subscribed to the topic using a shared subscription mode. Currently, each call to receive gets a single message from the topic. Is there a way to receive messages in bulk?
The fact that you get messages one by one doesn't mean that the Pulsar client doesn't use batching and other optimization techniques in the background. Official documentation for the Pulsar Java consumer defines the receiverQueueSize parameter defining accumulation of messages. By default, the Pulsar consumer uses reasonable values for its parameters and it should perform quite well for the most of the applications. Do you experience any kind of issues or slow performance?
Update
Since the 2.4.1 version of Apache Pulsar it is possible to receive messages in batches using consumer. First, the consumer should be created with the BatchReceivePolicy config (change values to more appropriate for your use case):
Consumer<GenericRecord> consumer = pulsarClient
.newConsumer(Schema.AUTO_CONSUME())
.batchReceivePolicy(BatchReceivePolicy.builder()
.maxNumMessages(5000)
.maxNumBytes(10 * 1024 * 1024)
.timeout(1, TimeUnit.SECONDS).build())
// .. other configuration such as topic and subscription
Second, use the batchReceive method to get a batch of messages:
Messages<GenericRecord> messages = consumer.batchReceive();
When all messages are processed, simply acknowledge all of them:
consumer.acknowledge(messages);

Cleaning up topics

We have an application which creates lots of topics.
When there are no more clients listening (cluster-wide) to a specific topic, that topic should be destroyed. But how to find out, in a clustered environment, that there are no more clients listening to that topic? We don't want to destroy the topic if there are other clients listening on that topic on other nodes in the cluster.
Regards
Fredrik
You most probably have to register / unregister them using a distributed MultiMap where the key is the name of the topic and the values are the listeners.

Redis pubsub and scaling?

For communication between the various services used pubsub. If you create multiple workers
, they both accept data both to process them. What are some methods
that have worked in the same message, only one of the workers
.
PS Perhaps there are some layers, so-called message brokers

Resources