Design suggestion for RabbitmQ Producer & Consumer [closed] - node.js

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a RabbitMQ design question. Let say I have 6 exchanges and 10 queues and split up as below
5 exchange with type 'fanout' and has 5 queues bound to it
1 exchange with type 'topic' and it is routed to 5 different queues based on the routing key.
I have a microservice application which runs Kubernetes with the scale of 25 and all these applications acquire 1 rabbitmq connection per process. So 25 rabbitmq connections act as producer.
I have another application which also runs in Kubernetes with the scale of 1 and these applications acquire 1 rabbitmq connection. So 1 rabbitmq connection act as a consumer.
Numbers: Let say every exchange gets 100k messages per day.
Tech stack: Node.js + amqplib
Questions:
How many channels should producer needs to create for publishing the messages to exchanges?
How many channels should consumer needs to create for consuming the messages from the queues?
Is it a good approach to have one application act as a consumer which consumes the message from all the queues?
How can I scale the consumers automatically based on the queue size in Kubernetes?
Is it possible to build priority based on consumers? Let say due to heavy load conditions, I would like the consumers to stop consuming from a couple of queues and focus all the resources on the rest of the queues.
How many connections should producer & consumer create?
Thanks in advance :)

Semantically, there will be publishing and consuming components in your system. Each should use a channel, primarily because error reporting and handling is channel-scoped.
Whether a single application should consume from "all" queues, entirely depends on how you structure your services.
Same for controlling what consumers consume from what queues. Usually queue and consumers have semantical "types" and serve certain purposes.
Simply adding more consumers and increasing prefetch will only work up to a point, a single queue has a realistic throughput limit.
Scaling application instances based on queue length (messages in Ready state specifically) involves monitoring individual queue metrics. That only works with a small number of queues (with e.g. 100K queues collecting all metrics from all them becomes really expensive).
A small application that monitors metrics of an individual queue or the totals, and updates the number of replicas of an app in a deployment using the Kubernetes API should do.

Related

are condition variables, message queues used for inter process communication and messagebrokers(kafka) the same?

In university level course on operating systems,it is told that inter process communication can take place using message queues.
Also, in mutli-threading, condition variables are queues used to solve the producer-consumer problem.
Recently I have been working with kafka.
are the above the three, kafka, message queues in inter process comm. and condition variables in multi threading the same?
thnx in advance
A message queue / broker is not the same as a queue data structure.
Queues and interprocress communications happen within the context of one machine, while message brokers are often external servers comprised of several machines that communicate over some TCP protocol for their clients. You'd need to dig into their source code to find how each server utilizes threads/locking, but typical users shouldn't concern themselves with that.

Nodejs Cluster with MQTT client [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I have a nodejs engine that uses MQTT to subscribe to messages from many IoT devices. As the number of IoT devices has increased, I want to run the nodejs engine in a cluster mode. This results in every cluster receiving all the MQTT messages. Is there a way to avoid this? A way in which one MQTT message will be received only once and the load of servicing the messages can be equally distributed.
Setup:
Nodejs Engine with MQTT client running via pm2 in an EC2 instance.
MQTT broker running in another EC2 instance.
You need to use a broker that supports shared subscriptions.
This allows multiple clients to connect to the broker and then all subscribe to the same topic[s] and the broker will deliver each message to only one client in the group in a round robin fashion.
Shared subscriptions are an option part of the MQTT v5 spec and some brokers have none standard implementations in their MQTT v3 brokers.
You can read more about shared subscriptions here

Azure Service Bus message time to live setting [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I would like to ask what is the best practice for Azure Service Bus message TTL (time to live) option - https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-expiration.
We use Azure Service Bus to import data from one system to another, amount of records is a couple of millions.
Briefly saying, this option tells ASB how much time a message can stay in a queue or a topic before it moved to dead letter queue(if it is configured) - https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-dead-letter-queues#moving-messages-to-the-dlq.
Even so, I cannot find how TTL value impacts on ASB throughput and performance. What is difference between 5 minutes, 1 hour and 20 hours set for TTL in terms of ASB queue/topic performance?
Thank you in advance
Time to live property is used to set the expiration time window for messages in Service Bus.
Based the time configured for TTL, the messages either moved to dead-letter or lost from the Queue. The usage of this property may differ based on the use cases.
For example, if I am sure that my system will not go down and will pick the messages as soon as it is en-queued, I will configure the TTL to very minimal time window say 1 minute (helps to verify the system is working fine by monitoring the dead-letter length of the Queue). If my system is not reliable or the system runs only once a day to process the messages, then I should have a higher value for this property, so that the messages will be available in the Queue for a longer time, letting the system to process.
Coming to the performance, there will not be much lack in the performance in the Queue due to the higher values of TTL.

How to set partition in azure event hub consumer java code [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
i want to know the purpose of the hostname in EventProcessHost and how to set partition in consumer side . right now i am able to get data from consumer group but all partitions goes to the output stream .
Questions:
1.How to set partition via code java.
2.Use of hostname in EventProcessHost
3.example for multi consumer each has it's own partition in java code.
I highly appreciate any help.
There is a complete Java example, see the docs
You don't need to set a partition when you use an EventProcessHost. Instead, each instance will lease a partition it will work on. So if you created the event hub using ,say 4 partitions, you should instantiate 4x EventProcessHost to get better troughput. See the linked docs as well:
This tutorial uses a single instance of EventProcessorHost. To increase throughput, it is recommended that you run multiple instances of EventProcessorHost, preferably on separate machines. This provides redundancy as well. In those cases, the various instances automatically coordinate with each other in order to load balance the received events.
Leases are given out for a specific time only. After that another receiver can take over that lease. If you give it a while you should notice all instances will retrieve data.
About the hostname:
When receiving events from different machines, it might be useful to specify names for EventProcessorHost instances based on the machines (or roles) in which they are deployed.

Send Data to Multiple Processes in Linux [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I need to update multiple processes with several different pieces of data, at varying rates, but as fast as 10 Hz. I don't want the receiving processes to have to actively get this data, but rather have it pushed to them, so that they only have to do anything about the new data when there actually is any (no polling).
I'm only sending probably a few bytes of data to each process. The data being transmitted will not likely need to be stored permanently, at least not before being received and processed by the recipients. Also, no data is updated less frequently than once every few seconds, so receiver crashes are not a concern (once a crashed receiver recovers, it can just wait for the next update).
I've looked at unix domain sockets and UDP and a little bit at pipes and shared memory, but it seems that they don't quite fit what I'm trying to do:
Domain sockets require the sender to send a separate message to each recipient (i.e., no broadcasting/multicasting)
Shared memory has the disadvantage of having the clients check that data has been updated (unless there's a mechanism I'm not familiar with that can notify them)
UDP doesn't guarantee that the messages will arrive (maybe not likely a problem for communication on the same computer?), and I have some concern about the overhead from the network stack (which domain sockets doesn't have)
The concern about TCP (and other protocols that support inter-device communication) is that there is functionality that's not needed for interprocess communication on a single device, and that that could create unnecessary overhead.
Any suggestions and direction to references and resources are appreciated.
Have you looked at zeroMQ? It is a lightweight messaging library that supports various push/pull access patterns over several transport mechanisms.
One option is to write flat files or SQLite database on the same box.
And have another control file with a process shared mutex, condition variable and record count mapped into memory of the publisher and subscribers. This is the notification mechanism.
This way you would have full history of records in the file or the database which makes it easy to replay records, debug and recover subscribers from crashes.
The publisher would:
Map the control file into memory.
Add new records to the file or the database.
Lock the mutex.
Update the record count.
notify_all on the condition variable.
Unlock the mutex.
The subscribers would:
Map the control file into memory.
Lock the mutex.
Wait on the condition variable till there are new records (each subscriber maintains its own count of already processed records).
Unlock the mutex.
Process the new records from the file or the database.

Resources